Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Modeling and recognition of events from temporal sensor data for energy applications
(USC Thesis Other)
Modeling and recognition of events from temporal sensor data for energy applications
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
MODELING AND RECOGNITION OF EVENTS FROM TEMPORAL SENSOR DATA FOR ENERGY APPLICATIONS by Om Prasad Patri A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) May 2017 Copyright 2017 Om Prasad Patri Dedication To my family, for their sacrifices, support and encouragement ii Acknowledgments I could not have completed my PhD journey without constant support from several people and I would like to express my since gratitude to them. I would like to thank my advisor Prof. Viktor K. Prasanna for his guidance and support through my PhD years, and for consistently challenging me to do better research and making sure I stay motivated and focused. I thank Prof. Dennis McLeod and Prof. Iraj Ershaghi for taking their valuable time to serve on my dissertation committee and providing me with feedback. I am grateful to the USC Center for Smart Interactive Oilfield Technologies (CiSoft) for funding a large part of my research. I sincerely thank my collaborators at USC and postdoctoral research associates who worked with me at various stages - Dr. Vikram Sorathia, Dr. Anand Panan- gadan, Dr. Arash Tehrani and Prof. Rajgopal Kannan. I am also very thank- ful to my mentors from summer internships at NEC Labs and Cylance includ- ing Abhishek Sharma, Michael Wojnowicz and Matt Wolff. A special thanks to Kathy Kassar for managing affairs in Prof. Prasanna’s group, and to all my close friends, mentors and colleagues from the group over the years, including Yinuo Zhang, Mohammed Rizwan Saeed, Sanmukh Rao, Ajitesh Srivastava, Chung Ming Cheung, Charalampos Chelmis, Alok Kumbhare, Charith Wickramaarachchi and many others. iii I thank my parents Dr. Bhabagrahi Patri and Dr. Sumana Panda, and my younger brother Jyoti Prasad Patri for their constant sacrifices for me in the past. And last but not the least, I thank Suparna Dawalkar for being my better half and for her love, encouragement and patience throughout the years of my PhD. Once again, thank you all, and thanks to anyone I missed above because I am forgetful. iv Contents Dedication ii Acknowledgments iii List of Tables viii List of Figures xi Abstract xvii 1 Introduction 1 1.1 Motivation: Energy Applications . . . . . . . . . . . . . . . . . . . 7 1.1.1 Digital Oil Field Applications . . . . . . . . . . . . . . . . . 8 1.1.2 Nonintrusive Load Monitoring in Smart Grids . . . . . . . . 11 1.2 Event Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.3 Event Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.5 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2 Preliminaries and Related Work 23 2.1 Complex Event Processing . . . . . . . . . . . . . . . . . . . . . . . 23 2.2 Event Models and Semantics . . . . . . . . . . . . . . . . . . . . . . 25 2.3 Semantic Rule-based Systems . . . . . . . . . . . . . . . . . . . . . 27 2.4 Multivariate Time Series Analysis . . . . . . . . . . . . . . . . . . . 29 2.5 Time Series Shapelets . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3 Event Modeling with the Process-oriented Event Model 38 3.1 Motivational Value Propositions . . . . . . . . . . . . . . . . . . . . 39 3.2 Foundational Concepts . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.2.1 Entity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.2.2 Observable Property . . . . . . . . . . . . . . . . . . . . . . 46 3.2.3 Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.2.4 Observation . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 v 3.2.5 Data Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.2.6 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.3 Event Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.3.1 Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.3.2 Event Profile . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.3.3 Complex Events and Profiles . . . . . . . . . . . . . . . . . . 55 3.4 Processing Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.4.1 State Model . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.4.2 Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.4.3 Role . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.4.4 Notification . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.4.5 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.4.6 Event Processing Workflow . . . . . . . . . . . . . . . . . . 70 3.5 Event Escalation Scenario . . . . . . . . . . . . . . . . . . . . . . . 71 3.6 Limitations of the Model . . . . . . . . . . . . . . . . . . . . . . . . 73 4 Event Recognition from Temporal Sensor Data 75 4.1 Motivational Scenario in Non-intrusive Load Monitoring . . . . . . 76 4.2 Semantic Computing in Event Processing . . . . . . . . . . . . . . . 78 4.2.1 Event Detection Semantics . . . . . . . . . . . . . . . . . . . 78 4.2.2 Event Filtering Semantics . . . . . . . . . . . . . . . . . . . 79 4.2.3 Context in Event Semantics . . . . . . . . . . . . . . . . . . 80 4.2.4 Event Notification Semantics . . . . . . . . . . . . . . . . . 80 4.2.5 Action Semantics . . . . . . . . . . . . . . . . . . . . . . . . 82 4.2.6 Prediction Semantics . . . . . . . . . . . . . . . . . . . . . . 82 4.3 Incorporating Temporal Pattern Mining in the PoEM Model . . . . 83 4.3.1 Time Series Shapelets for Classification . . . . . . . . . . . . 83 4.3.2 Shapelet-based Event Detection . . . . . . . . . . . . . . . . 87 4.3.3 Extending Event Recognition to Multivariate Temporal Data 90 4.4 Algorithms for Multivariate Shape Mining . . . . . . . . . . . . . . 94 4.4.1 Baseline approaches . . . . . . . . . . . . . . . . . . . . . . . 94 4.4.2 Interleaved Shapelets (ILS) . . . . . . . . . . . . . . . . . . 96 4.4.3 Shapelet Forests (SF) . . . . . . . . . . . . . . . . . . . . . . 100 4.4.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5 Energy Applications 115 5.1 Event Modeling Applications . . . . . . . . . . . . . . . . . . . . . . 115 5.1.1 The PoEM Ontology . . . . . . . . . . . . . . . . . . . . . . 115 5.1.2 Pump Failure in an Oil Field . . . . . . . . . . . . . . . . . 116 5.1.3 Power Blackout Event in a House . . . . . . . . . . . . . . . 117 5.1.4 Maritime Piracy Event . . . . . . . . . . . . . . . . . . . . . 118 5.1.5 Collision Avoidance in an Automobile . . . . . . . . . . . . . 119 vi 5.2 Event Recognition Applications . . . . . . . . . . . . . . . . . . . . 121 5.2.1 Energy Disaggregation in Smart Grids . . . . . . . . . . . . 121 5.2.2 Pump Failure Detection in an Oil Field . . . . . . . . . . . . 126 5.2.3 Gas Compressor Valve Failure Prediction . . . . . . . . . . . 143 6 Future Work and Conclusions 149 6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 6.2 Future Direction: Semantic Middleware for Event Modeling . . . . . 151 6.2.1 Motivational Scenario in Smart Energy Grids . . . . . . . . 153 6.2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 154 6.2.3 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . 156 6.2.4 Enterprise Integration Patterns Ontology . . . . . . . . . . . 157 6.2.5 Evaluation and Discussion . . . . . . . . . . . . . . . . . . . 159 6.3 Future Direction: Shapelets for Malware Detection . . . . . . . . . 166 6.3.1 Malware Detection . . . . . . . . . . . . . . . . . . . . . . . 167 6.3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 168 6.3.3 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . 171 6.3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 Reference List 182 vii List of Tables 2.1 Existing Semantic Event Models . . . . . . . . . . . . . . . . . . . . 27 3.1 Common Event Composition Operators . . . . . . . . . . . . . . . . 60 4.1 Comparison of event-related semantics in some SCEP approaches . 84 4.2 ClassificationaccuracyonWaferdatasetonNaiveShapelets,Shapelet Forests with majority voting (no feature selection) and Interleaved Shapelets. Four train/test percentage splits of the dataset are eval- uated as shown. ILS performs the best overall for this dataset. . . . 109 4.3 ClassificationaccuracyongesturerecognitiondatasetonNaiveShapelets, Shapelet Forests with majority voting (no feature selection) and Interleaved Shapelets. The train/test split shows how many days of data was used for training and how many for testing. The data contains 11 days of measurements in total. We used three splits for evaluation - first 3 days data for training (next 8 days for testing), first 5 days for training (next 6 for testing), and first 7 days for training (next 4 for testing). SF with majority voting performs the best overall for this dataset. . . . . . . . . . . . . . . . . . . . . . . 110 4.4 ClassificationaccuracyonWaferdatasetusingShapeletForestswith various feature selection methods on different train/test splits. . . . 111 viii 4.5 Effect of class imbalance and segment size parameter on perfor- mance of our proposed Interleaved Shapelets (ILS) algorithm for different train/test splits of the Wafer dataset. The numbers show classification accuracy on test data (1.00 means 100% classification accuracy). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.6 Effect of class imbalance and amount of training data . . . . . . . . 114 5.1 Putting BLUED dataset appliances into classes . . . . . . . . . . . 123 5.2 Classification accuracy of different classifiers for event detection (detecting whether an appliance was switched on/off) . . . . . . . . 124 5.3 Classification accuracy of different classifiers for event classification (identifying which appliance(s) was switched on/off), which is the energy disaggregation step . . . . . . . . . . . . . . . . . . . . . . . 125 5.4 Accuracy of detecting failures at the pump level . . . . . . . . . . . 136 5.5 Accuracy of predicting failures at the pump level . . . . . . . . . . . 137 5.6 Comparison of accuracy (%) of shapelet-based segment-level failure detection/prediction with other machine learning techniques. The following classifiers are compared to our proposed Fast Shapelets- based method (FS): Logistic Regression, Multi Layer Perceptron (MLP), Support Vector Machines (SVM) using either Gaussian or radial basis function (rbf) kernel, polynomial kernel, or linear ker- nel, AdaBoost, Decision Tree (J48), Random Forests (RF) and our baseline classifier (ZeroR) . . . . . . . . . . . . . . . . . . . . . . . 138 ix 5.7 Results of detecting failures at the pump level. Rows in bold indi- cate pumps that were labeled correctly as failures/normal though their segment-level accuracy is less than 100%. Testsegs indicates the number of test segments, failsegs the number of detected fail- ure segments, and threslabel the detected label of the pump after applying a threshold of 40% . . . . . . . . . . . . . . . . . . . . . . 140 5.8 Results of predicting failures at the pump level. Testsegs indicates the number of test segments, failsegs the number of predicted to fail segments, and threslabel the detected label of the pump after applying a threshold of 20% . . . . . . . . . . . . . . . . . . . . . . 142 x List of Figures 1.1 A pump failure event in one area has repercussions across the oil field motivating the need for event-driven information integration . 3 1.2 An offshore oil well with a blowout preventor, malfunctioning of which can cause an explosion . . . . . . . . . . . . . . . . . . . . . 4 1.3 Illustration of shapelets. These two shapelets (shown in red) were discovered automatically from intake pressure sensor data from a pump in an oil field. These shape patterns provide critical inputs towards predicting pump failures, and were found mathematically without using any domain knowledge about the application. . . . . 6 1.4 A four cylinder gas compressor . . . . . . . . . . . . . . . . . . . . . 10 1.5 Overview of components of the PoEM event modeling framework . 15 3.1 Event escalation capability . . . . . . . . . . . . . . . . . . . . . . . 40 3.2 Predictive event detection with complex event processing . . . . . . 42 3.3 Reduction in data seeking effort through an event-driven system . . 43 3.4 Management by exception . . . . . . . . . . . . . . . . . . . . . . . 45 3.5 Simple event identified in an oilfield . . . . . . . . . . . . . . . . . . 53 xi 3.6 A ‘Low Voltage’ simple event observed for a refrigerator. Note the 1-1 correlation between entity, observable property, and observa- tion. A simple or atomic event in PoEM is defined as an event which involves exactly one entity, one observable property and one observation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.7 In PoEM, moving from simple events to complex events can be done by one of these four ways - having a multiplicity in observations, entities, observable properties or a combination of the above. This is in stark contrast to numerous existing works which are unable to define a finite number of ways to move from simple to complex events in a real world system. . . . . . . . . . . . . . . . . . . . . . 56 3.8 Complex event caused by multiplicity in observations of same prop- erty at different times . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.9 Complex event involving two different observable properties for the same entity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.10 Complex event involving multiple properties, entities, observations and sub-events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.11 From raw data values to events: context addition and abstraction in our model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.12 State model for a pump in an oil field . . . . . . . . . . . . . . . . . 64 3.13 PoEM Event Processing Workflow . . . . . . . . . . . . . . . . . . . 70 xii 4.1 An illustration of a shapelet extracted from our evaluation dataset. This shapelet (denoted by S1), extracted automatically from power use data in a household recorded by sensors, is able to semantically capture the event of an appliance switching on without any explicit backgroundknowledgeabouttheapplianceorthenon-intrusiveload monitoring domain. . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.2 Shapelet-based decision tree for classification of a new test instance, the shapelet S 1 is the one extracted in the previous Figure. . . . . . 86 4.3 The structure of an event-condition-action rule . . . . . . . . . . . . 90 4.4 Shapelets (shown in red) extracted from time series representing a shake gesture (top) and a pick-up gesture (bottom). The Y-axis shows accelerometer sensor values while the X-axis shows time at a 100Hz frequency. Shapelets are discriminative features (time series subsequences) in data with the predictive power to perform classi- fication on new instances. . . . . . . . . . . . . . . . . . . . . . . . 93 4.5 Two multivariate instances each consisting of two time series (two sensors). The peak in the data is a potential shapelet candidate but it is not discriminative enough to differentiate between the two instances (which belong to opposite classes), because a similar pat- tern occurs in all four time series. . . . . . . . . . . . . . . . . . . . 97 4.6 Time series formed by concatenating data from all sensors for each instance in the previous example. Now the potential shapelet can- didate from the concatenated time series can discriminate between the two classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.7 Interleaving data from multiple sensors . . . . . . . . . . . . . . . . 99 4.8 Interleaved Shapelets approach . . . . . . . . . . . . . . . . . . . . 99 xiii 4.9 Shapelet Forests Algorithm . . . . . . . . . . . . . . . . . . . . . . 101 5.1 PoEM Ontology Classes and Properties. Classes are shown as rect- angles and literals (resulting from datatype properties) are shown in double rectangles. . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.2 Pump failure event in PoEM. og: denotes an oil and gas domain ontology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.3 A power blackout event in a household; nilm: denotes a NILM domain ontology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.4 Maritime piracy events mapped to PoEM; sem: denotes the simple event model ontology [145] and ex: is a maritime domain ontology. 120 5.5 Side collision warning event in a car mapped to PoEM . . . . . . . 121 5.6 Examples of intake pressure measurements during normal operation (blue) and failed operation (red) . . . . . . . . . . . . . . . . . . . . 128 5.7 Examples of current measurements during normal operation (blue) and failed operation (red) . . . . . . . . . . . . . . . . . . . . . . . 129 5.8 Examples of voltage measurements during normal operation (blue) and failed operation (red) . . . . . . . . . . . . . . . . . . . . . . . 130 5.9 Irregularity in sampling of oilfield sensor data. The period between consecutive data points varies between 1-hour and 1-day. The X- axis shows time (in days) and the Y-axis represents the insertion date of the data point (left) and the last good scan date (right). . . 131 5.10 Proposed pre-processing workflow . . . . . . . . . . . . . . . . . . . 132 xiv 5.11 Failure Event Detection and Failure Event Prediction Scenarios. The red shaded portion denotes a failure operation and the green operation period denotes a normal operation period. All segments extracted are of the specified segment length. In the detection sce- nario, the segments are extracted from the actual failure or normal operation duration, while, for prediction, segments are extracted only from the lookback period without utilizing any data in the actual labeled failure or normal periods. . . . . . . . . . . . . . . . 134 5.12 Two shapelets (shown in red) extracted for failure detection based on intake pressure sensor data (normalized). The segment length was 1 week. The first shapelet (left) is of 35 hours duration and the second one (right) is of 5 hours duration. The data was split into a 50%-50% train-test split (half of the data was used for training) . . 136 5.13 Threeshapelets(showninred)extractedforfailurepredictionbased on intake pressure sensor data (normalized). The segment length was 1 week and lookback period was 4 weeks. The shapelet dura- tions are Ð 50 hours, 36 hours and 5 hours. The data was divided into a 50-50 train-test split (half of the data was used for training). 137 5.14 Data from some of the sensors in the gas compressor Ð discharge temperature of a cylinder (top left), motor winding temperature (top right), motor vibration (bottom left) and cooler vibration (bot- tom right) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 5.15 Pre-processing sensor data Ð a failure window is set just prior to the occurrence of each failure and we use these blocks of data to extract signals indicative of failure . . . . . . . . . . . . . . . . . . . 146 5.16 Displaying the complete training data for one sensor . . . . . . . . . 147 xv 5.17 Classification Accuracy on Test Data vs. Segment Size Parameter in ILS (k). The highest accuracy is found for k = 20, 25 or 35. . . . 148 6.1 Demand-response scenario in Smart Grid . . . . . . . . . . . . . . . 154 6.2 Overview of the proposed approach . . . . . . . . . . . . . . . . . . 156 6.3 Aggregator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 6.4 Dead Letter Channel . . . . . . . . . . . . . . . . . . . . . . . . . . 159 6.5 Dynamic Router . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 6.6 Smart Grid demand-response scenario after optimizations . . . . . . 160 6.7 Energy consumption plots for three buildings in the campus . . . . 162 6.8 Number of subscribers in each building . . . . . . . . . . . . . . . . 164 6.9 Savings in the number of alert messages sent in 2009 . . . . . . . . 165 6.10 Shapelet found from file entropy data . . . . . . . . . . . . . . . . . 171 6.11 Overview of our malware classification approach . . . . . . . . . . . 171 6.12 Classification accuracy of our approach . . . . . . . . . . . . . . . . 177 6.13 Classification accuracy of our approach on balanced data . . . . . . 178 6.14 Illustrating the efficacy of the Distance from Shapelet (DFS) feature with Random Forests classifier . . . . . . . . . . . . . . . . . . . . . 179 6.15 Illustrating the efficacy of the Distance from Shapelet (DFS) feature with Logistic Regression classifier . . . . . . . . . . . . . . . . . . . 180 6.16 Illustrating the efficacy of the Distance from Shapelet (DFS) feature with Support Vector Machine classifier . . . . . . . . . . . . . . . . 181 xvi Abstract The ubiquitous nature of sensors and smart devices collecting more and more data from industrial and engineering equipment (such as pumps and compressors in oilfields or smart meters in energy grids) has led to new challenges in faster processing of temporal data to identify critical happenings (events) and respond to them. We deal with two primary challenges in processing events from temporal sensor data: (i) how to comprehensively model events and related happenings (event modeling), and (ii) how to automatically recognize event patterns from raw multi-sensor data (event recognition). The event modeling problem is to build a comprehensive event model enabling complex event analysis across diverse underlying systems, people, entities, actions and happenings. We propose the Process-oriented Event Model for event process- ing that attempts a comprehensive representation of these processes, particularly those seen in modern energy industries and sensor data processing applications. This model brings together, in a unified framework, the different types of entities that are expected to be present at different stages of an event processing workflow and a formal specification of relationships between them. Using event models in practice requires detailed domain knowledge about a variety of events based on raw data. We propose to learn this domain knowledge xvii automatically by using recent advances in time series classification and shape min- ing, which provide methods of identifying discriminative patterns or subsequences (called shapelets). These methods show great potential for real sensor data as they don’t make assumptions about the nature, source, structure, distribution, or stationarity of input time series, provide visual intuition, and perform fast event classification. By combining shape extraction and feature selection, we extend this temporal shape mining paradigm for processing data from multiple sensors. We present evaluation results to illustrate the performance of our approaches on real-world sensor data. xviii Chapter 1 Introduction The ubiquitous nature of sensors and devices continuously collecting more and more data has led to new challenges in analyzing growing streams of data in near real-time. A decisive factor for an organization’s success is to react quickly to changing trends and make decisions based on things that happen (events). Event processing refers to a method for tracking and analyzing (processing) streams of information about things that happen (events), and deriving a conclusion from them. The detection of event patterns in data streams is a key problem in real-time information processing. Event processing addresses this problem by matching con- tinuously incoming events against a pattern. The result of a matching are usually composite (or complex) events which are derived from the input events. In con- trast to traditional database management systems where a query is executed on stored data, event processing executes data on a stored query. Thus, we obtain a shift in paradigm from processing ‘data-at-rest’ to processing to ‘data-in-motion’. The advantage of this paradigm shift is that it enables us to apply queries on a potentially infinite stream of data. Inputs can be processed immediately, and once the system has seen all events for a matching sequence, results are emitted, thus, enabling real-time analytics. This event processing approach which combines data from multiple sources is called complex event processing. The goal of complex event processing is to identify meaningful events, such as opportunities or threats, and respond to them quickly. 1 This approach to processing events is applicable in diverse real-world scenarios. For instance, it can be used to monitor stock market trends or detect credit card fraud. It can be used to detect network intrusion from patterns of suspicious user behavior. It can be used for theft detection in a warehouse where items are not properly checked out or in their correct locations based on sensor data associated with the items. It can also be used for equipment monitoring and information integration for enhancing business objectives and safety in industries such as modern oil fields. Due to high frequency of event-related information in the oil and gas industry and criticality of quick responses for business as well as health, safety and environ- mental aspects, it is pertinent to use event processing approaches that can handle structured, semi-structured and unstructured information extracted from system silos. Event processing approaches can offer faster response times, reduced data seeking efforts, efficient interaction patterns, and management by exception for the digital oilfield. Petroleum production involves complex data-intensive processes that employ specialized tools and vendor products resulting in system silos, particularly across maintenance and production databases. For instance, a pump’s working condi- tion is monitored in operations systems. However, in the case of its failure, the responsibility of repairs is handed over to the maintenance team, who use sepa- rate tools for computerized maintenance management systems. The pump failure event affects different teams such as operations, maintenance and production opti- mization and these teams are required to collaborate across system boundaries to bring the pump back to functioning status. An event-based system can be used to specify the life-cycle of such events (starting from the sensor level) and automate the generation of appropriate responses (such as initiating a repair ticket in the 2 repair system) and escalating the event priority if the expected goal is not reached. An illustration of a pump failure is shown in Figure 1.1. An event-based system facilitating information integration would be able to correlate these happenings and process them to derive conclusions for a faster response. Figure 1.1: A pump failure event in one area has repercussions across the oil field motivating the need for event-driven information integration More important than maintenance and production issues are health, safety and environmental hazards. The concept of management by exception, which refers to dealing with emergency events immediately is pertinent here. Figure 1.2 shows an offshore oil well with a blowout preventor component, similar to that in the 3 Deepwater Horizon oil drilling rig, which met with a macabre tragedy in May 2010 (BP Macondo well blowout). It was later discovered 1 that the blowout preventor used on the BP Macondo well had a dead battery and a miswired solenoid, and so, did not function properly. It was further discovered that the battery failure could have been avoided by having a system in place that monitored its power, and prevented this critical tragedy ‘event’ from happening. Figure 1.2: An offshore oil well with a blowout preventor, malfunctioning of which can cause an explosion To manage events across an organization, we need a data model for events, which represents events and their their relationships to other concepts in a uni- fied complex event processing (CEP) system.These data models are called event models. Existing event models are typically based on a broad definition for events, classifying anything which happens as an “event” [20]. This broad scope makes it difficult to apply existing event models to a real-world application since only 1 http://www.nola.com/news/gulf-oil-spill/index.ssf/2013/04/electrical_ engineer_says_dead.html 4 a relatively small number of the possible events are important and need to be processed further. In this thesis, we provide a precise mathematical definition for an “event” which connects events to real-world entities, their properties and timestamps, enabling us to retain only relevant observations. State-of-the-art event models provide a framework to represent and reason about events but the fundamental problem of transforming large-scale sensor mea- surements to a sequence of only relevant events has not been addressed in existing work. Therefore, in practice, using these event processing models requires the detailed definition of each variety of events based on raw data. However, raw sen- sor data is typically time series data - a time series being a sequence of data items each having a timestamp, and the sequence of timestamps being non-decreasing. Existing event models do not work on the raw time series sensor data by design. Time series analysis methods have traditionally been used to process data streams for classification and identifying anomalies. The areas of event processing and time series mining are intuitively connected, since all form of event processing uses temporal sequences of data elements. However, event models do not work on the raw data directly, they work on events, and they need event definitions to be defined (likely by domain experts). Time series approaches, which work directly on the sensor data have not been incorporated into event models. Incorporating time series analysis for event detection in a comprehensive event processing framework is challenging since a given semantic event model has specific rules for defining and composing events which do not match the internal representation of a time series classification algorithm. However, recent advances in shape mining approaches to time series analysis [164, 110, 95] provide methods for identifying discriminative subsequences (shapelets) from data which can be composed in a manner similar to how simple events are combined to define complex events. 5 A visual example of two shapelets can be seen in Figure 1.3. The time series data on the Y-axis represents (normalized) intake pressure data from a pump in a oil field, and the X-axis represents time. The shapelet subsequences within the time series are shown in red in the figure, and were discovered automatically by a shapelet extraction algorithm without any domain knowledge. The discovered shape patterns (of changes in intake pressure) provide critical pointers towards possiblefailureofthepumpinthefuture. Oncethesediscriminative(andpowerful) shapelet segments are found, the rest of the input data is discarded. More details about shapelet-based approaches are discussed in Section 2.5. Figure 1.3: Illustration of shapelets. These two shapelets (shown in red) were discovered automatically from intake pressure sensor data from a pump in an oil field. These shape patterns provide critical inputs towards predicting pump failures, and were found mathematically without using any domain knowledge about the application. An opportunity exists for using these shapelet-based methods as the basis for identifying relevant events as part of a comprehensive event representation and detection model. This thesis describes our approach to achieve this combination of developing an event model that is capable of directly processing sensor data from multiple sources using shapelet mining. 6 We use shapelet mining as a method to learn domain knowledge about events automatically by directly processing the sensor data. This is enabled by designing our event model’s event detection system to have a structure which can adapt to be similar to that of how a shapelet-based classifier’s decision rules for classifying time instances work. Both of these systems have a similarity to the structure of event-condition-action rules [108] in event processing. These rules simply check for a specified condition upon the happening of an event and then trigger a specified action. For shapelet-based classifiers, the event is the discovery of a shapelet, the condition is related to mathematical distance comparisons between the shapelet and the new test time series, and the action is to make a prediction of a cer- tain target class related to the domain. Thus, we can learn information about the domain directly from the information automatically learned during shapelet discovery. Further details of our approach are provided in Section 4.3. We now motivate the need for our proposed event modeling and event recogni- tion approaches by illustrating the need for event-based data processing in multiple real-world energy systems and industries. 1.1 Motivation: Energy Applications Our work is driven by real-world data from diverse applications. For event modeling, we focus on industrial events and processes. For event recognition, any source of temporal sensor data can be used for evaluation of our developed methods. Here, we describe use cases that we have already explored as part of our work. 7 1.1.1 Digital Oil Field Applications The evolving nature of the modern digital oilfield requires large-scale instru- mentation and monitoring. Data mining and machine learning approaches have become vital to digital oilfield operations as they move into the age of Big Data. A massive portion of oilfield data today is in the form of sensor streams which necessitates the use of rapid real-time data analysis techniques. Due to the high frequency of sensor data in the oil and gas industry, it is inevitable for oil and gas enterprises to leverage efficient event mining techniques, especially those dealing with time series data. We delineate three specific use cases here which can benefit from our developed approaches. Event-driven Information Integration Modern oil fields are highly instrumented with sensors. Sensor data from var- ious systems within an oil field, possibly across application silos, need to be inte- gratedtomakequickdecisionsaboutdetectingandrespondingtoevents. Atypical oil field scenario related to a pump failure is shown in Figure 1.1, which shows the cascading effect of events in a situation. From the figure, we can observe that a pump failure can seriously affect maintenance and production schedules, and in some cases, be a safety or environmental issue. Semantic computing approaches are known to facilitate interoperability and informationintegrationacrossheterogeneousdatatypes. Inconjunctionwithevent processing approaches, a unified semantic event model is crucial to model and manage events with the oil field. There are several value propositions to having such a model, as delineated below. • Faster response times 8 • Reduced data seeking efforts • Consistent best practices maintained • Management by exception Electrical Submersible Pump Failures Electric submersible pumps (ESPs) are one of the main artificial lift meth- ods for extraction of fluid (oil). The failure of an ESP increases operating costs and affects production. There are several factors affecting ESP failures, including reservoir type, sand control, and whether the ESP is a replacement. In an effort to understand the causes of failures, the oil and gas industry has attempted to collect data on all potential factors and exchange this information for better data analysis. However, data aggregation is a difficult task considering the myriad ways of collecting and recording the diverse types of data. Instead of attempting to build a complete model of ESP failures from these disparate data sets, we apply machine learning and time-series data processing techniques to automatically learn failure models from only the sensor measure- ments (with relatively high temporal frequency) collected directly from the ESP. The goal of our research is therefore to understand the limits of failure detection and prediction using only the most accessible sensor measurements. It has been estimated that just a 1% improvement in ESP performance world-wide would pro- vide over a half-million additional barrels of oil per day. Considering the high cost incurred by an ESP failure, early detection and prediction of pump failures of even a subset of all operational assets from readily available data can reduce OPEX (Operational expenditure) and can reduce average maintenance cost. 9 Valve Failures in Gas Compressors As the number of sensors deployed in the oilfield increases, there is a corre- sponding need to develop methods for fast, automated processing of large-scale sensor data streams. This problem is aggravated when the streams of sensor data are from multiple sensors with different characteristics and applicability to specific tasks of interest to the oilfield operator. Predictive analytics and data mining approaches have been proven to improve production efficiency, reduce downtime and identify safety hazards in real-world scenarios. In this use case, we focus on the task of predicting burnt valve failures in gas compressors using heterogeneous sensor measurements collected at a high temporal resolution. A typical gas com- pressor with multiple cylinders is shown in Figure 1.4. Figure 1.4: A four cylinder gas compressor One of the most common failures in rotating equipment such as compressors is the breakdown of valves. This issue is of great value because a large proportion of production is dependent on rotating equipment. The aim of our undertaking is to find signature(s) in sensor data collected from compressors which are predictive of valve failures. The resulting information can be used to prioritize and monitor maintenance schedules for compressors, which are often on remote platforms. The 10 datausedinourevaluationisfromalargenumberofsensorswhichmeasurevarious physical properties of compressors, ranging from compressor vibrations and motor winding temperatures to pressure and temperature for both suction and discharge at the various compression stages. 1.1.2 Nonintrusive Load Monitoring in Smart Grids Smartgrids[124]haveenabledhouseholdelectricityconsumptionmonitoringat very fine time granularity. However, electricity consumption is typically recorded in the form of aggregate numbers. For encouraging energy conservation, it is more valuable to report energy usage at the level of individual appliances to consumers. Energy disaggregation refers to the technology that breaks down aggregate energy consumption into appliance level itemized measurements without any plug level sensors. Non-intrusive load monitoring (NILM) approaches attempt to perform such energy disaggregation, i.e., estimate the electricity consumption of individual appliances from aggregate power and/or voltage measurements [168]. United States utilities have recently been deploying millions of smart meters that collect energy consumption data from residential and commercial customers [123]. Despite the investment in infrastructure, energy savings and financial ben- efits have yet to reach their full potential. Access to appliance level data through disaggregation of smart meters’ measurements has numerous benefits both from the consumer’s and the utility’s perspective. From the user’s perspective, sev- eral studies suggest that great energy reductions can be achieved as a result of appliance-specific electricity consumption feedback (e.g., which specific appliance could most effectively reduce energy use for a given household) and automated personalized and cost-effective energy saving recommendations (e.g., what type of new appliance to purchase based on current use) [7]. From the utility’s perspective, 11 electricity disaggregation can lead to improved program design and diversification, as well as incentives recommendation. Perhaps the most important benefit for utilities is the development of enhanced prediction models for electrical energy consumption. Recent years have seen an increasing interest in developing load forecasting models; however, to date, there has been little study on the differences in consumption characteristics of individual customers, and their impact on the accuracy of prediction models. We now describe the event modeling and event recognition problems related to these applications briefly in the context of this thesis. 1.2 Event Modeling An event model refers to a data model for representing events and their rela- tionships to other concepts. A comprehensive event model should include event detection,filtering, notification,actiondetermination,contextawareness, andesca- lation mechanisms [20]. There are several value propositions for using an unified event model to rep- resent happenings across an organization, such as events in an oil field. These value propositions are delineated in Section 3.1. One of them is event escalation. Consider the event of a pump failure. Suppose the operator was alerted about the failure pretty soon but due to certain circumstances, she saw the message quite late. Then, she tried to take the suggested corrective action but failed and reported it to her supervisor. She would have to wait for her supervisor’s reply who again, might not respond quickly. Or in another scenario, probably the concerned engi- neer repaired the pump successfully but missed reporting it. Such delays could get accumulated at various stages increasing the response time to the pump failure. 12 An event-based system can use ‘event escalation’ in such situations to avoid bot- tlenecks in the process and reduce the delays greatly. Event escalation refers to the concept of waiting for an action response for a certain time, and upon no response, automatically looking up recommended best practice for the event, implementing it and alerting appropriate personnel. For information integration, reuse and interoperability, semantic web and semantic computing approaches have heavily influenced event modeling. Semantic computing approaches look at data in machine-understandable format - such as RDF 2 triples. Any form of human understandable data can be converted to such machine-understandable format, thus providing a unified approach to facilitate data integration. A semantic data model can be represented in the form of one or more ‘ontologies’ (semantic data models) for reuse and linking to other models. A unified semantic event model facilitates seamless integration of event data from diverse sources. Therearevariousvaluepropositionsusingsemanticcomputingtoenhanceevent processing, particularly for industrial and enterprise applications. These are delin- eated below: • Integration. Data from multiple diverse sources can be viewed in a unified manner at the required granularity by using a semantic representation. • Interoperability. Syntactic and semantic heterogeneity between sources can be resolved by using ontological representations events and related con- cepts. This enables interoperability while providing reasoning and inference capabilities over the data. 2 https://www.w3.org/RDF/ 13 • Dynamism. Semantics can help bring dynamism to event-based systems. Specific events will trigger specified actions from relevant actors, and seman- tic approaches can help define correlation between background knowledge, observations and corrective actions. • Management by Exception. Semantic approaches can help us to deter- mine the appropriate priority for a new event, and ensure that critical events are dealt with urgently. This can include dynamic selection of people to be notified of an event, based on its context, and automatic escalation of events in case of non-response. Theproblemofeventmodelingistobuildacomprehensiveend-to-endsemantic event model to enable complex event analysis across diverse underlying systems, people, entities, actions and happenings. We address this problem in Chapters 3 and 4. We propose a novel conceptual model called PoEM (Process-oriented Event Model)forcomplexeventprocessingthatattemptsacomprehensiverepresentation of processes, such as those seen in modern industries and sensor data processing applications. The PoEM model brings together, in a unified framework, the dif- ferent types of entities that are expected to be present at different stages of an event-processing workflow and a formal specification of the relationships between these entities. Figure 1.5 shows the major components of PoEM (these compo- nents are described in detail in Chapter 3). PoEM has a detailed representation of concepts related to converting data streams to events (measurements, observa- tions, interpretations) since most industrial events are detected from sensor-based data. Events are detected using pre-defined event profiles and this is used to drive goal planning. In PoEM, planning is performed using an explicit state model. The output of the planner defines the roles and actions that need to be taken to execute the plan. Chapter 3 describes the PoEM model formally. 14 Figure 1.5: Overview of components of the PoEM event modeling framework In this thesis, we explore and establish a link between event processing and time series analysis. Integrating an automatic time series classification algorithm for event detection into PoEM results in this additional value proposition: • Predictive Analytics. Event-driven systems require queries on data in motion, and semantic computing, through a richer representation model for events, can connect event concepts to automatically discovered patterns from sensor data. These patterns can be discovered from the data in a localized manner, and exhibit predictive power for rapid classification of unseen data in the future. 1.3 Event Recognition The event recognition problem is to automatically identify relevant events from sensor data, typically in the form of time series measurements. We view the event detection or event recognition problem as a temporal pattern mining problem, as events are essentially observed patterns over time. Time series data is inher- ently similar to that from event data streams, e.g. motion sensors in a user’s arm can let us know can when a certain gesture (‘event’) is performed by the user. Advances in machine learning, data science and data mining have opened up new and unconventional approaches for time series mining, and we build upon them for 15 extending our PoEM model such that a machine learning based time series classi- fication approach can be used as the basis for automatic semantic event detection from data streams. The nature of real-world sensor data brings with it several challenges due to the volume, velocity and variety of data. Industrial and engineering systems, such as oil fields or manufacturing plants, are heavily instrumented, and are continuously monitored by a large number of sensors that can collect data at a high temporal rate (one sample every tens of milliseconds to seconds [32]). There are several practical challenges when dealing with such data for temporal pattern mining. Conventional time series approaches typically make several assumptions on the nature, sourceandstructureofinputdata, whichrarelyholdforpracticalindustrial sensor datasets. These assumptions include restrictions on the nature, structure, source, length and distribution of the data (as described above). Typically, real sensor data is noisy, not independent and identically distributed (i.i.d.), does not have a linear structure, does not have equal lengths across data instances, and does not follow a specific probability distribution (e.g. Gaussian) or a mixture of distributions from which it is drawn. Further, the challenge is made even more complicated when the sensor data is from multiple sensors that may or may not be of the same type, and some sensors may not be contributing positively towards the predictive power of the classifier. The overarching task of multivariate time seriesclassification(MTSC),multivariatereferringtothemultiplesensors, requires integrating information from all sensors. For instance, temperature, pressure and vibration sensors may be used together to determine the status of the operating state of a machine. Even when the type of sensor is the same, they typically record orthogonal measurements which need to be combined later on for prediction and 16 analysis. For instance, accelerometer sensors typically used in human activity experiments record (synchronized) acceleration data in the x, y and z directions. In this thesis, we propose a new approach to tackle this multivariate time series classification problem while considering the described challenges encountered when dealing with real sensor data. Real-world sensor data typically does not follow one of more assumptions made by traditional sensor data mining approaches. These assumptions include restrictions on nature, characteristic or source of raw input sensor data such as: • stationarity assumption (mean and variance do not change over time) • smoothness of data e.g. data is assumed to be differentiable • distribution of data e.g. data is assumed to be generated from a Gaussian distribution or a Gaussian mixture model • conditional independence of variables within the data • dependence of current data value on previous k temporal values as in cur- rently used autoregressive or moving average models for time series analysis, and also in Hidden Markov models (where the current value depends only on the previous value) • limited interpretability and expressivity of results e.g. no visual intuition behind discovered causes of anomaly We focus on developing algorithms which can extract useful patterns from raw input data while mitigating these restrictions. Our proposed MTSC approach is based on identifying discriminative patterns in time series data, known as shapelets. A shapelet is a subsequence or local tem- poral pattern in a time series that is a representative feature of the class to which 17 this time series belongs [165]. Ye and Keogh [164, 165] were the first to define time series shapelets and propose an algorithm for shapelet mining targeted at univari- ate time series classification. Shapelets have been shown to be effective for diverse time series data mining tasks including classification, clustering, summarization, and visualization [165, 82, 110, 43, 44, 98]. We propose new algorithms for shape mining to perform event recognition from multisensor data so that it can extend the capabilities of our PoEM event model. Though several other approaches exist for time series classification, the pop- ularity of shapelets is due to four factors [165, 110, 43, 44]: (1) shapelet meth- ods impose no assumptions or restrictions on nature of data unlike autoregres- sive or ARIMA time series models [40], (2) they are easy to interpret for domain experts [43, 44, 98, 91], (3) they have been shown to be more accurate than other methods for some problems [43, 110], and once shapelets have been extracted from training data, we can classify a new time series quickly. These factors influenced our choice of using shapelets for solving the MTSC problem. Most of the work on shapelet mining is aimed at univariate time series data. In this work, we propose new approaches for generalizing shapelet mining to mul- tivariate or multisensor time series datasets. We propose an algorithm (called Shapelet Forests) which combines shapelet extraction with feature selection to solve this problem. Shapelet Forests performs univariate shapelet extraction first, and then uses feature ranking to create a multivariate classifier from multiple indi- vidual univariate classifiers. We also propose another algorithm (called Interleaved Shapelets) which works the opposite way - performing feature ranking first to gen- erate a univariate representation of the original multivariate data, and then per- formingshapeletextraction. Ourshapelet-basedapproachesautomaticallyidentify critical segments from time series which are discriminative and representative. We 18 interpret each of these critical segments as simple events for the PoEM model. Thus, this temporal pattern mining approach is a means of learning event defini- tion and detection parameters for the PoEM model from sensor data. 1.4 Contributions This dissertation makes the following contributions in the area of modeling and recognition of events from time series sensor data. 1. We propose a comprehensive theoretical model for modeling events and semantics of event processing at a level of abstraction that captures the different processes in industrial applications but is not limited to a specific application domain. This model, called the Process-oriented Event Model (PoEM), provides a formal approach to model real-world entities and their interrelationships, and specifies the process of moving from data streams to event detection to event-based goal planning. The model links event detec- tion to states, actions, and roles, enabling event notification, filtering, con- text awareness and escalation. PoEM defines event and non-event concepts and combines information from them to build an event processing workflow. PoEM also enables event escalation. A PoEM ontology is also presented and usage of the PoEM model is illustrated in case studies. 2. Semantic computing has been largely underutilized for event processing applications, and a primary reason for this gap is the difference in the level of abstraction between the high-level semantic models for events and the low-level raw data values received from sensor data streams. We thoroughly investigate the need for semantic computing in various aspects of event pro- cessing, and intend to bridge this gap by utilizing recent advances in time 19 series analytics and machine learning. We build upon our PoEM model and extend it to facilitate semantic time series data mining directly over sensor data, which provides the advantage of automatically learning the required background knowledge without domain expertise. We evaluate our approach over time series data from a real-life NILM scenario. 3. We propose two new algorithms for multisensor event recognition from time series data, which we frame in the context of shapelet-based multivariate time series classification. Shapelet-based approaches have become popu- lar in recent times for various data mining tasks, but most of the exist- ing shapelet-based classification approaches are limited to univariate time series. We propose an algorithm called Shapelet Forests, which combines shapeletextractionwithfeatureselectiontomakeshapeletidentificationpos- sible from multisensor datasets as well. This algorithm can choose from a suite of feature selection methods, which make sure that relevant and useful sensors are ranked higher in the final classifier predictions. We also propose a complementary algorithm called Interleaved Shapelets, which takes into account the local temporal dependencies across various sensors, and inter- leavesdatafromallsensorstocreateaunivariatetimeseriesrepresentationof the original multivariate time series dataset so that existing shapelet extrac- tion methods can be directly applied on them now. We show that both of our proposed algorithms are highly accurate and robust to several practical constraints when dealing with real-world sensor data, such as those from industrial equipment. These constraints include having class imbalanced data, noisy or redundant sensors, lesser quantities of training data, and the choice of algorithmic parameters. 20 4. Wemotivatedtheproblemsofeventmodelingandeventrecognitionfromsen- sor data in diverse real-world use cases related to energy applications which drive our research, and in this dissertation, we show how these two problems are closely linked together. To illustrate our event modeling approach, we modelthefollowingscenariosinourproposedmodel: (i)apumpfailureevent in an oil field, (ii) a piracy event when a ship is hijacked at sea by pirates, (iii) a power blackout event in a household, and (iv) an automobile collision avoidance event when driving. To illustrate our event recognition approach, we consider the following use cases: (i) detecting and predicting failures in electrical submersible pumps in an oil field from individual (univariate) analysis of intake pressure, current and voltage sensor data, (ii) predict- ing burnt valve failures in gas compressors from multisensor (multivariate) measurements, (iii) performing energy disaggregation for nonintrusive load monitoring (NILM) in smart grids to obtain appliance-level information from aggregate signals, (iv) mining sensor data from a semiconductor manufactur- ing plant to predict anomalies, and (v) recognition of hand gestures made by a user using accelerometer and gyroscope sensor data. 5. In the course of developing our event modeling and event recognition approaches, we also lay down foundations for several interesting extensions in future work. Since our approach works on raw time series sensor data without making assumptions about the structure, distribution, or nature of input data or the stationarity of the time series, it can be applied to a wide range of input data. A possible extension of the event recognition approach developed in this dissertation is to apply it for a cybersecurity application in malware detection from executable files, as shown in [101]. Each executable 21 file is represented to an entropy time series format, and then shapelet extrac- tion is used to find patterns of changes in entropy of the file’s contents which indicate whether it is a malicious or benign file. 1.5 Outline This thesis is organized as follows: • In Chapter 2, we define necessary preliminaries and background concepts about event processing, semantic event modeling, and temporal pattern min- ing. • We tackle the event modeling problem and describe our proposed Process- oriented Event Model (PoEM) to solve this in Chapter 3. • In Chapter 4, we tackle the event recognition problem and propose our approaches to solve it. We describe the need for semantic computing in var- ious event processing aspects, provide a method to incorporate time series shapelet mining in our proposed PoEM model, and propose new algorithms to extend shapelet mining to work for multivariate time series sensor data. • Chapter 5 provides a detailed description of several energy applications that benefit from our proposed approaches. • Finally, we provide pointers for future work and draw conclusions in Chap- ter 6. 22 Chapter 2 Preliminaries and Related Work Big Data arising from real-world engineering applications are often collected from a large network of process monitoring sensors. Analysis of such large-scale sensor data can enable identification of patterns that are indicative of anomaly events and suggest appropriate corrective actions. Analysis for these purposes thus focuses on identifying discriminative features, i.e., those patterns that are most relevant for distinguishing between normal and abnormal processes. We view the analysis and mining of complex sensor data from the perspective of events. In this section, we provide background on event models and event processing, as well as multivariate time series analysis methods used for event recognition. We also survey related approaches from both these areas. 2.1 Complex Event Processing Complex event processing (CEP) approaches have emerged as potential solu- tionsforprocessingstreamingdatainarangeofapplications[55]. Acomprehensive CEP system would include event detection, filtering, notification, action determi- nation, context awareness, visualization and escalation mechanisms [20]. Existing CEP systems are designed to perform continuous queries against the incoming data stream for detecting complex events [6, 25, 30]. These systems are config- ured to perform specific actions according to pre-defined business rules over the detected events. These CEP approaches are therefore focused on event detection 23 and filtering without making provisions for the other processes, such as suggesting appropriate follow-up actions from background knowledge, or escalating high pri- ority events to reduce response times. Event processing has also been viewed as a process by the business process management (BPM) community [154] but such works typically focus on representing business logic. A broad survey of existing event processing approaches can be found in Cugola and Margara [25]. Semantic complex event processing (SCEP) methods add the capability to har- ness background knowledge in CEP models. SCEP approaches have been used in diverse applications comprising a variety of complex events including ride sharing events[83], stockmarketevents[141],securityandthreatdetectionevents[50],user interface integration [102], RFID data integration [151], sensor networks [137], pro- cess management systems [66], ubiquitous logistics [113], smartgrids [127], oil well management [171], and E-health and ambient assisted living [163]. Many enter- prise information management systems have actively pursued the use of seman- tic web technologies to integrate information across diverse sources. Semantic computing approaches have also been used to integrate information from vari- ous sources including natural language/text, numeric data, structured data and others for applications such as situation awareness [140], smart grid load demand- response [170] and trip planning [96, 97]. We now briefly describe a few existing SCEP approaches, with a detailed comparison between them in Chapter 4. Stojanovic et al. [132] propose a logic- programming based approach for semantic CEP using the ETALIS event pro- cessing language. They describe a rich set of temporal composition operators (such as sequence, intersection, starts, finishes, equals, meets etc.) for detecting complex events from atomic events, yet it is not made clear how concepts from 24 the event ontologies can be utilized in this detection process. They also pro- vide rules for detecting complex events through temporal reasoning over atomic events. Teymourian et al [139] focus more on the ontology and its modular struc- ture for enabling CEP. They propose semantic enrichment of event streams, in which derived events are added to the event stream in addition to the observed events; however, such enrichment can also be used for correlating events or adding more context to atomic events from the event stream, without necessarily adding derived events.Taylor et al [137] use event-based ontologies mainly for defining complex events in the domain, and thus, as a guide to complex event processing. Hammar [50] proposes the concept of observation correlation, which refers to the CEP systems detecting which observed situations are potentially interesting only when co-occurring. Zhou et al. [170] propose to incorporate semantic knowledge for event processing in smart grids. Liu et al [72] desire to harness the power of Linked Data on the web in a CEP engine, and their model is based on the generic EVO-Core [104] event ontology. EVO-Core, while capturing some characteristics of events, still does not provide a comprehensive model for the various functions of event processing, including detection, filtering, notification, action determination, prediction, and escalation. 2.2 Event Models and Semantics Usually, events are modeled as data tuples consisting of attributes, values and timestamps, such as the 3-tuple by Voisard and Ziekow [147], the (sensor-id; read- ing; timestamp) tuple by Zhou et al. [170] or XML schema [14]. The E* event model [47] is useful for modeling multimedia events. A probabilistic event model for processing uncertain events is proposed by Wasserkrug et al. [153]. Hinze [54] 25 proposed identifying event profiles and constructing an event algebra. Event alge- bras were also proposed by Zimmer and Unland [172], Eckert et al. [30] and Anicic et al. [6]. The use of raw sensor data, particularly from energy systems can be used to learn semantics related to events. An instance of this is a place learning system, such as SensLoc [64], which attempts to find meaningful places from raw sensor data based on location changes recorded by sensors. Many event frameworks are tightly coupled with a specific domain (such as CIDOC CRM 1 ) and cannot be applied to other domains easily. On the contrary, generic models such as the Event Ontology 2 , E* [47] or LODE [121] mainly provide a model for events and not the complete event processing workflow. They are not able to incorporate complex events, entities, actions, roles and inter-relationships between event and non-event concepts. Chandy [19] proposed ideas for an event modelbasedonadaptingexistingcomputationapproachesfor‘dataatrest’to‘data in motion’. The Simple Event Model (SEM), proposed by Van Hage et al. [145], canbeusedtomodeleventsinvariousdomainswithoutmakingassumptionsabout domain-specific concepts. However, these approaches do not provide a model for describing an end-to-end workflow, moving from data streams to detecting and responding to events. They also do not identify the key real-world concepts (how many and which) to define events and the possible methods of transition from simple to complex events. We provide an overview of existing semantic event models in Table 2.1, and highlight whether the model is intended for a specific domain (if so which one), or is an independent generic upper-level model. 1 http://cidoc.ics.forth.gr/OWL/cidoc_v4.2.owl 2 http://purl.org/NET/c4dm/event.owl 26 Table 2.1: Existing Semantic Event Models Domain-dependent Models Domain ABC Ontology [67] Digital Libraries Card Ontology [73] Smart-card Systems CIDOC CRM [29] Museums/Libraries Event-Model-E [155] Multimedia Geospatial Event Model [158] GeoSpatial Oil Well Ontology [171] Oil Fields Event Ontology [107] Network Diagnosis Snap Event Ontology 3 News Events Independent and Generic Models CEPAT Ontology [132] DOLCE + DnS Ultralite (DUL) 4 Event Ontology 5 IPTC EventsML-G2 6 EVO-Core [72] Event-Model-F [114] LODE [121] OpenCyc [76] Simple Event Model (SEM) [145] Upper Event Ontology (UEO) [62] 2.3 Semantic Rule-based Systems Rule-based systems are closely related to event processing. The area of active databases [88] has explored the use of triggers for rule-based processing of data and events in databases. Snoop [18] is an example of a system defining an event specification language for active databases. A further extension of Snoop, called SnoopIB [1] extends this specification to account for interval-based semantics. 27 Event-condition-action (ECA) rules [88, 85, 108] are a simple approach to mod- eling rule-based event processing systems. An ECA rule is of the form: ON Event IF Condition DO Actions, and they can be used in conjunction with CEP systems. For instance, these ECA rules can be hardcoded into an event processing system to detect events from real-time data streams, as well as discover complex events occurring from a combination of simple events according to a rule-based template. Many production rule systems have a design which performs inference based on the RETE algorithm [35]. This is a pattern matching algorithm which decides whichofthesystem’srulesshouldbetriggeredbasedonitsdatastore. Anenhance- ment on the RETE algorithm, known as TREAT [78] was also proposed, and a comparison between the two for testing database rule conditions can be found in [152]. However, the RETE or TREAT algorithms do not provide concepts of time-stamped events and temporal constraints between events. An extension of the RETE algorithm for this purpose was proposed by Berstel [11]. Enhanced approaches integrating RETE-based ECA rules along with a specialized event detection system were proposed by Schmidt et al. [115] and Walzer et al. [150]. The addition of semantic web techniques to rule-based systems makes them reactive, i.e. able to automatically execute certain rules in an application based on event or condition triggers. An ECA rule language for operating on a graph/triple representation of RDF was proposed by Papamarkos et al. [85]. RuleML [15], a family of semantic web rule markup languages, was developed to facilitate the exchange of rules across various systems on the world wide web. A semantic web rule language combining OWL and RuleML, denoted as SWRL [58] was also proposed. ReactionRuleML[86]isaparticularbranchoftheRuleMLfamily, which 28 provides a standardized interchange format for reaction rules and semantic rule- based event processing [138]. ETALIS [6] is a rule-based event stream processing system. While semantic rule-based event processing approaches provide an efficient approach to manage automatic execution of rules based on event-based triggers, they still require definition of the rules by domain experts. Our work aims to bridge this gap by automatically discovering relevant domain rules directly from sensor data streams using a time series based machine learning approach using multivariate time series analysis. 2.4 Multivariate Time Series Analysis We formulate the event recognition problem as that of multivariate time series classification, on which we provide some background here. While multivariate time series classification is a well-studied problem [40, 61, 166], we focus on this probleminthecontextofcomplexreal-worldengineeringapplicationssuchasman- ufacturing plants and automobiles. For example, in the case of a manufacturing plant, a common scenario involves a process being continuously repeated to pro- duce batches of goods. During the production of each batch, monitoring data in the form of time series is collected by multiple sensors. Quality control tests can be performed after a batch of goods is produced to determine whether they are normal, i.e., meet a set of pre-determined standards, or abnormal. Identifying discriminative features in this application could help operators determine the root cause(s) of abnormal operation by identifying only the most relevant portions of the Big Data for this purpose. 29 Many forms of multimodal sensor data recorded in engineering and data science applications can be processed into a time series format. Thus, multi- modal data received from heterogeneous sensors can be processed to multivari- ate time series data, with data from each sensor corresponding to one dimension of the multivariate time series. For instance, shapes of objects [165], electric- ity consumption data [98], hand gestures and signals [82, 59], medical ECG sig- nals [59, 110], gyroscope and accelerometer data from physical activity and motion capture [165, 59, 110], as well as images, handwritten characters and audio sam- ples (e.g. bird sounds from Xeno-Canto 7 ) [165] can be easily converted into time series data. In this work, we specifically focus on such multivariate time series data. Our developed algorithm for multivariate time series classification thus has the potential to be used in a wide array of multimodal data. Multivariate time series classification. The data consists of multiple labeled instances I = {(T i ,y i )} where y i is a label for instance i and T i = [T 1 i ,...,T n i ] consists of n time series. Thus, T i is n-dimensional and typically each of its time series represents the monitoring data collected periodically by a sensor. Each time series T j i is a sequence of real numbers t 1 ,t 2 ,...,t m . Given K labeled instances fromI, we want to extract features from multivariate time series T i that are discriminative. Discriminative features can then accurately dis- tinguish between different labels or classes of instances and can focus the attention of domain experts on those situations likely to provide the most insight. Challenges. Solving the multivariate time series classification problem for complex physical systems involves two important challenges. Firstly, labels are associated with instances not the time series. For a normal instance, it is safe to assume that the multivariate time series in T i is normal, but for an abnormal 7 www.xeno-canto.org 30 instance, often only a subset of the time series in T i is abnormal. Similar scenarios arise in other domains [59, 43, 44], e.g., medical informatics where only a subset of clinical time series data (such as heart rate) collected from a patient might be capturing the disease symptoms. The second challenge arises from the volume and variety of time series data. We present examples from real-world case studies. Physical systems such as man- ufacturing plants are operated 24x7 and monitored by a large number of sensors that can collect data at a high rate, e.g., one sample every tens of milliseconds to a second [32]. Classification needs to be computationally fast even on large datasets. Training data from real-world use cases are imbalanced, i.e., the numbers of nor- mal and abnormal instances are not similar. The time series data collected by these sensors exhibit different patterns–e.g. some exhibit variation with values over a large range whereas others are stable with values concentrated within a short range. Moreover, the labels for training may be available only at the end of the multiple processes making assignment of labels to specific data instances impossible. The training algorithm also has to accommodate for the possibility of redundant sensors and correlated data streams from such sensors. Another source of data variety is due to the fact that the n time series belonging to T i may not have the same number of samples due to different sampling rates across sensors. Hence, when comparing time series from the same or different instances, we cannot assume that they have the same length. 2.5 Time Series Shapelets A shapelet is a subsequence or local temporal pattern in a time series that is a representative feature of the class to which this time series belongs [165]. 31 Shapelets have been shown to be effective for a variety of time series data mining tasks including classification, clustering, summarization, and visualiza- tion [165, 82, 110, 43, 44, 98]. Though several other approaches exist for time series classification, thepopularityofshapeletsisduetofourfactors[165,110,43,44]: (1) shapelet methods impose no assumptions or restrictions on nature of data unlike autoregressive or ARIMA time series models [40], (2) they are easy to interpret for domain experts [43, 44, 98], (3) they have been shown to be more accurate than other methods for some problems [43, 110], and (4) once shapelets have been extracted from training data, we can classify a new time series quickly. We now explain how the time series shapelet extraction algorithm works. We consider a binary (two-class) classification scenario. The basic shapelet extrac- tion algorithm was first proposed by Ye and Keogh [164] and there have been several optimizations on the initial method to make it faster or more advanced cite improvements. For our evaluation, we use Fast Shapelets [110], which is a randomized, faster version of the original [164] shapelet extraction method and is the state-of-the-art for supervised shapelet extraction. As shown in Algorithm 1, the input dataset,I, contains a set of time series instances (TS) and their corre- sponding labels (Label). Given all the input time series and labels, the algorithm generatescandidatesubsequencesofallpossiblelengthsfromalllocationsacrossall the training time series. The upper and lower length bounds (maxL and minL), as well as the step size increment to move between the lower to upper length bounds, can be specified as parameters. For a completely parameter-free approach, we can set the minimum length to 2, the step size to 1 and the maximum length to be equal to that of the length of the shortest time series in the data. 32 Algorithm 1 Basic shapelet discovery algorithm 1: Given I = [TS, Label] 2: max_gain ← 0 3: for len = minL to maxL do 4: Candidates ← GenAllCandidates(TS, len) 5: for each cand in Candidates do 6: create a split sp from cand 7: gain ← ComputeInfoGain(I, sp) 8: if gain > max_gain then 9: max_gain ← gain 10: shapelet ← s 11: end if 12: end for 13: end for As we know, the Euclidean distance (ed) between two vectors S and S 0 (each of length l) is defined by: ed(S,S 0 ) = v u u t 1 l i=l X i=1 (S i −S 0 i ) 2 (2.1) The distance (dist) between a candidate subsequence S of length l and a time series T is calculated as: dist(S,T ) = min i ed(S,T i l ) (2.2) where T i l is a subsequence at position i and length l within the time series T. Thus, it is the minimum Euclidean distance obtained by sliding the candidate subsequence against the (longer) time series instances, and recording the smallest distance observed at any position. The distance from each candidate to all the data instances in the training set are calculated. A good shapelet candidate should be able to split the training instances into two sets (predicted as normal, and predicted as abnormal) based on their distances 33 from itself. Each shapelet candidate would make this split differently, and we denote the two predicted sets created by a shapelet through a split (sp) variable. Given a candidate and split pair, we use an information gain metric to determine the best of the candidates. The information gain (gain) is defined as: gain =H before −H after (2.3) where H is the total entropy of the dataset, thus, the sum of entropy of instances in the two classes. H before is the entropy of the original dataset when split into two classes (based on the label of each instance), and H after is the entropy of the split generated by the shapelet, thus, the sum of the entropy of each predicted set in the split. A shapelet is the candidate with the maximum information gain across the split, which means it is the most discriminative subsequence found from training data. A distance threshold metric is also learnt when identifying the shapelet. For complex datasets, one shapelet may not be enough to get a clear separation between the two classes. In these cases, multiple shapelets are extracted and arranged in the form of a tree, similar to a decision tree classifier. The tree has shapelets in the internal nodes and predicted class labels in the leaf nodes. As a test instance traverses this tree, a decision on whether to follow the left or right child at each (internal) node can be made by comparing the distance threshold of the shapelet to the distance of the test instance to the shapelet. The test instance is classified when a leaf node is reached. The basic shapelet discovery algorithm described above is slow because it com- putes all possible subsequence-information gain combinations. In our evaluation, 34 we use the Fast Shapelets [110] approach, which is a significantly faster improve- ment over the basic shapelet extraction algorithm. This is a state of the art super- vised shapelet mining approach which uses random projections to get approximate, nearly optimal solutions. Fast Shapelets works with a change in data representa- tion Ð instead of working on the raw time series, it works on a dimensionally- reduced symbolic representation of the time series known as Symbolic Aggregate Approximation (SAX), more details of which can be found in [68]. SAX words are created from time series, and similarity is compared by top matching SAX words. The algorithm uses a random masking process in which a portion of the SAX word is randomly masked and if the remaining portion matches the desired target, it is still considered a match. This process reduces dimensionality of the original data and improves generalization of the classifier (helping to avoid overfitting). The average time complexity for shapelet extraction (training) in the Fast Shapelets algorithm is O(mn 2 ) where m is the number of time series and n is the length of time series. Executing shapelet-based approaches can be slow, especially when using long time series instances, since the complexity is quadratic in the length of time series (m) but linear in the number of time series (n), i.e. O(nm 2 ). Therefore, efforts have focused on efficient pruning techniques including Logical Shapelets [82] and Fast Shapelets methods [110]. Another way to reduce complexity is to restrict the set of shapelet candidates being evaluated based on certain constraints - such an approach is presented in McGovern et al. [77] to build shapelet tries for weather prediction. Gordon et al. [45] propose to optimize the evaluation order of shapelet candidates using a randomized model for shapelet sampling. Chang et al. [21] pro- pose to use GPUs for fast, parallel implementation of shapelet mining approaches. 35 An unsupervised algorithm to find shapelets using a separation gap measure instead of information gain has also been proposed in [167]. Alternate measures for quality of shapelets, such as Kruskal-Wallis and Mood’s median tests, have been explored in [69]. Lines et al. [70] propose to disconnect the shapelet extrac- tion approach from the classification step through a transformation step which uses the distance from the shapelet as input data for classification. Deng et al. [27] propose to build a time series forest for classification using both entropy gain and distance metrics (referred to as the entrance metric). However, they do not use shapelets for classification. Spiegel et al. [131] adopt an alternative approach for finding recurring patterns (not shapelets) in multivariate data. They segment the multivariate time series using singular value decomposition (SVD), and use hier- archical agglomerative clustering on the resulting segments. The quality, length, and number of their discovered patterns is dependent on the SVD parameters, and subject to the computational complexity of SVD. Also, finding the critical inflec- tion points in their algorithm requires the data to be smooth (second and third derivatives are computed). Forcertaindomainslikemedicaldiagnosis, inadditiontoclassificationaccuracy and interpretability, early detection is an important goal [43, 44]. The concept of local shapelets, proposed by Xing et al. [162], is also related to early classification. He et al. [51] provide early work on early prediction for imbalanced multivariate time series. Prieto et al. [109] propose stacking individual univariate classifiers to build an ensemble classifier for multivariate time series. None of these approaches work directly with multivariate data from complex physical systems and utilize feature selection methods to combine their constituent univariate classifiers. Hu et al. [59] present an approach for classifying multivariate streaming time series data that shares certain similarities with our proposed algorithm in this 36 work - Shapelet Forests (SF). Both approaches build a classifier for univariate time series and then combine the votes from multiple classifiers for classification. However, they use different algorithms. They use Nearest Neighbor whereas SF uses Shapelet-based Decision Tree. It is known that Shapelet-based Decision Tree is an eager version of the lazy Nearest Neighbor classifier [165]. In [43], Ghalwash et al. propose the MSD (Multivariate Shapelets Detection) approach. According to MSD, a multivariate shapelet is a collection of subse- quences, one from each of then time series in an instance, where each subsequence has the same start time and length. There are two major weaknesses of this approach. First, the restriction of same start time and same length is arbitrary. Second, MSD may not work well when a subset of the n time series are a better predictor for a class than all of them. The authors of MSD provide an example of such a scenario from the biomedical domain [43]. In [44], Ghalwash et al. proposed IPED (Interpretable Patterns for Early Diag- nosis), which improves upon MSD [43]. IPED constructs a binary matrix where the rows represent instances and the columns represent all possible subsequences of the time series data. For a dataset withk instances where each instance hasn time series of lengthL, this binary matrix will haven×k× P maxL l=minL l(L−l +1) columns (minL andmaxL denote the minimum and maximum length of the subsequences). Then, IPED solves a convex-concave optimization problem to select exactly one representative subsequence from each of the n dimensions. While IPED addresses some shortcomings of MSD, it still suffers from two drawbacks: (1) the exhaustive search over all possible subsequences in the first step is computationally expen- sive, and (2) the restriction of extracting only one subsequence per dimension in the second step is somewhat arbitrary. 37 Chapter 3 Event Modeling with the Process-oriented Event Model Basedontheeventprocessingchallengesdiscussedaswellasrequirementsiden- tified in motivational use cases, we propose a comprehensive new event processing model, known as the Process-oriented Event Model (PoEM). The major contri- butions of this model are 1) a comprehensive event model and ontology designed to describe the end-to-end process of moving from data to events to proactively responding to the events, 2) a step-by-step identification of key entities, observ- able properties, measurement models, interpretations, goal planning, actions, and roles that defines an event space applicable to practical industrial applications, 3) a state-based approach to goal planning based on detected events, and 4) application of the PoEM semantic modeling approach to multiple real-world use cases. We now key elements of the PoEM model. The first step is to model real world entities and their properties. This leads to modeling observations associated with entities and detecting events by interpretations of their instantaneous status. Events and event-related concepts are formally defined and linked to identification of relevant states, actions, and roles. A mechanism for event escalation is also proposed. The PoEM model can be rapidly adapted to handle complex business rules and domain-specific concepts, as illustrated in the case studies in Chpater 5. First, we delineate some critical value propositions from the oil and gas sector to bring out the need for a unified semantic event modeling approach. 38 3.1 Motivational Value Propositions A semantic model for event processing provides several value propositions for an enterprise or organization using it. In this section, we discuss five key value propositions relevant, but not limited, to the oil and gas industry. Efficient Interaction Patterns. The interaction patterns among the per- sonnel and data sources involved in an event-driven system are more efficient. For instance, in the event of a pump failure, suppose the operator was alerted about the failure pretty soon but due to certain circumstances, she saw the message quite late. Then, she tried to take the suggested corrective action but failed and reported it to her supervisor. She would have to wait for her supervisor’s reply who again, might not respond quickly. Or in another scenario, probably the concerned engi- neer repaired the pump successfully but missed reporting it. Such delays could get accumulated at various stages increasing the response time to the pump failure. A CEP-based system can use the ‘event escalation’ pattern described above in such situations to avoid bottlenecks in the process and reduce the delays greatly. Event escalation refers to the concept of waiting for an action response for a certain time, and upon no response, automatically looking up recommended best practice for the event, implementing it and alerting appropriate personnel. An event escalation scenario is depicted in Figure 3.1. Reduced Response Times. An event-driven framework leads to reduction in response time and reaction to events. The usual workflow for event detection systems is to detect the event, notify appropriate personnel, get corrective actions from the personnel and then suggest the actions. Since a CEP based approach can use event-condition-action rules from its knowledge base, the most likely actions can be selected automatically without manual intervention. These actions can be suggested to the personnel in charge of action execution (actors). If a new action 39 Figure 3.1: Event escalation capability is executed, it can be added to the knowledge base for future use. Using the historical data and the knowledge base as preconditions, a CEP-based system can predict similar events in the future and notify appropriate actors before the event occurs. Such forecasting is particularly useful for predicting failures, anomalies, and exceptions. Figure 3.2 depicts how proactive CEP leads to reduced delays and quicker response times. In scenario A, we depict the simple event detection workflow. An event takes place, after some time it is detected by the system, and then notifications are made within a certain period of time. Scenario B shows the event pattern detection workflow. The occurrence of an event (possibly a complex event) is defined by a certain patterns of rules and pre-conditions. Based on these pre-conditions and background information, it is possible to detect if the complex event has occurred or not. Scenario C incorporates complex event processing in 40 the workflow. CEP can not only detect complex events, but also take an adaptive configuration to make smart decisions such as looking up corrective actions and deciding the list of persons to be notified (subscribers). After this, the notified personnel can take corrective actions to handle the event. However, if a critical failure has already occurred, there are delays before an action is taken. These delays include time for preprocessing and enrichment of event information, event processing by the CEP engine, search for corrective action, determination of rele- vant personnel and message delivery to the actor(s). Ideally, we would want our system to be able to prevent future failures instead of taking corrective actions to redress them after they have occurred. This capability can be achieved through the predictive complex event processing workflow depicted in Scenario D. Based upon pre-conditions and contextual information from the knowledge base, the pre- dictive CEP system is able to predict future anomalies and failure events. When it detects that such an event is about to occur, it can take up the adaptive config- uration mentioned above and notify the relevant personnel with suggested actions to prevent occurrence of the failure. Reduction in Data Seeking Effort. Data seeking efforts of the end-user are significantly reduced with an integrated event-driven architecture. This is because all user queries are initiated through the data access component to the single query engine, instead of querying multiple sources and contacting several staff members. The complexity of seeking relevant data can be quite high especially for complex queries spanning multiple resources. For instance, if an analyst wants to predict the future production from a well, she may need access to the historical production data for the well, the well information system, maintenance data as well as the failure and repair job history. The analyst herself may not have access to the well information system, and might need to request access to an administrator. 41 Figure 3.2: Predictive event detection with complex event processing This administrator may not be actively responding to requests. Also the analyst’s access to the data might need to be revoked after she has completed reading the data. Taking such factors into consideration, it may be quite a long time before the analyst actually obtains the required data, which may not be in the desired form or granularity. A CEP-based system automates these processes and keeps proper records, thus minimizing the waiting time to receive data for the end-user. If the process does not require manual intervention, it may even be completed in near realtime. Figure 3.3 shows how CEP helps in reducing data seeking effort following an event. Without CEP, the user (subscriber) has to make multiple queries to data sources and knowledge bases to know the suggested action to be taken. Under a CEP framework, the complex event is detected automatically, processed in context and appropriate notifications with action specifications are made to entities, which have the subscriber role, without need for human intervention. 42 Figure 3.3: Reduction in data seeking effort through an event-driven system Consistent Best Practices. A CEP-based system maintains a knowledge base consisting of event-condition-action rules with recommended actions for spe- cific types of events. These rules can provide the foundations of certain consistent best practices for the enterprise as well as the community. When a new rule is added to the knowledge base, it can instantaneously be brought into practical use at relevant events. This ensures that all business units of the enterprise are well informed about changes in policy and avoids confusions. Management by Exception. A CEP-based system can prioritize events and notifications based on their importance and potential impact. This ensures 43 that the most critical events are brought into attention of the staff early enough for immediate response, in contrast to a system that reports events chronologi- cally. The CEP system reaches a step further in customization of the priorities according to preferences of different users. Notifications for critical events can be made to multiple personnel in different teams across the enterprise, and this list of subscribers can be generated dynamically using background knowledge. Further, suggested actions can be displayed for the staff member to choose, and in some cases chosen automatically for corrective action. Figure 3.4 depicts the management by exception scenario with our proposed framework. There are several pump failure events happening simultaneously through the enterprise being monitored by the production team (denoted by the dots in the figure). We propose to put an estimated cost (based on production rate and expected downtime) on each of the failure events. If a well producing 900 BOPD is down due to ESP failure which typically takes 4 days to repair, the cost for the well downtime would be 900*4 = 3600 BO. If other pump failures lead to an expected cost much lesser than 3600 barrels of oil, then the former ESP failure is the most critical failure event (dark red dot in the figure) and should be brought to immediate attention of the staff, giving it higher priority over other failure events (lighter shade of red in the dots in the figure). This failure may be related to persons in the operations and maintenance teams who should be informed as well. 3.2 Foundational Concepts We start by modeling key elements in the universe of discourse (UoD). Begin- ning with entities of interest, the modeling procedure leads to identification of otherrelatedconcepts, propertiesandinterdependencies. ThePoEMmodeladopts 44 Figure 3.4: Management by exception some foundational concepts from the dynamic information management method- ology proposed by Sorathia [129]. This modeling approach, originally proposed for situational awareness, provides an effective framework to build a conceptual model. 3.2.1 Entity Entities are basic elements in the conceptual model. An entity may be physical or logical. We represent the set of entities in UoD asK while individual entities are represented as κ. In the oilfield use case, entities would include physical entities 45 such as pumps, wells, pipes, vehicles and human operators as well as logical entities such as databases and software applications. K ={κ 1 ,κ 2 ,κ 3 ,...} Examples : pump-1, operator-A, vehicle-ZXV34 3.2.2 Observable Property Each identified entity may have several properties. The next step is to identify and enumerate the set of relevant observable properties. The set of relevant observ- able properties is represented by Π, which includes individual members denoted by π. Observable properties may range from detecting the mere presence of an entity to various physical and chemical properties requiring sensing techniques. Π ={π 1 ,π 2 ,π 3 ,...} Examples : temperature, pressure, motion, vibration Observable properties can be grouped by the chemical, physical, and other scien- tific methods used for measurement, and they can be measured in different ways. To accurately capture properties associated with entities, we need a model for measurement. 3.2.3 Measurement Identification of observable properties for the entities of interest leads to speci- fication of how observations can be recorded in various ways. For instance, temper- ature is an observable property measured using digital and analog sensors (ther- mometers). Thermometers may provide readings in degrees Celsius or Fahrenheit. 46 Also, sensitivity of the measuring device may vary significantly affecting precision and accuracy. Some sensors provide a continuous stream of readings, whereas others do so at intervals. As subtle changes in observable properties could be crit- ical in high reliability operations, the modeling procedure must comprehensively handle these aspects for which we propose the concept of measurement. A set of measurements is denoted by M while an individual measurement is shown as μ. A measurement record captures the value of a sensor reading, along with its unit of measure (UoM), type and other details related to measurement. An instance of measurement can be indicated by the 3-tuple: μ S,π =hvalue, type, uomi Here, S is a specific sensor type (thermometer) and π is the observable property (temperature) supported by the sensor. The same property can be measured using different sensors, each of which may have specific units of measure, sensitivity and operating ranges. This representation could include additional features related to precision, accuracy or sensor update frequency. However, once such information is identified for a specific sensor, it remains static. Therefore, a measurement model leads to a stream of readings stored as observations. 3.2.4 Observation Measurement concepts enumerate all possible ways the observable properties can be measured. However, in order to interpret the current status, it is useful to retrieve instantaneous values from the sensor. This is achieved by introducing the concept of observations (Ω), which is the set of all measurement observations. 47 Individual observations (ω) can be recorded at time t, for a given entity κ as per a specific measurement model μ. Ω κ,μ ={ω κ,μ t 1 ,ω κ,μ t 2 ,ω κ,μ t 3 ,...} Example : Ω k 1 =pump,μ 1 =temp ={25, 26, 28...} Fromthemeasurementmodel(μ i ), propertyπ, UoMandotherrelevantcontextcan be determined. Sensors typically provide such values in the form of data streams. Digital oilfields are typically highly instrumented and provide many such sensor streams. These streams are often configured to feed into data historian systems. 3.2.5 Data Stream Sensors reporting values as per specific observations (ω κ,μ ) result in continuous streams of data. We define Θ as the set of all data streams available for a given entity with individual instances of streams represented as θ as follows: Θ ={θ 1 ,θ 2 ,θ 3 ,...} Examples : temperature stream, pressure stream A specific data stream (θ) is linked with observations (ω) for a specific entity (κ) according to a specific measurement model (μ) that determines the type of sensors and details regarding UoM and other relevant details. 3.2.6 Interpretation Rawdatavaluesrecordedandreportedbysensorsdonotprovideinsightsunless they are evaluated in a specific context. For instance, a thermometer reporting 48 25 ◦ C for a tool room might be a standard condition, however the same value for a cold storage facility may be an exception. We introduce the concept of interpretations to resolve this issue. An interpretation provides meaning to the recorded values. Users can plug in their own event detection rules and filters here. For instance, for a given observation ω t , the recorded value may fall into a range that might be normal or critical. This directly provides contextual information about the entity. Therefore, it is useful to identify such ranges of values that makes them critical. Interpretation set (denoted byX orX μ ) is a set of all possible interpretationinstances(χ)thatcanbeidentifiedforaspecificmeasurementmodel (μ) associated with an observable property. X ={χ μ 1 ,χ μ 2 ,χ μ 3 ...} Examples :χ [a,b] 1 ,χ [c,d] 2 ,χ [e,f] 3 Here, χ 1 is the interpretation provided when the value of ω t 1 is in the interval [a,b]. Similarly, domain experts and practitioners can provide all the ranges and respective interpretations. An interpretation set for a sensor measuring pH of water may be represented as: X pH = Failed : Sensor reading N/A Invalid : (pH< 0)∪ (pH> 14) Acidic : 0≤ pH< 7 Neutral : pH = 7 Basic : 7< pH≥ 14 (3.1) 49 Each interpretation (χ μ i ) is associated with an evaluation condition, forming a branch (i) of the set (X). As seen from the above example, there may be a branch dedicated to find out whether the sensor is functioning and providing values in the first place. Placing this branch at the top of the interpretation set may lead to faster detection of failures, in case no data is being read. The evaluation condition is usually a function of the observation (ω), and is denoted as c(ω). Then, an interpretation set (X) can be defined as: X μ = m [ i=1 {χ μ i |c i =⇒ χ μ i } (3.2) which, when expanded, leads to Equation 3.3. X μ = χ μ 1 : if c 1 (ω) = true χ μ 2 : if c 2 (ω) = true ... ... χ μ m : if c m (ω) = true (3.3) For each measurement modelμ, there is at least one interpretation set delineating all possible interpretations which are related to values of the given property. The branches of the interpretation set are non-overlapping and only one of the branches is activated during evaluation. Individual interpretation sets can be defined based onpiece-wisefunctions, booleanfunctionsorothersuitablemathematicalorlogical representations recommended by domain experts or practitioners. So far, we have focused on a single observation value determined at a spe- cific point. However, there can be additional interpretations derived for multiple 50 observations as well. Considering readings at multiple time instances enables rep- resentation of complex scenarios such as a sudden increase or decrease in property measurements. For instance, temperature readings of 31 ◦ C at t 1 and 36 ◦ C at t 2 may be considered normal as they both belong to the normal range of [30− 37] ◦ C. However, two consecutive readings indicating an increase in excess of 10% may be a critical change according to a rule in a specific domain. X μ t = χ μ : if Δ(ω t 1 ,ω t 2 )≥ 10% (3.4) Once all relevant interpretation sets are identified by domain experts and practi- tioners, they can also be utilized to determine the state of an entity. For instance, in case of the pH of water, given interpretation set and instantaneous observations, it may be possible to determine if water is in the “potable” state or not. Therefore, it is intuitive to determine the state of the entity based on various interpretation sets. We present the state model in Section 3.4.1. 3.3 Event Concepts The foundational concepts proposed provide a mechanism to identify key enti- ties, attributes, measurements, and interpretations. This lays the foundation for identification of events. The interpretations link the current measurement to iden- tification of changing situations in the UoD. Therefore, interpretations directly link to identification of events and related concepts. 51 3.3.1 Event In conventional event definitions, every change is declared to be an event. Therefore, as per our model, all recorded interpretations should result in events; however, this leads to an exponential number of events. For instance, even if temperature is recorded to be normal, any minor change in temperature will be reported as a new event. To address this issue, we introduced the concept of inter- pretation for specific ranges. In addition, changes in interpretation may not be relevant to the user’s interest. For example, only extreme variations in temper- ature might be of interest. Taking these factors into account, we propose a new definition for an event that identifies a subset of interpretations to be considered as events. From all interpretations in an interpretation set, any one can be true at given point in time (τ), which leads to identification of a simple event (e s ). Definition 1. An event is the interpretation of an observation of interest. Con- ceptually, a simple event is represented as e s =χ ω κ,π,τ (3.5) wheree refers to an event,τ is the time stamp of the event,κ is the entity associated with the event,π is the observable property associated with the entity for which this event was recorded, and χ is the chosen interpretation of the event. In the digital oilfield use case, a “high temperature” interpretation for a pump may lead to detection of a simple event as depicted in Figure 3.5. The figure gives a complete picture of the foundational concepts. The entity of interest, κ pump , has observable property π temp which was observed using some measurement model. The observation ω t 1 was 100 ◦ F, which, by referring to the interpretations for a pump leads to the identification of a simple event (e s ). 52 Figure 3.5: Simple event identified in an oilfield In the NILM use case, a reading of 20 Volts from a refrigerator appliance may cause an “Low Voltage” interpretation and lead to detection of a simple event. The entity of interest here is the refrigerator (κ refrigerator ), the observable property is voltage (π voltage ) and the observation (ω t 1 ) is the measurement (with scope) recorded from the sensor at a particular timestamp t 1 . The observation ω t 1 was 20 Volts, which, by referring to the interpretations for an appliance leads to the identification of a simple event (e 1 ). This example is shown in Figure 3.6. Figure 3.6: A ‘Low Voltage’ simple event observed for a refrigerator. Note the 1-1 correlation between entity, observable property, and observation. A simple or atomic event in PoEM is defined as an event which involves exactly one entity, one observable property and one observation. 3.3.2 Event Profile As defined by Hinze [54], the concept of an event profile typically involves a query that is used to determine if a specific event has occurred or not. We can use interpretations and simple events to determine the event profile. Event profiles suggest what needs to be queried to determine the occurrence of an event. For a simple event, the event profile consists of just an observation that can be performed once or repeated at specific time intervals. Equation 3.6 indicates a 53 simple event profile that requires a single (one-time) observation of property π for an entity κ at time instance τ. P o s =ω κ,π,τ (3.6) However, in real world scenarios, the requirements can be more complex. An event profile may constantly need to be evaluated at a specific time interval Δτ. In such scenarios, a recurring event profile P r s can be represented as in Equation 3.7. An example of an event profile to check the temperature of a pump every 30 seconds is also shown. P r s =ω κ,π,Δτ P r s =ω pump,temp,30sec (3.7) The outcome of an event profile query is a set of observations that can be evaluated based on the interpretation sets. For instance, suppose the value of pH of water was observed at a certain time to be 1.5. On evaluating this value based on the interpretationsetprovidedinEquation3.1, itisascertainedthattherecordedvalue belongs to the “Acidic” branch in the interpretation set and all other branches are set to false as shown below. Hence, occurrence of a specific event related to the “Acidic” interpretation can be perceived. False : Failed False : Invalid True : Acidic False : Neutral False : Basic 54 So far, we have only discussed atomic or simple events. Simple events may trigger complex events and to capture them, event profiles can be extended to cover com- plex events. Event profiles provide a mechanism to weave observations of related properties to lead to identification of a complex event. 3.3.3 Complex Events and Profiles In real-world applications, it may not be sufficient to determine the occurrence of an event by evaluating a single property at a specific time instance or over mul- tiple time intervals. It may involve multiple entities and their properties evaluated at different times. To detect such complex events (which are generalizations of simple events), the event profiles can be defined in various ways. A simple event involves one entity, one property and one observation. A com- plex event occurs due to multiplicity in the number of observations, properties, entities or any combination of the above. Thus, a novel contribution of our model is that it quantifies that there are only four simple ways to combine simple events to form complex events, in contrast to numerous existing works which fail to pro- vide a finite number of ways of combining simple events to form complex events. These four ways are enumerated below, and an illustrative NILM example with some of these scenarios is shown in Figure 3.7. • Multiplicity in Observations Multiple observations related to the same entity and observable property taken at different times can lead to a complex event, the profile for which may be represented as follows. P o c =ω κ 1 ,π 1 ,τ 1 ∧ω κ 1 ,π 1 ,τ 2 (3.8) 55 Figure 3.7: In PoEM, moving from simple events to complex events can be done by one of these four ways - having a multiplicity in observations, entities, observable properties or a combination of the above. This is in stark contrast to numerous existing works which are unable to define a finite number of ways to move from simple to complex events in a real world system. For instance, the sequence of two atomic events occurring within a temporal distance is a complex event. Figure 3.8 depicts the scenario in which a complex event occurs based on an interpretation model that involves two observations of the same property at different times. In this scenario, the difference in temperature observations exceeds the pre-defined limit thereby triggering a temperature rise event. Note that averages over moving windows are captured in this category. 56 Figure 3.8: Complex event caused by multiplicity in observations of same property at different times • Multiplicity in Observable Properties The next level of complexity might occur via different observable properties associated with the same entity, as shown by the profile below. P o c =ω κ 1 ,π 1 ,T 1 ∧ω κ 1 ,π 2 ,T 1 (3.9) Figure 3.9 represents a complex event caused by measurements in two dif- ferent observable properties (temperature and pressure) of the same pump entity. Figure 3.9: Complex event involving two different observable properties for the same entity • Multiplicity in Entities Theinvolvementofmultipleentities, whichmayofthesameordifferenttype, lead to complex events. An event profile involving two different entities may 57 simply be shown as the intersection of the two related simple events, as in 3.10. P o c =ω κ 1 ,π 1 ,T 1 ∧ω κ 2 ,π 2 ,T 1 (3.10) • Combination of any of the above A combination of multiplicities in any of the above three criteria can cause a complex event. To illustrate this scenario, consider Figure 3.10 where the entity, observable property, and observations are all different. It indicates an interpretation where a complex event of production loss in the oilfield is triggered by multiple sub-events and observations. Figure 3.10: Complex event involving multiple properties, entities, observations and sub-events Considering the above criteria, we propose a definition for complex events according to our model. Definition2. A complex evente c is the interpretation of an observation of interest which depends on multiple simple events. It may be represented as: e c =X ω K,Π,T, (3.11) 58 where e c refers to the complex event, T is the timestamp of the complex event, K is the set of entities associated with the complex event, Π is the set of observable properties associated with the complex event, and X is the interpretation of the complex event. If the simple events constituting the complex event are e 1 ,e 2 ,... as defined in Equation 1, then K = S n(entities) i=1 κ i , Π = S n(obsproperties) i=1 π i and T is the timestamp of the last observation related to the complex event T = max{t i } if t i are timestamps of all related observations within the complex event. X is an interpretation which is based on the individual simple event interpretations through a complex event profile. In order to obtain complex events, we perform a composition of simple events using logical or temporal operators such as conjunction, disjunction, negation, or sequence. The algebraic semantics of complex event operators are analyzed in many CEP approaches for combining events and forming rules, and more details can be found in related event algebra works such as ETALIS [6], CERA [30], CEDL [26] and by Hinze [54]. Temporal operators are instances of a broad class of contextual operators, and other contexts, such as a spatial context, could be used instead. Table 3.1 lists some popular complex event operators and their notations in our model. Using a combination of these operators, it is possible to build complex rules for representing real world event processing scenarios, especially for detecting complex events. We provide a few examples of representing complex events based on the definitions and the operators introduced. • A sequence of events within a temporal gap of t e sequence =e i ∧e j ∧ (|τ i −τ j |<t) 59 Table 3.1: Common Event Composition Operators Operator Syntax Conjunction e i ∧e j ∧... = T n i=1 e i Sequence [e i ;e j ] T Disjunction e i ∨e j ∨... = S n i=1 e i Negation ¯ e i • High temperature recorded for two different pumps leading to a complex event e C =e i ∧e j =χ ω pump i ,temp,τ i , i ∧χ ω pump j ,temp,τ j j ∧i6=j • High temperature in pump-1 and low pressure in well-2 within 1 hour of each occurrence e C = χ ω pump 1 ,temp,τ i i ∧χ ω well 2 ,pressure,τ j j ∧(|τ i −τ j |< 1 hour) ∧χ i = [HighTemp]∧χ j = [LowPressure] • Voltage drop of more than 25% recorded for heater within 30 seconds e C = e i ∧e j =χ ω heater,voltage,τ i , i ∧χ ω heater,voltage,τ j j ∧(i6=j)∧ (τ j −τ i < 30 seconds) ∧(ω μ j −ω μ i > 0.25∗ω μ i ) Figure 3.11 is an illustration of most of the foundational as well as event con- cepts introduced above. An important observation from the figure is that as we progress through the various elements of the model, some contextual dimensions 60 are added to the previous concept to build a new concept. We begin with raw data values from the data stream. Measurements add the context of unit of measure, measurement type, and bounds to the data values. Observations are abstractions of measurements which deal with instantaneous measurement values, and also have a temporal context associated with them about when the value ceases to be valid. Events add further context to observations by only including those observations which have a corresponding match in an event profile and are relevant for future processing. Complex events are generalizations of simple events and include mul- tiple events. Each of the contextual additions or enrichment can be expedited by using a semantic knowledge repository. With identification of all relevant interpre- Figure 3.11: From raw data values to events: context addition and abstraction in our model tations that lead to simple or complex events of interest, it is possible to define the event space as the set of all possible events that can occur in the given universe of discourse. 61 3.4 Processing Concepts The model as described so far captures the static aspects of CEP, enabling us to model entities, properties, and their interdependence. However, these concepts are applicable at design time only. In a practical scenario, these rules should be applied over incoming streams of data. This requirement leads to various process- ing steps including event detection, notification, and filtering. We identify process- ing requirements from event profiles that leads to detection of events, states, and associated actions and processes in the application domain. Once actions are iden- tified for the detected events, we introduce CEP concepts such as event notification and filtering. 3.4.1 State Model Throughout its life-cycle, an entity goes through several states, e.g., a pump in our oilfield scenario might be in the procurement, working, failed or decommis- sioned states at various times. We incorporate the states of all associated entities and relationships of states to other concepts in a state model. An entity belongs to exactly one state at any point in time and has one desirable goal state (obtained from background knowledge). Entities change their states from time to time based on certain actions. A detailed description about actions can be found in Sec- tion 3.4.2. States in our model are denoted by ψ for individual state instances or Ψ for the set of states. The concept of states in our model is similar to “entity status” in the E* event model [47]. State-based models are frequently used in event processing. A popular choice is to use state machines or automatons, either deterministic finite automatons (DFA), or nondeterministic finite automatons (NFA), for instance, Siddhi [133] 62 uses NFAs. However, the primary purpose of using state models (such as NFAs) in these systems is to perform complex event detection by modeling states in the systemandmonitoringchangesinthesestatesleadingtoevents. Weproposetouse a DFA state model not for complex event detection (which we efficiently achieve through interpretations and event profiles as described in Section 3.3.2), but to manage state and action models. A state model specifies the sequence of states which should be followed so that an entity can reach its desirable goal state. It is also the basis for determining actions which are responsible for transitions between states so that the system can undertake these actions as specified by the sequence of states to reach the goal state. For the state model, we propose to use a DFA in which states of the DFA correspond to states of the entity in our case and input symbols of DFA correspond to actions. An initial start state and a final state for the entity need to be provided. A state-action transition table is also needed to complete the model which may be obtained from domain expertise or learning from historical data. This state model approach allows identification of additional observable prop- erties that further lead to various measurements and interpretations. It allows identification and specification of multiple properties of the entity that can be rel- evant during specific life-cycle phases thereby ensuring complete coverage of the modeling process. The state model also provides the additional benefit of estab- lishing relationships with other foundational concepts. For instance, each life-cycle phase may lead to a specific set of entities and business processes. In the case of a pump in an oilfield, the state related to procurement of the pump is ascertained usingspecificobservablepropertiesinaprocurementsystemordatabase. Thisalso leads to identification of the procurement process itself and other entities involved, e.g. people, companies, and systems. 63 A state model for a pump in our oilfield scenario is given in Figure 3.12. Vari- ous states and actions associated with the pump are shown. The goal state (which is “working” here) is denoted by double circles. Along with the actions, the actor role who is supposed to perform the corresponding action, is also depicted. It is Figure 3.12: State model for a pump in an oil field interesting to note that observable properties such as vibration, pressure or tem- perature are only relevant to the “working” state of the pump. In other phases of the pump’s life-cycle, these observable properties are not applicable. For instance, while a pump is in a procurement state, its relevant observable properties may be the associated order status or shipping status. Such properties are measured, tracked and updated in different systems. Similarly, while under repairs, the pump may have a separate set of observable properties that relate to work-order informa- tion. Change in property values can lead to a change in the current interpretation, and a change in interpretation may lead to change in state. From Figure 3.12 it can be noted that some of the states are desirable for an entity at a certain time while others are not. For instance, “failed” state of the pump is not desirable, hence immediately after the failed state is detected, the 64 pump is subjected to the “maintenance” action, in order to return it back to the desirable “working” state, referred to as the “goal” state. However it may not be possible to directly achieve the goal state in a single action. Hence, it is required to determine a path to the goal state requiring a series of intermediate states. In order to accommodate these requirements, the state model should be able to capture rules for determining the goal states, next states and actions in a specific local context. 3.4.2 Action A conceptual model of all possible actions requires careful identification of var- ious types of actions in the given context. Among different kinds of actions, we have already identified actions that are relevant to changes of state for a particular entity. In Section 3.4.1, we established that actions and processes are associ- ated with state transitions. It is possible to determine actions that lead to the desired state. Since these actions are specific to a real-world business or organi- zational context we identify them as domain actions. For instance, in the digital oilfield, actions performed by operators, maintenance engineers and other teams are domain-specific and therefore classified as domain actions. In addition to domain actions, CEP also requires certain actions to be per- formed in order to detect and process events. For instance, on detection of simple events, it is required to evaluate complex event profiles. Once simple and complex events are identified, the approach should lead to identification of states and nec- essary processing actions (such as notification to subscribers) required in response to the detected event. These actions are identified as CEP actions. These types of actions are covered in most CEP approaches. A simple method to represent such actions is by using Event-Condition-Action rules. 65 Another type of action is required to implement the CEP actions. CEP actions provide a list of tasks to be performed at specific time instances or intervals. How- ever, these tasks are realized in the form of database queries, computations, etc. These sets of tasks are performed automatically in the specific system implemen- tation, and thus identified as system actions or middleware actions. action = α domain e.g. maintenance, operations α CEP e.g. event detection, notification α system e.g. patterns life-cycle management (3.12) A simple system action identified for event detection may have complex require- ments. For instance, a complex event profile may require sensor readings from two sensor streams with separate update frequencies and data formats. Once these values are extracted and converted into desired form, they need to be evaluated as per the interpretation set. When an interpretation is identified, associated state and domain actions needs to be determined. An implementation of these actions therefore involves a multi-step workflow that should be realized automatically by the CEP system. In implementing these tasks, the system can employ specific enterprise integration patterns (EIPs) [57] to handle inputs/outputs such as sen- sor readings, temporary storage, processing and communication across software or logical entities. We have proposed semantic representations for EIPs [98]. The con- ceptual model should be able to make provisions for covering these issues. Upon detection of an event, the CEP system should be able to create patterns, utilize these patters in the information processing workflows and destroy patterns once processing is complete. Event escalation is a special type of CEP action, which gets triggered in exception circumstances. 66 3.4.3 Role Based on the identified types of actions, it is possible to determine the roles of actors who are required to perform particular actions. For the identified domain actions, a person who is interested or responsible in specific actions and resulting state transitions can be identified for playing the role of an actor. Since the person or agent is responsible for performing specific domain actions, the role is identified as a domain role. In an oilfield scenario, a person responsible for maintenance of a failed pump can be considered to play a domain role of maintenance engineer. However, just as in the case of actions, roles can be of various types and can be played by real-life or logical (software agent) entities. In case of CEP actions, the correspondingrolescanbeidentified. Forinstance, theCEPengineshouldperform a query to determine the recipient of the notification message about the detected event. In other words, publisher and subscriber roles can be identified for a given event. Itisinterestingtonotethatanactorplayingadomainrole(e.gmaintenance engineer) may also be assigned a CEP role (e.g. event subscriber). Additionally, as discussedinthecontextoflife-cyclemanagementofenterpriseintegrationpatterns, this conceptual modeling approach also allows life-cycle management of various roles. For instance, upon detection of an event, set of appropriate roles (like subscribers)areidentified. Whentheeventishandledappropriately, thesubscriber and related roles are destroyed. The third kind of roles corresponds to system roles or middleware roles that determines various middleware actions performed by a software agent. For instance, a software agent responsible for determining subscription to events can be assigned a role of broker. Similarly a software agent 67 responsible for generating event related information can be identified by the role of publisher. System roles can also be realized using rule-based representations. role = β domain e.g. maintenance engineer, operator β CEP e.g. publisher, subscriber β system e.g. query manager, broker (3.13) The sensor stream can be assigned a role of event source whereas, a particular domain role can be considered as anevent sink. Message subscription can also be realized in various ways to subscribe software agents and human beings appropri- ately. Subscriptions can be implemented by selecting proper enterprise integration patterns. Equation 3.14 defines rules to determine appropriate patterns while determining subscription for human or software agents. IF Software agent THEN Constant Polling IF Human agent THEN Request-Reply/Publish-Subscribe (3.14) 3.4.4 Notification When an event (e) is detected, and corrective actions (α) and roles (β) are identified, the next processing step is event notification. Here, the CEP system should be able to determine the content of the notification message N shown in Equation 3.15. Event notification is typically limited to core event detection information but we introduce additional background information to enrich the notification message. For instance, each corrective action (α) can be associated with an expected time for completion τ comp . This not only provides guidance to the actor but also enables a mechanism for validation of the notification. The CEP 68 system can evaluate the state of an entity associated with the event to determine changes. If the action is not taken and entity remains in the same state, the CEP systemmaytriggeradditionalactionstoescalatetheevent. Inadditiontoexpected time of completion, the CEP system might include best practices related to the required action and additional information that is available in the background knowledge base (KB). N =hβ,α,τ comp ,event-contexti (3.15) Using this approach it is possible to determine how long a subscription is valid. Subscriptions identified for an event can be maintained until the goal state is achieved. If the requisite action is not performed on the first notification, an addi- tional escalation message is generated. Here, the role β and action α can change according to the escalation process adopted in the specific enterprise. Similarly, when a message is escalated, the granularity of its content can also change based on the rules. As subscriptions and notifications are managed in a rule-driven process, it leads to possibility of redundancies and duplication. This leads us to identify provisions for filtering mechanisms. 3.4.5 Filtering Filtering mechanisms play a key role in avoiding duplicate or unwanted occur- rences of instances to report unique happenings from the deluge of observations. An event profile evaluated at certain time intervals may detect an event that trig- gers actions, roles and notifications. However, at the next time interval, it might detect the same event again and generate duplicate notifications. In case of the oilfield scenario, suppose an event profile to determine pump failure is evaluated 69 Figure 3.13: PoEM Event Processing Workflow every minute. If a failure event is detected at time τ i , the system would detect the same event again at τ i+1min (unless the failure is fixed within such a short time) resulting in redundancy. To avoid this situation, filtering can be employed. Existing patterns for event filtering, as mentioned by Paschke et al. [87] can be used here. 3.4.6 Event Processing Workflow The complete event processing workflow enabled by the proposed model is depicted in Figure 4.8. The top half of the figure delineates various elements of the knowledge base (KB) to be looked up by the components involved in event processing at runtime, which are shown in the bottom half. The process begins with reading values from data streams. Simple as well as complex events can be detected by using the respective event profiles and reporting if any interpretations within the event profile are evaluated to be true (event-to-profile match is found, similar to the “event matching” operator in [30]). An event (e) is detected in such a case. Once an event is detected, the current state (ψ current ) and goal state (ψ goal ) of the entity involved are determined. If they are the same, then the entity is in the desired state, and no further processing needs to be done. Thus, the event 70 and its related information can be discarded (after logging, if necessary). If the entity is not in the goal state, a sequence of states which needs to be traversed to reach the goal state is determined by referring to the state transition model. The first state from this sequence is the next state, and an action (α) leading to the next state is also looked up. After obtaining the action, the action-role mapping is looked up in the KB to determine appropriate roles (β) of actors who can perform the specific action. From the role, particular actors, e.g. employees who need to perform the action, are found. Finally, a notification message is sent to the actors (as well as any other event subscribers), the content of the message being enriched with relevant context. However, the event processing workflow does not end here, since it is unknown if the action suggested was actually performed. In such cases, escalation may need to be performed to ensure that appropriate action is taken and the entity reaches its desired goal state. 3.5 Event Escalation Scenario Escalation refers to the case when an event is detected, a notification is sent to an actor to take a certain action but no reply from the actor is received (probable reasons for which may be the actor being inactive, the actor being unable to per- form the action or the action being performed but the reply message getting lost). Conventional event models would stop processing the event here, or retry trans- mission of the action message, and wait till a response is received or someone finds out the root cause for the problem. It would be better if a proactive CEP system could monitor the repair action suggested to the actor, and upon exceeding a cer- tain expected time to reply, would automatically escalate the event and perform a set of predefined actions (which could include sending a message to the supervisor 71 of the actor or another actor/team to carry out the repair action, or aborting the event altogether). Such predefined actions can be determined from background knowledge and information about the organization and use case domain. A step-by-step algorithm for an event processing scenario which includes event escalation is depicted in Algorithm 2. The process involves several lookup opera- tions to obtain information from the knowledge base (KB). The algorithm shown assumes a simple event. The detected event has an associated entity, timestamp and property, as defined in Definition 1. The process can be extended to a com- plex event by replacing the individual instances of entities and properties with their corresponding sets. Within the escalation loop, the algorithm begins by obtaining an estimated time for completion of the action. A message is sent to the actors with appropriate roles with details about the entity, event and this time value. The system then waits for this amount of time and then looks up associated escalation roles who can perform or manage the required action. This new set of roles is added to the role set which serves as the subscriber list for the action notification message. By including the set of roles instead of a single role, it is ensured that any actor can perform the required action and that all of them are kept coordinated about the results. If the state of the entity has not reached the desirable next state, this process is repeated. Otherwise, the escalation process is canceled and the program returns execution to the outer loop to progress further in the state model towards the goal state. If the nature of the event is such that the goal state may never be reached, appropriate error messages can be passed to the user and escalation can be terminated. 72 Algorithm 2 Event Escalation Algorithm e←χ ω τ,κ,π Get ψ current ,ψ goal while ψ current 6=ψ goal do ψ next ← LookupNextState(ψ current ,κ) α← LookupAction(ψ current ,ψ next ) β← LookupRole(α) B←{β} repeat Get estimated t α SendMsg(α,β,t α ,e) Sleep(t α ) B←B∪ LookupEscalationRole(B,e) until ψ current 6=ψ next ψ current ←ψ next end while 3.6 Limitations of the Model We observed that existing generic event modeling approaches do not provide a means of representing an end-to-end event processing workflow.We designed a conceptual model, PoEM, to be comprehensive while not being restricted to one application domain. We introduced the foundational, event and processing con- cepts for building the PoEM model. PoEM includes state-based goal planning and event escalation. The event model presented here is limited to identification of key concepts and relationships between them. It assumes that all data is available at a central location holding decision-making capability. The middleware required to implement such an event processing model has not been discussed in detail and needs to be driven semantically as well. There are various components of back- ground knowledge (some of which are obtained from domain experts) which we need to know prior to using the model. In the next chapter, we show how we can learn parameters for the event model automatically from raw sensor data by using an advanced machine learning approach for time series mining. This provides us 73 a way to automatically learn event detection rules without explicit and expert domain knowledge about the application. 74 Chapter 4 Event Recognition from Temporal Sensor Data Detecting and responding to real-world events is an integral part of any enter- prise or organization, but semantic computing has been largely underutilized for complex event processing (CEP) applications. A primary reason for this gap is the difference in the level of abstraction between the high-level semantic models for events and the low-level raw data values received from sensor data streams. In this chapter, we investigate the need for semantic computing in various aspects of CEP, and intend to bridge this gap by utilizing recent advances in time series analytics and machine learning. We build upon our Process-oriented Event Model from the previous chapter, which provides a formal approach to model real-world objects and events, and specifies the process of moving from sensors to events. We extend this model to facilitate semantic time series data mining directly over sensor data, which provides the advantage of automatically learning the required background knowledge without domain expertise. We illustrate the expressive power of our model in case studies from diverse applications, with particular emphasis on non- intrusive load monitoring (NILM) in smart energy grids. We also demonstrate that this powerful semantic representation is still highly accurate and performs at par with existing approaches for event detection and classification. 75 4.1 Motivational Scenario in Non-intrusive Load Monitoring With the rise in the number of sensors and instrumentation, modern electricity grids have become Smart Grids, serving as a source of energy use data, often at a high temporal frequency. However, energy use in households is typically measured in aggregate numbers. It is more useful to report appliance-level information at a finer granularity to consumers, so they can take appropriate steps for energy con- servation by managing individual appliances based on their power use patterns. The broad area of nonintrusive load monitoring [168], which includes approaches such as energy disaggregation [39], has drawn increasing attention from researchers in recent times. Energy disaggregation refers to methods that break down aggre- gate energy consumption into appliance-level itemized measurements without any explicit plug-level sensors. Existing approaches [120, 5, 39, 48] for energy disaggregation include meth- ods such as estimation of individual appliance usage by differentiating the loads of various appliances and identifying ‘signatures’ [48] associated with most exist- ing consumer electronic appliances. These signatures can be measured by special sensors, along with analysis of the current-voltage patterns. This commonly used approach has several drawbacks though, as (i) installing specialized hardware is costly and time-consuming, (ii) the number of appliances in use in households is large and diverse along with variations in human usage patterns (iii) the signa- ture of an appliance can vary over time depending on its mode of operation (e.g. washing machine operating as washer or dryer), and most importantly, (iv) if a new appliance is added to the household, this approach won’t be able to learn its 76 signature automatically. In this work, we aim to mitigate all of these drawbacks by using a machine learning approach aided by an interpretable semantic model. NILM approaches for energy disaggregation are surveyed in [168] and [173]. Though several techniques for NILM disaggregation have been proposed, we frame the problem in the context of time series classification. ElectriSense [48] uses elec- tromagnetic interference (EMI) signals during appliance operation to identify and classifyindividualapplianceuse. Gemello[8]providesamachinelearningapproach for generating a more fine-grained electricity bill for a household (from aggregate data) by comparing to similar households. Anderson et. al. [5] survey different approaches for event detection, broadly classified as based on expert heuristics (such as [34]), probabilistic models (such as [10]) or matched filters (such as [122]). However, none of methods take a time series based approach. Shao et al. [119] mine for temporal motifs from energy consumption time series; however, they do not work with labeled data or extract discriminative features. Motifs are frequent patterns but not discriminative like shapelets. The energy disaggregation problem in a typical household poses a challenging frameworkforsemanticcomputingandmachinelearning. Thesemanticcomputing challenge is to develop a rich representation for all the objects (i.e. individual appliances or the household as a whole), events (i.e. appliance switching on/off, ) and processes involved (i.e. sending notifications in case of a power blackout). A better semantic representation for all actors and objects involved will undoubtedly aid the machine learning portion, which is to automatically detect appliance-level events from aggregate data. Even though the event processing and semantic web communities have focused on many smart grid related applications, there has been surprisingly little work to relate them to the energy disaggregation scenario. In this chapter, we propose 77 a comprehensive model for capturing the semantics of events and event process- ing, and illustrate how the model can be driven by advanced machine learning approaches to automatically find patterns from temporal power consumption data. But first, we investigate in detail the need for semantics in various common aspects ofeventprocessing, andalsocomparehowsomeoftheexistingworkstrytoaddress these issues. 4.2 Semantic Computing in Event Processing In this section, we investigate how semantic approaches are needed in various aspects of event processing tasks. We also survey existing semantic computing based event processing systems on the extent to which they fulfill these aspects. 4.2.1 Event Detection Semantics Detection of complex events involves monitoring multiple heterogeneous data sources, and checking certain conditions to see if an event has occurred. Event pro- filing is commonly used to capture the relevant details for event detection. Event profiles [56] define a predefined condition, which, if satisfied leads to detection of an event. For a simple event, profiling involves monitoring conditions and observa- tion values to check if they are in a certain critical range (defined as an event). For complex events, profiling involves dealing with correlated events or co-occurrence of events. Semantic computing can improve event detection and profiling by reduc- ing ambiguity in the meaning of data, and providing detection rules based upon instantaneous values of data. Semantic techniques also improve expressivity and reasoning power of the system. 78 Even though a large amount of existing work on event processing focuses on event detection, there are only a few approaches which propose semantic event detection, such as Stojanovic et al. [132], Teymourian et al. [139], Hammar et al. [50] and Zhou et al. [170]. Moser et al [80] propose related approaches for semantic event correlation (connecting related events and eliminating duplicates). We propose to enable semantic event detection by incorporating concepts from our proposed event model into an event ontology. Then, instances from the ontol- ogy can be used for reasoning whether certain conditions are satisfied, and certain property values fall in the critical range (for an event to occur). By using an inte- grated ontological repository as our data store, event correlation and elimination of duplicates is possible. 4.2.2 Event Filtering Semantics Filtering of relevant events is another functional requirement for event process- ing. Filtering is necessary to eliminate large portions of event related data and only focus on the portions which are relevant to a certain context. Event filters may perform filtering based upon event type, event priority or some other context. This context can be temporal, spatial, segmentation-oriented or state-oriented as enumerated in Section 4.2.3. For instance, in an event data stream which contains information about all automobiles in the city, a user may be interested only in the portions related to vehicles owned by her, or only those vehicles which are currently located within 10 miles of the vehicle she is located in. Since an integrated semantic knowledge base includes information from several data sources, it can provide a wider scope for the basis of event filtering. Semantic methods can increase the expressivity and reasoning power of event filtering by 79 using powerful semantic (SPARQL) queries. In Table 4.1, Complex Event Fil- tering refers to whether the SCEP system implements some sort of filtering for complex events. Semantic Event Filtering refers to the criteria that event filtering is driven by semantic concepts/rules. We also report on whether there is some spe- cial treatment for critical/high priority events, which can be filtered by assigning priorities to events and filtering those with the highest priority. Such critical event detection is primary to many practical applications such as threat detection [50]. In many industries, this concept of ‘management by exception’ is a key indicator of usefulness of the technology deployed. 4.2.3 Context in Event Semantics The role of context in event processing systems has been explored in detail by Etzion et al [33]. Four types of context are identified: temporal context, spatial context, segmentation context and state context. Most current SCEP systems implement temporal parameters and aspects, but spatial context is not found to be present in many works. Segmentation and state contexts are also not found in several state-of-the-art systems. 4.2.4 Event Notification Semantics Notification about events is a major constituent of CEP systems. Usually such notification is sent in the form of alerts/triggers to human agents, so that they can execute further actions and decide the future course of action. Sometimes, these triggers may lead to automatic execution of certain actions in the case of software agents. The CEP system is responsible for deciding the content of the notification message, method of notification (e.g. SMS/email), to whom the notification should be sent (subscriber list) and how to route the notification message. The message 80 contentcancontainrelevantinformationregardingtheevent, suchasinstantaneous and historical data values, various timestamps (when event occurred, when it was detected, when it was reported), actions related to event (what actions have been taken already, what are best practices associated with such event, what actions still need to be taken), processes associated with event (which processes must be rescheduled in order for action associated with this event to take place), entities related to event (people who are involved, people who can act as experts about this event, resources which are involved, agents which must be instructed to take further actions). Advanced CEP systems should be able to dynamically build the message content as well as subscription lists for events based on the event context, possibly benefiting from background knowledge provided by a semantic model. Notifications may need to be repeated, or sent iteratively with a certain fre- quency in many cases. This frequency can be determined from background knowl- edgeaswellastheuser’spreferences. Weproposetoimplementtheeventescalation pattern for CEP. For instance take the example of a patient whose health and vital stats were being monitored by a CEP system. Assume that the patient needs to be administered a certain medicine by the nurse when her heart rate is found to be abnormally high. At a certain point of time, this event occurred and the nurse in charge was notified. However, she does not see the message or respond for a certain time. In such a case, the doctor in charge of the patient, or other hospital staff can automatically be notified (probably based on their proximity to the patient). This capability to wait for a response to an action for a suggested time, and if there is still no response, implement a pre-defined action, is known as event escalation. Notification rules can use concepts from the ontology, and the ontology needs to make provisions for efficient notification processes. 81 Event escalation is another useful feature in a CEP system, and we discuss it in detail in Section 3.5. The role of the CEP system does not end with determina- tion of what action should be taken for which event, the system needs to have a ‘feedback’ loop to verify that the action was actually performed. Event escalation provides this feedback loop. Existing semantic models do not incorporate event escalation. 4.2.5 Action Semantics Afteraneventisdetected, itisimportanttotaketherightactionforthatevent, since the usefulness of event processing is in reacting to events quickly. Based on semanticknowledgeandhistoricaldata,actionscanbesuggestedandbestpractices can be listed for a certain event. These actions can be included in the event notification message. Actions to be suggested could be reporting actions (such as logging or monitoring), or correcting actions (such as repair or maintenance). Actions can be driven by semantic concepts and rules. Best practices discovery is very useful in enterprise systems where finding expert advice can be costly. These best practices can be integrated with business rules or company-specific practices. 4.2.6 Prediction Semantics Predicting future event occurrences accurately based on historical data and recent trends is important for an effective CEP system. Semantics can help in providing relevant background knowledge to serve as training data for a prediction system. Prediction involves predicting simple events, complex events, and situa- tions which are inferred from these events. Discovery of event patterns and adding them to the knowledge base for future reference is also a desired functionality. In this work, we focus our efforts on the detection of representative subsequences 82 (corresponding to events) from time series sensor data. The patterns detected by our approach have predictive power to classify future time series data quickly into one of several previously seen categories. We compare several aspects of these types of identified semantics in some exist- ingSCEPsystemsbaseduponthecontextofevent-basedsystemsinTable4.1. The approaches compared are (1) Stojanovic et al. [132], (2) Teymourian et al. [139], (3) Taylor et al. [137], (4) Liu et al. [72], (5) Hammar [50], and (6) Zhou et al. [170]. 4.3 Incorporating Temporal Pattern Mining in the PoEM Model In this section, we propose an enhancement to our event model to incorporate theminingoftemporalpatterndirectlyfromrawsensordata. Integratingtemporal pattern mining helps us automatically learn parameters for the PoEM model, and also facilitates the use of advanced machine learning approaches with predictive capabilities. We focus on a particular time series classification approach called time series shapelets [164], which are well-suited to our motivational use case in non-intrusive load monitoring. We also exhibit the utility of our shapelet-based approach on a real, large-scale, open NILM dataset. 4.3.1 Time Series Shapelets for Classification Shapeletshavebeenappliedforavarietyoftimeseriesdataminingapplications in diverse fields [91, 92, 95, 90, 94, 100, 89, 101, 53, 82, 164, 162]. A shapelet is a subsequence in a time series that is discriminative and has predictive power based on the distance of a new time series to the shapelet. Mathematically, they are identified using an information gain criteria (trying to find the subsequence 83 Table 4.1: Comparison of event-related semantics in some SCEP approaches Approach [132] [139] [137] [72] [50] [170] Overview of Capabilities Semantic CEP Engine X X × × – X Semantic Enrichment/Annotation X X X X X X Generic Event Ontology X X × X × × Isolated Domain Ontology × X X X × X Connection to Linked Data × × × X × × Detection Semantics Semantic Event Detection X X – – X X Complex Event Detection X X X X X X Semantic Event Correlation × × × × X × Event-driven Detection Rules X X X X X X Filtering Semantics Semantic Filtering/Inferencing X – × × X X Complex Event Filtering X X – × X X Event Priority Assignment × × × × X × Context Semantics Temporal Context X X – X X X Spatial Context × X × × X X Segmentation/State Context × × × × × × Notification Semantics Semantic Triggers/Alerts X X X – X X Dynamic Message Content × × × × × × Dynamic Subscription Lists × × × × × × Event Escalation × × × × × × Action Semantics Semantic Action Selection – X × × × × Best Practices Discovery × × × × × × Prediction Semantics Pattern Discovery from Sensor Data × × × × × × Incorporating Time Series Mining × × × × × × 84 in the training time series data which can discriminate best between the classes in the data). Useful extensions of shapelets are Logical Shapelets [82], which consider logical combinations of multiple shapelets, and Local Shapelets [162], which consider approximate shapelet mining for early classification of time series and can be used in a realtime streaming scenario. In our evaluation, we use a faster, enhanced, state-of-the-art version of the original supervised shapelet- based classification algorithm, called Fast Shapelets [110]. Shapelets find local discriminative patterns in time series data, which in contrast to other methods which find global patterns, gives us more predictive power with faster classification times for many use cases. More details of the shapelet extraction algorithm, as well as existing related work, has been discussed in Section 2.5. An illustration of a shapelet is shown in Figure 4.1. This shapelet was auto- matically extracted from our evaluation dataset for the task of event detection, or in this evaluation context, the task of determining whether or not an appliance was switched on within this evaluation time window given the aggregate power use data (details of our evaluation are provided in Section 5.2.1). This shapelet has a length of 156 and a distance threshold of 11.0253, and was extracted from a time series in our training set which belonged to the ‘event’ class (appliance was switched on). Visually, we can see that the shapelet can be interpreted as referring to the shape of change in aggregate power use following what looks like an event of an appliance being switched on. The power consumption suddenly increases and then decreases and remains stable, which is typical following the switching on of an appliance. In this experiment, this was the only shapelet extracted from our training dataset (of 922 times series, each being 360 data points long), and thus, after shapelet extraction we discard the entire training dataset of 922*360 = 85 Figure 4.1: An illustration of a shapelet extracted from our evaluation dataset. This shapelet (denoted by S1), extracted automatically from power use data in a householdrecordedbysensors, isabletosemanticallycapturetheeventofanappli- ance switching on without any explicit background knowledge about the appliance or the non-intrusive load monitoring domain. Figure 4.2: Shapelet-based decision tree for classification of a new test instance, the shapelet S 1 is the one extracted in the previous Figure. 331,920 data points and store just the 156 data points in the shapelet subsequence and the distance threshold value. Figure 4.2 shows how classification of a new instance happens with the shapelet-based decision tree formed from this shapelet. Ifthedistanceofthenewtesttimeseriesislessthanthethreshold, thetestinstance is predicted to be an event (appliance switched on within the time window of the test instance), otherwise, it is predicted to be a non-event. This shapelet, by itself, 86 is able to achieve a 98.6% classification accuracy on the test dataset as shown in Table 5.2. Semantically, this shapelet captures the event of an appliance switching on automatically without any background knowledge. Shapelets are a simple but powerful tool to capture the local discriminative characteristics of events within sensor data. 4.3.2 Shapelet-based Event Detection Time series shapelets provide an intuitive way to realize the design of our proposed event model in practice. They enable the model to operate directly on raw sensor data (typically time series), and automatically learn the required domain-specific parameters by the use of machine learning. As shapelets relate to the detection of critical happenings in time series, they are closely related to the idea of detecting ‘events’ from time series. Initial efforts towards finding a similar automatic link between time series shapelet mining and event processing have recently been proposed in autoCEP [81], but no event processing rules or interpretations, or event model is explicitly provided and their approach is not driven by the rich expressive and reasoning power of semantic computing. To explore the link between these two areas further, we start by revisiting how a shapelet-based classifier classifies a new test time series instance. For simplicity, consider the case of a dataset with two classes, and just one shapelet, sh of length m, with a distance thresholdd th , in the decision tree. The predicted class of a test time series t test , which has a distance d test from sh (found by sliding the smaller 87 lengthshapeletacrossthelongertesttimeseries, computingtheEuclideandistance at each point, and reporting the minimum), can be represented as: prediction(t test ) = Same class as shapelet :d test <d th Opposite class from shapelet :d test ≥d th (4.1) Notice the similarity of the structure of this decision rule to the structure of our concept of interpretations and interpretation sets, proposed in our PoEM event model. The example shown here is a binary classification instance with just one shapelet, but it is straightforward to extend this to a multi-class scenario with multiple shapelets, by just adding branches to this decision rule. This decision rule is equivalent in structure to our interpretation set, and the decision rules used to make predictions for a shapelet-based classifier essentially become the new event processing and event detection rules. All parameters in this rule, including the distance threshold and class label identification, are found automatically from the data by the powerful shapelet extraction method. An automatic representation for an event learned from this interpretation can be the interpretation itself. Based on the output of a shapelet-based classifier (denoted by SBC), we can build an interpretation set for events automatically. Here is an example with multiple classes. The task here is to identify which appliance caused a detected event (which appliance was switched on). X switch-on-event = Refrigerator : SBC predicts Refrigerator class Lights : SBC predicts Lights class Fan : SBC predicts Fan class Other : SBC predicts other appliance class (4.2) 88 Timestamped data (time series) can be easily collected from several different applications, possibly at very high frequencies making this connection between shapelets and our proposed event model applicable to a large number of domains. As the shapelet mining component fits in within the PoEM model by provid- ing a new way to approach interpretations, the rest of the model does not need to be modified to accommodate this component, and the event parameters can now be learned automatically. Background knowledge from the domain can also be added on top of the shapelet mining framework to further enrich the inter- pretations of extracted events. An interesting future direction of research is to also incorporate semantics into the event processing middleware, for instance, to automatically deploy CEP as well as time series mining components based on the situation and maintain their lifecycle. An effective approach for managing these middleware components are enterprise integration patterns [57]. These enterprise integration patterns propose middleware components including message construc- tion patterns, message routing patterns and message transformation patterns. An initial approach towards semantically managing these patterns in a smart grids application is proposed by us in [93]. Further, this correlation between time series shapelets (pertaining to event recognition from raw sensor data) and event detection (pertaining to event mod- eling as in PoEM), can be viewed from the perspective of event-condition-action (ECA) rules. A typical ECA rule is of the form “ON Event IF Condition TRIG- GER Action”, and is shown in Figure 4.3. The structure of a shapelet-based classifier as seen in Equation 4.1, and the structure of an event-condition-action rule are actually quite similar and inter- changeable. For establishing this connection, we look at the shapelet being found as the event, the distance comparison check i.e. d test < d th (or vice-versa) as the 89 Figure 4.3: The structure of an event-condition-action rule condition, and the target class being predicted from Equation 4.2 as the action. Thus, thisapproachofincorporatingshapelet-basedeventrecognitionintoanevent model like PoEM can now be established as creating new ECA rules in an event engine in a complex event processing system. However, all of our work till now has focused on univariate temporal sensor data. In real energy systems, we have dataIn the next section, we have hetero- geneous sensor data coming in at a high temporal rate. In the next section, we propose new algorithms which can discover shapelets from multivariate temporal sensor data. By these new algorithms, we make it possible to extend the event recognition paradigm to multi-sensor data. 4.3.3 Extending Event Recognition to Multivariate Tem- poral Data In this chapter, we are addressing the critical problem of resolving the dif- ference in representation between high-level semantic event models and low-level sensor data streams, and bringing together the communities of semantic event 90 processing and time series data mining. We explored the need for semantic com- puting in several aspects and functioning components of complex event processing systems. For efficiently detecting and processing events from Big Data streams, we proposed an approach to incorporate an advanced machine learning approach based on time series shapelets into the Process-oriented Event model. Our com- bined event representation and detection framework is able to rapidly represent events in new application domains, and we also demonstrated its high accuracy in classification and prediction of new events from data streams. Astheshapelet-basedapproachesdonotaccountforthebackground, structure, nature or source of data, they are effective for dealing with data from a diverse variety of sources, as usually observed in modern Big Data streams [159, 22]. An approach utilizing shapelet-based time series classification for heterogeneous multidimensional time series sensor data, in the context of Big Data, has been proposed by us in [95]. Further enhancements to the shapelets algorithm, such as Local Shapelets [162], can operate on data streams without needing all of the training time series in memory for shapelet extraction. This enhancement opens up the potential to extend our temporal pattern mining approach for dealing with high velocity Big Data streams. Thus, our approach can be adapted for automatic detection of domain-specific temporal data patterns even for Big Data streams. Our proposed approach proposed in this chapter still has a critical limitation – it is limited to processing and analyzing univariate (or single sensor) time series data. In practical applications, we often have multiple sensor sources, possibly heterogeneous, generating time series data, and we need to automatically identify shapelets from multivariate time series data to process these complex datasets. In this section, we propose algorithms for solving this problem of multivariate time series classification (MTSC). 91 Specifically, we describe our efforts on developing algorithms for detecting shapelets from multivariate time series data, and using these shapelets for classifi- cation. There are some existing methods for extracting shapelets from univariate times series data, the state of the art is Fast Shapelets [110]. However, related work (such as [43, 44, 59]) for shapelet extraction from multidimensional time series data is limited and has several restrictions. This related work and their shortcomings have been reviewed in detail in Chapter 2. An illustration of shapelets can be seen in Figure 4.4, which shows shapelets (in red) extracted from a gesture identification dataset used in our evaluation show- ing shapelets representative of shake and pick-up gestures. This dataset records accelerometer sensor data for different human users performing gestures, along the x, y and z axes at a sample rate of 100 Hz. These shapelets were extracted auto- matically from the raw sensor data using a shape mining algorithm. They now act as discriminative features and contain sufficient information in them (with- out using any of the remaining training data) to perform classification on a new test data instance (based on distance from the shapelets) to classify it as either a shake or pick-up gesture. The different shapes of shapelets extracted from the two activities can be easily seen. We propose two complementary shape mining approaches for MTSC to solve the multisensor event recognition problem.We propose a new approach, called Interleaved Shapelets (ILS), which performs data manipulation and feature rank- ing to generate a univariate representation of the original multisensor data, and then performs shapelet-based classification to solve the MTSC problem. By inter- leaving temporal segments of data across sensors, ILS enables us to capture local temporal dependencies across sensor dimensions. A complementary approach is to perform univariate shapelet extraction first, and then feature weighting to create 92 Figure4.4: Shapelets(showninred)extractedfromtimeseriesrepresentingashake gesture (top) and a pick-up gesture (bottom). The Y-axis shows accelerometer sensor values while the X-axis shows time at a 100Hz frequency. Shapelets are discriminativefeatures(timeseriessubsequences)indatawiththepredictivepower to perform classification on new instances. a multivariate classifier from multiple individual univariate classifiers. We propose an algorithm called Shapelet Forests (SF) based on this idea of combining shapelet extraction with feature selection. Our proposed SF algorithm can use a suite of feature selection approaches. We evaluate our approaches against a baseline classifier built using a state-of- the-art shapelet mining algorithm. For our experiments in the following sections, 93 weuse real sensordata from two use cases: (i) multisensor data froma silicon wafer chip manufacturing industry (target being to predict whether the wafer was man- ufactured normally or has an anomaly), and (ii) accelerometer data from human activity/gesture identification records (target being to identify the gesture per- formed). Both these datasets are publicly available. In Chapter 5, we describe in detail several use cases related to energy applications, and the performance of our algorithms on those datasets. 4.4 Algorithms for Multivariate Shape Mining In this section, we explore various strategies for extending the existing univari- ate shapelet extraction method to multisensor time series data, and propose new approaches to do so. 4.4.1 Baseline approaches First, we describe a baseline approach for multivariate time series classification using shapelets, which we call Naive Shapelets (NS). This approach simply uses the univariate shapelet extraction method on time series data from individual sensors and then aggregates the predictions of the univariate shapelet-based decision trees by majority voting. We assign the instance labely i to each of then time series in T i . This gives us n univariate time series with labely i . Suppose there are two classes, i.e. y i ∈{0, 1} with k 0 instances of Class 0 and k 1 instances of Class 1 where each instance has n time series. Using this approach, we end up with k 0 ×n univariate instances of class 0 andk 1 ×n univariate instances of class 1. We can then use algorithms such as Fast Shapelets [110] to extract shapelets. 94 To classify a new multivariate instance T i , NS uses the decision tree learned by the Fast Shapelets algorithm (as in case of classifying univariate time series). For a multivariate instance with n time series, it first uses the decision tree to learn the label for each univariate time series. It then computes the final label, e.g. normal or abnormal, by majority voting, i.e., the final label is the same as the label for the majority of the univariate time series. This approach ignores any “correlations” between time series from the same instance, and thus we refer to it as Naive Shapelets (NS) (analogous to the Naive Bayes approach). This approach implicitly assumes that all the n time series are useful for classification, andmaynotperformwellwhenonlyasubsetofthen timeseriesarebestsuitedfor distinguishing between different classes. Hu et al. make the same observation [59]. Another simple approach for converting multisensor time series from an instance to a single time series is to concatenate them. A few works have suggested this approach (including Logical-Shapelets [82]). A drawback of this approach is high computational complexity. If the concatenated time series has a large number of samples (as is the case for our our manufacturing dataset), the training phase can take a long time due to the fact that the running time of shapelet extraction is quadratic in the number of samples in a time series. Additionally, as in Naive Shapelets, this approach may not work well when only a subset of then time series capture the differences between the normal and abnormal classes. In practice, moving from univariate to multivariate time series shapelet mining typically involves a shapelet extraction phase (on univariate time series) and a data joining phase (to combine either time series data instances or the prediction of univariate classifiers based on individual sensors). Depending on the order of these two operations, we can observe two broad strategies: (1) first create a univariate time series representation of the original multivariate data (early join), 95 and then use existing univariate shapelet mining approaches, and (2) modify the shapelet mining algorithm discussed in the previous section to work directly with multivariate data, possibly using the univariate shapelet extraction method first on individual sensor dimensions and then combine the individual predictions of these univariate classifiers (late join). We propose two new complementary algorithms: the first (Interleaved Shapelets) following the early join strategy, and the second (Shapelet Forests) following the late join strategy. 4.4.2 Interleaved Shapelets (ILS) From Chapter 2, we observed two different methods for a strategic solution to the MTSC problem using shapelets - concatenation methods or ensemble-based methods. We now describe how aspects from both concatenation and ensemble- based methods can be combined to yield a novel multivariate shapelet-based time series classification algorithm that is more effective than existing methods for cases where the relative position of discriminative subsequences in different sensors varies. We first illustrate the intuition behind our approach. Consider a case with two sensors where a time series is to be classified to be in the positive class if a distinct shape (e.g., a peak) appears in Sensor 1’s data stream followed by its appearance in Sensor 2. The appearance of these shapes in any other order indicates that the time series is to be classified as the negative class. Two instances of this type are shown in Figure 4.5. In this case, most existing multivariate shapelet-based classification algorithms that extract shapelets from each sensor independently of the other would fail since the mere presence or absence of the shapelets is not sufficiently discriminative. 96 On the other hand, if the two sensor streams are concatenated, a (single) dis- criminative shapelet composed of the distinct shapes from each sensor can be identified, as shown in Figure 4.6. However, the shapelets extracted based on concatenating fixed length segments are not local – the length of the shapelet is dependent on the length of the window used for concatenation. In addition, the discriminative capacity of such multivariate shapelets depends on the relevance of each sensor stream to the class – the inclusion of time series segments from irrelevant sensors in the concatenated shapelet will increase the error rate in shape pattern matching during classification. We develop an interleaving and sensor ranking-based approach to make the extracted multivariate shapelets invariant to the length of segments and number of sensors. Even though the concatenated signature or shapelet found may cross boundaries between the two sensors (thus consisting of a pattern which does not actually exist in any one sensor), it does not matter as long as it is a discriminative pattern - since we will apply the same pre-processing to concatenate both the training and testing data. Figure 4.5: Two multivariate instances each consisting of two time series (two sensors). The peak in the data is a potential shapelet candidate but it is not discriminative enough to differentiate between the two instances (which belong to opposite classes), because a similar pattern occurs in all four time series. 97 Figure 4.6: Time series formed by concatenating data from all sensors for each instance in the previous example. Now the potential shapelet candidate from the concatenated time series can discriminate between the two classes. We explore interleaving measurements from multiple sensors as a means of applying univariate shapelet-based classification algorithms to multivariate time series. Following this step, shapelets can be extracted by jointly considering the discrimination ability across all sensors. Interleaving is treated as a generalization of the approach of concatenating time series from different sensors. While naive concatenationalsoenablesmeasurementsfrommultipletimeseriestobeconsidered together, shapeletscannotbeextractedsincethesizeoftheshapeletsearchwindow depends on the length of each time series itself. Our proposed approach is called interleaved Shapelets (ILS), and the overall approach is depicted in Figure 4.8. The idea is to interleave time series seg- ments across sensors from multiple dimensions to form the final concatenated one-dimensional time series for each instance. The simplest way to do this is to consider a fixed interval interleaving - segment size k at which we regularly cut the time series and interleave the segments, i.e. firstk elements of the first sensor, then first k elements of the second sensor and so on until the first k elements of the last sensor, after which we concatenate the (k +1) th to 2k th element of the first sensor again and so on as shown in Figure 4.7. In the end, we will have the same number of univariate time series as the number of instances. 98 Figure 4.7: Interleaving data from multiple sensors To improvise upon our proposed approach, we rank the sensors before perform- ing the interleaving and order the data according to the sensor ranks - from most important to the least. By doing this, we ensure that (i) the shapelet extraction algorithm encounters data from the highly ranked sensors first where a shapelet is more likely to be found, and (i) we can eliminate data from the lower ranked sensors completely leaving them out of the interleaving process (they will be at the end of each segment concatenation round). To implement this ranking scheme, we divide the training data into a training and validation set. The ranking of the sensors is based upon using the shapelets extracted from the training set to per- form classification on the validation set. We also z-normalized the univariate time series finally constructed from each instance to have a zero mean and unit standard deviation. Figure 4.8: Interleaved Shapelets approach 99 4.4.3 Shapelet Forests (SF) The ILS algorithm converts multivariate time series data to an equivalent uni- variaterepresentation. Thealternativeistodesignnewshapeletminingalgorithms that can work directly with multivariate data. We propose such a new approach called Shapelets Forest (SF) that works directly with multivariate data (and is equivalent to Fast Shapelets or Naive Shapelets for univariate instances). SF training phase. For extracting shapelets, i.e. in its training phase, SF takes the three steps shown in Figure 4.9. In the first step, it uses a univariate shapelet mining algorithm separately for each of the n time series. E.g., given k 0 instances of Class 0 and k 1 instances of Class 1 where each instance has n time series, SF first performs n different univariate shapelet extractions, where for time series i, i = 1,...,n, we have k 0 univariate time series labeled as Class 0 and k 1 labeled as Class 1. This step can give more than n shapelets. The number of shapelets extracted from each of the n dimensions depends on the data. Intuitively, time series that capture the differences between classes are more likely to provide one or more shapelets compared to time series that are same or similar across classes. The shapelets extracted in the first step by SF represent the features of mul- tivariate time series data, and typically a subset of these are discriminative, i.e. help us classify a new instance as normal or abnormal. The second and third steps in SF, labeled as Compute Data/ Distance Matrix and Learn Feature Weights in Figure 4.9, accomplish the goal of identifying the best subset of shapelets for differ- entiating between normal and abnormal instances. The Compute Data/Distance Matrix step is needed to convert the multivariate time series data into the for- mat accepted by typical feature selection algorithms that SF uses to learn feature weights. Our key contribution here is to do this using the shapelets extracted in the first step of SF. 100 Figure 4.9: Shapelet Forests Algorithm Distance Matrix. In typical feature selection approaches [105, 17, 169], a data matrix (D ={d ij }) is provided as input where instances are rows and features are columns of this matrix. However, in our data, each column or feature is a time series dimension or sensor, and hence, d ij is not a scalar but a sequence of numerical values. We propose an intuitive transformation of multivariate time series data into the data matrix format that is required by state-of-the-art feature selection algorithms. Instead of using a sensor or time series dimension, we use the extracted shapelets as features for the columns of our data matrix and the instances make up the rows. Each entryd ij is filled with the distance of time series instance to the corresponding shapelet(s) originating from the same sensor. E.g., suppose we want to compute the entryd 3,2 . This corresponds to the third instance and the second shapelet extracted from the data. We find out which sensor the second shapelet was from. Say it was from sensor 1; then the distance is computed between the second shapelet and the univariate time series corresponding to sensor 1 in the third instance, i.e. distance(T 1 3 ,s 2 ) wheres 2 is the second shapelet. Since the shapelets represent the columns, it is possible to have more thann columns in 101 the data matrix. The size of the data matrix is equal to the number of instances times the total number of shapelets. Learning weights for shapelets. After transforming time series data into a matrix using shapelets, we can use state-of-the-art feature selection algorithms [37, 49] to learn the weights for each shapelet. These weights are then used for weighted voting to classify a new instance as shown in Figure 4.9. In this paper, we explore several feature selection methods (some of them being commonly used machine learning techniques) to get weights/coefficients of the features for using in our feature selection approaches. We used the scikit-learn Python package [103] to implement most of these feature selection approaches (the default parameters were used). mRMR [105]. mRMR uses the minimum Redundancy Maximum Relevance criterion that ranks features based on how closely related they are to the class labels. At the same time, the set of high ranked features should not be redundant (which implies that they are not correlated to each other). Lasso [142, 37, 52]. Lasso (least absolute shrinkage and selection operator) is a regressionanalysismethodthatperformsbothvariableselectionandregularization in order to enhance the prediction accuracy and interpretability of the statistical modelitproduces. Itisalinearmodelthatestimatessparsecoefficients. Itisuseful due to its tendency to prefer solutions with fewer parameter values, effectively reducing the number of variables upon which the given solution is dependent. Mathematically, it consists of a linear model trained with ` 1 prior as regularizer. The objective function to minimize is: min w 1 2n samples ||Xw−y|| 2 2 +α||w|| 1 (4.3) 102 The lasso estimate thus solves the minimization of the least-squares penalty with α||w|| 1 added, whereα is a constant and||w|| 1 is the` 1 -norm of the parameter vec- tor. The α parameter controls the degree of sparsity of the coefficients estimated, and we use cross validation [31] to estimate this parameter in our implementation, as used in scikit-learn [103]. Discriminant Analysis [60]. Linear Discriminant Analysis (LDA) is common used to perform supervised dimensionality reduction of data by projecting the input data to a linear subspace consisting of the directions which maximize the separation between classes. LDA provides a linear decision surface, an extension of this method for quadratic decision surfaces is called Quadratic Discriminant Analysis (QDA) and is also considered in our evaluation. Ensemble and tree-based methods. We also evaluate several ensemble and tree-based classifiers [28, 41] and use feature weights provided by them when they are used as classifiers. These methods include: • Random Forests [16] - This is an ensemble classifier where each tree in the ensemble is built from a sample drawn with replacement from the training set. When splitting a node during the construction of the tree, the split that is chosen is the best split among a random subset of the features. As a result of this randomness, the bias of the forest usually slightly increases (with respect to the bias of a single non-random tree) but, due to averaging, its variance also decreases, hence yielding an overall better model. • Extremely Randomized Trees [42] - In extremely randomized trees (Extra Trees), the randomness is taken a step further than Random Forests. As in random forests, a random subset of candidate features is used, but instead of looking for the most discriminative thresholds, thresholds are 103 drawn at random for each candidate feature and the best of these randomly- generated thresholds is picked as the splitting rule. • AdaBoost [36, 24] - AdaBoost fits a sequence of weak learners on repeat- edly modified versions of the data. The predictions from all of them are then combined through a weighted majority vote (or sum) to produce the final prediction. The data modifications at each boosting iteration consist of applying and adjusting weights on each of the training samples, so the classifier can focus on learning increasingly difficult instances. • Gradient Tree Boosting [38] - Gradient Boosting produces a prediction model in the form of an ensemble of weak prediction models, typically deci- sion trees. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function. Thus, it is robust to outliers and can handle data with heterogeneous features. The two methods below - MCFS [17] and SPEC [169], are based on spectral clustering and require a similarity matrix along with the data matrix. We used common similarity matrices such as the dot product matrix and Gaussian kernels. MCFS [17]: stands for Multi-Cluster Feature Selection, and it can be used for both supervised and unsupervised feature selection. In the supervised MCFS approach, the dot product kernel or Gaussian kernel or 0-1 kernel are commonly used for the similarity matrix [17]. In the unsupervised approach, k-nearest neigh- bors are used to construct the similarity matrix. We use k = 5. It is a two-step process based on spectral clustering [148]. First, it uses the k smallest eigenvalues (andtheireigenvectors)ofthegraphLaplaciantoembedthehighdimensionaldata 104 into a low dimensional space. Then, L1-regularized least squares optimization is used to find the best features. SPEC [169]: stands for SPECtral Feature Selection and this algorithm provides a unified approach for both supervised and unsupervised feature selection using spectral graph theory. For the unsupervised case, the linear dot product kernel and RBF kernel are popular choices for the similarity matrix. The SF algorithm. Algorithms 3 and 4 present the pseudo-code for the the training and classification phases of our SF algorithm. FEATURE-WEIGHT() rep- resents any of the feature selection algorithms used to assign weights to shapelet features. Since there are z≥n shapelets in total, we have w z weights. Anewinstanceisclassifiedbyfirstobtainingthepredictionsfromthenunivari- ate classifiers. Then, we compute a weighted combination of these decisions (e.g., weighted majority voting) to compute the final label. The weights for n classifiers are computed from the weights of the shapelets provided by the feature selection algorithms. The REDUCE() method represents any procedure for computing the weights for n classifiers using the weights of z≥n shapelets. A simple method is to sum up or average the weights for all shapelets from a certain sensor when a specific sensor provides multiple shapelets. We use a more sophisticated approach that adjusts the weights of the classifiers based on the test instance being classified. We use shapelet-based decision trees to classify univariate time series. When clas- sifying time series j from a multivariate instance using its corresponding decision tree classifier, we track the “path” through the decision tree to remember which shapelets were responsible for the label. We use only the weights of these shapelets instead of all the shapelets in the decision tree to set the weight of the classifier (as average of the weights of shapelets). This gives us an effective data reduction 105 approach as it is specific to each test instance and can capture its classification path to provide better results. Algorithm 3 Shapelet Forests Training 1: procedure Shapelet-Forests-Train(I) 2: I ={(T i ,y i )},T i = [T 1 i ,...,T n i ] 3: for f = 1 to n do . n features 4: I f = [(T f 1 ,y 1 ),..., (T f p ,y p )] . p instances 5: [S f ,DT f ]← FAST-SHAPELETS(I f ) 6: end for 7: z← Σ|S f | 8: D is a matrix (p x z) such that . z shapelets 9: d i,j ← distance(T g i ,s j ) 10: where g is the dimension which produced s j 11: [w 1 ,...,w z ]← FEATURE-WEIGHT(D,y i ) 12: end procedure Algorithm 4 Shapelet Forests Testing 1: procedure Shapelet-Forests-Test(J) 2: J ={(T j )},T j = [T 1 j ,...,T n j ] 3: for f = 1 to n do . n features 4: for j = 1 to q do 5: c f j ← CLASSIFY(T f j ,DT f ) 6: end for 7: end for 8: for f = 1 to z do . z shapelets 9: for j = 1 to q do . q instances 10: w j f ← REDUCE(w f ,S f ) . Classifier weights 11: end for 12: end for 13: pred(j) = Σ f w j f c f j . Predicted Label 14: end procedure 4.4.4 Evaluation In this section, we describe our experimental methodology and evaluation results. 106 Datasets and experimental methodology. Our approaches can be used with any multivariate time series dataset, which has recordings from multiple sen- sors when they were recorded. Both of our proposed approaches also apply to any univariate time series dataset, but they simply reduce to the univariate shapelet extraction method in those cases, and thus, we only evaluate them on multivariate time series datasets. Even though there are a plethora of open time series sensor datasets available, the majority of them are either univariate, or synthetically gen- erated, or not large enough in size for rigorous experiments, or contain data from measurements in different sensors which are not synchronized in time. For our evaluation here, we decided to use the Wafer dataset [84], which is a dataset capturing sensor measurements (for 6 sensors) for the manufacturing pro- cess of silicon wafer chips. The dataset is labeled for each time series instance, on whether the manufacturing process was normal for a chip or anomalous. We use this information as target labels for our classification task. In total, the Wafer dataset contains 7,194 time series instances (equivalent to 1194 multivari- ate instances as there are 6 sensors). This dataset is also highly imbalanced in the distribution of instances across the positive (normal) and negative (anomaly) classes, with only 127 of the 1194 multivariate instances belonging to the negative class (and the rest being positive). This imbalanced behavior is typical of practical real-world sensor datasets, such as those from modern manufacturing plants, since they do not fail very often. We evaluated the effect of this class imbalance on classification performance for various parameters. We also used a second dataset for evaluation, this dataset being from a recent user and gesture identification study performed by Guna et al. in [46]. This dataset contains accelerometer time series data, which can be treated as as a 3- sensor dataset, the sensors measuring acceleration along the x, y and z axes. The 107 data is recorded from various users performing several gestures over the course of multiple days at a 100 Hz sample rate frequency. We consider two gestures - pick-up and shake, performed by many users over the course of 11 days, and the target is to identify whether the gesture performed was a pick-up or a shake from the provided sensor data. We plan to make the various training and testing data subsets used in this paper openly available for reproducibility. Apart from these two datasets, we highlight several energy related applications providing temporal sensor data in Chapter 5 and report on how our proposed algorithms perform on them. For shapelet extraction from univariate time series, we used Fast Shapelets [110], which is the state-of-the-art for univariate shapelet extraction, and unless otherwise specified, we set the minimum shapelet length parameter to be 20, the step size for shapelet candidate search to be 1 (so all candidates are searched), and the maximum shapelet length parameter depending on the length of the shortest time series in the dataset. We experimented with several splits between the training and testing sets in the data, sometimes using as little as 10% of the data for training. Since the shapelet-based algorithms are based on the philosophy of throwing away most of the data and keeping only the very signifi- cant or discriminative portions, they typically perform almost as well with lesser training data as with more, and we observe that from our experiments as well. For the SF algorithm, we evaluated various feature selection methods, and for the ILS algorithm, we experimented with various segment lengths as parameters. Classification performance First, we evaluate the performance of our pro- posed approaches against the baseline Naive Shapelets (NS) approach proposed in Section 4.4.1, which is based on the state-of-the-art univariate shapelet extrac- tion algorithm (Fast Shapelets). The evaluation results for the wafer dataset 108 are presented in Table 4.2, and for the gesture recognition dataset are presented in Table 4.3, with a score of 1 implying perfect classification accuracy on the test dataset. Several training/testing splits of each dataset are considered. The comparison includes three algorithms - NS, the base version of our Shapelet Forests algorithm with majority voting (no feature selection), and our Interleaved Shapelets (ILS) algorithm (for which segment sizes from 5 to 50 were considered). We can observe that either Shapelet Forests (SF) or Interleaved Shapelets (ILS) outperforms NS for all splits of both datasets, with ILS performing best for Wafer and SF performing best for the gesture recognition dataset. Algorithm Train/Test % split 10/90 25/75 50/50 75/25 NS 0.96 0.92 0.92 0.89 SF 0.97 0.97 0.96 0.98 ILS 1.00 1.00 0.97 0.99 Table 4.2: Classification accuracy on Wafer dataset on Naive Shapelets, Shapelet Forests with majority voting (no feature selection) and Interleaved Shapelets. Four train/test percentage splits of the dataset are evaluated as shown. ILS performs the best overall for this dataset. Feature selection for Shapelet Forests. We observed that even without any feature selection, our SF algorithm performs competitively or outperforms the NS algorithm everytime. To enhance it even further, we evaluate adding feature selection to SF. As described in Section 4.4.3, SF can choose from a suite of feature selection algorithms. We evaluate several feature selection approaches to add on to SF here, and the evaluation results on the Wafer dataset are shown in Table 4.4. We can observe that SF with feature selection outperforms both the baseline NS algorithmaswellasSFwithoutfeatureselection(usingmajorityvoting),foralmost all the experimental cases considered. 109 Algorithm Train/Test splits 3d/8d 5d/6d 7d/4d NS 0.91 0.93 0.93 SF 0.93 0.96 0.95 ILS 0.93 0.89 0.89 Table 4.3: Classification accuracy on gesture recognition dataset on Naive Shapelets, Shapelet Forests with majority voting (no feature selection) and Inter- leaved Shapelets. The train/test split shows how many days of data was used for training and how many for testing. The data contains 11 days of measurements in total. We used three splits for evaluation - first 3 days data for training (next 8 days for testing), first 5 days for training (next 6 for testing), and first 7 days for training (next 4 for testing). SF with majority voting performs the best overall for this dataset. Classification with class imbalanced data. We now consider the problem of dealing with imbalanced datasets, which are extremely common in practical real-world sensor data. Typically, modern industrial plants (such as the wafer manufacturing dataset we evaluate) do not have errors or anomalies often, and so there is a lot more data of the ‘normal’ (positive) class as compared to the ‘anomaly’ (or negative) class. To evaluate how the classification performance of our algorithms varies with different class balance ratios, we resampled the Wafer dataset in various ratios of positive to negative examples. We considered all of the negative instances in our derived data subset here, but only used a select number of positive instances (uniformly randomly picked from the remaining data) for each class balance ratio. We experimented with class balance ratios of negative:positive instances in the 1:1 and 1:4 ratios. We also report results on the original full dataset, which has a class balance ratio of 1:8.4. The same ratio was maintained for train and test sets and we experimented across multiple train/test splits. The results are shown in Table 4.5 for the ILS algorithm, and also show the results on 110 Algorithm Train/Test % split 10/90 25/75 50/50 75/25 NS 0.96 0.92 0.92 0.89 SF 0.97 0.97 0.96 0.98 SF+mRMR 0.98 0.99 0.99 0.99 SF+Lasso 0.98 0.99 0.99 0.99 SF+LDA 0.98 0.99 0.99 0.97 SF+QDA 0.98 0.99 0.99 0.97 SF+RandomForest 0.97 0.99 0.99 0.99 SF+AdaBoost 0.97 0.99 0.95 0.98 SF+ExtraTrees 0.98 0.99 0.99 0.99 SF+GradientBoosting 0.98 0.99 0.99 0.99 Table 4.4: Classification accuracy on Wafer dataset using Shapelet Forests with various feature selection methods on different train/test splits. a range of segment sizes between 5 and 50. The results for the SF algorithm are shown in Table 4.6. We observe that in general, the accuracy of ILS is not affected too much by the change in the segment size parameter. For NS, SF and ILS, the accuracy is not affected too much by class imbalance in the data and they are robust to the varyingamountsoftrainingdataaswell. Thesealgorithmsperformwellevenwhen using just 10% of the data for training. This is an advantage of shapelet-based methods, as they rely on throwing away most of the data and keeping only some discriminative subsequences (shapelets). If a good shapelet candidate is quickly found from the data, the algorithm can perform competitively for even very small training sets. Summary. We analyzed the problem of multivariate time series classification that arises in Big Data applications. We described two new complementary multi- variate shape mining algorithms – Shapelet Forests (SF) and Interleaved Shapelets 111 (ILS). SF combines the state-of-the-art shapelet-based algorithm for univariate time series to build an ensemble of classifiers, one for each time series dimension in our multivariate instances. ILS considers the local temporal dependencies between the time series data across all sensors and interleaves data segments from them to build a univariate representation of the multivariate dataset, and then uses the univariate shapelet extraction method for classification. We evaluated both of our algorithms on real world sensor datasets and they outperformed baselines based on state-of-the-art approaches. We also considered various practical issues arising when dealing with real sensor data, and showed that our methods are robust to imbalance in data, smaller amounts of training data and choice of algorithmic parameters. Our approaches add new tools to existing work on multisensor event recognition from data with practical constraints, particularly such as those seen in complex engineering applications, such as energy applications in smart grids and digital oil fields. 112 Class k for ILS Train/Test % split ratio 10/90 25/75 50/50 75/25 Full 5 1.00 0.99 0.96 0.97 data 10 1.00 0.99 0.96 0.99 (1:8.4) 15 1.00 1.00 0.96 0.99 20 1.00 1.00 0.97 0.99 25 0.83 0.98 0.95 0.98 30 1.00 0.94 0.96 0.97 40 0.86 0.92 0.90 0.96 50 0.94 0.94 0.93 0.96 1:1 5 0.86 0.89 0.84 0.89 10 0.86 0.83 0.84 0.93 15 0.86 0.89 0.89 0.89 20 0.71 0.89 0.89 0.88 25 0.71 0.89 0.43 0.93 30 0.71 0.78 0.86 0.91 40 0.71 0.61 0.41 0.46 50 0.71 0.78 0.62 0.47 1:4 5 0.83 0.91 0.88 0.94 10 1.00 0.89 0.88 0.96 15 1.00 0.91 0.88 0.94 20 0.83 0.94 0.89 0.97 25 0.89 0.72 0.87 0.96 30 1.00 0.72 0.92 0.96 40 0.94 0.72 0.75 0.96 50 0.94 0.72 0.86 0.77 Table 4.5: Effect of class imbalance and segment size parameter on performance of our proposed Interleaved Shapelets (ILS) algorithm for different train/test splits of the Wafer dataset. The numbers show classification accuracy on test data (1.00 means 100% classification accuracy). 113 Class balance ratio Train/Test NS SF w/ Split (%) majority 1:1 10/90 0.50 0.89 25/75 0.89 0.96 50/50 0.85 0.91 75/25 0.94 0.97 1:2 10/90 0.92 0.95 25/75 0.94 0.97 50/50 0.92 0.91 75/25 0.96 0.94 1:4 10/90 0.97 0.93 25/75 0.94 0.96 50/50 0.95 0.98 75/25 0.99 0.99 Full 10/90 0.98 0.97 data 25/75 0.95 0.97 (1:8.4) 50/50 0.92 0.98 75/25 0.92 0.98 Table 4.6: Effect of class imbalance and amount of training data 114 Chapter 5 Energy Applications In this chapter, we describe several energy related applications in both event modeling and event recognition which can benefit from our proposed approaches. 5.1 Event Modeling Applications 5.1.1 The PoEM Ontology We developed an ontology for the PoEM model closely following the described concepts in Chapter 3. Each concept translates to an OWL class in the ontology and inter-relationships between concepts are represented by properties. Figure 5.1 shows an overview of the ontology schema 1 . Having an ontology for PoEM enables us to connect to open linked data. For instance, time-related concepts can be borrowed from the W3C Time ontology 2 , and geospatial concepts can be reused from the Geonames ontology 3 . It also enables us to compare against existing models, such as the Simple Event Model (SEM) [145] and provide cross-links to other models. 1 The ontology was developed using Topbraid Composer TM 2 http://www.w3.org/TR/owl-time/ 3 http://www.geonames.org/ontology/ 115 Figure 5.1: PoEM Ontology Classes and Properties. Classes are shown as rectan- gles and literals (resulting from datatype properties) are shown in double rectan- gles. 5.1.2 Pump Failure in an Oil Field Using the PoEM ontology, we model a pump failure scenario in an oilfield. As shown in Figure 5.2, consider a pump (entity in PoEM) which has sensors to constantly monitor the pressure by the load. Excessively high or low pressure values can lead to anomaly events. Consider an event detection rule to detect rapid rates of change in the pressure value of the pump. Using the concepts of measurements, observations and interpretations described earlier, such a high pressure event can be detected, serving as an indicator of pump failure. The state-based goal planning method is triggered next, leading to identification of current and goal states, and the sequence of actions required to move the pump 116 Figure 5.2: Pump failure event in PoEM. og: denotes an oil and gas domain ontology. to the goal state. The associated roles with the actions are queried and relevant personnel such as maintenance engineers are notified. The event context should be stored as the same notification can be sent to the production team as well so as to avoid application or system silos. The proposed event escalation mechanism ensures reduced response times and ultimately, higher oil production. Even though a complex event resulting from multiplicity in observations is shown here, more complex scenarios are possible as described earlier. 5.1.3 Power Blackout Event in a House We model a simple power blackout event in a household. During a blackout event, all appliances in a house have lost power, and thus the sensor measuring 117 voltage will either not show a reading or show a flat zero. In a simplistic case, we formulate a heuristic of repeatedly checking the voltage for a few appliances (heater, refrigerator and lightbulb) to see if their voltage is at a value of zero (or not). It is simple to map this complex event to our PoEM model and ontology as shown in Figure 5.3. The appliances are entities in PoEM, voltage is an observable property with a measurement value, and the ‘constant zero voltage’ interpretation leads to a blackout event. If a blackout scenario is detected, the appliance entities will also be moved into an appropriate PoEM state (if not already there). Details of roles and actions are not shown in our illustration. Appropriate notifications to family members can be automatically sent in case a blackout event is detected. Also, if specified, this notification event can be escalated as required. For instance, if it is found that the blackout is not fixed over a period of time, an electrician maybeautomaticallynotified. Ourillustrationlinkstoadomainontology(nilm:), which contains domain concepts such as the details of appliances and schematics for processes which might occur. Further distinction between event ontologies and domain ontologies can be found in [99] 5.1.4 Maritime Piracy Event Next, we use the example of modeling maritime piracy events to show the extensibleandadaptablecapabilitiesofourmodel. VanHageetal.[145]illustrated the Simple Event Model (SEM) with this example. We represent the same domain using the PoEM model by a simple identification and mapping of piracy event related concepts to PoEM ontology classes. This mapping is shown in Figure 5.4. Note that PoEM enables adding more context than the originally intended purpose of modeling just a piracy event detection scenario based on location of the yacht. UsingPoEM,wecanalsomodeleventnotifications, actions, roles, andescalationto 118 Figure 5.3: A power blackout event in a household; nilm: denotes a NILM domain ontology. complete the event processing cycle. This model can be used to send notifications if a vessel at sea is under possible attack from pirates. 5.1.5 Collision Avoidance in an Automobile Weconsidertheprocessofautomaticallymonitoringsensormeasurementsinan automobile for collision avoidance. We show how this process can be mapped on to the PoEM model through an example. We consider an automobile fitted with two types of sensors that are relevant for prediction of side collisions during merging. These are side-facing Light Detectionand Ranging, orLIDAR, tomeasure distance to nearest object at the side and Inertial Measurement Unit, or IMU, to measure 119 Figure 5.4: Maritime piracy events mapped to PoEM; sem: denotes the simple event model ontology [145] and ex: is a maritime domain ontology. the vehicle’s velocity and orientation. Measurements from these sensors are con- tinuously matched against a complex event profile to detect side collision warning events. Note that this is a complex event with multiplicity in observable properties since the collision warning event depends on both the distance to the side obstacle and the turn angle of the vehicle. Detection of the warning event can change the state of the side collision avoidance state model which in turn initiates a corrective action based on returning to the non-warning goal state. Figure 5.5 illustrates the event detection concepts using the PoEM ontology (the states, actions, and roles of this process are not shown). 120 Figure 5.5: Side collision warning event in a car mapped to PoEM 5.2 Event Recognition Applications 5.2.1 Energy Disaggregation in Smart Grids We demonstrate the utility of shapelets for energy disaggregation on the pub- licly available Building-Level fUlly-labeled dataset for Electricity Disaggregation (BLUED) dataset [4], which contains voltage, current and power measurements for a single family in the United States for one week. Every appliance transition (switching on/off) for each appliance in the house is labeled and time-stamped, thus providing the ground truth for the energy disaggregation task. The data available is downsampled to 60 Hz (from the collection sampling rate of 12 kHz). 121 Real and reactive power, for both phase A and phase B are available in the dataset. We use the real power values from both phase A and B appliances in our experi- ments. The complete BLUED dataset (one week long) contains nearly 37 million data points. We used the first 50% of the data for training, and the next 50% of the data for testing. A typical advantage of shapelet-based algorithms is that they find localized shapelet patterns, and once they find these patterns the rest of the data is discarded, so they often perform well with a smaller fraction of the data used for training than other methods. To evaluate the utility of shapelets on the BLUED dataset, we first need to preprocess the data into the appropriate (labeled time series) format for training and testing. Ground truth events, in the form of timestamps when a certain appliance was switched on/off, are available to us. However the data about power is just one long time series, and ideally, we want multiple training and testing time series instances. To resolve this, we consider a window around each event. As BLUED suggests that each event lasts for at least 5 seconds, we define this window to contain all data points within 1 second before the event and up to 5 seconds after the event. Since each second of data contains 60 data points, the length of our training and testing time series is set at 360 (but can be changed by varying the parameters above). We evaluate the performance of shapelets on detecting events and classifying appliance activity from aggregate electrical power data. The first task is to detect events, i.e. differentiate event data segments from the non-event segments. To achieve this, we extract all event instances, according to the pre-processing steps described earlier. Then, we extract several non-event instances of the same length from the power consumption data (ensuring that no events overlap at all with any of the non-event segments). We perform this sampling of non-event instances 122 in a balanced manner (1:1 ratio of events:non-events) as well as an imbalanced manner (1:4 ratio of events:non-events). This experiment is performed for the power data in both phase A and phase B. The second task is to perform actual disaggregation, i.e. to classify which appliance is actually responsible for which event. This is achieved by a multi-class shapelet-based classifier. We divide the appliances into groups, based on the nature of the appliances and their average power consumption. This gives us multiple classes of appliances for training and classification. Theclassesweuse, alongwithinstancesofappliancesofthoseclasses are mentioned in Table 5.1. Table 5.1: Putting BLUED dataset appliances into classes Phase A appliance classes Class Examples 1. Refrigerator refrigerator 2. Lights backyard lights, washroom light, bedroom light 3. High-Power (> 150W) hair dryer, air compressor, kitchen chopper Phase B appliance classes Class Examples 1. Lights desktop lamp, basement light, closet lights 2. High-Power (> 150W) printer, iron, garage door 3. Low-Power (< 150W) computer, LCD monitor, DVR/blu-ray player The approach proposed in this thesis can be used to model many complex events and represent them in a richer, expressive manner (such as the blackout event case study in Section 5.1.3). This can be done without much additional effort since the shapelet mining method is the same, but can now be connected to a comprehensive event model in PoEM. To evaluate the efficacy of our shapelet-based approach, we compare the clas- sification accuracy (on test data) to ten other popular classifiers. We formed a 123 baseline classifier which always assigns the class label of the majority class in the training data to any test data instance. The other nine classifiers used are imple- mented using scikit-learn [103] in Python with their default parameters. These classifiers are Naive Bayes, Linear Discriminant Analysis (LDA) based classifier, 1-Nearest Neighbor (1-NN) classifier, Logistic Regression based classifier (Logis- tic), Support Vector Machines (SVM), Multilayer Perceptrons (MLP), Decision Tree classifier, Random Forests classifier and AdaBoost. For the event detection experiment, weevaluateourmethodsonboththebalanced(1:1ratioofevents:non- events) as well as imbalanced (1:4 ratio of events:non-events) datasets. For every experiment, evaluation is done on both Phase A and Phase B. The event detection experiment is a binary classification task while the event classification experiment is a 3-class classification task. Table 5.2: Classification accuracy of different classifiers for event detection (detect- ing whether an appliance was switched on/off) Phase Phase A Phase A Phase B Phase B Dataset Balance 1:1 1:4 1:1 1:4 Shapelets 98.6 99.0 98.3 97.9 Baseline 50.0 75.0 50.0 75.0 Naive Bayes 68.1 84.3 65.8 82.4 LDA 68.4 87.4 66.9 83.2 1-NN 99.4 99.5 95.6 97.8 Logistic 94.9 96.8 63.0 83.2 SVM 95.4 95.4 50.0 80.0 MLP 99.1 95.3 89.5 83.0 Decision Tree 94.3 98.1 85.6 93.6 Random Forests 96.6 99.0 91.9 96.0 AdaBoost 91.2 92.1 66.6 83.4 124 Table 5.3: Classification accuracy of different classifiers for event classification (identifying which appliance(s) was switched on/off), which is the energy disaggre- gation step Phase Phase A Phase B Shapelets 83.8 77.9 Baseline 77.3 41.3 Naive Bayes 70.5 52.4 LDA 56.0 48.6 1-NN 90.2 63.7 Logistic 78.8 53.0 SVM 78.0 34.0 MLP 77.3 42.3 Decision Tree 84.5 58.0 Random Forests 85.8 63.4 AdaBoost 80.0 50.8 We present our evaluation results (classification accuracy on test data) for event detection in Table 5.2 and event classification in Table 5.3. We note that our shapelets approach performs at par with other state-of-the-art classifiers. It has been previously reported [160] that the 1-nearest neighbor classifier is hard to beat for time series classification tasks on many datasets, and we agree. In all the experimental cases considered, our shapelet-based classifier is either close in performance to the best performing classifier, or performs the best out of all classifiers. Particularly, we observe that shapelets perform better when the data is more complex or challenging - they perform the best for the imbalanced data (Phase B) of all classifiers, and again, they perform significantly better than any other classifier for the multi-class event classification task (Phase B). Additionally, our shapelet-based approach has several other advantages - (i) the classification step is extremely fast as the training data is discarded once shapelets are found, 125 (ii) it does not make any assumptions about the structure of the data, which does not need to be smooth or differentiable or come from any specific distribution, and (iii) our approach provides visual intuition and interpretability in the form of shapelets - these can be used for feedback from users and domain experts, unlike several machine learning classifiers which behave like a blackbox. 5.2.2 Pump Failure Detection in an Oil Field Electric submersible pumps (ESPs) are one of the main artificial lift methods for extraction of fluid. The failure of an ESP increases operating costs and affects production. Statistical analysis of ESP failures shows that the failure rates of ESPs vary considerably. There are several factors affecting ESP failures, including reservoir type, sand control, and whether the ESP is a replacement. In an effort to understand the causes of failures, the oil and gas industry has attempted to collect data on all potential factors and exchange this information for better data analysis. However, data aggregation is a difficult task considering the myriad ways of collecting and recording the diverse types of data. Instead of attempting to build a complete model of ESP failures from these disparate data sets, we apply machine learning and time-series data processing techniques to automatically learn failure models from only the sensor measurements (with relatively high temporal frequency) collected directly from the ESP. The goal of our research is therefore to understand the limits of failure detection and prediction using only the most accessible sensor measurements. It has been estimated that just a 1% improvement in ESP performance world- widewouldprovideoverahalf-millionadditionalbarrelsofoilperday. Considering the high cost incurred by an ESP failure, early detection and prediction of pump 126 failures of even a subset of all operational assets from readily available data can reduce OPEX (Operational expenditure) and can reduce average maintenance cost We develop a pre-processing method to address the irregular nature of real worldsensordataandtransformthesensormeasurementsintoaformatsuitablefor shapelet-based time series analysis. This method can handle invalid data points, irregularity in sampling intervals and reduce redundant data segments. These are three frequent problems with industrial sensor datasets and are particularly applicable to datasets from the highly instrumented oil and gas industry. Dataset Description. We used data from electrical submersible pumps (ESPs) from a single onshore oilfield in North America. We consider a supervised classification scenario. The classification task is to detect and predict the behav- ior of a new ESP instance over a specified period of time, given sensor readings of its physical attributes (current, voltage and intake pressure). We are provided with labeled data from ESPs defining periods of normal and failure operation. The data ranges from 1 Aug 2011 to 29 July 2013, and measurements are nomi- nally recorded at hourly intervals. We use data from 11 normal operation periods (from 11 ESPs) and 10 failure operation periods (from 8 ESPs, with two ESPs providing two failures each). For these periods, we have sensor data containing time-stamped data points for various attributes or physical properties of the pump. In this paper, we focus on three of these attributes Ð current, voltage, and intake pressure, which are the most readily available sensor measurements. We also have timestamps for all of the data with separate timestamps for insertion date of the data points as well as the last good scan dates (which we use later for ensuring that the data points are valid). Instances of sensor measurements from selected ESPs during both normal and failed operation periods are shown in Figure 5.6 (intake pressure), Figure 5.7 (current), and Figure 5.8 (voltage). These plots show 127 measurements from two normally functioning ESPs and two failed ESPs. These do not represent all the patterns of normal or failure periods since there is a lot of diversity in the data. Our prediction methods need to incorporate such diverse instances to be able to make decisions even for previously unseen instances without overfitting. Shapelets are suitable for this purpose because they are designed to identify the most discriminative segments in the time series instances, irrespective of the nature of the data. Figure 5.6: Examples of intake pressure measurements during normal operation (blue) and failed operation (red) Issues with using the raw dataset. Since real oilfield data is irregularly recorded and contains large missing portions, we have developed an appropriate pre-processing method for use before the described shapelets technique can be used for classification. In particular, we have developed a pre-processing method 128 Figure 5.7: Examples of current measurements during normal operation (blue) and failed operation (red) to obtain clean data segments. The proposed method divides each sensor data stream into windows of a specified segment size (preferably, the time duration over which predictions need to be made) and ensures that the elements within each segment are regularly sampled. Only a specified degree of overlap is allowed between consecutive segments so as to avoid overly similar time segments. A check is performed on whether the sensor value read was new or not, providing only regularly sampled data segments in the end, suitable for input to shapelet based classification approaches. Several factors make the raw dataset, collected directly from sensors in an oilfield, unsuitable for direct analysis via the shapelet mining algorithm (or any generic data mining method). We desire a regularly sampled version that can 129 Figure 5.8: Examples of voltage measurements during normal operation (blue) and failed operation (red) directly be input to a state-of-the-art shapelet finding method (such as Fast Shapelets). To process the dataset, a logical choice would be obtaining one time series per ESP (per failure/normal operation period). However, this is not possible to implement here because of the following three issues. The first issue is that of validity. Since it is possible for sensors to fail in their functioning, some of the data points recorded may not be valid and need to be eliminated. Keeping track of the last good scan timestamp in the data can help us perform this elimination. The next concern is of regularity in the frequency of sampling the data points. In cases where the sensor was not operational, or for other reasons, we found that the data was sampled for some periods at a 1- day frequency instead of the standard 1-hour frequency imposed on the rest of the 130 data. Figure 5.9 shows plots of insertion dates and last good scan dates for a single pump. As seen in the plot, it is not regular for the complete duration of the data stream. Regular intervals between data points are essential for the shapelet finding algorithm. One method to handle this is to interpolate for the missing data points, but this can result in localized artifacts, which are then spuriously interpreted as distinctive patterns in the shapelet-based algorithm. Instead, as described later, we find clean segments where the data is regularly sampled for further use. The third issue is that of redundancy. There may be several data segments in our input training dataset that are very similar to each other with only a few data points being different at the beginning and end of the segments. However, if we do not allow any overlap between the segments at all, then we are left with an extremely limited number of segments and the possibility that the most distinctive patterns are across two segments. To handle this tradeoff, we introduce an overlap parameter, which controls the amount of overlap between consecutive segments. Figure 5.9: Irregularity in sampling of oilfield sensor data. The period between consecutive data points varies between 1-hour and 1-day. The X-axis shows time (in days) and the Y-axis represents the insertion date of the data point (left) and the last good scan date (right). 131 Pre-processing. Since large portions of the data can be missing, we need an effective pre-processing technique to filter useful data. The following filtering step does not modify any of the data or interpolate any values. Our pre-processing method is summarized in Figure 5.10. We extract segments out of each time series for each pump. The first criterion for keeping a segment is to ensure that it is valid. We ensure this by requiring that the insertion date of the data entry is within 1 day of the last good scan timestamp. The second condition is to fix a regular time interval for sampling of the values. We only consider those values, which are recorded 1 hour apart. These two pre-processing conditions give us regularly sampled segments of data. However, each segment is very similar to the previous segment with the exception of just one or two data points, which have changed. To eliminate many of these close segments, we add a third condition, which restricts the degree of overlap between adjacent or consecutive segments. This parameter can be changed to reflect various requirements in specific use cases. The more the allowed overlap, the more the number of segments we get after pre-processing. In our experiments, we allow a 75% overlap between adjacent segments, thus forcing at least one-fourth of consecutive segments to be different from each other. Figure 5.10: Proposed pre-processing workflow 132 Failure event detection and failure event prediction. We describe the two major scenarios of data mining tasks performed via the shapelet mining approach. The first task is failure detection. The failure detection experiment aims to detect whether a given data segment represents a failure or not. It is reac- tive rather than being proactive. For failure event detection, we label the failure event durations as negative class instances while all other segments are treated to be of the positive class. For example, as described earlier in the pre-processing steps, given a segment length parameter, all possible clean data segments (of the specified segment length), which satisfy all the three processing checks, may be extracted. Failure event prediction aims to predict failures before they occur, so that maintenance personnel may be notified in advance, taking a proactive anomaly detection approach. The failure prediction experiment aims to classify whether a given data segment will eventually lead to a failure. We define a user-customizable lookback parameter, which is to be used for finding precursor segments for building thetrainingandtestingdatasetsintheshapelet-basedclassificationapproach. The intuition behind this is that there is a failure event precursor at some point before the failure is detected and we want to make predictions based on including this data point in our training and testing data. We label segments that are precursors to the failure or normal event durations (within the lookback period) as negative or positive class instances respectively. Note that we do not extract or consider any data from the actual failure or normal duration while making a prediction. Typically, the lookback should be larger than the segment length parameter so as to have multiple data segments for training the system. However, if it is too large, then it would cover a lot of redundant data points which may not have the real precursors for the failure event that is about to happen. Since we do not 133 know the point of time when an exact precursor for a failure event occurs, we fix a moderately long lookback period, typically a few times the segment length. However, the shapelet-based algorithm provides good results (as shown in the next section) for different lookback periods irrespective of the fact that we may not have included the ‘real’ failure precursor data point or added too many redundant data points. Figure 5.11 illustrates the extraction of time segments from each sensor dataset for failure detection and prediction. Figure 5.11: Failure Event Detection and Failure Event Prediction Scenarios. The red shaded portion denotes a failure operation and the green operation period denotes a normal operation period. All segments extracted are of the specified segment length. In the detection scenario, the segments are extracted from the actual failure or normal operation duration, while, for prediction, segments are extracted only from the lookback period without utilizing any data in the actual labeled failure or normal periods. 134 We have two major categories of experiments Ð at the segment-level and at the pump-level, and for each category, there are two scenarios Ð failure detection and failure prediction, as described earlier. We used the supervised shapelet finding algorithm, Fast Shapelets, for finding shapelets in all the experiments. Segment-level Failure Detection. The first category of experiments is that of segment-level classification Ð either for failure detection or failure prediction. In this category, we extract data segments from all pumps during pre-processing. All extracted segments (from all pumps) are collected to form the full segment-level dataset. This dataset is split randomly into training and testing sets, which are independent of each other. Note that in this method of evaluation, though a given segment only appears in either the training or test portion, a particular pump can contribute sensor data segments to both the training and test portions. Two shapelets, extracted from one such training experiment, are shown in Figure 5.12. According to the shapelet framework, these segments are the most discriminative for distinguishing failures from working instances. We used various parameters in our experiments. We used segment lengths of 1- day, 2-day and 1-week. The allowed overlap was set such that at least one-fourths of the data is different between consecutive segments. We experimented for 25%- 75% and 50%-50% random splits for the training and testing sets respectively. We compared our results to a baseline classifier, denoted by ZeroR, and to other common classifiers as well. The ZeroR classifier simply assigns the label of the majority class in the training data to a new class instance. Thus, if there are more positive class instances than negative in the training data, it will assign the positive class to every test data instance, and vice-versa. This baseline classifier provides the minimum expected accuracy estimates for this dataset. The accuracy results are shown in Table 5.4 below. 135 Figure 5.12: Two shapelets (shown in red) extracted for failure detection based on intake pressure sensor data (normalized). The segment length was 1 week. The first shapelet (left) is of 35 hours duration and the second one (right) is of 5 hours duration. The data was split into a 50%-50% train-test split (half of the data was used for training) Table 5.4: Accuracy of detecting failures at the pump level Segment length 1 day 2 days 1 week Data in Training (%) 25 50 25 50 25 50 Current 77 77 77 82 72 85 Voltage 81 82 83 85 72 73 Intake Pressure 90 89 90 91 85 96 ZeroR (baseline) 72 72 72 74 73 75 Segment-level failure prediction. For our failure prediction experiment, we considered data segments that appear before the failure or normal events (i.e., data segments are collected from the lookback period). The lookback periods were set as follows: a lookback of 1-week for 1-day segment length, and a lookback of 4-weeks for both the 2-day and the 1-week segment lengths. The accuracy results are shown in Table 5.5 below. The shapelets extracted from training a prediction classifier are shown in Figure 5.13 136 Table 5.5: Accuracy of predicting failures at the pump level Segment length 1 day 2 days 1 week Data in Training (%) 25 50 25 50 25 50 Current 60 68 67 73 71 66 Voltage 58 68 64 69 70 67 Intake Pressure 87 83 73 74 67 71 ZeroR (baseline) 56 53 52 52 52 55 Figure 5.13: Three shapelets (shown in red) extracted for failure prediction based on intake pressure sensor data (normalized). The segment length was 1 week and lookback period was 4 weeks. The shapelet durations are Ð 50 hours, 36 hours and 5 hours. The data was divided into a 50-50 train-test split (half of the data was used for training). Comparison with other machine learning techniques. We compared our results with those of several state-of-the-art machine learning approaches, widely used for pump failure detection. The results are shown in Table 5.6. We observe that shapelet-based methods are comparable to other classifiers. In these experi- ments, the segment length is 1 week and the split for training and testing is 50%. The lookback window for failure prediction is 4 weeks. Pump-level failure detection. The segment-level classification experiments attempt to detect and predict failures given a single segment. However, in a typical 137 Table 5.6: Comparison of accuracy (%) of shapelet-based segment-level failure detection/prediction with other machine learning techniques. The following clas- sifiers are compared to our proposed Fast Shapelets-based method (FS): Logistic Regression, MultiLayerPerceptron(MLP),SupportVectorMachines(SVM)using either Gaussian or radial basis function (rbf) kernel, polynomial kernel, or linear kernel, AdaBoost, Decision Tree (J48), Random Forests (RF) and our baseline classifier (ZeroR) Classifier Failure Detection Failure Prediction Shapelets (FS) 96 71 Logistic 74 55 MLP 93 69 SVM (rbf) 96 76 SVM (poly) 73 55 SVM (linear) 73 55 AdaBoost 93 73 J48 91 75 RF 95 80 ZeroR 75 55 oilfield monitoring application, it is sufficient to detect and predict failures at a given time using all the sensor measurements available up to that time. We can therefore combine the predictions of all the time segments from a given ESP to get a prediction of the failure for that pump. We call this analysis pump-level failure detection and prediction. For pump-level analysis, the train-test split is performed differently from the segment-level experiments. We use leave-one-out (LOO) cross validation for eval- uating pump-level failure detection and prediction. In LOO, only instances (the data segments extracted after pre-processing) from one of the pumps are set aside for testing the learned classifier while all other segments (along with their fail- ure/normal labels) are used for training using the Fast Shapelets algorithm as 138 described earlier. This evaluation is repeated for each pump and the classifica- tion accuracy results are aggregated. Note that unlike segment-level evaluation, none of the data segments from the well set aside for testing are used for training the shapelet-based failure detector or predictor. (In segment-level evaluation, the segments from all the wells were randomly split into training and test instances.) Table 5.7 presents the accuracy of pump-level failure detection. There are 8 ESPs which failed and 11 which functioned normally in the available data (column Failed/Normal?). The number of data segments extracted after pre-processing is shown (column test segments) for each of these 19 cases. The accuracy of applying the shapelet-based classifier for each of the segments from an ESP (the classifier is trained on all the segments from the remaining ESPs) is shown in columns detected failure segments and equivalently as a percentage in column Segment- level accuracy (%). In order to combine the segment-level labels into a single label for the pump, we apply a threshold. Setting this threshold enables the precision- recall performance of the classifier to be traded-off against each other. (Precision is defined as the fraction of all positives labeled by the classifier that are true positives; recall is defined as the fraction of all positives in the dataset that are identified as true positives by the classifier.) Increasing the threshold increases the precision (certainty that identified failures are genuine faults) at the cost of recall (missing some genuine faults). In our evaluation, we set the failure detection rule to classify a pump as a failure if 40% or more of the segments were classified as failure segments. This approach is then able to detect ESP failures with a precision of 89% – 8 out of the 9 pumps labeled as failures had actually failed and a recall of 100% – 8 out of the 8 failed pumps were correctly identified. In the following table, setting this threshold enables Pump 5, 9, and 16 to be labeled correctly even though their segment-level accuracy is less than 100%. 139 Table 5.7: Results of detecting failures at the pump level. Rows in bold indicate pumps that were labeled correctly as failures/normal though their segment-level accuracy is less than 100%. Testsegs indicates the number of test segments, failsegs the number of detected failure segments, and threslabel the detected label of the pump after applying a threshold of 40% pump failed/normal? testsegs failsegs segacc threslabel 1 F 4 4 100 F 3 F 1 1 100 F 4 F 2 2 100 F 5 F 6 4 67 F 6 F 28 28 100 F 7 F 2 2 100 F 8 F 10 10 100 F 9 F 10 4 40 F 10 N 11 0 100 N 11 N 13 13 0 F 12 N 19 0 100 N 13 N 22 0 100 N 14 N 24 0 100 N 15 N 13 0 100 N 16 N 8 2 75 N 17 N 14 0 100 N 18 N 14 0 100 N 19 N 22 2 91 N 20 N 19 0 100 N Pump-level failure prediction. The evaluation described earlier is repeated for pump failure prediction. In this case, all the segments for training and testing are taken from the lookback period. The accuracy results are shown below in Table 5.8. In this case, we apply a threshold of 20%. The failure prediction rule is that a pump is predicted to fail if 20% or more of the segments were classified 140 as failure segments. The pump-level shaplets-based predictor is able to predict ESP failures with a precision of 78% (7 out of the 9 pumps labeled as failures will actually fail) and a recall of 78% (7 out of the 9 pumps that would fail were correctly identified). The computational cost of the Fast Shapelets method that was presented is very different for training and classification. Classification is typically fast since it depends only on the number and length of shapelets identified during train- ing and these are few in number. The shapelets identified during training can be viewed by domain experts and an interpretation with regards to failure causes can be attempted. As expected, prediction of failures is harder than detection (the accuracies of prediction experiments are lower than that of the corresponding detection experiments). Intake pressure is generally a better indicator and predic- tor of failures compared to current and voltage given the accuracy results in the table earlier. As expected, increasing the amount of data used in training gives higher accuracies (25% versus 50%) but the accuracies are comparable in magni- tude. Moreover, the accuracy is comparable with other ML techniques even with low amounts of training. Our novel threshold-based method of detecting pump- level failures is able to achieve relatively high precision of 89% and recall of 100% when the classifier is trained on all data segments available from other pumps. Note that in a fault monitoring application with low failure rates, the operational cost of implementing the monitoring workflow is largely determined by the false alarm rates (100 - precision). Pump-level failure prediction has lower precision and recall (both 78%). The pre-processing procedure and the shapelets algorithm have a few parameters that need to be determined before it can be applied for classification. In our case, a segment length of 1 week, and a lookback period of 4 141 weeks was found to give the best results. The optimal parameters may have to be recomputed for a different dataset. Table 5.8: Results of predicting failures at the pump level. Testsegs indicates the number of test segments, failsegs the number of predicted to fail segments, and threslabel the detected label of the pump after applying a threshold of 20% pump failed/normal? testsegs failsegs segacc threslabel 1 F 2 2 100 F 2 F 8 0 0 N 3 F 9 2 22 F 4 F 12 7 58 F 5 F 12 11 92 F 6 F 5 5 100 F 7 F 2 2 100 F 8 F 12 12 100 F 9 F 12 1 8 N 10 N 5 1 80 N 11 N 5 2 60 N 14 N 8 2 75 N 15 N 8 2 75 N 16 N 8 7 13 F 17 N 6 1 83 N 18 N 10 5 50 N 19 N 9 9 0 F 20 N 6 1 83 N Shapelets-based time series classification is attractive for oil and gas applica- tions due to their fast classification time and the possibility of interpreting the underlying shapelets by domain experts. We adapted the shapelets approach for detecting and predicting ESP failures only from readily available sensor data. This required developing an appropriate pre-processing procedure. In our experiments 142 to evaluate the accuracy of this method, we found that the accuracy is compara- ble to other machine learning-based classifiers even with relatively low amounts of training data. The method is able to detect failures only from intake pressure measurements with a precision of 89% and 100% recall. The accuracy of predicting failures is lower with a precision of 78% at 78% recall. This approach has a few customizable parameters that may have to be adapted to different datasets. 5.2.3 Gas Compressor Valve Failure Prediction Gas compressor failures are frequently caused by breakdown of valves. Since production is dependent on rotating equipment, it is useful to minimize down- time caused by such valve failures, and try to predict them in advance. This is a challenging problem, which we address using our event recognition approach for analysis of the data gathered by a large number of sensors deployed on various parts of the compressor. These sensors take periodic readings (at every few min- utes) of various physical properties of the compressors including motor winding temperatures, compressor vibrations, and pressure and temperature for both suc- tion and discharge at various compression stages. We frame this problem as a multivariate time series classification task. Many existing MTSC approaches make the assumption that the reading of sen- sors are independent, which is not the case for sensor data in gas compressors as variation or anomaly in a valve affect the reading of adjacent sensors. Since all the sensors record data synchronized in time, the temporal dependencies across them need to be captured. Our Interleaved Shapelets (ILS) method, which attempts to incorporate these dependencies into the final shapelet-based classification frame- work. We achieve this using a heuristic of inter-leaving time series data across the sensors. This helps us reduce the multivariate time series data to a univariate 143 format such that existing univariate shapelet extraction methods can be applied directly on the data. We evaluate our approach on real sensor data taken from gas compressors in an oil field in North America. Our dataset consists of sensor data from four-cylinder gas compressors in an oilfield. A compressor has approximately fifty sensors. The sensor functions range from measuring compressor vibrations and motor winding temperatures to sensors measuring the pressure and temperature for both suction and discharge at the various compression stages. Data from some of the sensors is shown in Figure 5.14. Along with this operation data for the compressors, we utilizeda subsetof work orderswith compressorfailuresthat focusedon everything related to valves. The work orders listed dates of reported failure and completion along with comments for each repair. We used information from the work orders to build our labels, thus framing our failure prediction problem as a time series classification problem. To convert the sensor data streams into our required time series training and testing datasets, we partitioned the full sensor stream around the occurrences of failures. A failure report date is available as part of the maintenance records for the gas compressor. However, this report date does not necessarily correspond to the time when the compressor actually failed but rather when a technician created a work order to address a deficiency or respond to a prior open work order. In this work, we label segments as failures if they appear a short while before this failure report date. We expect the actual failure to have occurred before it is noticed by a human operator. Failure windows are set just before these calculated failure times. The size of this window can be set as a parameter in our experiments, but we focused on one-week windows. Since the data in this window is just prior to the failure occurrence, it is likely to be indicative of failure signals before the failure 144 Figure 5.14: Data from some of the sensors in the gas compressor Ð discharge temperature of a cylinder (top left), motor winding temperature (top right), motor vibration (bottom left) and cooler vibration (bottom right) happens. For each labeled failure occurrence, we obtain the data in this failure window, as shown in Figure 5.15 and extract shapelets from this window. Each such data segments will have a failure label associated with it. Next, we select normal instances only from those periods that are not close to any failures. Thus, we pick data segments of the same window size from the rest of the data (normal operation) ensuring that there is no intersection between the cleandatasegmentsandthefailureperiodsorthepre-failureperiodsi.e. thefailure window. These clean data segments, taken from normal operation are assigned a normal class label for classification. Figure 5.16 shows the complete duration of the training dataset for one of the sensors. Green indicates those sections that are labeled as periods of normal 145 Figure 5.15: Pre-processing sensor data Ð a failure window is set just prior to the occurrence of each failure and we use these blocks of data to extract signals indicative of failure operation. Red indicated periods of failed operation. Blue indicates the remaining periods. The vertical lines indicate the reported date of failure in the maintenance record. We mark 1-week windows corresponding to the failure periods. For our evaluation, we use data from six sensors in a compressor, which were identified to be important to the functioning of the system by domain experts. We have time series data for a few years collected at the frequency of once every few minutes. We use the pre-processing approaches described in the previous section work to generate our training and testing time series datasets. We use about one- fourth of the provided training dataset for our validation set. We experiment with a range of segment sizes for fixed interval ILS, denoted by ILS-k where k is the segment size. We also compare our approach to the univariate baseline approach, which ignores the multivariate structure of the data (using Fast Shapelets) as well as Shapelet Forests, which ignores the local temporal dependencies across sensors. We have 211 multivariate training instances (which is equivalent to 211*6 = 1266 univariate time series), and 90 test instances (equivalent to 540 univariate time series). We used 53 of the training instances for the validation set, leaving 146 Figure 5.16: Displaying the complete training data for one sensor 158 instances in our final training set. Using the baseline approaches, we achieved an accuracy of 90% on our dataset (486/540 correct predictions). We also observed the same accuracy when using the Shapelet Forests approach. We experimented with ILS for various segment sizes, and for many of the segment sizes (the k parameter), we were able to achieve the same accuracy of 90%. This shows that our new approach, which works for multivariate time series and can capture more complex shapes in the data, is also able to perform at par with other state-of- the-art approaches. Figure 5.17 shows a plot of the classification accuracy versus segment size parameter (k) in our evaluation. As we can observe, the accuracy is high and stable in the mid-range of sizes (from 20-35), but is lower for very low segment sizes (below 20) or high segment sizes (above 35). We conclude that given the choice of a good k parameter, this approach performs at par with other approaches. 147 Figure 5.17: Classification Accuracy on Test Data vs. Segment Size Parameter in ILS (k). The highest accuracy is found for k = 20, 25 or 35. Our results illustrate that time series approaches based on shapelet mining are valuable for fast prediction of failures from sensor data in oil and gas fields. These approaches provide key insights into the functioning of the individual sensors as well as deliver a visual aid to domain experts for further root cause analysis. 148 Chapter 6 Future Work and Conclusions In this thesis, we tackled two primary problems - event modeling (how to comprehensivelymodeleventsandrelatedhappenings), andeventrecognition(how to automatically event patterns in raw sensor data). In this chapter, we summarize the contributions of this thesis and provide guidelines for related future work. 6.1 Conclusions We observed that existing generic event modeling approaches do not provide a means of representing an end-to-end event processing workflow. We designed a conceptual model, PoEM, to be comprehensive while not being restricted to one application domain. We introduced the foundational, event and processing concepts for building the PoEM model. PoEM includes state-based goal planning andeventescalation. WeillustratedtheapplicabilityofPoEMtoenergyindustries, such as managing failure events in an oil field. The event model presented here is limited to identification of key concepts and relationships between them. We currently assume that all data is available at a central location holding decision- making capability. A future extension to PoEM can account for distributed data sources. Another extension can focus on using semantic computing for event-based middleware and other implementation aspects. We describe initial steps towards this implementation in this chapter later. 149 In the second part of this thesis, we addressed the critical problem of resolv- ing the difference in representation between high-level semantic event models and low-level sensor data streams, and bringing together the communities of semantic computing based event processing and time series data mining. We explored the needforsemanticcomputinginseveralaspectsandfunctioningcomponentsofcom- plex event processing systems. For efficiently detecting and processing events from Big Data streams, we proposed an approach to incorporate an advanced machine learning approach based on time series shapelets into the Process-oriented Event model. Our combined event representation and detection framework is able to rapidly represent events in new application domains, and we also demonstrated its high accuracy in classification and prediction of new events from data streams. We extended the use of this paradigm for multi-sensor data processing as well. We analyzed the problem of multivariate time series classification that arises in Big Data applications. We described two new complementary multivariate shape mining algorithms Shapelet Forests (SF) and Interleaved Shapelets (ILS). SF combines the state-of-the-art shapelet-based algorithm for univariate time series to build an ensemble of classifers, one for each time series dimension in our mul- tivariate instances. ILS considers the local temporal dependencies between the time series data across all sensors and interleaves data segments from them to build a univariate representation of the multivariate dataset, and then uses the univariate shapelet extraction method for classification. We evaluated both of our algorithms on real world sensor datasets and they outperformed baselines based on state-of-the-art approaches. We also considered various practical issues aris- ing when dealing with real sensor data, and showed that our methods are robust to imbalance in data, smaller amounts of training data and choice of algorith- mic parameters. Our approaches add new tools to existing work on multisensor 150 event recognition from data with practical constraints, particularly such as those seen in complex energy applications. Later in this chapter, we describe an excit- ing future extension applying our work on event recognition to the information security domain for malware detection. 6.2 Future Direction: Semantic Middleware for Event Modeling Enterprise integration patterns [57] are a set of design patterns for linking multiple systems using asynchronous messaging interfaces. This approach to sys- tem integration is increasingly popular due to its relatively simple loose coupling requirement. Implementations of these patterns are available in current integra- tion frameworks but these are not semantic in nature. A possible future direction of research is to introduce the concept of automatic management of messaging resources in an integration application via the use of a semantic representation of the enterprise integration patterns. We have developed semantic representations of some of the commonly used integration patterns, which include a description of the expected resource requirements for each pattern. We then demonstrate this approach by considering the design of an application to connect mobile customers to Smart Power Grid companies (for the purpose of near real-time regulation of electricity usage). We illustrate potential savings in messaging resources and auto- matic lifecycle management using real-world sensor data collected in a Smart Grid project. This research would provide the basis for a semantic middleware to plug into our proposed PoEM model, thus making implementation of the event model even more straightforward. 151 A set of integration design patterns, collectively known as “Enterprise Inte- gration Patterns” (EIPs), have been proposed by Hohpe and Woolf [57] that use asynchronous messaging as the predominant framework for system integration. The chief advantage of an asynchronous message-oriented interface for integration is that it requires only a loose coupling between the systems. Thus, relatively few assumptions need to be made regarding the functional capabilities and imple- mentations of each system. The EIPs proposed are a well-defined set of typical integration scenarios and the functional requirements of a message-oriented inter- face for each of these cases. These integration patterns have become popular and are natively supported by commercial as well as open-source integration frame- works and enterprise service bus (ESB) products such as Apache Camel, Mule ESB 1 , and WSO2 Carbon 2 . Current implementations of EIPs are not semantic. In typical usage, an appli- cation would make use of multiple integration patterns connected together. We believe that a semantic wrapper around each of these integration patterns that specifies the message requirements for each pattern would (1) ease the design of a complex integrated system (by formalizing the representation of the integration design), and (2) enable automatic management of the underlying message passing resources. Specifically, if the systems to be integrated are event-driven [136], then a semantic implementation of the asynchronous interfaces would enable the systems to automatically manage the lifecycle of the underlying message channels. We propose to introduce the concept of semantic management of messaging interfaces for system integration. In our initial work, we selected a few common 1 http://mulesoft.org 2 http://wso2.com/products/carbon/ 152 EIPs and designed an ontology to represent these selected patterns. This ontol- ogy includes the most relevant messaging properties for each pattern. We then demonstrate the value of this approach by showing how it can be applied to a particular application linking mobile customers to the demand-response outputs of smart electricity grid operators. 6.2.1 Motivational Scenario in Smart Energy Grids Traditional power grids are now moving to smart grids [75] for more efficient, reliable and sustainable production and distribution of electricity. Huge amounts of data are generated by the swarm of sensors in buildings to record electricity consumption. The data are integrated and analyzed to implement dynamic scal- ing of resources or alternative pricing may be implemented. In smart energy grids, this is known as demand-response. Demand-response further requires communi- cating with multiple data sources, detecting events and anomalies and evaluation of possible responses in realtime. As listed by Wagner et al. [149], supporting diverse participants, using a flexible data schema, and performing complex event processing are all open challenge areas for Smart Grid research. Semantic rep- resentations of concepts and middleware used in Smart Grid systems can enable these requirements through a generic integration architecture. Using EIPs from Hohpe and Woolf [57], we consider a typical demand-response scenario for the Smart Grid in Figure 6.1. The figure shows an alert message, usu- ally generated by an event stream processing system in response to a certain event, such as high electricity consumption in a building. Once the alert is generated, it needs to be enriched with contextual information as well as its intended recipients (e.g. the occupants of the building). When this information is obtained, the alert 153 is sent over an appropriate message channel (such as email or SMS) to the recip- ients. The message content of the alert may be to suggest a reduction in usage or send information about excessive charges, if needed. The challenge then is to manage the creation and deletion of these message channels, and to decide between types of channels (email or SMS) in response to failed messages. Traditionally, this functionality would be explicitly programmed when creating the integration appli- cation. We propose that this functionality be included within the implementation of the EIPs and expose this capability using Semantic Web standards. Figure 6.1: Demand-response scenario in Smart Grid 6.2.2 Related Work Integration Patterns: Enterprise integration patterns [57] have been used for various integration architectures to connect messaging-based systems and per- form enterprise application integration [71]. Umapathy et al. [143] integrate EIPs with web services to build conversation policies for enterprise integration tasks. DSL Route [135] implements EIPs for building a domain specific language to 154 improve messaging. Chen et al. [23] provide an overview of more enterprise inte- gration methods and Vernadat [146] focuses on laying out principles for interop- erability in enterprise systems. Such interoperability can be achieved by using semantic and ontology-based approaches. Design patterns other than EIPs have been proposed for integration of disparate systems. Not all of these patterns are restricted to message-oriented architectures. Semantic Middleware: Semantic web methods have been used in sev- eral enterprise integration approaches, such as semantic business process manage- ment [104] and semantic middleware for the internet of things [63]. The semantic representation for EIPs proposed by us is not stand-alone, and the power of seman- tic web and linked data enables us to find other ontologies which can be integrated with the proposed approach to provide greater value. Such ontologies include the semantic sensor network ontology and the enterprise ontology [144]. The sensor network ontology can be used to represent sensor data measurements, such as those from smart meters, and the Enterprise ontology can represent organizations, people, groups and various other associated enterprise components. Even though there are other related efforts using EIPs and semantic middleware there is no existing approach which brings them together or provides a generic semantic rep- resentation of EIPs for use in ontology-based approaches. It is precisely this sort of representation that we aimed at providing in this work. Smart Grids: An overview of challenges and opportunities for using seman- tics in the Smart Grid is provided by [149]. A model-based semantic method is used for energy management and distribution in the Smart Grid by [112]. Infor- mation integration approaches for the Smart Grid are proposed in [127] through semantic complex event processing and data mining over streams. A cloud-based semantic software architecture for the Smart Grid is proposed by Simmhan et 155 al. [126] to perform big data analytics, demand-response forecasting and secure, storage of data on the cloud. However, EIPs are not investigated in any of the Smart Grid approaches. 6.2.3 Proposed Approach Our approach assumes a semantic workflow for both the data being exchanged between systems and for the implementation of the messaging interface between the systems. Figure 6.2 shows the main components in the proposed approach. If thedatathatistobeexchangedbetweenthesystemsisnotinsemanticformat, itis first transformed into some appropriate semantic representation, such as triples in RDF 3 . Events are then detected from this data stream, typically using a (semantic) event processing engine [20]. The system architect is then expected to integrate thesystemswithappropriateintegrationpatternsusingasemanticimplementation of the integration middleware. The detected events are inputs to the integration middleware. Each integration pattern implemented in the middleware then creates and destroys its required messaging resources (channels, filters, routers, etc.) in response to the event stream. Figure 6.2: Overview of the proposed approach 3 http://www.w3.org/RDF/ 156 In our semantic representation of the integration patterns, each integration pattern is an OWL class and the messaging resources are properties of this class representing the different types of messaging components used in the patterns. Not only the EIPs, but also the data records coming from the sensors themselves can be represented semantically. The semantic sensor network ontology 4 can be used to effectively represent such data records and observations. The ontology of EIPs is described in detail in the following subsection. 6.2.4 Enterprise Integration Patterns Ontology We representeachEIP asan OWL classand refertothe ontology ofall patterns as the enterprise integration patterns ontology (represented by the prefix eip:). Each EIP (class) has its own set of properties linking it to other classes and speci- fying its messaging resource requirements, as described by Hohpe and Woolf [57]. Three example patterns are described here with their semantic representation. An aggregator pattern (eip:Aggregator) is shown in Figure 6.3. Based on the definition in [57], an aggregator is used to combine related messages to out- put one aggregated message. As shown in the figure, aggregator is a type of eip:EI_Pattern, the class of all integration patterns. eip:Aggregator has two properties specifying its input and output messaging resource requirements: (i) eip:givesOutputMsg, which maps to eip:Message, provides the aggregated out- put message and has a one-one property (i.e. each aggregator gives exactly one output message), and (ii) eip:takesInputMsg which also has eip:Message as its range but is a one-many mapping (i.e. each aggregator takes multiple input mes- sages). The eip:Message class, in turn, has object-type properties specifying the 4 http://purl.oclc.org/NET/ssnx/ssn 157 message content, sender and receiver as well as a data-type property recording the timestamp of the message. The dead letter channel pattern (eip:Dead_Letter_Channel), which is a more complex pattern, is shown in Figure 6.4. The dead letter channel pattern is a type of message channel with the additional restriction that all messages in this channel must be dead messages (eip:Dead_Message). Figure 6.5 shows a composite pattern - the dynamic router (eip:Dynamic_Router) which is composed of a message router (eip:Message_Router) and an external rule-base. The rule-base (eip:Rulebase) is a type of resource (eip:Resource) which other patterns may also use and is a sibling class of eip:EI_Pattern. Similarly, other EIPs are implemented in the ontology following their description in [57] as closely as possible. Figure 6.3: Aggregator 158 Figure 6.4: Dead Letter Channel 6.2.5 Evaluation and Discussion Inthissection,wedescribehowsemanticEIPcanbeappliedtothemotivational scenario described earlier to reduce its messaging resources. In the original integration method in Figure 6.1, we observe three buildings whose power consumption is monitored and their recipients notified in case the consumption is high. The recipients for each building include any person who might be associated with consuming electricity in that building. Since it may not be possible to track individuals and their locations, it is useful to include all possible persons who are expected to be present in the building at any time. For instance, in a classroom, this would include everyone registered for the class whether they show up for the class or not. 159 Figure 6.5: Dynamic Router Figure 6.6: Smart Grid demand-response scenario after optimizations In a Smart Grid scenario such as the USC campus micro grid, on which we base our evaluation, a key observation is that several recipients are common to two or more buildings. The same student may reside in a dormitory building, attend lectures in a lecture hall (in a separate building) and work at a third building, all on the same day. As per the scenario in Figure 6.1, such a student would receive three alerts simultaneously if all three buildings have a case of high electricity 160 consumption on a particular day. Further, each individual may be contacted over multiple messaging channels (such as SMS or email) further increasing the number of messages in the system. If we include the number of message re-transmissions due to unavailable subscribers or faulty email/SMS addresses, the total number of alerts is even higher. There is thus a need for efficient message filtering and aggregation so that alert messages are not duplicated with the growth in number of buildings and subscribers. To optimize the original scenario, we start by adding an aggregator to the input message channel after its content has been enriched with the information about the destination recipient. The aggregator pattern can efficiently combine multiple messages routed to the same recipient and provide exactly one output message for each recipient, irrespective of the number of buildings the recipient is associated with. Instead of a separate alert for each building, the user would now receive exactly one message with information about all the buildings he/she is associated with. Next, a dynamic router (which is composed of a message router and a dynamic rule base) is added to decide (dynamically) through which messaging channel (e.g. email or SMS) the alert should be delivered. The content or destination of the message may also be modified based on other rules from the rule base. To optimize further, we propose to keep track of which recipients are not actively receiving messages (for instance, due to incorrect email/SMS addresses). This information can be stored in the dynamic rule base associated with the dynamic router and can be used to make future decisions about sending messages to those subscribers (e.g., remove them from the recipient list, or retry using alternative messaging channels). The resulting optimized scenario in Figure 6.6 eliminates redundant messages from the original scenario, thus saving messaging resources. 161 We next demonstrate the potential savings in messages achieved by the use of EIPs in this scenario. To quantify such savings, we compare the number of alert messages sent. We use power consumption data from campus buildings in the University of SouthernCalifornia(USC)campusmicrogrid. Thisdataisrecordedthroughsmart meters located in each campus building. The dataset contains energy consumption data recorded at 15 minute intervals for approximately three years (from January 1, 2008 to November 21, 2010). We aggregate the 15 minute data for each building to obtain a cumulative daily total of power consumption and use this value in our simulations. To demonstrate our proposed approach in various scenarios, we choose three buildings of different categories for analysis. One of the buildings is a dormitory/residentialbuilding, thesecondbuildinghaslaboratories/lecturerooms, and the third has academic/administrative offices. Daily power consumption for each of these three buildings is shown in Figure 6.7. Figure 6.7: Energy consumption plots for three buildings in the campus 162 For our evaluation setup, we use OpenRDF Sesame 5 as our triple store, Python for data processing and scripting, and SPARQL as the semantic query language. The generation of alerts is typically done by an event processing engine which processes the incoming sensor data streams and detects interesting events, which are linked with alerts. A common alert is to find out whether energy consumption for a day exceeds a certain threshold obtained from historical data, such as a moving average. In our scenario, we implemented a simple query to generate an alert for a particular day when the aggregate daily power consumption for a building on that day exceeds the aggregate power consumption for the same building on the same date in the previous year. The analysis was performed on 1055 days between Jan 1, 2008 and Nov 21, 2010. Out of these, we report results for each day in 2009 and 2010 since there is no previous data to compare against the 2008 values. We simulated the list of occupants of each building using a realistic estimate of the occupancy or association of persons to each building. The simulated occupant list was used as the recipient list for messages about particular buildings. These numbers are shown in Figure 6.8. Since our concept representations and data representations are semantic, the query to generate alerts must also be semantic. The SPARQL representation for the query used is listed below. This query lists all records which satisfy the condition of having a higher cumulative daily total energy use compared to the same day on the previous year for the same building. We query over records of the same class type (building) and retain only those records whose timestamp is exactly 365 days ahead of the previous record and energy consumption value is higher than the previous value. Based upon the output results of this query, semantic representations for alerts are created. The results of such a query can 5 www.openrdf.org 163 Figure 6.8: Number of subscribers in each building also be used to create, destroy and modify the instances of EIP classes themselves (via SPARQL UPDATE and CONSTRUCT statements), thus, leading to overall semantic-driven lifecycle management of enterprise integration patterns. SELECT ?record WHERE { ?record a ?type; eip:hasValue ?val ; eip:hasTimestamp ?time . ?rec_old a ?type; eip:hasTimestamp ?time_old ; eip:hasValue ?val_old. FILTER (?time=?time_old+365 && 164 ?val>?val_old) } We perform the described experiment to compare the number of alert messages generated in the workflow in Figure 6.1 and compare it to the number of alerts in the optimized workflow in Figure 6.6. The resulting total number of difference in alerts (for all three buildings) is shown in Figure 6.9. As expected, optimizing the demand-response scenario with the aggregator and dynamic router EIPs has reduced the number of redundant alerts. Over the complete dataset, 469,100 alerts are generated in the basic workflow, and 441,155 alerts are generated by using the optimized workflow with enriched EIPs, thus, leading to 27,945 less messages being sent in the optimized scenario. This is an improvement of 5.96% over the original number of alerts. Figure 6.9: Savings in the number of alert messages sent in 2009 Enterprise integration patterns are used to loosely couple systems with mes- saging interfaces. We introduced the concept of a semantic implementation of theseintegrationpatternsanddescribedhowsucharepresentationcanencapsulate 165 automatic management of messaging resources. This concept was demonstrated in the context of a Smart Grid demand-response communication application. Our simulations using data from this application showed the potential savings of com- munication messages in the system. Future work would include developing semantic representations of all EIPs. In order to easily realize the potential benefits of managing the messaging interfaces usingsemanticrepresentations, theEIPpatternsinexistingmiddlewaresuiteshave to be wrapped with a semantic layer; this layer would implement the lifecycle of the messaging resources (such as messaging channels) via the specific middleware implementation. 6.3 Future Direction: Shapelets for Malware Detection Malicious software (ÔmalwareÕ) detection systems usually rely on virus signa- tures and cannot stop attacks by malicious files they have never encountered. To stop these attacks, we need statistical learning approaches to identify root patterns behind execution of malware. Using our proposed event recognition approach with time series shapelets, we propose a machine learning approach for detection of malware from portable executable (PE) files. We create an ‘entropy time series’ representation of the content of each file, and then apply a shapelet-based classifier for identifying malware. The shapelet-based approach picks up local discrimina- tive features from the entropy signals. Our approach is file format agnostic, can deal with varying lengths in input instances, and provides fast classification. We evaluate our method on an industrial dataset containing thousands of executable files, and comparison with state-of-the-art methods illustrates the performance of 166 our approach. This preliminary work [101] is the first to use time series shapelets for malware detection and information security applications, and provides an inter- esting future direction of research. 6.3.1 Malware Detection The evolving volume, variety and intensity of vulnerabilities due to malicious software (malware) call for smarter malware detection techniques [125]. Most existing antivirus solutions rely on signature-based detection, which requires past exposure to the malware being detected. Such systems fail at detection of new malware which was previously unseen (zero-day attacks) [13]. A new zero-day was discovered each week on average in 2015 6 . Effective statistical learning approaches can automatically find root patterns behind execution of a malicious file to build a model that can accurately and quickly classify new malware [2, 3, 134, 116, 117, 118, 125]. We propose a new approach for malware detection from Microsoft portable executable (PE) files [106] using an advanced time series classification approach, which can pick up local discriminative features from data. Existing approaches to classification that use entropy analysis [74] or wavelet energy spectrum analysis [79] often only use global properties of the input signal. Modern malware may contain sophisticated malicious code hidden within com- pressed or encrypted files [74, 9]. For instance, parasitic infections and injected shellcode (which is computationally infeasible to model [128]) often rely on packed (compressed) code segments with concealed malicious code. Entropy analy- sis [130, 156, 157] has been used to detect such attacks. As observed by Lyda et al. [74], sections of code with packing or encryption tend to have higher entropy. 6 Symantec Corporation, 2016 Internet Security Threat Report. 167 We choose to represent each PE file by an ‘entropy time series’. We get byte- level content from each file, make non-overlapping chunks of the file content, and compute entropy of each chunk. This gives us the entropy time series (ETS) repre- sentation Ð one time series for each file. Given labeled training data (both benign and malware files), our aim is to classify a test file as safe or malicious. Thus, we frame the malware detection task as a time series classification task. There are multiple challenges involved with malware detection from complex executable files found in the wild [3, 65, 116, 118]. The input data varies in structure, nature, patterns and lengths of time series. Our data has ETS lengths ranging from a couple of data points to more than a hundred thousand data points. Some subsections of ETS show multiple rapid changes in entropy while others are largely flat or zero. There is no obvious hint or giveaway within the structure of the data differentiating the benign files from the malicious ones. We introduce and motivate the use of shapelets in the cybersecurity domain for malware detection and file content analysis. We convert the malware detec- tion problem into a time series classification task, and demonstrate the efficacy of our shapelet-based classification method on entropy time series representations of executable files. From our results, we observe that shapelets can learn useful discriminative patterns from a very small fraction of the training data and perform accurate classification. Further, we show that the distance from the shapelet can be used as a single (or additional) feature for any supervised learning task using an existing classifier. 6.3.2 Related Work Malicious software has become rampant in our daily lives, affecting our com- puters and mobile devices. Static analysis methods examine the content of a 168 file without executing it. As reported in Moser et al. [79], deception techniques such as code transformations, obfuscations and metamorphism can overcome static approaches used by many virus/malware scanners. Siddiqui et al. [125] provide an overview of malware detection approaches using file features. A common source of file features is N-grams (of mnemonics of assembly instructions) [134]. N-grams are sequences of bytes of a certain length, and contain bytes adjacent to each other. These approaches typically process an extremely large number of N-grams. Wavelet transformations [157] are another source for file features. Bilar [12] pro- posed using mnemonics of assembly instructions from file content as a predictor for malware. Statistical machine learning and data science methods [37] have been increasingly used for malware detection, including approaches based on support vector machines, logistic regression, NaŢve Bayes, neural networks, deep learning, wavelet transforms, decision trees and k-nearest neighbors [125, 65, 116]. Entropy analysis [74, 9, 156, 157, 130] is an effective technique for aiding detec- tion of malware by pointing to the possible presence of deception techniques. Despite polymorphism or obfuscation, files with high entropy are more likely to have encrypted sections in them. When a PE file switches between content regimes (i.e. native code, encrypted section, compressed section, text, padding), there are corresponding shifts in its entropy time series [157]. Usually in entropy analysis of files for malware detection, either the mean entropy of the entire file, or entropy of chunks of code sections in the file (as in our approach) are computed. This sim- plistic entropy statistics approach may not be sufficient to detect expertly hidden malware, which for instance, may have additional padding (zero entropy chunks) to pass through high entropy filters. In our approach, we develop a machine learn- ing approach to automatically identify patterns of changes in entropy, which are indicative of malicious file behavior. 169 We believe shapelets are inherently well-suited to malware detection as they identify local discriminative patterns, and identifying these patterns helps identify unknown malware. Shapelets are also appropriate for classifying new time series very quickly as shown in our experiments, as (i) they perform classification on test data very quickly, and (ii) they don’t need a lot of training data to learn the shapelet patterns. As an example consider the following shapelet in Figure 6.10 – the red portion showing the extracted shapelet subsequence within its time series. This was a discriminative shapelet extracted from our dataset, full details of our work are in [101]. This shapelet belongs to a time series instance from the malware class, and might be indicative of malicious behavior with the power to predict similar (or dissimilar) subsequences in new time series. At a broad view, this shapelet signals that a sharp drop in entropy could possibly be indicative of malicious code behavior. In Figure 6.10, the file had constant, high entropy for a large portion and then there was a large entropy drop – both of which could be suggestive of a code obfuscation section. This feature would have been detected if the drop occurred at a different location in the file, but might not have been detected if the scale of the drop was different (shapelet feature matching is invariant to horizontal translationbutnottoverticaldilation). Theseinvariancepropertiesarewell-suited for signaling the potential presence of obfuscated code to a malware detector: the existence, rather than the location, of the obfuscation is what’s important (hence, horizontal invariance), and moreover the drop from high entropy to different levels of low entropy could reflect shifts to different types of content (plain text, native code, padding, etc.; hence, vertical non-invariance). 170 Figure 6.10: Shapelet found from file entropy data 6.3.3 Proposed Approach An overview of our classification approach is shown in Figure 6.11, and it can be summarized in three steps (after we have access to the training and testing files and class labels). Figure 6.11: Overview of our malware classification approach Step(1) is entropy time series creation. In information theory, entropy (more specifically, Shannon entropy [111]) is the expected value of information contained in a message. Generally, the entropy of a variable refers to the amount of disorder (uncertainty) in that variable (for example, the entropy is zero when one outcome is certain). 171 We have multiple labeled instances of both malicious and benign files. We convert each of them to the entropy time series representation. We perform the same preprocessing on the test data files as well (before classification). To do this, we consider contiguous, non-overlapping chunks of the file content and compute entropy for each chunk. The size of each chunk is set at 256 bytes (but can be changed). The decimal representations in these chunks are numbers from 0-255 (hexadecimal equivalent of 00-FF) and correspond to assembly instructions. Entropy is usually defined as: H =− X p(x)logp(x) (6.1) Here, the probability p(x) of each byte (ranging from 0 to 255) is the count (or frequency) of how many times it occurs within the current chunk (normalized by the chunk size). So, the entropy of each chunk is defined as: H =− 255 X i=0 f i log 2 f i (6.2) Entropy only takes into account the probability (here, frequency) of observing a specific outcome, so the meaning of the bytes themselves does not matter here (only their frequencies). The entropy of each chunk is measured in bits and is a non-negative number between 0 and 8, both inclusive. Step (2) shapelet extraction (Training). Once we have the input data in time series format, we can perform shapelet extraction. We use the Fast Shapelets method for this step. This process gives us the shapelets themselves, and a decision tree based on shapelets, along with their corresponding distance thresholds. Each internal node of the tree is a shapelet, while the leaf nodes contain final class label predictions. In the shapelet extraction process, we can choose the minimum 172 and maximum lengths of shapelets we desire to be picked. Alternatively, if the minimum length is set to 2 and the maximum length to the length of the smallest time series in the dataset, then the method can be completely parameter-free. We can see that our approach gradually reduces the files to entropy time series and then to shapelets, in the process of extracting the most relevant and discriminative features out of the data. Step (3) shapelet-based classification (Testing). In the final step of our approach, we use the shapelet-based decision tree to predict the class label of a new test file, presented in the ETS format. The distance of the new time series to the shapelet in the root of the tree is calculated, and based on whether the distance is less or more than the threshold provided, we follow either the left or right subtree. This process goes on iteratively until we have found a path to a leaf node, and then a class prediction for this instance is made. Similarly, we classify all the test data instances. Apart from treating shapelets as a stand-alone classifier, we can also view it as a method to obtain a single feature (or multiple features) to assist an existing classifier that uses many features. This feature is based on the distance of a time series instance to a shapelet discovered from the training data [164, 161]. To reiterate, the distance of a time series (typically longer) to a shapelet subsequence (typically shorter) is obtained by sliding the shapelet across the time series and noting the minimum Euclidean distance between the shapelet subsequence and the corresponding time series subsequences. Intuitively, this is similar to compressing the whole length of a time series instance to just one scalar value Ð its distance from a shapelet. We may have multiple shapelets extracted from a dataset, and so, the distance from each shapelet can become a feature by itself, or they can be aggregated to form a single feature. From our evaluation, we notice that this 173 distanceisindeedapowerfulfeature, andeventhoughitreducesthedimensionality of the data greatly, it still carries a lot of predictive power. 6.3.4 Evaluation We provide selective brief evaluation of our approach here, and for detailed evaluation, results and discussion, we refer the reader to [101]. We use a real industrial dataset of nearly 40,000 Windows portable executable files. Half of them are labeled as malware, and the other half as benign. These files contain a wide range of executables in their type and nature. We focus our experimental evaluation on a subset of 1,599 files from this set, which has also been used by Wojnowicz et al. [156, 157]. We perform several experiments on many different ratios of random training- testingsplitsfromthedata, rangingfromusing10%to90%ofthedatafortraining. Any data instance that is not used for training is held out to be used in the test set. Unless otherwise specified, we set the minimum shapelet length to be found as 5, and the length step size for shapelet search as 1. As the Fast Shapelets approach we use is a randomized algorithm, the accuracy results vary slightly when the method is run multiple times. Unless otherwise noted in our evaluation, we report the mean accuracy of 10 runs (along with the standard deviation) of the shapelets classifier on a dataset. We do not normalize the time series in our dataset Ð all the entropy values lie between 0 and 8 (both inclusive) and a pilot test showed no significant difference in performance between the z-normalized and unnormalized datasets. A challenge dealing with our dataset is that the lengths of the ETS (directly leading from the lengths of the files generating them) are vastly different Ð ranging fromjustafewdatapointstomorethan100,000, withameanlengthof3,725anda 174 median length of 920. For convenience, the authors in [157] proposed grouping the dataset into file size groups based on ETS lengths. We follow the same convention and group all ETS of lengths between [2j, 2j +1) into a groupJ =j. We focus our experiments on a set of 1,599 files satisfying J = 5 (ETS lengths between 32 and 64),thuslimitingourshapeletsearchtolengthsbetween5and32(thesmallesttime series in the dataset restricts the maximum length of shapelet that can be found). This J=5 set has also been picked for evaluation using wavelet transformations by Wojnowicz et al. [157] and thus we picked it to allow comparisons with their wavelet-based method. They proposed a Suspiciously Structured Entropic Change Score (SSECS) feature for malware classification, and we compare the efficacy of our proposed Distance from Shapelet (DFS) feature to that of SSECS. AkeyobservationhereisthatourshapeletlearnsfromdataveryquicklyÐwith only a small percentage used for training. We attempted an extreme experiment to illustrate this. We randomly sampled 1% of our original (near 40,000 time series) dataset with the restriction that the ETS lengths were between 100 and 10,000 (which still includes about 80% of our full dataset). We extracted shapelets (minL=5, maxL=100) and built a shapelet-based decision tree from this 1% set (containing 321 time series from 179 benign and 142 malware files). Nine shapelets were extracted in this case to form the decision tree. We used the rest (99%) of the data (31,980 time series) for testing, and obtained a test accuracy of 67.3%, which is comparable to the accuracy obtained by the wavelet transformation and entropy approach in [157] using 80% of the data for training. The first and most discriminative shapelet found from this experiment, which might be indicative of malware, is shown in Figure 6.10. To use existing popular classifiers and evaluate the utility of the DFS feature, we create a feature representation of the dataset (with DFS, SSECS and statistical 175 features). A set of eight basic statistical features (also used in [157]) captures information about each ETS. They are: • Mean • Standard Deviation (std. dev.) • Signal-to-noise ratio (mean/std. dev.) • Maximum entropy • Fraction of signal with high entropy (>=6.5 bits) • Fraction of signal with zero entropy • Length of time series • Square of length of time series We present results of our classification approach on our chosen dataset of 1,599 files in Figure 6.12. We compare our shapelet-based method to a baseline classifier, which assigns every test data instance to the label of the majority class in the training dataset. Our dataset is quite imbalanced across the two classes (as shown by the baseline classifier), but we also experimented on a resampled dataset with balanced class ratio, with results shown in Figure 6.13. The mean accuracy varies from 87.84% (using 10% of data for training) to 90.55% (using 90% of data for training). We can observe that even on using just 10% of the training data, our classifier has quickly gained a lot of predictive power. We experiment on 9 train-test ratios of the dataset, the fraction used for training ranging from 0.1 to 0.9 (in increments of 0.1). Every time we have a new train-test split ratio, we randomly choose whether an instance in the original dataset goes to the training or testing test. Due to this random splitting, the randomized nature of 176 Figure 6.12: Classification accuracy of our approach shapeletsitselfandpossiblyoverfitting, itmaysometimeshappenthateventhough the fraction of data used for training increases there is no improvement (or a dip) in the prediction accuracy on the test set. The difference between our algorithm’s performance and the baseline is much stronger for the balanced dataset. We intend to explore the utility of the Distance from Shapelet (DFS) feature, possiblyasanadditionalfeatureinanexistingclassifier. Thisexistingclassifiercan be any classifier, and we use these three Ð support vector machines (SVM), logistic regression classifier (LR) and random forests (RF). To implement these classifiers, we used the scikit-learn Python package [103] with their default parameters. It is likely possible to optimize the parameters of each classifier and get higher accuracy, but our main goal is a relative comparison of classification accuracy across our chosen feature sets. We compare performance of each classifier on these feature sets: 177 Figure 6.13: Classification accuracy of our approach on balanced data 1. SSECS – using only the SSECS score 2. DFS – using only the DFS feature 3. SSECS+Stat–usingSSECSscoreandeightstatisticalfeaturesdefinedearlier 4. SSECS+DFS+Stat – using SSECS score, DFS, and eight statistical features. The evaluation results are shown in Figure 6.14 using random forests, Fig- ure 6.15 using logistic regression, and Figure 6.16 for support vector machines. For the DFS feature above in our evaluation, we only use the distance from the first (most discriminative) shapelet extracted Ð this node is the root node of the shapelet decision tree and all instances must pass through the root. For all three classifiers (RF, LR and SVM), SSECS is always outperformed by DFS. When sta- tistical features are added to SSECS, the accuracy increases. However, adding the DFS feature still improves the overall accuracy in a large number of cases. Thus, 178 the DFS feature is useful (as a single feature) to add to an existing classifier. For SVMs, DFS alone outperforms all three other combinations in cases where the fraction of data used for training is small. Figure 6.14: Illustrating the efficacy of the Distance from Shapelet (DFS) feature with Random Forests classifier In future work, a wide range of information security applications can be tested on our proposed event recognition methods using shape mining from temporal data. As shown here, the input data does not need to be in the form of temporal sensor data, but anything which can be converted to a time series representation is suitable for our approaches. 179 Figure 6.15: Illustrating the efficacy of the Distance from Shapelet (DFS) feature with Logistic Regression classifier 6.4 Summary In this thesis, we have laid down foundations for exciting future work in event modeling and event recognition from temporal sensor data. We proposed a com- prehensive event model, incorporated temporal pattern and shape mining into it to make it applicable on raw sensor data, and proposed new algorithmic extensions to current state-of-the-art shape mining methods. Our proposed approaches are applicable in versatile applications ranging beyond energy systems, and the future holds great promise for active research in event-based sensor data processing. 180 Figure 6.16: Illustrating the efficacy of the Distance from Shapelet (DFS) feature with Support Vector Machine classifier 181 Reference List [1] R. Adaikkalavan and S. Chakravarthy. Snoopib: Interval-based event speci- fication and detection for active databases. Data & Knowledge Engineering, 59(1):139–165, 2006. [2] M. Ahmadi, D. Ulyanov, S. Semenov, M. Trofimov, and G. Giacinto. Novel feature extraction, selection and fusion for effective malware family classifi- cation. In Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pages 183–194. ACM, 2016. [3] M. Alazab, S. Venkatraman, P. Watters, and M. Alazab. Zero-day malware detection based on supervised learning algorithms of api call signatures. In Proceedings of the Ninth Australasian Data Mining Conference-Volume 121, pages 171–182. Australian Computer Society, Inc., 2011. [4] K. Anderson, A. Ocneanu, D. Benitez, D. Carlson, A. Rowe, and M. Berges. Blued: Afullylabeledpublicdatasetforevent-basednon-intrusiveloadmon- itoring research. In Proceedings of the 2nd KDD workshop on data mining applications in sustainability (SustKDD), pages 1–5, 2012. [5] K. D. Anderson, M. E. Bergés, A. Ocneanu, D. Benitez, and J. M. Moura. Event detection for non intrusive load monitoring. In IECON 2012-38th Annual Conference on IEEE Industrial Electronics Society, pages 3312–3317. IEEE, 2012. [6] D. Anicic, P. Fodor, S. Rudolph, R. Stühmer, N. Stojanovic, and R. Studer. Etalis: Rule-based reasoning in event processing. Reasoning in EventBased Distributed Systems, 347:99–124, 2011. [7] K. C. Armel, A. Gupta, G. Shrimali, and A. Albert. Is disaggregation the holy grail of energy efficiency? the case of electricity. Energy Policy, 52:213– 234, 2013. [8] N. Batra, A. Singh, and K. Whitehouse. Gemello: Creating a detailed energy breakdown from just the monthly electricity bill. In Proceedings of the 22nd 182 ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2016. [9] D. Baysa, R. M. Low, and M. Stamp. Structural entropy and metamorphic malware. Journal of computer virology and hacking techniques, 9(4):179–192, 2013. [10] M. Berges, E. Goldman, H. S. Matthews, L. Soibelman, and K. Anderson. User-centered nonintrusive electricity load monitoring for residential build- ings. Journal of Computing in Civil Engineering, 25(6):471–480, 2011. [11] B. Berstel. Extending the rete algorithm for event management. In Tem- poral Representation and Reasoning, 2002. TIME 2002. Proceedings. Ninth International Symposium on, pages 49–51. IEEE, 2002. [12] D. Bilar. Opcodes as predictor for malware. International Journal of Elec- tronic Security and Digital Forensics, 1(2):156–168, 2007. [13] L. Bilge and T. Dumitras. Before we knew it: an empirical study of zero-day attacks in the real world. In Proceedings of the 2012 ACM conference on Computer and communications security, pages 833–844. ACM, 2012. [14] R.Blanco, J.Wang, andP.Alencar. Ametamodelfordistributedeventbased systems. In Proceedings of the second international conference on Distributed event-based systems, pages 221–232. ACM, 2008. [15] H. Boley, A. Paschke, and O. Shafiq. Ruleml 1.0: the overarching specifi- cation of web rules. Lecture Notes in Computer Science, 6403(4):162–178, 2010. [16] L. Breiman. Random forests. Machine learning, 45(1):5–32, 2001. [17] D.Cai, C.Zhang, andX.He. Unsupervisedfeatureselectionformulti-cluster data. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’10, pages 333–342, New York, NY, USA, 2010. ACM. [18] S. Chakravarthy and D. Mishra. Snoop: An expressive event specification language for active databases. Data & Knowledge Engineering, 14(1):1–26, 1994. [19] K. M. Chandy. Theory and implementation of a distributed event based platform. In Proceedings of the 10th ACM International Conference on Dis- tributed and Event-based Systems, pages 205–213. ACM, 2016. 183 [20] K. M. Chandy, O. Etzion, and R. von Ammon. The event processing mani- festo. In 2010 Dagstuhl Seminar on Event Processing, 2010. [21] K.-W. Chang, B. Deka, W.-M. W. Hwu, and D. Roth. Efficient pattern- based time series classification on gpu. In Proceedings of the 12th IEEE International Conference on Data Mining, 2012. [22] S. Chaudhuri. What next?: a half-dozen data management research goals for big data and the cloud. In Principles of Database Systems, pages 1–4, 2012. [23] D. Chen, G. Doumeingts, and F. Vernadat. Architectures for enterprise integration and interoperability: Past, present and future. Computers in industry, 59(7):647–659, 2008. [24] M. Collins, R. E. Schapire, and Y. Singer. Logistic regression, adaboost and bregman distances. Machine Learning, 48(1-3):253–285, 2002. [25] G. Cugola and A. Margara. Processing flows of information: From data stream to complex event processing. ACM Comput. Surv., 44(3):15:1–15:62, June 2012. [26] I. David. A model-driven approach for processing complex events. arXiv preprint arXiv:1204.2203, 2012. [27] H. Deng, G. Runger, E. Tuv, and M. Vladimir. A time series forest for classification and feature extraction. Information Sciences, 2013. [28] T. G. Dietterich. Ensemble methods in machine learning. In International workshop on multiple classifier systems, pages 1–15. Springer, 2000. [29] M. Doerr. The cidoc conceptual reference module: an ontological approach to semantic interoperability of metadata. AI magazine, 24(3):75, 2003. [30] M. Eckert, F. Bry, S. Brodt, O. Poppe, and S. Hausmann. Two semantics for CEP, no double talk: Complex event relational algebra (CERA) and its application to XChangeEQ. Reasoning in Event-Based Distributed Systems, pages 71–97, 2011. [31] B. Efron and G. Gong. A leisurely look at the bootstrap, the jackknife, and cross-validation. The American Statistician, 37(1):36–48, 1983. [32] D. Estrin, D. Culler, K. Pister, and G. Sukhatme. Connecting the physical world with pervasive networks. Pervasive Computing, 2002. 184 [33] O. Etzion, Y. Magid, E. Rabinovich, I. Skarbovsky, and N. Zolotorevsky. Context-based event processing systems. In S. Helmer, A. Poulovassilis, and F. Xhafa, editors, Reasoning in Event-Based Distributed Systems, volume 347 of Studies in Computational Intelligence, pages 257–278. Springer Berlin / Heidelberg, 2011. [34] L. Farinaccio and R. Zmeureanu. Using a pattern recognition approach to disaggregate the total electricity consumption in a house into the major end- uses. Energy and Buildings, 30(3):245–259, 1999. [35] C. L. Forgy. Rete: A fast algorithm for the many pattern/many object pattern match problem. Artificial intelligence, 19(1):17–37, 1982. [36] Y. Freund and R. E. Schapire. A desicion-theoretic generalization of on- line learning and an application to boosting. In European conference on computational learning theory, pages 23–37. Springer, 1995. [37] J. Friedman, T. Hastie, and R. Tibshirani. The elements of statistical learn- ing, volume 1. Springer series in statistics Springer, Berlin, 2001. [38] J. H. Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232, 2001. [39] J. Froehlich, E. Larson, S. Gupta, G. Cohn, M. Reynolds, and S. Patel. Disaggregated end-use energy sensing for the smart grid. IEEE Pervasive Computing, 10(1):28–39, Jan 2011. [40] T.-c. Fu. A review on time series data mining. Engineering Applications of Artificial Intelligence, 24(1):164–181, 2011. [41] M. Gashler, C. Giraud-Carrier, and T. Martinez. Decision tree ensemble: Small heterogeneous is better than large homogeneous. In Machine Learning and Applications, 2008. ICMLA’08. Seventh International Conference on, pages 900–905. IEEE, 2008. [42] P. Geurts, D. Ernst, and L. Wehenkel. Extremely randomized trees. Machine learning, 63(1):3–42, 2006. [43] M. F. Ghalwash and Z. Obradovic. Early classification of multivariate tem- poral observations by extraction of interpretable shapelets. BMC bioinfor- matics, 13(1):1, 2012. [44] M. F. Ghalwash, V. Radosavljevic, and Z. Obradovic. Extraction of inter- pretablemultivariatepatternsforearlydiagnostics. In Data Mining (ICDM), 2013 IEEE 13th International Conference on, pages 201–210. IEEE, 2013. 185 [45] D. Gordon, D. Hendler, and L. Rokach. Fast randomized model generation for shapelet-based time series classification. arXiv preprint arXiv:1209.5038, 2012. [46] J. Guna, E. Stojmenova, A. Lugmayr, I. Humar, and M. Pogačnik. User identification approach based on simple gestures. Multimedia tools and appli- cations, 71(1):179–194, 2014. [47] A. Gupta and R. Jain. Managing event information: Modeling, retrieval, and applications. Synthesis Lectures on Data Management, 3(4):1–141, 2011. [48] S. Gupta, M. S. Reynolds, and S. N. Patel. Electrisense: single-point sensing using emi for electrical event detection and classification in the home. In Pro- ceedings of the 12th ACM international conference on Ubiquitous computing, pages 139–148. ACM, 2010. [49] I. Guyon and A. Elisseeff. An introduction to variable and feature selection. The Journal of Machine Learning Research, 3:1157–1182, 2003. [50] K. Hammar. Modular semantic cep for threat detection. In Operations Research and Data Mining ORADM 2012 workshop proceedings. National Polytechnic Institute, 2012. [51] G. He, Y. Duan, T. Qian, and X. Chen. Early prediction on imbalanced multivariate time series. In Proceedings of the 22Nd ACM International Conference on Conference on Information & Knowledge Management, CIKM ’13, pages 1889–1892, New York, NY, USA, 2013. ACM. [52] T. Hesterberg, N. H. Choi, L. Meier, C. Fraley, et al. Least angle and l1 penalized regression: A review. Statistics Surveys, 2:61–93, 2008. [53] J. Hills, J. Lines, E. Baranauskas, J. Mapp, and A. Bagnall. Classifica- tion of time series by shapelet transformation. Data Mining and Knowledge Discovery, 28(4):851–881, 2014. [54] A. Hinze. A-MEDIAS: Concept and Design of an Adaptive Integrating Event Notification Service. PhD thesis, Freie Universitaet Berlin, 2003. [55] A. Hinze, K. Sachs, and A. Buchmann. Event-based applications and enabling technologies. In Proceedings of the Third ACM International Con- ference on Distributed Event-Based Systems, DEBS ’09, pages 1–15, New York, NY, USA, 2009. ACM. [56] A. Hinze and A. Voisard. A parameterized algebra for event notification services. In Temporal Representation and Reasoning, 2002. TIME 2002. Proceedings. Ninth International Symposium on, pages 61–63. IEEE, 2002. 186 [57] G.HohpeandB.Woolf. Enterprise integration patterns: Designing, building, and deploying messaging solutions. Addison-Wesley Professional, 2004. [58] I. Horrocks, P. F. Patel-Schneider, H. Boley, S. Tabet, B. Grosof, M. Dean, et al. Swrl: A semantic web rule language combining owl and ruleml. W3C Member submission, 21:79, 2004. [59] B. Hu, Y. Chen, J. Zakaria, L. Ulanova, and E. Keogh. Classification of multi-dimensional streaming time series by weighting each classifier’s track record. In Data Mining (ICDM), 2013 IEEE 13th International Conference on, pages 281–290, Dec 2013. [60] A. J. Izenman. Linear discriminant analysis. In Modern Multivariate Statis- tical Techniques, pages 237–280. Springer, 2013. [61] M. W. Kadous. Temporal classification: Extending the classification paradigm to multivariate time series. PhD thesis, The University of New South Wales, 2002. [62] K. Kaneiwa, M. Iwazume, and K. Fukuda. An upper ontology for event classifications and relations. AI 2007: Advances in Artificial Intelligence, pages 394–403, 2007. [63] A. Katasonov, O. Kaykova, O. Khriyenko, S. Nikitin, and V. Terziyan. Smart semantic middleware for the internet of things. In Proceedings of the 5-th International Conference on Informatics in Control, Automation and Robotics, pages 11–15, 2008. [64] D. H. Kim, Y. Kim, D. Estrin, and M. B. Srivastava. Sensloc: sensing everyday places and paths using less energy. In Proceedings of the 8th ACM Conference on Embedded Networked Sensor Systems, pages 43–56. ACM, 2010. [65] J. Z. Kolter and M. A. Maloof. Learning to detect malicious executables in the wild. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 470–478. ACM, 2004. [66] A. Kumar, W. Yao, C.-H. Chu, and Z. Li. Ensuring compliance with seman- tic constraints in process adaptation with rule-based event processing. In Proceedings of the 2010 international conference on Semantic web rules, RuleML’10, pages 50–65, Berlin, Heidelberg, 2010. Springer-Verlag. [67] C.LagozeandJ.Hunter. Theabcontologyandmodel. In DC-2001: Interna- tional Conference on Dublin Core and Metadata Applications 2001, volume2, pages 1–18. British Computer Society and Oxford University Press, 2001. 187 [68] J. Lin, E. Keogh, S. Lonardi, and B. Chiu. A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, pages 2–11. ACM, 2003. [69] J. Lines and A. Bagnall. Alternative quality measures for time series shapelets. Intelligent Data Engineering and Automated Learning, 7435:475– 483, 2012. [70] J. Lines, L. M. Davis, J. Hills, and A. Bagnall. A shapelet transform for time series classification. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 289–297. ACM, 2012. [71] D. S. Linthicum. Enterprise application integration. Addison-Wesley Profes- sional, 2000. [72] D. Liu, C. Pedrinaci, and J. Domingue. A framework for feeding linked data to complex event processing engines. In 1st International Workshop on Consuming Linked Data (COLD 2010) at The 9th International Semantic Web Conference (ISWC 2010), 2010. [73] J. Liu and X. Guan. Complex event processing for sequence data and domain knowledge. InMechanic Automation and Control Engineering (MACE), 2010 International Conference on, pages 2899–2902. IEEE, 2010. [74] R. Lyda and J. Hamrock. Using entropy analysis to find encrypted and packed malware. IEEE Security & Privacy, 5(2), 2007. [75] S.MassoudAminandB.F.Wollenberg. Towardasmartgrid: powerdelivery for the 21st century. Power and Energy Magazine, IEEE, 3(5):34–41, 2005. [76] C. Matuszek, J. Cabral, M. Witbrock, and J. DeOliveira. An introduction to the syntax and content of cyc. In Proceedings of the 2006 AAAI Spring Symposium on Formalizing and Compiling Background Knowledge and Its Applications to Knowledge Representation and Question Answering, volume 3864, pages 44–49. AAAI Press, 2006. [77] A.McGovern, D.Rosendahl, R.Brown, andK.Droegemeier. Identifyingpre- dictivemulti-dimensionaltimeseriesmotifs: anapplicationtosevereweather prediction. Data Mining and Knowledge Discovery, 22(1):232–258, 2011. [78] D. Miranker. TREAT: A new and efficient match algorithm. PhD thesis, Ph. D. dissertation, Columbia University, 1987. 188 [79] A. Moser, C. Kruegel, and E. Kirda. Limits of static analysis for malware detection. InComputer security applications conference, 2007. ACSAC 2007. Twenty-third annual, pages 421–430. IEEE, 2007. [80] T. Moser, H. Roth, S. Rozsnyai, R. Mordinyi, and S. Biffl. Semantic event correlation using ontologies. In Proceedings of the Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009 on On the Move to Meaningful Internet Systems: Part II, OTM ’09, pages 1087–1094, Berlin, Heidelberg, 2009. Springer-Verlag. [81] R.Mousheimish, Y.Taher, andK.Zeitouni. Automaticlearningofpredictive rules for complex event processing: doctoral symposium. In Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems, pages 414–417. ACM, 2016. [82] A. Mueen, E. Keogh, and N. Young. Logical-shapelets: an expressive primi- tive for time series classification. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1154–1162. ACM, 2011. [83] D. Mukherjee, S. Banerjee, and P. Misra. Ad-hoc ride sharing application using continuous sparql queries. In Proceedings of the 21st international conference companion on World Wide Web, pages 579–580. ACM, 2012. [84] R. T. Olszewski. Generalized Feature Extraction for Structural Pattern Recognition in Time-series Data. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, USA, 2001. AAI3040489. [85] G. Papamarkos, A. Poulovassilis, and P. Wood. Event-condition-action rule languages for the semantic web. In Workshop on Semantic Web and Databases, 2003. [86] A. Paschke, H. Boley, Z. Zhao, K. Teymourian, and T. Athan. Reaction ruleml 1.0: standardized semantic reaction rules. In International Workshop on Rules and Rule Markup Languages for the Semantic Web, pages 100–119. Springer, 2012. [87] A.Paschke, P.Vincent, A.Alves, andC. Moxey. Tutorialon advanced design patterns in event processing. In Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems, DEBS ’12, pages 324–334, New York, NY, USA, 2012. ACM. [88] N.W.PatonandO.Díaz. Activedatabasesystems. ACM Computing Surveys (CSUR), 31(1):63–103, 1999. 189 [89] O. P. Patri. Modeling and recognition of events from multidimensional data: doctoral symposium. In Proceedings of the 10th ACM International Confer- ence on Distributed and Event-based Systems, pages 430–433. ACM, 2016. [90] O. P. Patri, R. Kannan, A. Panangadan, and V. K. Prasanna. Multivari- ate time series classification using inter-leaved shapelets. In Time Series Workshop in Neural Information Processing Systems (NIPS), 2015. [91] O. P. Patri, A. V. Panangadan, C. Chelmis, R. G. McKee, and V. Prasanna. Predicting failures from oilfield sensor data using time series shapelets. In SPE Annual Technical Conference and Exhibition. Society of Petroleum Engineers, 2014. [92] O. P. Patri, A. V. Panangadan, C. Chelmis, and V. K. Prasanna. Extracting discriminative features for event-based electricity disaggregation. In Tech- nologies for Sustainability (SusTech), 2014 IEEE Conference on, pages 232– 238. IEEE, 2014. [93] O.P.Patri, A.V.Panangadan, V.S.Sorathia, andV.K.Prasanna. Semantic management of enterprise integration patterns: A use case in smart grids. In 10th International Workshop on Information Integration on the Web (IIWeb), IEEE 30th International Conference on Data Engineering (ICDE), 2014. [94] O. P. Patri, N. Reyna, A. Panangadan, V. Prasanna, et al. Predicting com- pressorvalvefailuresfrommulti-sensordata. In SPE Western Regional Meet- ing. Society of Petroleum Engineers, 2015. [95] O. P. Patri, A. B. Sharma, H. Chen, G. Jiang, A. V. Panangadan, and V. K. Prasanna. Extracting discriminative shapelets from heterogeneous sensor data. In Big Data (Big Data), 2014 IEEE International Conference on, pages 1095–1104. IEEE, 2014. [96] O. P. Patri, K. Singh, and P. Szekely. Photo odyssey: Create personalized photography itineraries in realtime. In AI Mashup Challenge Demo at the 10th Extended Semantic Web Conference (ESWC), 2013. [97] O. P. Patri, K. Singh, P. Szekely, A. V. Panangadan, and V. K. Prasanna. Personalizedtripplanningbyintegratingmultimodaluser-generatedcontent. In Semantic Computing (ICSC), 2015 IEEE International Conference on, pages 381–388. IEEE, 2015. [98] O. P. Patri, V. S. Sorathia, A. V. Panangadan, and V. K. Prasanna. The process-oriented event model (PoEM): a conceptual model for industrial events. In Proceedings of the 8th ACM International Conference on Dis- tributed Event-Based Systems, pages 154–165. ACM, 2014. 190 [99] O. P. Patri, V. S. Sorathia, and V. Prasanna. Event-driven information integration for the digital oilfield. In SPE Annual Technical Conference and Exhibition. Society of Petroleum Engineers, 2012. [100] O. P. Patri, A. S. Tehrani, V. K. Prasanna, R. Kannan, A. Panangadan, N. Reyna, et al. Data mining with shapelets for predicting valve failures in gas compressors. In SPE Western Regional Meeting. Society of Petroleum Engineers, 2016. [101] O. P. Patri, M. Wojnowicz, and M. Wolff. Discovering malware with time series shapelets. In Proceedings of the 50th Hawaii International Conference on System Sciences, 2017. [102] H. Paulheim. Efficient semantic event processing: Lessons learned in user interface integration. The Semantic Web: Research and Applications, pages 60–74, 2010. [103] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Pas- sos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit- learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011. [104] C. Pedrinaci, J. Domingue, and A. K. A. de Medeiros. A core ontology for business process analysis. In European Semantic Web Conference, pages 49–64. Springer, 2008. [105] H. Peng, F. Long, and C. Ding. Feature selection based on mutual informa- tion criteria of max-dependency, max-relevance, and min-redundancy. Pat- tern Analysis and Machine Intelligence, IEEE Transactions on, 27(8):1226– 1238, 2005. [106] M. Pietrek. Peering inside the pe: a tour of the win32 (r) portable executable file format. Microsoft Systems Journal-US Edition, pages 15–38, 1994. [107] M. Popescu, P. Lorenz, and J. Nicod. A semantic-oriented framework for sys- tem diagnosis. International Journal On Advances in Telecommunications, 3(3 and 4):173–193, 2011. [108] A. Poulovassilis, G. Papamarkos, and P. T. Wood. Event-condition-action rule languages for the semantic web. In International Conference on Extend- ing Database Technology, pages 855–864. Springer, 2006. [109] O. J. Prieto, C. J. Alonso-González, and J. J. Rodríguez. Stacking for mul- tivariate time series classification. Pattern Analysis and Applications, pages 1–16, 2013. 191 [110] T. Rakthanmanon and E. Keogh. Fast shapelets: A scalable algorithm for discovering time series shapelets. In Proceedings of the 13th SIAM interna- tional conference on data mining, pages 668–676. SIAM, 2013. [111] A. Rényi et al. On measures of entropy and information. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, volume 1, pages 547–561, 1961. [112] S. Rohjans, M. Uslar, and H. Juergen Appelrath. Opc ua and cim: Seman- tics for the smart grid. In Transmission and Distribution Conference and Exposition, 2010 IEEE PES, pages 1–8. IEEE, 2010. [113] P. Rosales and J. Jung. Semantic annotation on event pattern languages for complexeventprocessinginubiquitouslogistics. In Communication Software and Networks (ICCSN), 2011 IEEE 3rd International Conference on, pages 160–163. IEEE, 2011. [114] A. Scherp, T. Franz, C. Saathoff, and S. Staab. F–a model of events based on the foundational ontology dolce + dns ultralight. In Proceedings of the fifth international conference on Knowledge capture, pages 137–144. ACM, 2009. [115] K.-U. Schmidt, R. Stühmer, and L. Stojanovic. Blending complex event processing with the rete algorithm. In Proceedings of iCEP2008: 1st Inter- national Workshop on Complex Event Processing for the Future Internet, volume 412. Citeseer, 2008. [116] M. G. Schultz, E. Eskin, F. Zadok, and S. J. Stolfo. Data mining methods for detection of new malicious executables. In Security and Privacy, 2001. S&P 2001. Proceedings. 2001 IEEE Symposium on, pages 38–49. IEEE, 2001. [117] A. Shabtai, R. Moskovitch, Y. Elovici, and C. Glezer. Detection of malicious code by applying machine learning classifiers on static features: A state-of- the-art survey. information security technical report, 14(1):16–29, 2009. [118] M. Z. Shafiq, S. M. Tabish, F. Mirza, and M. Farooq. Pe-miner: Min- ing structural information to detect malicious executables in realtime. In International Workshop on Recent Advances in Intrusion Detection, pages 121–141. Springer, 2009. [119] H. Shao, M. Marwah, and N. Ramakrishnan. A temporal motif mining approach to unsupervised energy disaggregation. In Proceedings of the 1st International Workshop on Non-Intrusive Load Monitoring, Pittsburgh, PA, USA, volume 7, 2012. 192 [120] H. Shao, M. Marwah, and N. Ramakrishnan. A temporal motif mining approach to unsupervised energy disaggregation: Applications to residential and commercial buildings. In AAAI, 2013. [121] R. Shaw, R. Troncy, and L. Hardman. Lode: Linking open descriptions of events. The Semantic Web, pages 153–167, 2009. [122] S. R. Shaw, S. B. Leeb, L. K. Norford, and R. W. Cox. Nonintrusive load monitoring and diagnostics in power systems. Instrumentation and Measure- ment, IEEE Transactions on, 57(7):1445–1454, 2008. [123] J. Shishido and E. U. Solutions. Smart meter data quality insights. In ACEEE Summer Study on Energy Efficiency in Buildings, 2012. [124] P. Siano. Demand response and smart gridsŮa survey. Renewable and Sus- tainable Energy Reviews, 30:461–478, 2014. [125] M. Siddiqui, M. C. Wang, and J. Lee. A survey of data mining techniques for malware detection using file features. In Proceedings of the 46th Annual Southeast Regional Conference on XX, pages 509–510. ACM, 2008. [126] Y. Simmhan, S. Aman, A. Kumbhare, R. Liu, S. Stevens, Q. Zhou, and V. Prasanna. Cloud-based software platform for big data analytics in smart grids. Computing in Science & Engineering, 15(4):38–47, 2013. [127] Y. Simmhan, Q. Zhou, and V. Prasanna. Semantic information integration for smart grid applications. Green IT: Technologies and Applications, page 361, 2011. [128] Y. Song, M. E. Locasto, A. Stavrou, A. D. Keromytis, and S. J. Stolfo. On the infeasibility of modeling polymorphic shellcode. In Proceedings of the 14th ACM conference on Computer and communications security, pages 541–551. ACM, 2007. [129] V. Sorathia. Dynamic Information Management Methodology with Situation Awareness Capability. PhD thesis, Dhirubhai Ambani Institute of Informa- tion and Communication Technology, 2008. [130] I. Sorokin. Comparing files using structural entropy. Journal in computer virology, 7(4):259–265, 2011. [131] S.Spiegel, J.Gaebler, A.Lommatzsch, E.DeLuca, andS.Albayrak. Pattern recognition and classification for multivariate time series. In Proceedings of the fifth international workshop on knowledge discovery from sensor data, pages 34–42. ACM, 2011. 193 [132] N. Stojanovic, L. Stojanovic, D. Anicic, J. Ma, S. Sen, and R. Stühmer. Semantic complex event reasoning - beyond complex event processing. In Foundations for the Web of Information and Services’11, pages 253–279, 2011. [133] S. Suhothayan, K. Gajasinghe, I. Loku Narangoda, S. Chaturanga, S. Per- era, and V. Nanayakkara. Siddhi: a second look at complex event processing architectures. In Proceedings of the 2011 ACM workshop on Gateway com- puting environments, pages 43–50. ACM, 2011. [134] G. Tahan, L. Rokach, and Y. Shahar. Mal-id: Automatic malware detec- tion using common segment analysis and meta-features. Journal of Machine Learning Research, 13(Apr):949–979, 2012. [135] X. Tang, X. Luo, X. Mi, X. Yuan, and D. Chen. Dsl route: An efficient integration solution for message routing. In Semantics, Knowledge and Grid, 2009. SKG 2009. Fifth International Conference on, pages 436–437. IEEE, 2009. [136] H. Taylor, A. Yochem, L. Phillips, and F. Martinez. Event-driven archi- tecture: How SOA enables the real-time enterprise. Addison-Wesley Profes- sional, 2009. [137] K.TaylorandL.Leidinger. Ontology-drivencomplexeventprocessinginhet- erogeneous sensor networks. The Semanic Web: Research and Applications, pages 285–299, 2011. [138] K. Teymourian and A. Paschke. Semantic rule-based complex event process- ing. Rule Interchange and Applications, pages 82–92, 2009. [139] K. Teymourian and A. Paschke. Enabling knowledge-based complex event processing. In Proceedings of the 2010 EDBT/ICDT Workshops, page 37. ACM, 2010. [140] K. Teymourian and A. Paschke. Semantic enrichment of event stream for semantic situation awareness. In Semantic Web, pages 185–212. Springer, 2016. [141] K. Teymourian, M. Rohde, and A. Paschke. Knowledge-based processing of complex stock market events. In Proceedings of EDBT 2012, 2012. [142] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996. 194 [143] K. Umapathy and S. Purao. Designing enterprise solutions with web ser- vices and integration patterns. In Services Computing, 2006. SCC’06. IEEE International Conference on, pages 111–118. IEEE, 2006. [144] M. Uschold, M. King, S. Moralee, and Y. Zorgios. The enterprise ontology. The knowledge engineering review, 13(01):31–89, 1998. [145] W. Van Hage, V. Malaisé, R. Segers, L. Hollink, and G. Schreiber. Design and use of the simple event model (SEM). Web Semantics: Science, Services and Agents on the World Wide Web, 2011. [146] F. Vernadat. Interoperable enterprise systems: Principles, concepts, and methods. Annual Reviews in Control, 31(1):137 – 145, 2007. [147] A. Voisard and H. Ziekow. Architect: A layered framework for classifying technologies of event-based systems. Information Systems, 36(6):937–957, 2011. [148] U. Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395–416, 2007. [149] A. Wagner, S. Speiser, and A. Harth. Semantic web technologies for a smart energy grid: Requirements and challenges. In In proceedings of 9th Interna- tional Semantic Web Conference (ISWC2010), pages 33–37, 2010. [150] K. Walzer, T. Breddin, and M. Groch. Relative temporal constraints in the rete algorithm for complex event detection. In Proceedings of the second international conference on Distributed event-based systems, pages 147–155. ACM, 2008. [151] F. Wang, S. Liu, P. Liu, and Y. Bai. Bridging physical and virtual worlds: complex event processing for rfid data streams. Advances in Database Technology-EDBT 2006, pages 588–607, 2006. [152] Y.-W. Wang and E. N. Hanson. A performance comparison of the rete and treat algorithms for testing database rule conditions. In Data Engineering, 1992. Proceedings. Eighth International Conference on, pages 88–97. IEEE, 1992. [153] S. Wasserkrug, A. Gal, O. Etzion, and Y. Turchin. Efficient processing of uncertain events in rule-based systems. Knowledge and Data Engineering, IEEE Transactions on, 24(1):45–58, 2012. [154] M. Weidlich, J. Mendling, and A. Gal. Net-based analysis of event processing networks – the fast flower delivery case. In Application and Theory of Petri Nets and Concurrency, pages 270–290. Springer, 2013. 195 [155] U. Westermann and R. Jain. E - a generic event model for event-centric multimedia data management in echronicle applications. In Proceedings of the 22nd International Conference on Data Engineering Workshops, ICDEW ’06, pages 106–, Washington, DC, USA, 2006. IEEE Computer Society. [156] M. Wojnowicz, G. Chisholm, and M. Wolff. Suspiciously structured entropy: Wavelet decomposition of software entropy reveals symptoms of malware in the energy spectrum. In FLAIRS Conference, pages 294–298, 2016. [157] M. Wojnowicz, G. Chisholm, M. Wolff, and X. Zhao. Wavelet decompo- sition of software entropy reveals symptoms of malicious code. Journal of Innovation in Digital Ecosystems, 3(2):130–140, 2016. [158] M. Worboys and K. Hornsby. From objects to events: Gem, the geospatial event model. Geographic Information Science, pages 327–343, 2004. [159] X. Wu, X. Zhu, G.-Q. Wu, and W. Ding. Data mining with big data. IEEE transactions on knowledge and data engineering, 26(1):97–107, 2014. [160] X. Xi, E. Keogh, C. Shelton, L. Wei, and C. A. Ratanamahatana. Fast time series classification using numerosity reduction. In Proceedings of the 23rd international conference on Machine learning, pages 1033–1040. ACM, 2006. [161] Z. Xing, J. Pei, and E. Keogh. A brief survey on sequence classification. ACM Sigkdd Explorations Newsletter, 12(1):40–48, 2010. [162] Z. Xing, J. Pei, S. Y. Philip, and K. Wang. Extracting interpretable features for early classification on time series. In SDM, volume 11, pages 247–258. SIAM, 2011. [163] Y. Xu, P. Wolf, N. Stojanovic, and H.-J. Happel. Semantic-based complex event processing in the aal domain. In 9th International Semantic Web Con- ference (ISWC2010), November 2010. [164] L. Ye and E. Keogh. Time series shapelets: a new primitive for data min- ing. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 947–956. ACM, 2009. [165] L. Ye and E. Keogh. Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. Data mining and knowledge discovery, 22(1-2):149–182, 2011. [166] H. Yoon, K. Yang, and C. Shahabi. Feature subset selection and feature ranking for multivariate time series. Knowledge and Data Engineering, IEEE Transactions on, 17(9):1186–1198, 2005. 196 [167] J. Zakaria, A. Mueen, and E. Keogh. Clustering time series using unsupervised-shapelets. In 2012 IEEE 12th International Conference on Data Mining, pages 785–794. IEEE, 2012. [168] M. Zeifman, C. Akers, and K. Roth. Nonintrusive appliance load monitoring (nialm) for energy control in residential buildings: Review and outlook. In IEEE transactions on Consumer Electronics. Citeseer, 2011. [169] Z. Zhao and H. Liu. Spectral feature selection for supervised and unsu- pervised learning. In Proceedings of the 24th International Conference on Machine Learning, ICML ’07, pages 1151–1157, New York, NY, USA, 2007. ACM. [170] Q. Zhou, Y. Simmhan, and V. Prasanna. Incorporating semantic knowledge into dynamic data processing for smart power grids. The Semantic Web– ISWC 2012, pages 257–273, 2012. [171] T. Zhu, A. Bakshi, V. Prasanna, and K. Gomadam. Applying semantic web techniques to reservoir engineering: Challenges and experiences from event modeling. In Information Technology: New Generations (ITNG), 2010 Seventh International Conference on, pages 586–591. IEEE, 2010. [172] D. Zimmer and R. Unland. On the semantics of complex events in active database management systems. In Data Engineering, 1999. Proceedings., 15th International Conference on, pages 392–399. IEEE, 1999. [173] A. Zoha, A. Gluhak, M. A. Imran, and S. Rajasegarar. Non-intrusive load monitoring approaches for disaggregated energy sensing: A survey. Sensors, 12(12):16838–16866, 2012. 197
Abstract (if available)
Abstract
The ubiquitous nature of sensors and smart devices collecting more and more data from industrial and engineering equipment (such as pumps and compressors in oilfields or smart meters in energy grids) has led to new challenges in faster processing of temporal data to identify critical happenings (events) and respond to them. We deal with two primary challenges in processing events from temporal sensor data: (i) how to comprehensively model events and related happenings (event modeling), and (ii) how to automatically recognize event patterns from raw multi-sensor data (event recognition). ❧ The event modeling problem is to build a comprehensive event model enabling complex event analysis across diverse underlying systems, people, entities, actions and happenings. We propose the Process-oriented Event Model for event processing that attempts a comprehensive representation of these processes, particularly those seen in modern energy industries and sensor data processing applications. This model brings together, in a unified framework, the different types of entities that are expected to be present at different stages of an event processing workflow and a formal specification of relationships between them. ❧ Using event models in practice requires detailed domain knowledge about a variety of events based on raw data. We propose to learn this domain knowledge automatically by using recent advances in time series classification and shape mining, which provide methods of identifying discriminative patterns or subsequences (called shapelets). These methods show great potential for real sensor data as they don't make assumptions about the nature, source, structure, distribution, or stationarity of input time series, provide visual intuition, and perform fast event classification. By combining shape extraction and feature selection, we extend this temporal shape mining paradigm for processing data from multiple sensors. We present evaluation results to illustrate the performance of our approaches on real-world sensor data.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Modeling dynamic behaviors in the wild
PDF
Efficient and accurate in-network processing for monitoring applications in wireless sensor networks
PDF
A complex event processing framework for fast data management
PDF
From matching to querying: A unified framework for ontology integration
PDF
Prediction models for dynamic decision making in smart grid
PDF
Efficient data collection in wireless sensor networks: modeling and algorithms
PDF
Robust real-time algorithms for processing data from oil and gas facilities
PDF
Transforming unstructured historical and geographic data into spatio-temporal knowledge graphs
PDF
Data-driven methods for increasing real-time observability in smart distribution grids
PDF
Model-driven situational awareness in large-scale, complex systems
PDF
Heterogeneous graphs versus multimodal content: modeling, mining, and analysis of social network data
PDF
Application of data-driven modeling in basin-wide analysis of unconventional resources, including domain expertise
PDF
Mining and modeling temporal structures of human behavior in digital platforms
PDF
Utilizing real-world traffic data to forecast the impact of traffic incidents
PDF
Responsible AI in spatio-temporal data processing
PDF
Adaptive and resilient stream processing on cloud infrastructure
PDF
Event-centric reasoning with neuro-symbolic networks and knowledge incorporation
PDF
Integrated reservoir characterization for unconventional reservoirs using seismic, microseismic and well log data
PDF
Multiple humnas tracking by learning appearance and motion patterns
PDF
Efficient and accurate object extraction from scanned maps by leveraging external data and learning representative context
Asset Metadata
Creator
Patri, Om Prasad
(author)
Core Title
Modeling and recognition of events from temporal sensor data for energy applications
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
04/19/2017
Defense Date
02/17/2017
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
complex event processing,data science,digital energy,event model,events,feature selection,machine learning,multivariate time series classification,OAI-PMH Harvest,oilfields,sensor data,shapelets,smart meters,time series
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Prasanna, Viktor K. (
committee chair
), Ershaghi, Iraj (
committee member
), McLeod, Dennis (
committee member
)
Creator Email
ompatri@gmail.com,patri@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-362277
Unique identifier
UC11258993
Identifier
etd-PatriOmPra-5238.pdf (filename),usctheses-c40-362277 (legacy record id)
Legacy Identifier
etd-PatriOmPra-5238.pdf
Dmrecord
362277
Document Type
Dissertation
Rights
Patri, Om Prasad
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
complex event processing
data science
digital energy
event model
feature selection
machine learning
multivariate time series classification
oilfields
sensor data
shapelets
smart meters
time series