Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
Computer Science Technical Report Archive
/
USC Computer Science Technical Reports, no. 939 (2013)
(USC DC Other)
USC Computer Science Technical Reports, no. 939 (2013)
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Flexible and Efficient Sensor Fusion for Automotive Apps Yurong Jiang, Hang Qiu, Matthew McCartney, William G. J. Halfond, Fan Bai, Donald Grimm, Ramesh Govindan {yurongji,hangqiu,mmcartn,halfond,ramesh}@usc.edu,{fan.bai,donald.grimm}@gm.com ABSTRACT Automotive apps can improve efficiency, safety, comfort, and longevity of vehicular use. These apps achieve their goals by continuously monitoring sensors in a vehicle, and fusing them with information from cloud databases in order to detect events that are used to trigger actions (e.g., alerting a driver, turning on fog lights, screening calls). However, modern vehicles have several hundred sensors that describe the low level dynamics of vehicular subsystems, these sen- sors can be fused in complex ways together with cloud in- formation, and the parameters of the fusion algorithms them- selves may depend upon user behavior. Furthermore, fusion algorithms may incur significant costs in acquiring sensor and cloud information. In this paper, we propose a program- ming framework called AUTOLOG to simplify the task of programming these event detection algorithms. AUTOLOG uses Datalog to express sensor fusion, but incorporates novel query optimization methods that can be used to minimize bandwidth usage, energy or latency, without sacrificing cor- rectness of query execution. Experimental results on a pro- totype show that AUTOLOG can reduce latency by 4-7× rel- ative to an unoptimized Datalog engine. 1. INTRODUCTION Many mobile app marketplaces feature automotive apps that provide in-car infotainment, or record trip information for later analysis. With the development of systems like Mercedes-Benz mbrace [3], Ford Sync [1], and GM On- Star [2], it is clear that auto manufacturers see significant value in integrating mobile devices into the car’s electronic ecosystem as a way of enhancing the automotive experience (§6). Because of this development, in the near future we are likely to see many more automotive apps in mobile market- places. An important feature of automobiles that is likely to play a significant part in the development of future automo- tive apps is the availability of a large number of vehicular sensors. These sensors describe the instantaneous state and performance of many subsystems inside a vehicle, and rep- resent a rich source of information, both for assessing ve- hicle behavior and driver behavior. At the same time, there has been an increase on the availability of cloud-based infor- mation that governs the behavior of vehicles: topology and terrain, weather, traffic conditions, speed restrictions etc. As such, we expect that future automotive apps will likely fuse vehicular sensors with cloud-based information as well as sensors on the mobile device itself to enhance the perfor- mance, safety, comfort, or efficiency of vehicles (§2). For example, apps can monitor vehicular sensors, GPS location, and traffic and weather information to determine whether the car is being driven dangerously, and then take appropriate action (e.g., screen calls, alert the driver). Similarly, an app may be able to warn drivers of impending rough road con- ditions, based both on the availability of cloud-based road surface condition maps and an analysis of vehicle comfort settings (e.g., suspension stiffness). In this paper, we consider automotive apps that perform sensor and cloud information fusion. Many of these apps can be modeled as continuously fusing vehicular sensors with cloud information, in order to detect events. In the exam- ples above, a car being driven dangerously, or over a patch of rough road, constitutes an event, and sensor fusion algo- rithms continuously evaluate sensor readings to determine when an event occurs or to anticipate event occurrence. Within this space of apps, we focus on two programming pain points. First, because cars can have several hundred sensors each of which describes low-level subsystem dy- namics, and the cloud-based information can be limitless, determining the right combinations of sensors and cloud in- formation to detect events can be challenging. For instance, whether someone is driving dangerously can depend not just on vehicle speed, but on road curvature, the speed limit, the road surface conditions, traffic, visibility etc. As such, pro- grammers will likely need to build their event detectors in a layered fashion, first by building lower-level sensing ab- stractions, and then combining these abstractions to develop more sophisticated event detectors. In the example above, a programmer can layer the dangerous driving detector by first building an abstraction for whether the driver is speeding (using car speed sensors and cloud-speed limit information), then an abstraction for whether this speed is likely to cause the driver to lose control (by analyzing the car’s turn radius vis-a-vis the curvature of the road), and combine these two abstractions to design the final detector. Beyond compre- hensibility and ease of programming, this layered approach 1 has the benefit of re-use: sensor abstractions can be re-used in multiple situations. For example, the abstraction for an- alyzing whether driver speed is likely to cause a driver to lose control can be used in an app that tells drivers what speed to take an impending curve on the road. Finally, many of these event detectors may need to be tailored to individ- ual users, since different users have different tolerances for safety, comfort, and performance. The second programming pain point is having to reason about the costs of accessing sensors and cloud-based infor- mation. These accesses incur energy costs, latency, and band- width usage, and designing efficient sensor algorithms that minimize these costs for every automotive app can be diffi- cult, if not impossible. Moreover, expecting mobile app de- velopers to reason about this cost can increase programming burden significantly. Contributions. In this paper, we address the first pain point by using a programming language called Datalog, which provides rule-based conjunctive queries. Datalog is based on the predicate calculus of first-order logic, and supports negation of rules. In our use of Datalog, sensors and cloud information are modeled as (time-varying) facts, and appli- cations define event detectors as rules, which are conjunc- tions of facts. An event is said to occur at some time in- stance if the predicate corresponding to a specific rule is true at that instant. The use of Datalog addresses the first pain point: in Datalog, rules can be expressed in terms of other rules, allowing a layered definition of rules, together with re-usability. Furthermore, since rule descriptions are fairly compact, applications can tailor the definitions of rules to suit individual users, perhaps by observing user behavior. To address the second pain point, we have developed auto- matic optimization methods for rule evaluation that attempt to minimize latency, energy, or bandwidth costs. In partic- ular, our optimizer re-orders fact assessment (determining facts from sensors or the cloud) to minimize the expected cost of rule evaluation. This expected cost is derived from a priori probabilities of predicates being true, where these probabilities are obtained from training data. We have embodied these contributions in a programming framework called AUTOLOG (§3). AUTOLOG includes sev- eral kinds of optimizations including provably-optimal fact assessment for a single detector, and joint fact assessment for a concurrent detector (§4). Experiments on a prototype of AUTOLOG, and evaluations on vehicle data collected over 1,000 miles of driving, shows that it is 4-7× more efficient than Datalog’s na¨ ıve fact assessment strategy, and consis- tently outperforms heuristic alternatives, sometimes by 3× (§5). 2. BACKGROUND AND MOTIV ATION Automotive Sensing. For nearly four decades now, cars have exposed on-board diagnostic information to enable health assessment and troubleshooting. These diagnostics report on the internal status of various subsystems in a vehicle. Today, with the increasing use of electronics in cars, it is possible to obtain much more detailed information than diagnostics in vehicles. Vehicles are partitioned into distinct subsystems that control the behavior of individual aspects of the vehi- cle (the transmission, braking, engine operation, in-vehicle entertainment, and so forth). Cars have several hundred sen- sors that can continuously provide the instantaneous internal state of all vehicular subsystems. The CAN Standard. Modern cars contain one or more inter- nal controller area network (CAN) buses interconnecting the electronic control units (ECUs) that regulate internal sub- systems [23]. Cars can have more than 70 ECUs, and these communicate using the Controller Area Network (CAN) pro- tocols. In a typical vehicle, there may be multiple CAN buses that support high level vehicle functions. A typical design uses a dedicated communication bus for powertrain functions (e.g., engine and transmission control) and a sep- arate bus for body functions (e.g., user-activated switches, user display information). Other dedicated buses may be im- plemented to support other areas of vehicle functions, such as multimedia systems, chassis systems or object detection systems. All cars built in the US after 2008 are required to implement the CAN standard. The CAN standard defines two types of CAN buses. A high-speed communication (up to 1 Mbps) bus is used for timing-critical information; modules connected to a high speed bus typically communicate high-frequency sensed informa- tion to enable receiving modules to accurately track and quickly respond to a sensed condition (e.g., a stability control system needs information on a timely basis from a variety of sys- tems to properly respond to loss of control). Low-speed (up to 50 Kbps) resilient communication buses are typically used to communicate event-triggered information or information that does not change frequently (e.g., a window switch that is pressed by the driver or vehicle speed information that is intended for a driver display). Lower speed buses can be im- plemented using single-wire communications, which is im- portant for reducing the cost and weight of vehicles. Messages on a CAN bus have a simple format. A mes- sage payload of 8 data bytes encapsulates one or more data signals that contain information about a sensed condition, a control operation or a system status indication. Messages are identified by a CAN identifier, which may consist of ei- ther 29 bits or 11 bits. ECUs generate CAN messages either periodically, or periodically when a condition is sensed, or in response to sensor value changes or threshold crossings. The frequency of periodic sensing depends upon the spe- cific data requirements for a vehicle system. Certain types of information may be reported by a module at up to 100Hz, whereas other types of information may be communicated only at 1Hz. Given bus speed limits, a high-speed bus can concurrently access up to 40 CAN IDs. Accessing Sensors. The CAN standard simply defines a com- munication protocol and does not identify specific sensors that a car must support. A companion standard, the On- 2 Board Diagnostics or OBD-II [6] standard, describes diag- nostic sensors diagnostic information that every vehicle is required to support. Beyond the diagnostic reporting re- quirements, manufacturers are free to implement their own sensors and define their own messaging strategy. Indeed, many manufacturers do just that, defining several hundred internal sensors that are used to monitor and regulate inter- nal subsystems. Examples of such sensors include: vehi- cle speed, throttle position, transmission lever position, au- tomatic gear, cruise control status, radiator fan speed, fuel capacity, transmission oil temperature, and so forth. In addi- tion to accessing sensors, the CAN bus can also be used to program or actuate internal subsystems, a capability we have left to future work to exploit. While the CAN bus is used for internal communication, it is possible to export CAN sensor values to an external com- puter. All vehicles are required to have an OBD-II port, and it is possible to export CAN bus messages using a special OBD-II port adapter designed to understand CAN message framing and data content. In this paper, we use a Bluetooth- capable OBD-II adapter that we have developed to access CAN sensor information from late-model GM vehicles. This capability permits Bluetooth-enabled mobile devices (smart- phones, tablets) to have instantaneous access to internal car sensor information. Automotive Apps. The availability of a large number of sensors provides rich information about the instantaneous behavior of internal subsystems. This can be used to de- velop mobile apps for improving the performance, safety, efficiency, reliability, and comfort of vehicles [20]. Exter- nal factors can also affect many of these goals: the lifetime of vehicle components can be affected by severe climate, fuel efficiency by traffic conditions and by terrain, safety by road surface and weather, and so forth. Increasingly, infor- mation about these external conditions is available in cloud databases, and because mobile devices are Internet-enabled, it is possible to conceive of cloud-enabled mobile apps that fuse cloud information with car sensors in order to achieve the goals discussed above. In this paper, we focus on automotive apps that fuse this information in order to detect events in near real-time. This class of event-driven apps is distinct from automotive apps that record car sensor information for analytics (e.g.,, for as- sessing driver behavior, or long-term automotive health). Event-driven apps. Consider an app that would like to alert a driver when he or she is executing a dangerous sharp turn. Detecting a sharp turn can be tricky because one has to rule out legitimate sharp turns at intersections, or those that fol- low the curvature of the road. Accordingly, an algorithm that detects a sharp turn has to access an online map database to determine whether the vehicle is at an intersection, or to de- termine the curvature of the road. In addition, this algorithm needs access to the sensor that provides the turn angle of the steering wheel, and a sensor that determines the yaw rate (or angular velocity about the vertical axis). Continuously fusing this information, can help determine when a driver is making a sharp turn; this event can be used to trigger appro- priate actions. Finally, we note that any such algorithm will include thresholds that determine safe or unsafe sharp turns; these thresholds are often determined by driver preferences and risk-tolerance. Consider a second example, an application that would like to block incoming phone calls or text messages when a driver is driving dangerously. Call blocking can be triggered by a collection of different sets of conditions: a combination of bad weather, and a car speed above the posted speed limit or bad weather and a sharp turn. This illustrates an event- driven app, where events can be defined by multiple fusion algorithms. More important, it also illustrates layered defini- tions of events, where the call block event is defined in terms of the sharp turn event discussed above. In §5, we describe several other event-driven apps. Datalog. Datalog is a natural choice for describing sensor fusion for event-driven apps. It [29] is a highly-mature logic programming language whose semantics are derived from the predicate calculus of first-order logic. Datalog permits the specification of conjunctive rules, and supports negation and recursion, and is often used in information extraction, integration, and cloud computing [22]. Facts and Rules. Operationally, a Datalog system consists of two databases: an extensional database (EDB) which con- tains ground facts, and an intensional database (IDB) which consists of rules. Facts describe knowledge about the ex- ternal world; in our setting, sensor readings and cloud in- formation provide facts instantiated in the EDB. Rules are declarative descriptions of the steps by which one can infer higher-order information from the facts. Each rule has two parts, a head and a body. The head of a rule is an atom, and the body of a rule is a conjunction of several atoms. Each atom consists of a predicate, which has one or more vari- ables or constants as arguments. Any predicate which is the head of a rule is called an IDB-predicate, and one that occurs only in the body of rules is called an EDB-predicate. For example, the code snippet shown below describes a rule that defines a dangerous driving event. The head of the rule contains the predicateDangerousDriving, with four variables, and the body is a conjunction of several predi- cates, some of which are automotive sensors (like theYaw R- ate and the Steer Angle) and others access cloud in- formation such as SpeedLimit. According to this rule, dangerous driving is said to occur whenever the yaw rate ex- ceeds 15rad/s, the steering angle exceeds 45 ◦ , and the ve- hicle speed exceeds the speed limit by a factor of more than 1.2. Thus, for example, when the Yaw Rate sensor has a value 20rad/s (when this happens, a factYaw Rate(20) is instantiated in the EDB), and the steering angle is 60 ◦ , and the car is being driven at 45mph in a 30mph zone, a new fact DangerousDriving(20,60,45,30) is instanti- ated into the EDB and signals the occurrence of a dangerous driving event. 3 DangerousDriving(x,y,z,w):- Yaw rate(x), x > 15, Steer Angle(y), y > 45, Vehicle Speed(z), SpeedLimit(w), MULTIPLY(w, 1.2 , a), a < z. More generally, the head of a rule is true if there exists an instantiation of values for variables that satisfies the atoms in the body. As discussed above, one or more atoms in the body can be a negation, and a rule may be recursively defined (the head atom may also appear in the body). An atom in the body of one rule may appear in the head of another rule. Rule Evaluation and Optimization. Datalog is an elegant declarative language for describing computations over data, and a Datalog engine evaluates rules. In general, given a specific IDB, a Datalog engine will apply these rules to infer new facts whenever an externally-determined fact is instan- tiated into the EDB. Datalog also permits queries: queries describe specific rules of interest to a user. For example, while the IDB may contain several tens or hundreds of rules, a user may, at a given instant, be interested in evaluating the DangerousDriving rule. This is expressed as a query ?-DangerousDriving(x,y,z,w). In this paper, we focus on long-standing (or continuous) queries: queries that are posed to the Datalog engine and continuously evaluated in response to changes in sensor readings. In general, rule evaluation in Datalog has a long history of research, and many papers have explored a variety of tech- niques for optimizing evaluation [29, 16]. These techniques have proposed bottom-up evaluation, top-down evaluation, and a class of the program transformations called magic sets (§6). All of these approaches seek to minimize or eliminate redundancy in rule evaluation, and we do not discuss these optimizations further in this paper. Our paper discusses an orthogonal class of optimizations. 3. AUTOLOG DESIGN In this section, we describe the design of a programming system called AUTOLOG that simplifies the development of event-driven automotive apps. AUTOLOG models car sensors and cloud based information as Datalog predicates, and apps can query AUTOLOG to identify events. Figure 1 shows the internal structure of AUTOLOG. The Sensor Acquisition and Cloud Acquisition modules access information from the car’s sensors and the cloud, respec- tively, and provide these to the Interface module in the form of Datalog facts. The Interface module takes (1) app-defined queries and (2) facts from the sensors, and passes these to a modified Datalog query processing engine that performs query evaluation. AUTOLOG introduces two additional and novel compo- nents, the Query Optimizer and the Query Plan Evaluator. The Query Optimizer statically analyzes a query’s associ- ated rules and determines an evaluation plan for rule exe- cution. Unlike traditional Datalog optimization, the Query Optimizer attempts to minimize query evaluation cost based on the cost of acquiring car sensor and cloud information, Query Optimizer Cloud Acquisition Interface Query Plan Evaluator App 1 Acquisition Weather … AUTOLOG App 2 Yaw rate … Sensor Acquisition … Figure 1—AUTOLOG Design instead of the number of rules to be evaluated. The output of the Query Optimizer is a query plan executed by the Query Plan Evaluator. In the remainder of this section, we describe AUTOLOG in more detail, and in §4 we discuss the Query Optimizer and Query Plan Evaluator. How Apps use AUTOLOG. Event-driven apps instantiate Datalog rules in AUTOLOG. Typically, these rules define events for which an app is interested in receiving notifica- tions. In Datalog terminology, these rules constitute the IDB. Rules instantiated by one app may use IDB-predicates (heads of IDB rules) instantiated by other apps. Apps can then pose Datalog queries to AUTOLOG. Our ap- proach assumes queries are relatively long-lived, rather than single use queries. When a query is posed, AUTOLOG first identifies the facts needed to evaluate the query. Then it con- tinuously evaluates the query by monitoring when predicates from the relevant sensors become facts. As discussed in the previous section, instantiation of the query predicate as a fact corresponds to the occurrence of an event and therefore the interested app is notified when this occurs. Using this ap- proach to query evaluation allows AUTOLOG to also support multiple concurrent queries. AUTOLOG Sensor and Cloud Predicates. AUTOLOG pro- vides substantially the same capabilities as Datalog, and in- herits all of its benefits (these are discussed below). Like Datalog, AUTOLOG supports conjunction and negation. Un- like Datalog, AUTOLOG does not support recursion: in our experience, we have not seen rules or events in the automo- tive domain that require recursion. AUTOLOG extends Datalog to support acquisitional query processing [26]: the capability to process queries that de- pend on dynamically instantiated sensor and cloud data. To do this, sensor and cloud information are modeled as EDB- predicates; we use the terms sensor predicate and cloud pred- icate, respectively, to denote the source of the predicate. For example, Yaw Rate(x) is a sensor predicate that models the yaw rate sensor in a vehicle, andSpeedLimit(w) is a cloud predicate that models the speed limit at the current lo- cation (§2). These predicates are predefined EDB-predicates that applications can use when defining new rules. The Role of Time. Unlike traditional EDB-predicates, sen- 4 sor and cloud predicates can vary with time. To capture this, AUTOLOG associates an explicit time variable with each sen- sor or cloud predicate. To illustrate, the two predicates in the previous paragraph are actually represented as:Yaw Rate- (t, x) and SpeedLimit(t, w). AUTOLOG requires that any IDB-predicate defined by an app must include a time variable if its body includes a sensor or cloud predi- cate. However, our subsequent descriptions will generally omit the time variable to simplify exposition. In AUTOLOG, sensor and cloud facts are materialized dy- namically (either periodically, or by dynamically querying the sensor). Thus, theYaw Rate predicate may be materi- alized periodically: for example, if the corresponding sen- sor is sampled at 5Hz, then new values are available for the sensor predicate every 200ms. On the other hand, the SpeedLimit predicate may only be materialized when AU- TOLOG decides to query the cloud for this information. Temporal Semantics. Given this temporal dimension, AU- TOLOG supports two kinds of temporal semantics for query evaluation, periodic and sensor-triggered. These semantics are necessary because car sensors can be sampled periodi- cally in the background (e.g.,Yaw Rate can be sampled at 10 Hz), or car and cloud sensors can be queried on demand. In periodic temporal semantics, a query is evaluated pe- riodically, with the periodicity defined by the user. In this case, if the query evaluates to true at timeT , all relevant sen- sor and cloud predicates are materialized atT (modulo sen- sor acquisition and network delays). Thus, under these se- mantics, ifDangerousDriving is true at timeT ,Yaw R- ate,Steer Angle andSpeedLimit must all have been computed at timeT . In sensor-triggered temporal semantics, a query is eval- uated when a sensor, which is being sampled periodically, returns a new value. Assume that the sensor returns a value at time T , then, if the query predicate evaluates to true, all relevant cloud predicates and dynamically-queried sensors must be evaluated at time T , and all periodically-sampled sensors must have been evaluated within a small window T − δ where δ is the largest sampling interval of the cor- responding sensors. Within the time window δ, the sensor value that causes the query predicate to evaluate to true is the most recent sensor value. Thus, in our example, suppose that Yaw Rate is sampled at 10Hz and Steer Angle at 5Hz. IfDangerousDriving evaluates to true at timeT , triggered by a reading of Yaw Rate, then SpeedLimit must have been evaluated at T , but Steer Angle may have been evaluated within the interval [T −200ms,T]. Benefits of AUTOLOG. Prior work [20] has proposed a pro- cedural abstraction for programming automotive apps. Com- pared to such an abstraction, AUTOLOG is declarative due to its use of Datalog, so apps can define events without hav- ing to specify or program sensor or cloud data acquisition. Furthermore, apps can easily customize rules for individual users: the dangerous driving rule in §2 has several thresh- olds (e.g., 45 ◦ for Steer Angle), and customizing these is simply a matter of instantiating a new rule. Since cars have several hundred sensors and Datalog is a mature rule processing technology that can support large rule bases, AUTOLOG inherits scalability from Datalog. This scalability comes from several techniques to optimize rule evaluation. Of particular importance for our work is incre- mental rule evaluation when a new fact is installed in the EDB, and the magic sets optimization [11]. It also inherits other benefits from Datalog. In AUTOLOG, as discussed above, rule definitions can include IDB-predicates defined by other apps. As such, rule definitions can be lay- ered, permitting significant rule re-use and the definition of increasingly complex events. As discussed in§2,CallBlock can be defined in terms of a DangerousDriving IDB- predicate instantiated by another app. Finally, and perhaps most important, apps that use AU- TOLOG need not explicitly distinguish between sensor pred- icates and cloud predicates. This permits AUTOLOG to opti- mize the cost (e.g., latency) of query execution. 4. AUTOLOG QUERY OPTIMIZATION In AUTOLOG, programmers do not need to distinguish sensor and cloud predicates from other EDB-predicates. How- ever, unlike other Datalog EDB-predicates, sensor and cloud predicates incur a predicate acquisition cost (PAC) which is the cost associated with acquiring the data necessary to eval- uate the predicate. A novel feature of AUTOLOG is that it can perform PAC optimization in the back-end (during rule eval- uation), in a way that is transparent to the user. In this sec- tion, we first motivate PAC optimization, and then describe the PAC optimizations that AUTOLOG performs. 4.1 Optimizing Predicate Acquisition Costs Like several prior sensor-based query processing languages (e.g., [26]), AUTOLOG supports acquisitional query process- ing, where sensor data and cloud information are modeled as predicates, but may be materialized on-demand. How- ever, an important difference is that in the automotive en- vironment materializing sensor and cloud predicates can be expensive and it is important to minimize this cost during query evaluation. For example, accessing cloud-based in- formation can incur energy costs for network transmission, latency, and bandwidth usage. This last factor is particu- larly important for users with data usage caps. Acquiring sensor data also has associated cost. In our automotive envi- ronment, the car can be tasked to stream sensor readings at a pre-defined frequency over Bluetooth. This incurs an en- ergy cost for Bluetooth transmission and, to a lesser extent, network propagation and transmission latency. As discussed earlier, due to CAN bus limitations, there is a limit on the number of concurrent sensors that the car can stream con- currently. If the set of app-provided concurrent queries uses more sensor predicates than this limit, sensors may need to be acquired on-demand. The request-response latency for a single sensor reading can be on the order of tens of millisec- 5 onds. While Datalog research has explored many types of query optimization, these techniques do not consider predi- cate acquisition costs. Although the cost of a sensor may be small in absolute terms, in practice, the cost of a query can become quite sig- nificant. First, costs can be incurred at a high frequency since sensor and cloud readings can change dynamically, in some cases at the sub-second granularity. Second, queries often need to be evaluated in near real-time, often at sub-second intervals, in order to detect and respond to events while the user is driving, so the costs associated with a query will be incurred frequently. Third, a user can run multiple concur- rent apps, each of which instantiates a set of queries that themselves can be built on top multiple other queries or pred- icates that have an associated cost. AUTOLOG performs PAC optimization, by statically ana- lyzing each query and computing an optimal order of execu- tion for the query’s predicate acquisition. This computation is performed once, when an application instantiates a query. Subsequently, whenever a query needs to be reevaluated, this order of predicate acquisition is followed. AUTOLOG’s PAC optimization builds upon short-circuit evaluation of Boolean operators. In a conjunctive rule, if one predicate happens to be false, the other predicates do not need to be evaluated. AUTOLOG takes this intuition one step further, and is based on a key observation about the au- tomotive setting: some predicates are more likely to be false than others. Consider our dangerous driving example in§2. During experiments in which we recorded sensor values, we found that the predicate Yaw Rate(x),x > 15 was far more likely to be false than Steer Angle(y),y > 45. Intuitively, this is because drivers do not normally turn at high rates of angular velocity (yaw), but do turn (steer) of- ten at intersections, parking lots, etc. In this case, evaluat- ing Yaw Rate first will avoid the cost of predicate acqui- sition forSteer Angle, thereby incurring a lower overall expected cost for repeated query execution as compared to whenSteer Angle is evaluated first. In general, determining the optimal order of sensor acqui- sition can be challenging as it depends both on the PAC and probability of the predicate being true. Continuing with the example, if it were less expensive to acquireSteer Angle thanYaw Rate, then the optimal order would depend both upon the acquisition cost and the probability of a predicate being true. AUTOLOG optimizes using both the PAC and the predicate probability. A key challenge for PAC optimization is to estimate the probability of a predicate being true. We estimate these probabilities using training data, obtained by collecting, for a short while, sensor and cloud information continuously while a car is being driven. When an application instantiates a query, AUTOLOG’s Query Optimizer statically analyzes the query, extracts the sensor and cloud predicates, and com- putes the a priori probability of each predicate being true from the training data. For example, if the training data has ̴ ͳ ͷ ̴ Ͷ ͷ ̴ ͳǤʹ ଵ ଶ ଷ ܥ ଵ ܥ ଶ ܥ ଷ Probability: Cost: Figure 2—Expansion Proof Tree for Rule 2 N samples ofYaw Rate, but onlyn of these are above the threshold of 10, then the corresponding probability isn/N. These probabilities, together with the PAC, are inputs to the optimization algorithms discussed below. The output is a predicate acquisition order that minimizes the expected cost. We make two observations about the training procedure and query optimization. First, the accuracy of the probabil- ity estimates affects only performance, not correctness. One corollary of this is that training data from one driver can be used to estimate probabilities for similar drivers, with- out impacting correctness, only performance. Second, query optimization optimizes predicate acquisition cost for the ex- pected common case, namely that the event does not oc- cur. As discussed above, we assume that events occur in- frequently with respect to the frequency at which sensors are sampled. When an event does happen, AUTOLOG pays the total cost of predicate acquisition. In some cases, we can reduce this cost, as discussed below. 4.2 Terminology, Notation and Formulation In Datalog, a query can be represented as a proof tree. The internal nodes of this proof tree are IDB-predicates, and the leaves of the proof tree are EDB-predicates. In AUTOLOG, leaves represent sensor and cloud EDB-predicates. 1 Figure 2 shows the proof tree for the dangerous driving example rule. In general, a proof tree will have a set G of n leaf pred- icates G 1 ,...,G n . Each G i is also associated with a cost c i and a probabilityp i of being true. The order of predicate evaluation generated by AUTOLOG is a permutation of G, such that there exists no other permutation ofG with a lower expected acquisition cost. For Figure 2, the expected costE of evaluating the predi- cates in the orderG 1 ,G 2 ,G 3 can be defined recursively as: E[G 1 ,G 2 ,G 3 ] =p 1 ∗E[G 2 ,G 3 |G 1 = 1] +(1−p 1 )∗E[G 2 ,G 3 |G 1 = 0]+C 1 (1) Because evaluation can be short-circuited whenG 1 is false, this results in the following expression: E[G 1 ,G 2 ,G 3 ] =p 1 ∗E[G 2 ,G 3 ]+C 1 (2) This expected cost calculation can be applied to any size set of predicates. Using a brute force approach, one can find the expected cost for each permutation of a setG and iden- tify the permutation with the lowest cost. In the following 1 In AUTOLOG, leaves can represent EDB-predicates which are not sensors or cloud predicates. We omit further discussion of this gen- eralization as it is straightforward. 6 sections, we explore algorithms for determining the optimal evaluation order for: (a) conjunctive rules without negation, (b) conjunctive rules with negation, and (c) concurrent con- junctive rules with no negation and shared predicates. Ex- ploring optimizations for concurrent conjunctive rules with negation and shared predicates is left to future work. 4.3 PAC Optimization: Algorithms Single Conjunctive Query without Negation. Consider a single conjunctive query withn leaf sensor and cloud pred- icates and where none of the predicates are negated. Intu- itively, the lowest expected cost evaluation order prioritizes predicates with a low cost and low probability of being true. For conjunctive queries without negation, this intuition en- ables AUTOLOG to use an optimal greedy algorithm with O(n logn) complexity to compute an ordering with the min- imal expected cost. The correctness of the resulting order- ing follows from the correctness of short-circuit evaluation of Boolean predicates. (Datalog predicates do not have side- effects, so short-circuiting preserves correctness). THEOREM 4.1. If c 1 1−p 1 ≤ c 2 1−p 2 ≤...≤ c n 1−p n (3) thenG 1 ,G 2 ,...,G n is the predicate evaluation order with lowest expected cost. The proof of this theorem may be found in Appendix A. Single Query with Negation. In Datalog, queries or rules can have negated IDB-predicates. In the automotive domain, we have found event descriptions that are more naturally ex- pressed using negation. A simple example of a proof tree for a query with negation is shown in Figure 3. In this example, the IDB-predicateR 1 is negated. Short-circuiting evaluation for negated predicates is different than in the purely conjunc- tive case. For example, in Figure 3, we can only short-circuit the evaluation of the query when both G 2 and G 3 are true, but if one is false, we must continue the evaluation. The optimality of the ordering generated by AUTOLOG in the case of negated predicates relies on an exchange argu- ment, which we illustrate using Figure 3(a). Suppose that the optimal order of evaluation of R 1 is (G 2 ,G 3 ). Then in the optimal order of evaluation for the overall query, RH, G 1 cannot be interleaved betweenG 2 andG 3 . Assume the con- trary and consider the following order of evaluation: (G 2 ,G 1 ,- G 3 ). For this ordering, it can be shown that the expected cost is c 2 +c 1 +p 1 p 2 c 3 : G 2 must be evaluated, and regardless of whether G 2 is true or false, G 1 must be evaluated; G 3 is only evaluated if G 2 and G 3 are both true. By a similar reasoning, it can be shown that the cost of (G 1 ,G 2 ,G 3 ) is c 1 +p 1 c 2 +p 1 p 2 c 3 . Comparing term-wise, the cost of this order is less than or equal to (G 2 ,G 1 ,G 3 ). Now consider the other possible ordering (G 2 ,G 3 ,G 1 ). In this case, the expected cost isc 2 +p 2 c 3 + (1−p 2 p 3 )c 1 . Consider predicateR 1 of Figure 3(a) in isolation. This pred- icate has an effective cost ofc 2 +p 2 c 3 (for similar reasons as above) and an effective probability of (1−p 2 p 3 ) (sinceR 1 is negated, it is true only when both G 2 and G 3 are not si- multaneously true). By Theorem 4.1, AUTOLOG produces an optimal order of (R 1 ,G 1 ) only if c2+p2c3 1−(1−p2p3) ≤ c1 1−p1 . Af- ter simplifying the expression on the LHS, this order implies that c3 p3 ≤ c1 1−p1 . Therefore, the cost of (G 2 ,G 3 ,G 1 ) is less than or equal to the cost of (G 2 ,G 1 ,G 3 ) only if c3 p3 ≤ c1 1−p1 . Therefore, an evaluation order in whichG 1 is interleaved be- tweenG 2 andG 3 is equal or greater in cost than other orders where it is not. Algorithm 1 : OPTIMAL EVALUATION ORDER FOR QUERIES WITH NEGATION INPUT : Proof treeT , 1: FUNCTION :OPTORDER(T) 2: NS = set of minimal negated sub-trees inT 3: for allt∈NS do 4: Compute optimal evaluation order fort using Theorem 4.1 5: c eff (t) = expected cost of optimal evaluation order fort 6: p eff (t) = 1− Q k i=1 p i , wherep i s are the probabilities associated with the leaf predicate oft 7: Replacet with a single node (predicate) whose cost isc eff (t) and whose probability isp eff (t) 8: NS = set of minimal negated sub-trees inT 9: Compute optimal evaluation order forT using Theorem 4.1 This result motivates the use of an algorithm that indepen- dently processes subtrees of the proof using the algorithm for Theorem 4.1 as a building block. This insight is cap- tured by Algorithm (1), which computes the evaluation or- der with the minimal expected cost for queries with negated predicates. This algorithm operates on minimal negated- subtrees, which are subtrees of the proof tree whose root is a negated-predicate, but whose subtree does not contain a negated predicate. Intuitively, Algorithm (1) computes the effective cost and effective probability for each minimal negated-subtree and replaces the subtree with a single node (or predicate) to which the effective cost and probability are associated. At the end of this process, no negated subtrees exist, and Theorem 4.1 can be directly applied. In Appendix B, we prove the optimality of this algorithm. Algorithm (1) is simplified for ease of presentation in one respect. For purely conjunctive queries, Theorem 4.1 im- plies that there is a single evaluation order. Because of the more complex short-circuit evaluation rules, this is not al- ways the case for queries with negated predicates. The out- put of Algorithm (1) is actually a binary decision tree that defines the ordering in which predicates should be evalu- ated. For example, in Figure 3(a), if the evaluation order is(G 2 ,G 3 ,G 1 ), the decision tree is as shown in Figure 3(b). In this tree, if G 2 is false, then G 1 must be evaluated. G 1 is also evaluated if G 2 is true, but G 3 is false. For space reasons, we have omitted a more complete description of the decision tree generation from Algorithm (1). Multiple Queries without Negation. In AUTOLOG, mul- tiple automotive apps can concurrently instantiate queries. These queries can also share predicates. Consider two queries, 7 NOT ܴ ଵ ܩ ଵ ଵ ܥ ଵ ܩ ଶ ଶ ܥ ଷ ܩ ଷ ଷ ܥ ଷ (a) Proof Tree of a Sample Query with Negation ܩ ଶ ܩ ଷ ܩ ଵ ܩ ଵ (b) Sample Deci- sion Tree Figure 3—Sample Negation Proof Tree and Decision Tree one which uses predicatesX andY , and another which uses Y and Z; i.e., they share a predicate Y . Now, suppose the probabilities ofX,Y andZ are 0.39, 0.14 and 0.71 respec- tively, and their costs are 201, 404, and 278. Jointly opti- mizing these queries (by realizing that evaluatingY first can short-circuit the evaluation of both queries) results in an or- der (Y,X,Z), which has an expected cost of 471.1. Alternative approaches like individually optimizing these queries using Theorem 4.1 and evaluating the shared predi- cate only once, or using Theorem 4.1 but assigning half the cost ofY to each query, incur higher costs (643.9 and 521.6 respectively). For this reason, AUTOLOG uses a brute-force search for the optimal joint evaluation order. However, we are currently exploring efficient algorithms for this joint op- timization. As we show in §5, joint query optimization can result in significant performance benefits. 4.4 Putting it all together When an automotive app instantiates an AUTOLOG query, the Query Optimizer statically analyzes the query and as- signs costs and probabilities to each sensor or cloud predi- cate, as discussed above. The cost of each EDB-predicate is, in general, derived from cost models. AUTOLOG supports several forms of cost: energy incurred in evaluating the pred- icate, cellular data usage for a cloud predicate (data usage for a sensor predicate is zero), or latency for cloud and sensor predicates. For each form of cost, we expect AUTOLOG will contain a library of cost models, which map each predicate to a cost. For example, for energy, an appropriate cost model for a sensor predicate might simply be the average cost of acquiring a CAN sensor value on-demand. For a cloud pred- icate, a linear model of the forma+bx, wherex is the size of the downloaded data, might capture energy usage. These models can be empirically derived from measurements of a few cloud and sensor predicates. A similar strategy can be used to build models for latency and data usage. Using these costs and probabilities, the Query Optimizer applies the appropriate form of PAC optimization discussed above. This is a one-time computation performed when the query is instantiated. The output of this optimization is a de- cision tree (e.g., Figure 3(b)) that is passed to the Query Plan Evaluator, which repeatedly evaluates queries when new sen- sor facts are materialized. AUTOLOG does not yet support joint optimization of mul- tiple queries where some queries may contain negated predi- cates. In this case, our current implementation of AUTOLOG optimizes each of those queries individually. We have left a joint optimization of queries in this scenario to future work. Latency as a cost. Energy and data usage are additive costs: the energy or data usage of acquiring two predicatesG 1 and G 2 is the sum of the their individual costs. This is not true for latency, since predicate acquisition can be performed for these predicates in parallel. The latency cost of predicate acquisition is the larger of the latency costs forG 1 andG 2 . However, parallel acquisition does not benefit from short- circuit evaluation. AcquiringX andY in parallel is benefi- cial only if the minimal expected cost of acquiring both of them is larger than the cost of acquiring them in parallel. More generally, consider n predicates and, without loss of generality, assume an evaluation orderG 1 ,G 2 ,...,G n . Sup- pose that G 1 ,G 2 ,...,G i has already been evaluated and all of those predicates are true. Then, consider the minimal residual expected cost of evaluating G i+1 ,...,G n . If this residual cost is greater than the latency cost of evaluating those predicates in parallel, AUTOLOG can optimize latency through parallel acquisition. AUTOLOG’s implementation does not yet include this la- tency optimization, but in §5 we evaluate the potential effi- cacy of this optimization. 5. EV ALUATION In this section, we present evaluation results for several event-driven automotive apps in AUTOLOG. We begin by describing our methodology and metrics, and then quantify the performance benefits of AUTOLOG for both single and concurrent queries. 5.1 Methodology and Metrics AUTOLOG Implementation. Our implementation of AU- TOLOG has two components: one on the mobile device and the other on the cloud. The mobile device implementation includes the query optimization algorithms described in §4 and code for acquiring local sensors from the CAN bus over Bluetooth and implementing the streaming and request-resp- onse access methods for local sensors. Our query evaluation engine is a modified version of a publicly available Java- based Datalog evaluation engine called IRIS [12]. Our mod- ifications implement the Query Plan Evaluator, which ex- ecutes the decision tree returned by the Query Optimizer. The local sensor acquisition code is 14,084 lines of code, and the query processing code, including optimization and plan evaluation, is 6,639 lines. The cloud sensor acquisition component of AUTOLOG ac- cesses a cloud service that we implemented. This cloud ser- vice is written in PHP, and supports access to a variety of 8 cloud IDB-predicates: the curvature of the road, whether it is a highway or not, the current weather information, list of traffic incidents near the current location, the speed limit on the current road, whether the vehicle is close to an intersec- tion or not, the current real-time average traffic speed, and a list of nearby places. This cloud service aggregates informa- tion from several other cloud services; the map information is provided by Open Street Map (OSM [21]). The cloud ser- vice is about 700 lines of PHP code. Methodology and Datasets. To demonstrate some of the features of AUTOLOG, we illustrate results from an actual in-vehicle experiment. However, in order to be able to accu- rately compare AUTOLOG’s optimization algorithms against other alternatives, we use trace analysis. For the experiment, we collected 40 CAN sensors (sampled at the nominal fre- quency, which can be up to 100Hz for some sensors), to- gether with all the cloud information (sampled every 5s) dis- cussed above, from 10 drivers over 1 month. When collect- ing these readings, we also record the latency of accessing the sensors and cloud information. Our dataset has nearly 1GB of sensor readings, obtained by driving nearly 1,000 miles. We use this dataset to evaluate AUTOLOG as described below. Event Definitions. To evaluate AUTOLOG, we created about 19 different Datalog rules that cover different driving related events drawn from our collective experience, but are not in- tended to be exhaustive. These include: a sudden sharp turn (Sharpturn); speeding in bad weather (SpeedingWeather); a sharp turn in bad weather (SharpTurnWeather); a left turn executed with the right turn indicator on (BadRTurnSignal) and vice versa (BadLTurnSignal) and sharp turn variants of these (BadRSharpTurnSignal and BadLSharpTurnSignal); hi- gh fuel consumption requiring refueling (GasStationOp); a slow left turn (SlowLTurn); tail-gating while driving (Tail- gater); several events defined for highway driving at speed (HwySpeeding), or having the wrong turn indicator on the highway (HwyBadRTurnSignal and HwyBadLTurnSignal), or executing a sharp turn on the highway (HwySwerving); a legal turn at an intersection at high speed (FastTurn); driving slowly on a rough road surface (SlowRoughRoad), turning on such a surface (RoughRoadTurn), or driving on the rough road during bad weather (RoughRoadWeather); finally, exe- cuting a turn without activating the turn signal (CarelessTurn). Many of these event descriptions are, by design, layered. For example, the SharpTurnWeather event uses the Sharp- Turn rule. This allows us to also evaluate multi-query exe- cution and quantify the benefits of joint optimization of mul- tiple queries. As discussed before, we expect that program- mers will naturally layer event descriptions, because this is a useful form of code reuse. We omit the Datalog code for these rules, but on average each rule uses 3.6 sensor pred- icates and 1.4 cloud predicates. The largest and smallest numbers of sensor predicates in a rule are 7 and 2, respec- tively, and of cloud predicates 3 and 0. Finally, three of these rules use negated predicates. Trace-driven evaluations and comparison. Our evalua- tions use about 40% of the dataset to compute the predicate probabilities for the 19 rules, and use the remaining 60% of the data set to evaluate the optimization algorithms. This choice of training set size is conservative; as shown below, much smaller training sets give comparable accuracy. In all of our evaluations, we use latency as the cost metric; we have left to future work the evaluation of energy and bandwidth as a cost metric. The cost values for each predicate are derived from the trace data. Our evaluation compares the PAC optimization algorithms against four alternatives. A naive approach that acquires all relevant sensors before query execution; this models what a traditional Datalog engine would have done. Three other approaches consider three different predicate acquisition or- ders: lowest-probability first, lowest-cost first, and local sen- sors first. In all of these three approaches, short-circuit pred- icate evaluation is used. 5.2 AUTOLOG in Action To demonstrate the practical benefits of AUTOLOG’s PAC optimizations, we show results from an actual run of AU- TOLOG during a 1-hour drive (Figure 4). During this drive, an Android smartphone was configured with AUTOLOG and running five rules concurrently; these rules collectively in- voked 7 sensor predicates and 3 cloud predicates. We used multi-query optimization to determine optimal predicate ac- quisition order, since all 5 rules shared at least one predicate with another rule. All of the queries were evaluated with a periodicity of 8s. Figure 4 shows the screenshot of an app that tracks these events on a map in real-time; whenever an event occurs, the app transmits the event location to a cloud based map server, which updates the map immediately. The map shows the lo- cations at which the various events were triggered; the num- bers indicate the indices of the events that were triggered. Our hour-long experimental run triggers a relatively large number of events (about 94, or a little over 1 per minute); we would not expect this for normal runs, but to demonstrate the system in action we set low thresholds for triggering events. Figure 4 also shows the time taken to evaluate the rules. AUTOLOG evaluates the 5 rules concurrently in about 261ms; as will become clear later, this number means that, on aver- age, AUTOLOG acquires 2 local sensors before short-circuiting the evaluation. To compare these performance numbers, we conducted another experimental run on the vehicle where, instead of evaluating rules, we simply acquired all of the sen- sor and cloud predicates periodically. This mimics a naive predicate evaluation strategy, whose average acquisition time was 4.51s (or about 17×) AUTOLOG’s performance. Further- more, we note that the time of the AUTOLOG series never overlaps the naive time series. This is because, in our exper- iment, not all of the events are triggered at the same time, so short-circuit evaluation results in fewer fetched predicates. 9 Figure 4—AUTOLOG in Action This experiment is adversarial along many dimensions: it demonstrates a large number of concurrent rules, uses many local and cloud sensors, and has a large number of events (more than 1 per minute). Even under this setting, AUTOLOG’s benefits are evident. We now explore AUTOLOG’s perfor- mance for a wide range of queries and compare it with other candidate approaches. 5.3 Single Query Performance We compare the performance of AUTOLOG against the other candidate strategies discussed above for each query individually; that is, in these experiments, we assume that only a single query is active at any given point in time. We cannot conduct such comparisons using live experiments on the vehicle, since during each run of the vehicle we can only evaluate a single strategy and different runs may produce dif- ferent conditions. Instead we used trace analysis to evaluate all 19 queries for the five different strategies. In our trace, there were a total of 9,000 events covering the 19 queries. In the initial experiments, local sensors are queried on-demand and so incur a non-zero latency cost. We also consider the case where local sensor readings are streamed from the CAN bus, resulting in a zero cost for the local sensors. Figure 5 shows the average cost and the 1st and 3rd quar- tile costs for each strategy and query. We first observe that AUTOLOG consistently outperforms all other strategies (ex- cept in 2 cases, discussed below, where all strategies per- form equally). Across all the queries the alternative strate- gies have a higher cost: naive is 7.09× higher, lowest-prob- ability 1.64× higher, lowest-cost is 1.18× higher, and local- sensors 1.13× higher. (Because the naive strategy is almost an order of magnitude slower than AUTOLOG, we had to in- troduce a discontinuity in the y-axis to better present these and subsequent results). In absolute terms, single query execution times for AU- TOLOG are within about 150 ms, while the naive strategy can take up to 5 seconds. We should caution that the ab- solute numbers, for strategies with short-circuit evaluation, are really a function of the rule definition and the thresholds. For example, if the thresholds are low and events are trig- gered often, the average execution times will be high. That said, we believe the relative performance numbers are in- dicative of the actual performance benefits of AUTOLOG vis- a-vis other strategies. Another important point to make is that for AUTOLOG, query evaluation times in the 100-150ms are indicative of its efficiency; the cost of acquiring the local sensor is about 80ms, so AUTOLOG average times show that, in most cases, predicate evaluation is short-circuited after acquiring a single local sensor! Our results also illustrate that it is possible to write rules for which AUTOLOG does not provide any performance im- provement. For example, the SpeedingWeather predicate is true if the car is being driven above the speed limit in bad weather. To evaluate this predicate, it is necessary to access the weather and speed limit information on the cloud. There is no performance difference among the alternative strategies since at least one cloud predicate has to be acquired, and it turns out that the bad weather predicate is mostly false (due to the good weather in Los Angeles), and is also cheap to ac- quire, so all of the strategies perform similarly. The GasSta- tionOp rule has two predicates: the first checks if the current fuel consumption rate is high, and the second (a cloud pred- icate) checks for gas stations nearby. In our experiments, we set the threshold for the former to be low, so that that pred- icate was frequently true. So, regardless of the execution strategy, cloud predicate evaluation was incurred, resulting in comparable costs for all the strategies. The relative performance of these strategies follows from the fact that acquiring a cloud predicate is 6 to 18 times more expensive than acquiring local sensor predicate. The naive strategy always fetches all sensors, so it incurs a significant cost. As discussed above, AUTOLOG is able to short-circuit evaluation in most cases after fetching a single local sensor (the best possible scenario for optimization). Lowest proba- bility first, by ignoring cost, performs worse when a higher cost predicate (e.g., a cloud predicate or an expensive sensor predicate) has a low probability. This behavior is also evi- dent for queries with purely local sensors (BadRTurnSignal, BadLTurnSignal) where the lowest probability local sensor has a slightly higher acquisition cost than other local sen- sors 2 . Similarly, the lowest-cost first strategy ignores prob- 2 Local sensors can differ in the acquisition cost by 10s of ms. Al- though the request response times are comparable, different sen- sors have different nominal frequencies, so the time from when the request is made to when the next sensor reading is available for 10 1000 2000 3000 4000 5000 6000 Evaluation Time (ms) Naive Lowest Prob First Local Sensor First Lowest Cost First AUTOLOG T ailgater RoughRoadTurn RoughRoadWeather HwySwerving HwySpeeding HwyBadRTurnSignal HwyBadLTurnSignal SlowRoughRoad BadRTurnSignal BadLTurnSignal SpeedingWeather SharpTurnWeather CarelessTurn BadRSharpTurnSignal BadLSharpTurnSignal FastTurn GasStationOp SlowLTurn Sharpturn 0 50 100 150 200 250 300 2334 1859 2994 1520 3497 1609 1609 360 178 179 3201 5005 1859 3870 3870 1860 1162 3875 3781 345 206 1735 231 476 155 155 393 126 126 1716 234 273 216 221 264 1085 2206 289 294 366 982 130 459 121 119 283 135 163 1716 239 277 207 178 143 1084 187 326 294 201 968 128 451 123 122 280 132 160 1716 222 273 180 156 153 1084 193 280 242 165 781 107 377 97 95 232 91 94 1715 184 222 145 127 115 1083 125 231 Figure 5—Single Query Performance Comparison ability and often evaluates a predicate that is true, therefore missing the opportunity to perform a short-circuit evalua- tion. Finally, the local sensor first strategy picks a local sen- sor without regard to cost or probability, and ends up paying a higher cost even if it short-circuits evaluation or acquiring more than one local sensor. When local sensors are streamed, the relative performance of the candidate strategies is slightly different. Figure 6 shows that, in this case, as expected, the heuristics local- sensors-first and lowest-cost first perform comparably to AU- TOLOG. However, interestingly, lowest probability first is often more expensive, indicating that for some rules a cloud predicate often has lower probability. On average, naive in- curs 9× the cost AUTOLOG, lowest-probability incurs 1.7×, lowest-cost 1.07× and local-sensors first 1.05×. 5.4 Multiple Query Performance In §4, we argued that jointly optimizing across multiple queries can provide a lower overall cost. To test this hypoth- esis, we created 9 different subsets of our 19 rules (Table 1), and one combination that included all 19 rules. For each set, we computed the joint optimal decision tree using the brute force method discussed in §4. We compared the per- formance of this optimization against our candidate strate- gies and against individual single-query optimization. As Figure 7 shows, joint multi-query optimization can sig- nificantly outperform other strategies. The naive strategy is, on average, 5.16× slower that of the joint multi-query opti- mization; lowest-probability is 1.47× slower, local-sensors- first is 1.64× slower, and lowest-cost is 1.40× slower. More interestingly, single-query optimization is, on average, 1.88× response, varies. slower than multi-query optimization, indicating that the lat- ter is a necessary component of AUTOLOG. For individual combinations, the performance ratios vary significantly. For example, AUTOLOG outperforms naive for combination 6. This combination has 4 rules: SharpTurn- Weather, BadRTurnSignal, BadLTurnSignal and BadLSharp- TurnSignal. Both SharpTurnWeather and BadLSharpTurnSig- nal share a rule, Sharpturn, and BadLTurnSignal is a pred- icate in BadLSharpTurnSignal’s rule body. However, only one of either BadRTurnSignal or BadLTurnSignal can be true at a given instant, and AUTOLOG can leverage this to significantly reduce latency for the multi-query optimiza- tion, while single-query optimization cannot do this. On the other hand, combination 1 contains HwySpeeding, combina- tion 2 contains GasStationOp, and combinations 3, 4, and 5 contain SpeedingWeather, each of which requires cloud sensor access. Therefore all schemes incur a high abso- lute cost even though multi-query optimization out-performs the other strategies. Combination 9 contains BadRSharp- TurnSignal and BadLSharpTurnSignal, of which only one can be true, and Sharpturn and FastTurn share two sensor predicates (Steering wheels, Close to intersection), so multi- query optimization can leverage this to outperform the other strategies. Thus, multi-query performs best when rules share sensors or have related predicates. We have also conducted experiments for multiple queries with zero local cost for sensors. We omit the detailed graph for space reasons, but we find that our results are similar: on average, the performance ratios for naive, lowest-probability, local-sensors, lowest-cost, and single-query are, respectively, 4.86, 1.23, 3.46, 1.10 and 1.55. A fascinating aspect of this result is that the cost of the local-sensors-first heuris- 11 1000 2000 3000 4000 5000 Evaluation Time (ms) Naive Lowest Prob First Local Sensor First Lowest Cost First AUTOLOG T ailgater RoughRoadTurn RoughRoadWeather HwySwerving HwySpeeding HwyBadRTurnSignal HwyBadLTurnSignal SlowRoughRoad BadRTurnSignal BadLTurnSignal SpeedingWeather SharpTurnWeather CarelessTurn BadRSharpTurnSignal BadLSharpTurnSignal FastTurn GasStationOp SlowLTurn Sharpturn 0 5 10 15 20 25 30 35 40 1893 1572 2797 1340 3229 1340 1340 6 4 4 3117 4836 1572 3612 3612 1572 1082 3550 3612 96 83 1375 118 339 42 43 8 5 5 1695 99 146 100 102 146 1004 1870 175 95 83 674 27 318 15 13 7 5 5 1695 109 146 87 64 36 1004 107 211 95 83 673 27 318 15 13 7 5 5 1695 99 146 71 54 36 1004 82 174 95 82 570 27 267 15 13 7 5 5 1695 84 146 60 46 36 1004 69 147 Figure 6—Single Query Performance with zero local sensor cost Combination Rules Complete All Rules Combination 1 BadRSharpTurnSignal, BadLSharpTurnSignal, Tailgater, HwySpeeding Combination 2 HwySpeeding, HwyBadRTurnSignal, HwyBadLTurnSignal, HwySwerving, Tailgater, GasStationOp Combination 3 SlowLTurn, FastTurn, SlowRoughRoad, SpeedingWeather Combination 4 RoughRoadTurn, RoughRoadWeather, SlowRoughRoad, SpeedingWeather, SharpTurnWeather Combination 5 RoughRoadTurn, RoughRoadWeather, FastTurn, HwyBadRTurnSignal, HwyBadLTurnSignal, SpeedingWeather Combination 6 SharpTurnWeather, BadRTurnSignal, BadLTurnSignal, BadLSharpTurnSignal Combination 7 BadRTurnSignal, BadLTurnSignal, BadRSharpTurnSignal, BadLSharpTurnSignal Combination 8 CarelessTurn, Sharpturn, BadRTurnSignal, BadRSharpTurnSignal, FastTurn Combination 9 Sharpturn, BadRSharpTurnSignal, BadLSharpTurnSignal, CarelessTurn, FastTurn Table 1—Rule Combinations tic is consistently high. Unlike the single-query zero cost case, with multiple queries, it is highly likely that evalua- tion is not short-circuited before at least one cloud sensor is invoked. Since this heuristic is not careful about select- ing low-cost cloud sensors, it can have higher costs than the other approaches. Furthermore, in this case as well, single query optimization has, on average, a 50% higher cost than multi-query optimization. 5.5 Other Results Sensitivity to Training Set. In conducting the evaluations above, we conservatively used training data of about 40% of the total dataset size. This corresponds to about 400MB of sensor readings, or about 8 hours of continuous driving. We also conducted experiments with much smaller training set sizes that use 4%, 8% and 12% of the total data and cor- respond to 1hr, 2hrs and 3hrs, respectively, of driving time. We find that the performance of AUTOLOG is relatively in- sensitive to the size of the training data set: the 4% train- ing data set is only 1.13× worse than the 40%, the 8% only 1.02× and the 12% only 1.01×. These numbers are encour- aging, indicating that relatively small amounts of training data (even 2 hours’ worth) can produce good performance. Parallel Predicate Acquisition. As discussed in §4, when latency is the cost, AUTOLOG can acquire predicates in par- allel when the residual minimum expected cost exceeds the parallel acquisition cost. We analyzed the 19 queries to de- termine whether this condition was true. For eight of the 19 queries, this parallel acquisition optimization was pos- sible. Put another way, for 11 of the 19 queries, parallel predicate acquisition does not provide performance gains, so, even when latency is used as a metric, AUTOLOG’s opti- mization algorithms are necessary for good performance. 6. RELATED WORK Developments in industry are progressing to the point where automotive apps will become much more widespread than they are, at which point an AUTOLOG-like platform will be indispensable. Several applications like OBDLink [4] and Torque [5] are popular in both Android and iOS, and al- low the users to view very limited real time OBD-II scan data (a subset of information available on the CAN bus). 12 2000 4000 6000 8000 10000 12000 Evaluation Time (ms) Naive OPT Single Separate Lowest Prob First Local Sensor First Lowest Cost First AUTOLOG Complete Combination 1 Combination 2 Combination 3 Combination 4 Combination 5 Combination 6 Combination 7 Combination 8 Combination 9 0 200 400 600 800 1000 1200 11769 8978 7297 6813 6391 5110 7681 4003 4182 4182 9660 2653 3789 3576 1821 2397 1166 601 1039 1076 6543 2645 2753 2635 1751 1963 916 501 733 615 6984 4716 2681 2758 1708 2082 909 475 597 483 6140 2493 2637 2588 1706 1783 824 506 702 591 4516 1879 1883 1853 1291 1283 563 330 422 352 Figure 7—Multi-query optimization Torque also supports extensibility through plug-ins that can provide analysis and customized views. Automotive manu- facturers are moving towards producing closed automotive analytics systems like OnStar [2] by General Motors, and Ford Sync [1] by Ford. Currently, the systems do not pro- vide an open API, but if and when car manufacturers decide to open up their systems for app development, AUTOLOG can be a candidate programming framework. Recent research has also explored complementary prob- lems and the automotive space. Many pieces of work ex- plore the problem of sensing driving behavior using vehicle sensors, phone sensors, and specialized cameras [13, 7, 34, 35, 32, 33]. These algorithms can be modeled as individual predicates in AUTOLOG, so that higher level predicates can be defined using these detection algorithms. Our own prior work has also explored procedural abstractions for program- ming vehicles [20], focuses on tuning vehicles, and does not consider cost optimization, unlike AUTOLOG. Finally, re- cent work has examine user interface issues in the design of automotive apps [25], which is complementary to our work. Datalog optimization [16] has been studied over decades, many different optimization strategies have been proposed and well-studied. There are mainly 4 classes of optimiza- tion methods: top-down, bottom-up, logic rewriting meth- ods (magic sets), algebraic rewriting. Bottom-up evalua- tion [15, 18], [8, 10] was originally designed to eliminate re- dundant computation in reaching a fixpoint in Datalog eval- uation. Top-Down evaluation [30, 31, 9] is a complemen- tary approach with a similar goal of eliminating redundant computation in goal or query-directed Datalog evaluation. The Magic Sets method [17, 9, 11], and a related Count- ing method [9, 11], are logical rewriting methods that insert extra IDB-predicates into the existing program; these serve as constraints for bottom-up evaluation, thus eliminating re- dundant computations of intermediate predicates. In recent years, Datalog has been optimized for the applications in specific area, for example, [28] applies Datalog for graph queries. In contrast to all of these, our algorithms optimize the order of predicate acquisition for sensor and cloud pred- icates, a problem motivated by our specific setting. The theory community has explored optimizing the eval- uation order of Boolean predicates. Laber [14] suggests re- ordering conjunctive predicates with no negation based on the properties of the relational table on which the predi- cates are evaluated. Another work by the same author [19] deals with more complicated queries that include negation, in a similar setting. These kinds of optimizations are spe- cial cases of the evaluation of game trees [27]. In general, these problems have not addressed a setting such as ours, where predicates have both a cost and an associate probabil- ity. Closest in this regard is the work of Kempe et al. [24], who prove a result similar to Theorem 4.1, but in the context of optimizing ad placement on websites. 7. CONCLUSION In this paper, we discuss AUTOLOG, a programming sys- tem for automotive apps. AUTOLOG allows programmers to succinctly express fusion of vehicle sensor and cloud infor- mation, a capability that can be used to detect events in au- tomotive settings. It contains novel optimization algorithms designed to minimize the cost of predicate acquisition. Us- ing experiments on a prototype of AUTOLOG, we demon- strate that it can provide 4-7× lower cost than a baseline sensor and cloud predicate acquisition strategy. Acknowledgements. We thank David Kempe for suggesting the proof of Theorem 4.1 and for discussions on PAC cost optimization. 8. REFERENCES [1] Ford sync. http://www.ford.com/technology/sync/. 13 [2] Gm onstar. http://www.gm.com/vision/design technology/ onstar safe connected.html. [3] Mercedes-benz mbrace. http://www.mbusa.com/mercedes/mbrace. [4] Obdlink. http://www.scantool.net/. [5] Torque: Engine performance and diagnostic tool for automotive professionals and enthusiasts. http://torque-bhp.com/. [6] Society of automotive engineers. E/E Diagnostic Test Modes(J1979), 2010. [7] S. Al-Sultan, A. Al-Bayatti, and H. Zedan. Context-aware driver behavior detection system in intelligent transportation systems. IEEE Transactions on Vehicular Technology, 62(9):4264–4275, 2013. [8] F. Bancilhon. Naive evaluation of recursively defined relations. Springer, 1986. [9] F. Bancilhon, D. Maier, Y . Sagiv, and J. D. Ullman. Magic sets and other strange ways to implement logic programs. In Proceedings of the fifth ACM SIGACT-SIGMOD symposium on Principles of database systems, pages 1–15. ACM, 1985. [10] R. Bayer. Query evaluation and recursion in deductive database systems. Bibliothek d. Fak. f¨ ur Mathematik u. Informatik, TUM, 1985. [11] C. Beeri and R. Ramakrishnan. On the power of magic. The journal of logic programming, 10(3):255–299, 1991. [12] B. Bishop and F. Fischer. Iris-integrated rule inference system. Advancing Reasoning on the Web: Scalability and Commonsense, page 18, 2010. [13] M. Canale and S. Malan. Analysis and classification of human driving behaviour in an urban environment*. Cognition, Technology & Work, 4(3):197–206, 2002. [14] R. Carmo, T. Feder, Y . Kohayakawa, E. Laber, R. Motwani, L. O’Callaghan, R. Panigrahy, and D. Thomas. Querying priced information in databases: The conjunctive case. ACM Trans. Algorithms, 3(1):9:1–9:22, Feb. 2007. [15] S. Ceri, G. Gottlob, and L. Lavazza. Translation and optimization of logic queries: the algebraic approach. In Proceedings of the 12th International Conference on Very Large Data Bases, pages 395–402. Morgan Kaufmann Publishers Inc., 1986. [16] S. Ceri, G. Gottlob, and L. Tanca. What you always wanted to know about datalog (and never dared to ask). IEEE Transactions on Knowledge and Data Engineering, 1(1):146–166, 1989. [17] S. Ceri, G. Gottlob, and L. Tanca. Logic programming and databases. 1990. [18] S. Ceri and L. Tanca. Optimization of systems of algebraic equations for evaluating datalog queries. In Proceedings of the 13th International Conference on Very Large Data Bases, pages 31–41. Morgan Kaufmann Publishers Inc., 1987. [19] F. Cicalese and E. S. Laber. A new strategy for querying priced information. In Proceedings of the Thirty-seventh Annual ACM Symposium on Theory of Computing, STOC ’05, pages 674–683, New York, NY , USA, 2005. ACM. [20] T. Flach, N. Mishra, L. Pedrosa, C. Riesz, and R. Govindan. Carma: towards personalized automotive tuning. In Proceedings of the 9th ACM Conference on Embedded Networked Sensor Systems, pages 135–148. ACM, 2011. [21] M. Haklay and P. Weber. Openstreetmap: User-generated street maps. Pervasive Computing, IEEE, 7(4):12–18, 2008. [22] S. S. Huang, T. J. Green, and B. T. Loo. Datalog and emerging applications: an interactive tutorial. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 1213–1216. ACM, 2011. [23] K. H. Johansson, M. T¨ orngren, and L. Nielsen. Vehicle applications of controller area network. In Handbook of Networked and Embedded Control Systems, pages 741–765. Springer, 2005. [24] D. Kempe and M. Mahdian. A cascade model for externalities in sponsored search. In Proceedings of the 4th International Workshop on Internet and Network Economics, WINE ’08, pages 585–596, Berlin, Heidelberg, 2008. Springer-Verlag. [25] K. Lee, J. Flinn, T. Giuli, B. Noble, and C. Peplin. Amc: Verifying user interface properties for vehicular applications. In Proceeding of the 11th Annual International Conference on Mobile Systems, Applications, and Services, MobiSys ’13, pages 1–12. ACM, 2013. [26] S. R. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong. Tinydb: An acquisitional query processing system for sensor networks. ACM Transactions on Database Systems (TODS), 30(1):122–173, 2005. [27] M. Snir. Lower bounds on probabilistic decision trees. Theoretical Computer Science, pages 69–82, 1985. [28] K. T. Tekle, M. Gorbovitski, and Y . A. Liu. Graph queries through datalog optimizations. In Proceedings of the 12th international ACM SIGPLAN symposium on Principles and practice of declarative programming, pages 25–34. ACM, 2010. [29] J. D. Ullman. Principles of database systems. Galgotia Publications, 1985. [30] L. Vieille. Recursive axioms in deductive databases: The query/subquery approach. In Expert Database Conf., pages 253–267, 1986. [31] L. Vieille. A database-complete proof procedure based on sld-resolution. In ICLP, pages 74–103, 1987. [32] Y . Wang, J. Yang, H. Liu, Y . Chen, M. Gruteser, and R. P. Martin. Sensing vehicle dynamics for determining driver phone use. In Proceeding of the 11th Annual International Conference on Mobile Systems, Applications, and Services, MobiSys ’13, pages 41–54. ACM, 2013. [33] J. Yang, S. Sidhom, G. Chandrasekaran, T. Vu, H. Liu, N. Cecan, Y . Chen, M. Gruteser, and R. P. Martin. Detecting driver phone use leveraging car speakers. In Proceedings of the 17th annual international conference on Mobile computing and networking, pages 97–108. ACM, 2011. [34] C.-W. You, M. Montes-de Oca, T. J. Bao, N. D. Lane, G. Cardone, L. Torresani, and A. T. Campbell. Carsafe app: Alerting drowsy and distracted drivers using dual cameras on smartphones. In Proceedings of the 11th international conference on Mobile systems, applications, and services. ACM, 2013. [35] Z. Zhu and Q. Ji. Real time and non-intrusive driver fatigue monitoring. In Proceedings of The 7th International IEEE Conference on Intelligent Transportation Systems., pages 657–662. IEEE, 2004. 14 APPENDIX A. PROOF OF SINGLE QUERY WITHOUT NEGATION Theorem 4.1 can be proved as follows. The expected cost of the evaluation orderG 1 ,G 2 ,...,G n is: C(G 1 ,...,G N ) (4) = c 1 +p 1 ∗(C(G 2 ,...,G N )) = c 1 +p 1 ∗(c 2 +p 2 ∗(C(G 3 ,...,G N )) = ... = C C +p C ∗(c i +p i ∗c i+1 +C res ) whereC C ,p C andC res are defined by: C C =C 1 + i−1 X m=2 ( c m m−1 Y n=1 p n ) p C = i−1 Y n=1 p n C res =p i ∗p i+1 ∗(C(G i+2 ,...,G N )) (5) Suppose that G 1 ,G 2 ,...,G n is not optimal, and swap- ping any r i and r i+1 results in a lower cost. With such a swap, the expected cost would change to: C(G 1 ,...,G i+1 ,G i ,...,G N ) (6) = C C +p C ∗(c i+1 +p i+1 ∗c i +C res ) Since Ci 1−pi ≤ Ci+1 1−pi+1 , we have: c i 1−p i ≤ c i+1 1−p i+1 (7) c i ∗(1−p i+1 ) ≤ c i+1 ∗(1−p i ) c i −p i+1 ∗c i ≤ c i+1 −p i ∗c i+1 c i +p i ∗c i+1 ≤ c i+1 +p i+1 ∗c i It follows then that: C(G 1 ,...,G i ,G i+1 ,...,G N ) ≤C(G 1 ,...,G i+1 ,G i ,...,G N ) (8) which leads to a contradiction. B. OPTIMALITY OF SINGLE QUERY WITH NEGATION Algorithm (1) relies on a crucial property: that, in any op- timal order, a negated predicate (or, equivalently, a negated subtree of the proof tree) can be considered as an atomic predicate with respect to other non-negated predicates. The proof of this negation atomicity requires two steps. The first step formalizes the intuitive exchange argument discussed in §4, but assumes that negated predicates are not nested. The second step proves negation atomicity for nested negated predicates as well. LEMMA 1. Negation Atomicity. Consider a query with K positive predicatesG p1 ,...,G pK , andL negative pred- icates G n1 ,...,G nL . Each positive (negative) predicates can be viewed as a single query with, if any,A pk (A nl ) non- negated atoms. Any evaluation order interleaving atoms in a negated predicateG nx and atoms, if any, in other predicates G pk orG nl at the same level asG nx would cost more than evaluating the negated predicateG nx as a whole. PROOF. According to previous discussion, the probability for predicate G pk and negated predicate G nl to be true is, respectively: p pk = A pk Y a pk =1 p a pk , p nl = 1− A nl Y a nl =1 p a nl (9) Since each positive predicate has only non-negated atoms, without loss of generality, we can treat each and every atom as a directly evaluatable atom set {G p1 ,...,G pK ′} at the same level asG nl , whereK ′ = P K k=1 A pk . Assume the evaluation order yields from Algorithm 1 is {G m1 ,...,G m(K ′ +L) }, then according to Equation 3, we have C m1 1−p m1 ≤ C m2 1−p m2 ≤...≤ C m(K ′ +L) 1−p m(K ′ +L) (10) Assume that G mx is a negated predicate G nl , which has A nl direct evaluatable atoms, each with a cost ofC a nl and a probability ofp a nl to be true. Inside the negated predicate, assume that the optimal evaluation order is {r 1 ,...,r A nl }, which would satisfy Equation 3 according to Theorem 4.1 . Hence, the whole evaluation order would be {(G m1 ,...,G m(x−1) ,r 1 ,...,r Anl ,G m(x+1) ,...,G m(K ′ +L) )} (11) Assume each predicateG my has a cost ofC my and a prob- ability ofp my to be true. From here, we separate the proof into two parts, one for interleaving predicates G my ,y 6= x as a whole with one negatedG mx , the other for interleaving atoms {r 1 ,...,r A nl1 } and {r 1 ,...,r A nl1 } of any different negated predicates G nl1 and G nl2 . We prove that in either case, the interleaving would cost more than the original op- timal order. Part 1: Consider movingG m(x−1) into the negation part betweenr i andr i+1 . The expected cost of the whole query before the move is: C c +p c (C N +p N ∗C res ) (12) where C c =C m1 + x−1 X a=2 ( C ma a−1 Y b=1 p mb ) (13) p c = x−1 Y b=1 p mb (14) 15 C N =C 1 + A nl X a=2 ( C a a−1 Y b=1 p b ) (15) p N = 1− A nl Y b=1 p b (16) C res =C m(x+1) + K ′ +L X a=x+2 C ma K ′ +L−1 Y b=x+1 p mb (17) The expected cost of the whole query after movingG m(x−1) to the position betweenr i andr i+1 is: C ∗ c +p ∗ c C N1 +p N1 C m(x−1) +p m(x−1) (C N2 +p N2 C res ) (18) where C ∗ c =C m1 + x−2 X a=2 ( C ma a−1 Y b=1 p mb ) (19) p ∗ c = x−2 Y b=1 p mb (20) C N1 =C 1 +p 1 C 2 +(1−p 1 )C ∗ res +C ′ N1 (21) C ′ N1 = ( i X a=3 " C a a−1 Y b=1 p b +(1−p a−1 )C ∗ res a−2 Y b=1 p b #) (22) p N1 = i Y b=1 p b (23) C N2 =C i+1 +p i+1 C i+2 +(1−p i+1 )C res +C ′ N2 (24) C ′ N2 = ( A nl X a=i+3 " C a a−1 Y b=i+1 p b +(1−p a−1 )C res a−2 Y b=i+1 p b #) (25) p N2 = A nl Y b=i+1 p b (26) C ∗ res = (C m(x−1) +p m(x−1) C res ) (27) Note that C ∗ res is different from C res in that if any atom r j ,j i (30) With the coefficient of C res being p c Q A nl b=1 p b = p ∗ c ∗ p m(x−1) Q A nl b=1 p b in both cases, we can conclude that mov- ingG m(x−1) to the position betweenr i andr i+1 would bring an extra expected cost of (1−p m(x−1) ) P i j=1 C j . With Theorem 4.1, we proved that it would have greater or equal cost to move anyG my ,y <x−1 or anyG my ,y >x to the position between G m(x−1) and G mx . Thus, combining Theorem 4.1 with what we just proved above, we can con- clude that interleaving any predicate G my ,y 6= x with the negated predicate G mx would have a higher expected cost than the original optimal evaluation order. Note that this conclusion equivalently proves that mov- ing out any atomsr i ,∀i inG mx to the any position between would have greater or equal cost. The reason is that the lat- ter movement can be interpreted as the following two steps. Moving to the postion between G mz and G m(z+1) , z < x, (the other case is symmetric) is equivalent to first moving r yi before r y1 and then move all predicates between G mz and G mx into the negated predicate G mx . Both the first (Theorem 4.1) and second step (proved above) are proved of greater or equal cost. That concludes the proof of Part 1. Part 2: By a similar derivation, we can prove that it would cost more if we interleave atoms of any different negated predicates G nl1 and G nl2 . To simplify the exposition, we omit the closed-form expressions and explain the gist of them by comparing the coefficients directly. To start with, assume G nl1 and G nl2 are two consecu- tive negated predicates, G mx and G my ,y = x + 1, in the optimal order. G mx (G my ) has a set of direct evaluatable atoms{r x1 ,...,r xA nl1 } ({r y1 ,...,r yA nl2 }). Thus, the op- timal evaluation order would be: {r x1 ,...,r xAmx ,r y1 ,...,r yAmy } (31) Consider moving r xAmx to the position between r yi and r y(i+1) . Before moving, the coefficient of r xAmx , which is also the probability to evaluater xAmx is: 16 p (xAmx) = x−1 Y b=1 p mb ∗ xA m(x−1) Y b=1 p b (32) Whether atomsr yj ,j ≤ i fail or not, atomr xAmx would still have to be evaluated. Therefore, after moving, the coef- ficient remainsp (xAmx) . Similar to the analysis in Part 1, the coefficient for C res will remain the same, while in this case: C res =C c +p c ∗C ∗ res (33) where C c =C y(i+1) + Amy X a=i+2 C a Amy−1 Y b=i+1 p b (34) p c = 1− Amy Y b=i+2 p b (35) C ∗ res =C m(y+1) + K ′ +L X a=y+2 C ma K ′ +L−1 Y b=y+1 p mb (36) and the coefficent remains Cp res =p (xAmx) ∗ ( 1− Amx Y b=1 p b ) ∗ yi Y b=1 p b (37) The only difference of interleaving lies in the coefficient of each r yj ,j ≤ i. Originally, the coefficient of r yj ,j ≤ i is: Cp yj =p (xAmx) ∗ ( 1− Amx Y b=1 p b ) ∗ y(j−1) Y b=1 p b (38) whereas after moving, it changes to the value of Cp ′ yj , be- cause whether or notr xa ,1≤a≤A m(x−1) fails,r yj would still have to be evaluated: Cp ′ yj =p (xAmx) y(j−1) Y b=1 p b (39) With 0 ≤ 1− Q Amx b=1 p b ≤ 1, moving r xAmx to the po- sition between r yi and r y(i+1) would have greater or equal cost than original optimal order. According to Theorem 4.1, moving anyr xi to the position afterr Amx would have greater or equal cost. Thus we conclude that interleaving r xi ,∀i to the any position between r yi and r y(i+1) ,∀i would have greater or equal cost. By symmetry, we can also prove that moving r y1 to the position betweenr xi andr x(i+1) would have greater or equal cost than original optimal order. Similarly, according to The- orem 4.1, moving any r yj to the position before r y1 would have greater or equal cost. Thus, we conclude that interleav- ingr yj ,∀i to any position betweenr xi andr x(i+1) ,∀i would have greater or equal cost. Since by now we know that interleaving neighboring negated predicates would have greater or equal cost, consider inter- leaving any two negated predicates within a query. The inter- leaving process can be interpreted as three steps. First, move the two predicates to neighboring positions, which cannot decrease the cost (Theorem 4.1). Second, interleave two neighboring negated predicates, which also cannot decrease the cost, as just proved. Finally, interleave predicates in be- tween a negated predicate, which cannot decrease the cost as proved in Part 1. With Theorem 4.1 and above two part proof, we conclude that interleaving atoms of any two negated predicates G mx G my ,∀x,y would cost more or equal than the original optimal order. That concludes Part 2. Having proved that interleaving any predicates (Part 1), or any atoms of any negated predicates (Part 2), with atoms of any negated predicate would incur a higher or equal cost, we conclude here the proof of Negation Atomicity with both Phase 1 and Phase 2. The above proof does not consider nested negation. In what follows, we prove that negation atomicity applies re- cursively when negated predicates are nested. LEMMA 2. Nested Negation Atomicity. In the most gen- eral case, interleaving any atoms of positive or negated pred- icates, which are positively or negatively nested at any level, would cost higher or equal to evaluating the negated predi- cate at each level as a whole. PROOF. We separate the proof into two parts: one for moving an atom into any level of negated predicates, and the other for moving one atom of negated predicate out of any level of negation. MoveIN: Consider a negated predicatesG nl has only one level of nested negated atoms and all positive predicatesG pK has no negated atoms. As analyzed in the proof of Lemma 1, without loss of generality, we could still assume there are K ′ direct evaluatable atoms in all positive predicates. In- terleavingG pk ,1 ≤ k ≤ K ′ with any nested negated atom r i ,1 ≤ i ≤ B nl can be interpreted as two exchanges. First, moveG pk intoG nl among its positive atoms. Second, move G pk from positive atoms into negated atoms ofG nl . Accord- ing to Lemma 1, both of these two steps would cost higher or equal, hence interleaving a direct evaluable atom into one level, or any level of consecutively nested negated predicates would cost more or equal. Note that with Theorem 4.1, interleaving atoms into a pos- itive predicate would cost more or equal as well. Therefore, by induction, we prove that interleaving a direct evaluable atom into any level of discretely nested negated predicates would cost more or equal. MoveOUT: In Phase 1 of Lemma 1, we proved that mov- ing one direct evaluatable atom out of a negated predicate would cost more or equal. By induction, it would cost higher or equal to move one direct evaluable atom out of any level of consecutively (negatively) nested negated predicates. 17 Consider positive predicatesG px which haveA pk positive predicates andB pk negated predicates. Each negated pred- icate has only one direct evaluatable atoms. Moving one of these atoms r i out of a positively nested negation to be at the position betweenG pz andG p(z+1) can be interpreted as the following two steps. Assuming z < x, it is equivalent to moving all predicates betweenG pz andG px into the be- ginning of G px and moving r i to the position before G pz . The former movement (Theorem 4.1) and the latter (Lemma 1), have already been proved to cost higher or equal. Hence, moving a direct evaluable atom out of one level of positively nested negation would cost more or equal. Now by induc- tion, we can prove moving a direct evaluable atom out of any level of discretely nested negated predicates would cost more or equal. Finally, the process of interleaving atoms with any level of consecutively or discretely nested negation can be resolved into multiple MoveIN and multiple MoveOUT. Finally, with the proof of these two parts, MoveIN and MoveOUT, we draw the conclusion that any evaluation order interleaving atoms in negated predicate G nx and atoms, if any, in other predicates, positive (G pk ) or negated (G nl ), nested or not, would cost more than evaluating the negated predicateG nx as a whole. Note that G nx could actually be at any level of the expansion tree. That concludes the proof of Nested Negation Atomicity. From this latter proof, the optimality of Algorithm (1), which relies on nested negation atomicity, is proved. 18
Linked assets
Computer Science Technical Report Archive
Conceptually similar
PDF
USC Computer Science Technical Reports, no. 949 (2014)
PDF
USC Computer Science Technical Reports, no. 941 (2014)
PDF
USC Computer Science Technical Reports, no. 944 (2014)
PDF
USC Computer Science Technical Reports, no. 938 (2013)
PDF
USC Computer Science Technical Reports, no. 839 (2004)
PDF
USC Computer Science Technical Reports, no. 914 (2010)
PDF
USC Computer Science Technical Reports, no. 852 (2005)
PDF
USC Computer Science Technical Reports, no. 935 (2013)
PDF
USC Computer Science Technical Reports, no. 745 (2001)
PDF
USC Computer Science Technical Reports, no. 692 (1999)
PDF
USC Computer Science Technical Reports, no. 906 (2009)
PDF
USC Computer Science Technical Reports, no. 873 (2005)
PDF
USC Computer Science Technical Reports, no. 872 (2005)
PDF
USC Computer Science Technical Reports, no. 921 (2011)
PDF
USC Computer Science Technical Reports, no. 773 (2002)
PDF
USC Computer Science Technical Reports, no. 774 (2002)
PDF
USC Computer Science Technical Reports, no. 888 (2007)
PDF
USC Computer Science Technical Reports, no. 771 (2002)
PDF
USC Computer Science Technical Reports, no. 732 (2000)
PDF
USC Computer Science Technical Reports, no. 937 (2013)
Description
Yurong Jiang, Hang Qiu, Matthew McCartney, William G. J. Halfond, Fan Bai, Donald Grimm, and Ramesh Govindan. "Flexible and efficient sensor fusion for automotive apps." Computer Science Technical Reports (Los Angeles, California, USA: University of Southern California. Department of Computer Science) no. 939 (2013).
Asset Metadata
Creator
Bai, Fan
(author),
Govindan, Ramesh
(author),
Grimm, Donald
(author),
Halfond, William G.J.
(author),
Jiang, Yurong
(author),
McCartney, Matthew
(author),
Qiu, Hang
(author)
Core Title
USC Computer Science Technical Reports, no. 939 (2013)
Alternative Title
Flexible and efficient sensor fusion for automotive apps (
title
)
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Tag
OAI-PMH Harvest
Format
18 pages
(extent),
technical reports
(aat)
Language
English
Unique identifier
UC16271042
Identifier
13-939 Flexible and Efficient Sensor Fusion for Automotive Apps (filename)
Legacy Identifier
usc-cstr-13-939
Format
18 pages (extent),technical reports (aat)
Rights
Department of Computer Science (University of Southern California) and the author(s).
Internet Media Type
application/pdf
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/
Source
20180426-rozan-cstechreports-shoaf
(batch),
Computer Science Technical Report Archive
(collection),
University of Southern California. Department of Computer Science. Technical Reports
(series)
Access Conditions
The author(s) retain rights to their work according to U.S. copyright law. Electronic access is being provided by the USC Libraries, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Repository Email
csdept@usc.edu
Inherited Values
Title
Computer Science Technical Report Archive
Description
Archive of computer science technical reports published by the USC Department of Computer Science from 1991 - 2017.
Coverage Temporal
1991/2017
Repository Email
csdept@usc.edu
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/