Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Reinforcement learning in hybrid electric vehicles (HEVs) / electric vehicles (EVs)
(USC Thesis Other)
Reinforcement learning in hybrid electric vehicles (HEVs) / electric vehicles (EVs)
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
REINFORCEMENT LEARNING IN HYBRID ELECTRIC VEHICLES (HEVS) / ELECTRIC VEHICLES (EVS) by Xue Lin A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) May 2016 Copyright 2016 Xue Lin Dedication To my parents, Baozhi Lin and Yuemei Wei my husband, Yanzhi Wang and our lovely daughter, Ruiying Wang ii Acknowledgments First and foremost, I would like to give my thanks to my Ph.D. advisor, Prof. Massoud Pedram, for his guidance and support throughout my graduate study. He taught me the professional knowledge in the area of computer engineering, inspired me to do different research projects, and encouraged me to pursue academic career. To me, he is not only the advisor, but also the most respectful elder. He is the role model for me as a successful professor and advisor. It was my great pleasure to be his student. Next, I would like to thank the other committee members in my qualifying exam and dissertation defense, Prof. Sandeep K. Gupta, Prof. Aiichiro Nakano, Prof. Peter A. Beerel, Prof. Paul Bogdan, and Prof. Minlan Yu. Special thanks to Prof. Sandeep Gupta, who taught me the VLSI design course as a professor and helped my graduate study and academic career application as the department chair. His sharpness in research and kindness in personality are well-known to everyone inside and outside the department. Thanks to Prof. Nakano for his kindness, help, and valuable suggestions for my defense. Also thanks to Prof. Peter Beerel for his valuable opinions on my research work and his willingness to discussion. Thanks a million to Prof. Paul Bogdan, who gave me strong support for my academic career application and valuable feedbacks on our collaborative research work. I was greatly infected by his enthusiasm and rigor in research. Also thanks to Prof. Minlan Yu for her invaluable inputs to my proposal. She acts as a role model for me as a promising female scholar. iii During my graduate study at USC, I was honored to have worked with many excel- lent professional scholars, especially Prof. Naehyuck Chang. My first project in the SPORT lab was in collaboration with him and his students on photovoltaic system. I learnt from him the passion and dedication to work, the unique research style, and the positive attitude in life. Also I would like to thank Prof. Shahin Nazarian, who taught me the very first course on VLSI at USC, who was the course instructor of my teaching assistant job, and who helped me on circuit-related work. Thanks are also given to Prof. Chongwu Zhou for his guidance of my research work on carbon nanotubes. I would like to thank all my colleagues in the SPORT lab, Inkwon, Yanzhi, Qing, Woojoo, Hadi, Alireza, Javad, Mehdi, Di, Siyu, Tiansong, Shuang, Majid, Luhao, Naveen, Hassan, and Mahdi. I was so proud to have worked with this brilliant and productive team. Special thanks to the students of Prof. Naehyuck Chang in the ELPL lab at Seoul National University, Younghyun, Donghwa, Sangyoung, Jaehyun, Jaemin and Kitae. I really enjoyed the time I spent with them in Seoul during the summer of 2012. Also thanks to the students in the NANO lab at USC, Chuan, Jialu, Yi, Yue, Kyoungmin, Po-Chiang, Lewis, Akshay, Fumiaki, Alexandr, Hsiaokang, Anuj, Yuchi, Zhen, Maoqing, Jia, Luyao, Xiaoli, Jing, Fiona, Haitian, Jiepeng, Mingyuan, Noppadal, and Liang. Because of them, I had a really good time at the beginning of my gradu- ate study. My sincere thanks also go to my colleagues and friends in the department, Yuankun, Ji, Jianwei, Yue, Da, Wentao, Yanting, Chenxiao, Shuo, Fangzhou, Jizhe, Lizhong, and Bo. Last but not least, my deepest gratitude goes to my family. I would like to thank my parents for their unconditional love and support throughout my life. I would like to thank my husband. He is my most intimate friend and best companion in both work and life. My thanks also go to my parents-in-law for their care and support. In the end, thanks to my daughter who brings tremendous happiness to our family. iv Contents Dedication ii Acknowledgments iii List of Figures viii List of Tables x Abstract xi 1 Introduction 1 1.1 Electric and Hybrid Electric Vehicles . . . . . . . . . . . . . . . . . . . 1 1.2 Drivetrain Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Energy Management Strategies . . . . . . . . . . . . . . . . . . . . . . 6 1.4 EVs/HEVs in the Smart Grid . . . . . . . . . . . . . . . . . . . . . . . 9 2 HEV Modeling 12 2.1 Operation Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Internal Combustion Engine (ICE) . . . . . . . . . . . . . . . . . . . . 13 2.3 Electric Motor (EM) . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4 Drivetrain Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.5 Vehicle Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.6 Backward-Looking Optimization . . . . . . . . . . . . . . . . . . . . . 19 3 MDP-Based HEV Energy Management 20 3.1 MDP Concepts and Definitions . . . . . . . . . . . . . . . . . . . . . . 21 3.2 Stochastic Driving Cycle Modeling . . . . . . . . . . . . . . . . . . . . 22 3.3 Stochastic Battery Modeling . . . . . . . . . . . . . . . . . . . . . . . 23 3.4 MDP Modeling of HEV Energy Management . . . . . . . . . . . . . . 25 3.4.1 State Transition Probability Matrix . . . . . . . . . . . . . . . . 26 3.4.2 Immediate Reward Matrix . . . . . . . . . . . . . . . . . . . . 26 3.5 MDP Optimal Policy Derivation . . . . . . . . . . . . . . . . . . . . . 28 3.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 v 4 Reinforcement Learning-Based HEV Energy Management 32 4.1 Reinforcement Learning Background . . . . . . . . . . . . . . . . . . . 33 4.2 Backward-Looking Approach . . . . . . . . . . . . . . . . . . . . . . 34 4.3 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.4 State, Action and Reward of HEV Energy Management . . . . . . . . . 36 4.4.1 State Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.4.2 Action Space . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.4.3 Reward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.5 TD()-Learning Algorithm for HEV Energy Management . . . . . . . 39 4.5.1 Action Selection . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.5.2 Q-Value Update . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.5.3 Algorithm Description . . . . . . . . . . . . . . . . . . . . . . 41 4.6 Model-Free Property Analysis . . . . . . . . . . . . . . . . . . . . . . 42 4.7 Complexity and Convergence Analysis . . . . . . . . . . . . . . . . . . 43 4.8 Application-Specific Implementations . . . . . . . . . . . . . . . . . . 44 4.9 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5 Battery SoH-Aware HEV Energy Management 49 5.1 Battery SoH Degradation Model . . . . . . . . . . . . . . . . . . . . . 50 5.1.1 SoH Degradation in Cycled Charging/Discharging Pattern . . . 51 5.1.2 Cycle-Decoupling Method . . . . . . . . . . . . . . . . . . . . 53 5.2 SoH-Aware Reinforcement Learning for HEV Energy Management . . 56 5.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.2.2 Inner-Loop Reinforcement Learning Model . . . . . . . . . . . 58 5.2.3 TD()-Learning Algorithm for the Inner Loop . . . . . . . . . 62 5.2.4 Outer-Loop Adaptive Learning . . . . . . . . . . . . . . . . . . 64 5.2.5 Adaptive Learning Algorithm for the Outer Loop . . . . . . . . 66 5.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 6 Joint Control of the Powertrain and Auxiliary Systems in HEVs 70 6.1 Auxiliary Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.2 Joint Control Framework of Powertrain and Auxiliary Systems . . . . . 73 6.2.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.2.2 Prediction of Future Driving Profile Characteristics . . . . . . . 73 6.2.3 Details of the Reinforcement Learning Process . . . . . . . . . 75 6.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 7 Photovoltaic System Reconfiguration for Electric Vehicles 82 7.1 Onboard PV System . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 7.1.1 PV Cell Modeling . . . . . . . . . . . . . . . . . . . . . . . . 85 7.1.2 PV Reconfiguration Structure . . . . . . . . . . . . . . . . . . 86 7.1.3 PV Array Reconfiguration Algorithm . . . . . . . . . . . . . . 88 vi 7.2 Overview of the Two Proposed PV Reconfiguration Frameworks . . . . 89 7.3 PV Reconfiguration Hardware Design . . . . . . . . . . . . . . . . . . 90 7.3.1 IGBT-Based Reconfiguration Switch Network . . . . . . . . . . 90 7.3.2 Solar Irradiance Sensor Network . . . . . . . . . . . . . . . . . 91 7.3.3 Overhead Analysis . . . . . . . . . . . . . . . . . . . . . . . . 92 7.4 Event-Driven PV Array Reconfiguration . . . . . . . . . . . . . . . . . 93 7.5 Sensorless PV Array Reconfiguration . . . . . . . . . . . . . . . . . . 94 7.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 8 Optimal Pricing Policy for Aggregators in the Smart Grid 101 8.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 8.1.1 Aggregator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 8.1.2 Residential Users and EV/PHEV Users . . . . . . . . . . . . . 105 8.2 Problem Formulation and Algorithm . . . . . . . . . . . . . . . . . . . 107 8.2.1 Game Theoretic Optimization in Stage II . . . . . . . . . . . . 108 8.2.2 Aggregator Overall Profit Optimization . . . . . . . . . . . . . 110 8.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Reference List 116 vii List of Figures 1.1 The drivetrain structure of an EV [1]. . . . . . . . . . . . . . . . . . . . 3 1.2 The drivetrain structure of a series HEV [1]. . . . . . . . . . . . . . . . 3 1.3 The drivetrain structure of a series-parallel HEV [1]. . . . . . . . . . . 5 1.4 The drivetrain structure of a parallel HEV [1]. . . . . . . . . . . . . . . 6 1.5 Classification of the hybrid drivetrain control strategies [2]. . . . . . . . 8 2.1 The fuel consumption map of an ICE. . . . . . . . . . . . . . . . . . . 13 2.2 The fuel efficiency map of an ICE. . . . . . . . . . . . . . . . . . . . . 14 2.3 The efficiency map of an EM. . . . . . . . . . . . . . . . . . . . . . . 15 2.4 The analysis of vehicle dynamics [3]. . . . . . . . . . . . . . . . . . . 18 3.1 The ICE operation points from the proposed and rule-based policies. . . 31 4.1 The agent-environment iteration. . . . . . . . . . . . . . . . . . . . . . 33 4.2 The ICE operation points of an ordinary HEV from the proposed and rule-based policies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.3 The ICE operation points of a PHEV from the proposed and rule-based policies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.1 Illustration of the cycle-decoupling method. . . . . . . . . . . . . . . . 53 5.2 Battery SoC profile versus time with the set of turning points. . . . . . . 54 5.3 Six basic cases for turning point classification and cycle identification. . 55 5.4 An example of cycle identification. . . . . . . . . . . . . . . . . . . . . 56 6.1 Normalized fuel consumption of RL-based HEV control frameworks with and without prediction. . . . . . . . . . . . . . . . . . . . . . . . 80 6.2 The MPG values achieved by the proposed joint control framework and the rule-based policy. . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.1 System diagram of a PV system on the electric vehicle. . . . . . . . . . 85 7.2 I-V and P-V output characteristics of a PV cell. . . . . . . . . . . . . . 86 7.3 PV array reconfiguration structure. . . . . . . . . . . . . . . . . . . . . 87 7.4 An illustration of the solar irradiance estimation algorithm. . . . . . . . 96 viii 7.5 Output power profiles of the event-driven, sensorless, and baseline frame- works. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 7.6 Energy overhead and average output power comparisons of periodic and event-driven frameworks. . . . . . . . . . . . . . . . . . . . . . . . . . 100 8.1 The system structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 8.2 Three sets ofp 0 k values. . . . . . . . . . . . . . . . . . . . . . . . . . . 113 8.3 BESS stored energy as a function of time. . . . . . . . . . . . . . . . . 114 8.4 Overall profit of the aggregator from the optimal pricing policy and the baseline pricing policy. . . . . . . . . . . . . . . . . . . . . . . . . . . 114 ix List of Tables 3.1 Honda Insight Hybrid component parameters. . . . . . . . . . . . . . . 30 3.2 Fuel consumption: proposed and rule-based policies. . . . . . . . . . . 30 4.1 Models needed for the original and reduced action spaces. . . . . . . . 43 4.2 Fuel consumption of an ordinary HEV using proposed and rule-based policies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.3 Fuel consumption of a PHEV using proposed and rule-based policies. . 48 5.1 PHEV key parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.2 Operating cost of the PHEV in different trips using the proposed, RL, and rule-based policies. . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.3 Operating cost of the HEV in different trips by the proposed, RL, and rule-based policies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.1 HEV key parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 6.2 Reward function values from the proposed joint control framework and the rule-based policy. . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.1 Average system output power (W). . . . . . . . . . . . . . . . . . . . . 98 x Abstract The conventional internal combustion engine (ICE)-powered vehicles have contributed significantly to the development of modern society. However, they have also brought about large amounts of fuel consumption and pollution emissions due to the increasing number of vehicles in use around the world. Electric vehicles (EVs) and hybrid elec- tric vehicles (HEVs) have been developed to improve the fuel economy and reduce the pollution emissions. This thesis first introduces basic components of EV and HEV and methods for the EV/HEV energy management. After an accurate and detailed modeling of the HEV , this thesis provides two control strategies for the HEV energy management to improve the fuel economy. Different from some previous literature work that rely on a priori knowledge of the driving profiles, the proposed control strategies, namely, a Markov decision process based strategy and a reinforcement learning based strategy, only need stochastic knowledge of the driving profiles or do not rely on any prior knowledge of the driving profiles. In particular, the reinforcement learning based control strategy can be model-free, which enables one to (partially) avoid reliance on complex HEV modeling while coping with driver specific behaviors. The state-of-health (SoH) of the battery pack is degrading with the operation of an HEV . The battery pack will reach its end-of-life when it loses 20% or 30% of its nominal capacity. At the same time, the battery pack replacement results in additional operational xi cost for an HEV . Therefore, this thesis investigates the energy management problem in HEVs focusing on the minimization of the operating cost of an HEV , including both fuel and battery replacement cost. A nested learning framework is proposed, in which the inner-loop learning process is the key to minimization of the fuel usage whereas the outer-loop learning process is critical to minimization of the amortized battery replace- ment cost. On the other hand, auxiliary systems of HEVs/EVs, comprised of lighting, air condi- tioning (or more generally, heating, ventilation, and air conditioning), and other battery- powered systems such as GPS, may account for 10% - 30% of the overall fuel con- sumption for an ordinary (fuel-based) vehicle. For HEVs and EVs, it is projected that auxiliary systems will take a larger portion of the overall energy consumption, partly because heating of an ordinary vehicle can be partially achieved by the heated internal combustion engine. Hence, in this thesis, the control of HEV powertain and auxiliary systems are jointly considered for the minimal operational cost. We minimize fuel cost induced both by propelling the vehicle and by the auxiliary systems, and meanwhile maximize a total utility function (representing the degree of desirability) of the auxil- iary systems. To further enhance the effectiveness of the RL framework, the prediction of future driving profile characteristics is incorporated. An EV with onboard PV electrical energy generation system (PV system) is benefi- cial since PV cells can charge the EV battery pack when the EV is running and parking to mitigate the power demand from the grid. This thesis aims at maximizing the output power of a vehicular PV system with the string charger architecture taking into account the non-uniform distribution of solar irradiance levels on different vehicle surface areas. This work is based on the dynamic PV array reconfiguration architecture from previous work with the accommodation of the rapidly changing solar irradiance in the onboard xii scenario. Most importantly, this work differs from previous dynamic PV array recon- figuration work in that an event-driven and a sensorless PV array reconfiguration frame- work are proposed. The concept of vehicle-to-grid (V2G) was developed to make use of the electrical energy storage ability of EV/HEV batteries for frequency regulation, load balancing, etc. This thesis also presents the work on the smart grid optimal pricing policy problem, in which the aggregator maximizes its profit by designing a real-time pricing policy while taking into account the behaviors of both residential users and EV/HEV users. The aggregator pre-announces a pricing policy for an entire billing period, then in each time interval of the billing period, the electricity users (both residential and EV/PHEV users) try to maximize their own utility functions based on the pricing model in the current time interval and the awareness of the other users’ behaviors. We use a dynamic programming algorithm to derive the optimal real-time pricing policy for maximizing the aggragator’s overall profit, based on backward induction. xiii Chapter 1 Introduction 1.1 Electric and Hybrid Electric Vehicles Conventional internal combustion engine (ICE)-powered vehicles have been in existence for over 100 years. Automobiles have significantly satisfied many of the requirements for mobility in everyday life. At the same time, large amounts of fuel consumption and pollution emissions resulting from the increasing number of automobiles have drawn attention of government agencies and developers towards more energy efficient and environmentally friendly automobiles. Although the ICE technology has been matured over the past 100 years with the aid of automotive electronic technology, it will mainly rely on alternative evolution approaches to significantly improve the fuel economy and reduce pollution emissions [4]. Electric vehicles (EVs) are one of the solutions proposed to tackle the energy crisis and pollution emission problems. EVs are propelled by one or possibly more electric motors powered by rechargeable battery packs. The energy conversion efficiency of electric motors is much higher than that of ICEs, and therefore EVs are more energy- efficient than conventional ICE-powered vehicles. In addition, EVs are environmentally friendly due to their zero tailpipe pollutants [5]. However, some battery-related chal- lenges have limited the development of EVs, such as short driving range, long charging time, and high battery cost [6]. Examples of EVs on the market are BMW i3, BYD e6, Honda Fit EV , Nissan Leaf, and Tesla Model S, among which BYD e6 (186 miles) and Tesla Model S (> 200 miles) have driving range longer than 150 miles. 1 Hybrid electric vehicles (HEVs) are developed to overcome the disadvantages of ICE-powered vehicles and EVs [7]. Comparing to ICE-powered vehicles, HEVs achieve both higher fuel economy and lower pollution emissions. Comparing to EVs, HEVs do not have the battery-related issues due to the hybrid drivetrain structure. The HEV fea- tures a hybrid propulsion system comprised of an ICE with an associated fuel tank and one or more electric motors (EMs) with an associated electrical energy storage system (e.g., battery packs), both of which are coupled to the drivetrain comprising a hybrid drivetrain. The ICE consumes fuel to provide the primary propulsion, whereas the EM converts the stored electrical energy in the battery pack to the secondary propulsion when extra torque is needed. Besides assisting the ICE with extra torque, the EM also serves as a generator for recovering kinetic energy during braking (known as regenera- tive braking) and collecting excess energy from the ICE during coasting [8, 9]. There are some successful HEV models on the market, such as Toyota Prius, Honda Insight, and Ford Fusion. The plug-in HEV (PHEV) is a new generation of HEV , the battery pack of which can be restored to full charge by connecting a plug to the electrical grid. A PHEV shares the characteristics of both a conventional HEV (by having an ICE and one or more EMs) and an EV (by having a plug to connect to the grid). By adding the plug-in jacket with the extra battery, the PHEV has the ability to use its extra capacity by running on all-electric mode for an extended range (as high as 70 miles). Therefore, the fuel consumption of a PHEV can be significantly reduced [10]. Examples of PHEVs are Chevrolet V olt and Honda Accord PHEV . 2 Figure 1.1: The drivetrain structure of an EV [1]. Figure 1.2: The drivetrain structure of a series HEV [1]. 1.2 Drivetrain Structures The drivetrain structure of an EV is straightforward, whereas there are a variety of HEV drivetrain structures depending on how the ICE and EM(s) are coupled to the trans- mission shaft [4, 9, 11]. HEVs have a complex architecture comprised of numerous subsystems, which decreases the global reliability of the system, compared with ICE- powered vehicles [12]. However, it is difficult to obtain real current data for reasons of industrial property. In the following, we discuss some common drivetrain structures. 3 1. EV: As shown in Figure 1.1, an EV is propelled by the EM. The EM either con- sumes the electrical energy stored in the battery pack to propel the wheels or operates as a generator to recover kinetic energy during braking (known as regen- erative braking). Because the vehicle is powered only by batteries or other electri- cal energy sources, zero emission can be achieved. However, the high initial cost of EVs, as well as its short driving range and long refueling time, has limited its use. 2. Series HEV: Figure 1.2 shows the series HEV drivetrain structure, in which there are two EMs and an ICE. In this structure, all the traction power is converted from electricity [11]. The ICE mechanical output is first converted into electricity by the EM 2. The converted electricity can either charge the battery or directly go to propel the wheels via the EM 1 and the transmission, thus bypassing the battery. Due to the decoupling of the ICE and the driving wheels, series HEVs have the advantage of being flexible in terms of the location of the ICE. For the same reason, the ICE can operate in its very narrow optimal region, independent of the vehicle speed [7]. However, such a cascade structure leads to relatively low efficiency ratings, and thus, all these EMs and ICE need to be sized for the maximum level of sustained power [13]. 3. Series-parallel HEV: As shown in Figure 1.3, there are also two EMs and an ICE in a series-parallel HEV . Different from a series HEV , the EMs and ICE are cou- pled together by a planetary gear set. The EM 1 and the transmission shaft are connected to the planetary ring gear, whereas the ICE is connected to the gear carrier and the EM 2 is connected to the sun gear. Because of the planetary gear set, the ICE speed is a weighted average of the speeds of EM 1 and EM 2. The EM 1 speed is proportional to the vehicle speed. For any given vehicle speed (or any 4 Figure 1.3: The drivetrain structure of a series-parallel HEV [1]. given EM 1 speed), the EM 2 speed can be chosen to adjust the ICE speed. There- fore, the ICE can operate in an optimal region by controlling the EM 2 speed [7]. The drivetrain structure of a series-parallel HEV is somewhat complicated and costly, and therefore controlling this structure is quite complex [7]. 4. Parallel HEV: The parallel HEV has a simplified drivetrain structure, since it only has one EM and one ICE, as shown in Figure 1.4. The EM and ICE are coupled to the transmission shaft in parallel. The traction power can be provided by ICE alone, EM alone, or both acting together. The EM can be used to charge the battery pack through regenerative braking or to store power from the ICE when its output is greater than the power required to drive the wheels. More efficient than the series HEV , the EM and ICE do not need to be sized for the maximum level of power [7]. However, because of the mechanical coupling between the ICE and the transmission, the ICE cannot always operate in its optimal region, and thus, clutches are often necessary [4]. 5 Figure 1.4: The drivetrain structure of a parallel HEV [1]. 1.3 Energy Management Strategies As discussed in Section 1.2, the drivetrain structure of an EV is not as complicated as that of an HEV , since fewer power components are involved. Therefore, the energy man- agement strategies of an EV mainly focus on the “battery” side rather than the “drive- train” side. The EV energy management strategies optimize the efficiency or the lifespan of the battery pack. A research program titled “development of supercapacitors for electric vehicles” proposed to integrate the supercapacitor as a peak power unit with the on-board energy source of an EV , aiming at improving the vehicle performance, battery life, and energy economy [14]. Carter et al. proposed two control strategies for EVs equipped a hybrid energy storage system (i.e., batteries and supercapacitors) [15]. The first strategy aimed at maximizing efficiency, whereas the second strategy aimed at prolonging the lifespan of batteries. They found that it is not usually possible to optimize for both efficiency and battery life, however, these two strategies could help each other to a certain extent. Ort´ uzar et al. designed, implemented and tested an auxiliary energy system, which is composed of a supercapacitor bank and a buck-boost converter, to extend the driving 6 range of an EV [16]. They evaluated the performance of the auxiliary energy system from an economic perspective, demonstrating that a battery life increase of about 50% was required to compensate the cost of the auxiliary energy system. Cao et al. proposed a new battery/supercapacitor hybrid energy storage system for EVs [17], in which a smaller dc/dc converter is used to pump the voltage of the supercapacitor and the battery is isolated from frequent chargeing/discharging. In this way, the battery life can be extended. Park et al. employed the charge migration technique for the hybrid energy storage system in an EV [18]. Different from other work, this work showed that charge migra- tion during idle and cruise/stopping time can be beneficial in terms of energy efficiency and cruise range. They demonstrated in the simulation that the proposed charge migra- tion between the supercapacitor and battery improves energy efficiency by 19.4%. On the other hand, due to the complexity of HEV drivetrain, appropriate control or energy management strategies are needed for meeting the driver’s demand for the traction power, sustaining the battery charge and optimizing drivetrain efficiency, fuel consumption and emissions [2]. A energy management strategy, which is usually imple- mented in the vehicle central controller, is defined as an algorithm, which is a law reg- ulating the operation of the drivetrain of the vehicle [19]. Figure 1.5 shows the clas- sification of the main proposed approaches to the HEV energy management problem [2]. As can be observed in that figure, there are two main categories: rule-based control and optimization-based control. And in the rule-based control category there are fuzzy rule-based control and deterministic rule-based control, while in the optimization-based control category there are global optimization and real-time optimization. Rule-based control strategies have been designed to determine the power split between ICE and EM based on intuition, heuristics, human expertise or fuzzy logic [20, 21, 22, 23]. Although rule-based approaches are effective for real-time supervisory 7 Figure 1.5: Classification of the hybrid drivetrain control strategies [2]. control, their results may be far from optimal. On the other hand, dynamic program- ming techniques have been applied to the power management of various types of HEVs [24, 25, 26]. Dynamic programming techniques can derive a globally optimal solution that minimizes the total fuel consumption during a whole driving cycle, which is given as a vehicle speed versus time profile for a specific trip. Unfortunately, the DP tech- niques require a priori knowledge of the driving cycles as well as detailed and accurate HEV modeling; therefore they are not applicable for real-time implementation. The equivalent consumption minimization strategy (ECMS) approach has been pro- posed to reduce the global optimization problem (as in dynamic programming tech- niques) to an instantaneous optimization problem [27]. However, the ECMS approach 8 strongly depends on the equivalence factors, which convert the electrical energy con- sumption of EM into the equivalent fuel consumption of ICE. The equivalence factors are quite sensitive to the driving cycles. For instance, the equivalence factors that are suitable for a driving cycle may lead to poor performance for other driving cycles. To overcome this challenge, the adaptive-ECMS (A-ECMS) approach has been applied for HEV power management based on driving cycle prediction within a finite horizon [28]. Although the A-ECMS approach has good performance, the detailed driving cycle pre- diction method has been omitted. Gong et al. has provided a trip modeling method using a combination of geographical information systems (GISs), global positioning systems (GPSs), and intelligent transportation systems (ITSs) [29]. However, the driving cycle constructed by this trip modeling method is synthetic and not accurate enough to capture the real driving scenarios, such as the effect of traffic lights and some unforeseen cir- cumstances. In [30, 31], the authors proposed the stochastic control method for HEVs based on a Markov chain model of the driving cycles. This method does not rely on a priori knowledge of the driving cycles, but it is not adaptive to the dynamical driving conditions. 1.4 EVs/HEVs in the Smart Grid As EVs and HEVs (especially PHEVs) have an increasing market share nowadays, their penetration may pose challenges to the power generation capability of the electricity grid [32]. A study from National Renewable Energy Laboratory concluded that a large penetration of PHEVs would place increased pressure on peaking units if charging is in a completely uncontrolled manner [33]. Pacific Northwest National Laboratory has the conclusion that existing electric power generation plants would be used at full capacity for most hours of the day to support up to 84% of the nation’s cars, pickup trucks and 9 SUVs for a daily drive of 33 miles on average [34]. Shao et al. examined the impact of PHEV charging on a distribution transformer under different charging scenario and concluded that new load peaks will be created, which in some cases my exceed the distribution transformer capacity [32]. On the other hand, the recently proposed smart grid technology that takes advantage of the modern communication system in order to improve the efficiency, reliability and sustainability of the power grid [35], can be employed to tackle the pressure on the grid caused by the penetration of EVs/HEVs. Some reference work [36, 37, 38, 39, 40] focus on minimizing the charging cost for EV/PHEV users. An optimal charge control method was proposed in [36], which opti- mizes charging time and energy flow to reduce daily electricity cost of vehicle charging. Cao et al. investigated the EV charging problem in the time-of-use electricity price scenario and proposed an intelligent method to minimize the charging cost taking into account the relation between the acceptable charging power of EV battery and the state of charge [37]. A charging scheduling problem was formulated for a group of PHEVs and solved to minimize charging cost [38]. Wu et al. proposed an operating framework for EV charging aggregators that first determines the purchase of energy in the day- ahead market and then distributes energy to EVs on the operating day [39]. Based on the game theory, Escudero-Garzas et al. formulated a charging station selection prob- lem, where EVs reduce their charging costs while satisfying their energy requirements, and the charging stations optimize their benefits [40]. As most of the vehicles are parked on an average of 96% of the time [41], the concept of vehicle-to-grid (V2G) was discussed, where the electrical energy storage ability of EV/PHEV batteries is explored for frequency regulation, load balancing, etc [41, 42, 43, 44]. Shimizu et al. proposed a V2G model considering EV users’ conve- nience and the state-of-charge control method by local control centers to suppress the grid frequency fluctuation [45]. Han et al. designed an aggregator for V2G frequency 10 regulation service utilizing active discharge of EVs with the consideration of energy constraint of the batteries [46]. Saad et al. formulated a non-cooperative game between PHEV groups (an PHEV group can, for example, represent a parking lot), which seek to sell part of their stored energy in a power market for maximizing a utility function capturing the tradeoff between the economical benefits from energy trading and the associated costs [47]. As a result of the growing demand for green energy technology, integration of EV/PHEV charging/discharging and renewable energy scheduling into the grid can be mutually beneficial. Li et al. explored the controllable nature of EV charging to accom- modate the intermittent wind energy with the aim of optimal electricity cost, replac- ing fossil fuel generation by wind energy, reducing ancillary service by controlled EV charging, and improving the quality of electricity service [48]. Wu et al. addressed the problem of integrating wind power into the grid by smart scheduling of end users’ energy consumption using a game-theoretic pricing algorithm, considering both resi- dential users and EV users, both shiftable loads and non-shiftable loads [49]. Mets et al. presented a distributed algorithm to increase the usage of wind energy while minimizing imbalance costs and the disutility experienced by consumers [50]. 11 Chapter 2 HEV Modeling 2.1 Operation Modes By way of an example and without loss of generality, the parallel HEV drivetrain struc- ture as shown in Figure 1.4 is used in this thesis. In a parallel HEV , the internal combus- tion engine (ICE) and the electric motor (EM) can deliver power in parallel to drive the wheels. Since both the ICE and EM are coupled to the drivetrain shaft via two clutches, the propulsion power may be supplied by the ICE alone, by the electric motor alone, or by both [4]. The EM can serve as a generator to charge the battery when the vehicle is braking or the ICE output power is greater than required. There are five different operation modes in a parallel HEV , depending on the flow of energy. 1. ICE only mode: wheels are driven only by the ICE. 2. EM only mode: wheels are driven only by the EM and the ICE is off. 3. Power assist mode: wheels are driven by both the ICE and the EM. 4. Battery charging mode: a part of the ICE power drives the EM as a generator to charge the battery pack, while the other part of the ICE power drives the wheels. 5. Regenerative braking mode: the wheels drive the EM as a generator to charge the battery pack when the vehicle is braking. 12 Figure 2.1: The fuel consumption map of an ICE. 2.2 Internal Combustion Engine (ICE) For the study of HEV energy management, ICE dynamics is ignored based on the quasi- static assumption [51]. Figure 2.1 represents a contour map of the fuel consumption of an ICE in the speed-torque plane. And Figure 2.2 shows the corresponding contour map of the fuel efficiency. It is a 1.0L VTEC-E SI ICE modeled by the advanced vehicle simulator ADVISOR [52]. The ICE has a peak power of 50 kW and a peak efficiency of 40%. The fuel consumption rate _ m f (in g/s) of an ICE is a nonlinear function of the ICE speed! ICE (in rad/s) and torqueT ICE in (Nm). The fuel efficiency of an ICE is calcu- lated by ICE (! ICE ;T ICE ) =T ICE ! ICE =( _ m f D f ); (2.1) 13 Figure 2.2: The fuel efficiency map of an ICE. whereD f is the fuel energy density (in J/g). A “good” energy management policy should avoid ICE operation point (! ICE ;T ICE ) in the low efficiency region. Superimposed on the contour maps is the maximum ICE torqueT max ICE (! ICE ) (the dashed line). To ensure safe and smooth opera- tion of an ICE, the following constraints should be satisfied: ! min ICE ! ICE ! max ICE ; (2.2) 0 T ICE T max ICE (! ICE ): 14 Figure 2.3: The efficiency map of an EM. 2.3 Electric Motor (EM) Figure 2.3 shows the contour map of the efficiency of an EM in the speed-torque plane. It is a permanent magnet EM modeled by ADVISOR [52]. The EM has a peak power of 10 kW and a peak efficiency of 96%. The EM can work as either a motor or a generator. We use ! EM and T EM to denote the speed and torque of the EM. When T EM 0, the EM operates as a motor corresponding to the upper half plane in Figure 2.3; when T EM < 0, the EM operates as a generator corresponding to the lower half plane in Figure 2.3. The efficiency of the EM is defined by EM (! EM ;T EM ) = 8 > < > : (! EM T EM )=P batt T EM 0 P batt =(! EM T EM ) T EM < 0 (2.3) 15 whereP batt is the output power of the battery pack. WhenT EM 0, the battery pack is discharging andP batt is a positive value; whenT EM < 0, the EM operates as a generator, the battery pack is charging andP batt is a negative value. Superimposed on the contour map (Figure 2.3) are the maximum and minimum EM torques (the dashed lines) i.e., T max EM (! EM ) andT min EM (! EM ), respectively. To ensure safe and smooth operation of an EM, the following constraints should be satisfied: 0 ! EM ! max EM ; (2.4) T min EM (! EM ) T EM T max EM (! EM ): 2.4 Drivetrain Mechanics In what follows, we discuss a simplified but significantly accurate parallel HEV drive- train model as in [53, 54]. The following equations describe the drivetrain mechanics, showing the mechanical coupling between different components and the vehicle. Speed relation. ! wh = ! ICE R(k) = ! EM R(k) reg : (2.5) Torque relation T wh =R(k) (T ICE + reg T EM ( reg ) ) ( gb ) : (2.6) 16 ! wh andT wh are the wheel speed and torque, respectively. R(k) is the gear ratio of the k-th gear. The reg is the reduction gear ratio. The reg and gb are the reduction gear efficiency and gear box efficiency, respectively. = 8 > < > : +1 T EM 0; 1 T EM < 0: (2.7) = 8 > < > : +1 T ICE + reg T EM ( reg ) 0; 1 T ICE + reg T EM ( reg ) < 0: (2.8) 2.5 Vehicle Dynamics The vehicle is considered as a rigid body with four wheels and the vehicle mass is assumed to be concentrated in a single point, as shown in Figure 2.4. The following force balance equation describes the vehicle dynamics: ma =F TR F g F R F AD : (2.9) m is the vehicle mass,a is the vehicle acceleration, andF TR is the total tractive force. The force due to road slope is given by F g =mg sin; (2.10) where is the road slope angle. The rolling friction force is given by F R =mg cosC R ; (2.11) 17 Figure 2.4: The analysis of vehicle dynamics [3]. whereC R is the rolling friction coefficient. The air drag force is given by F AD = 0:5C D A F v 2 ; (2.12) where is air density,C D is air drag coefficient,A F is the vehicle frontal area, andv is the vehicle speed. Givenv,a, and, the total tractive forceF TR can be derived using (2.9)(2.12). Then, the wheel speed and torque are related toF TR ,v, and wheel radius r wh by ! wh = v=r wh ; (2.13) T wh = F TR r wh : 18 2.6 Backward-Looking Optimization In this work, the backward-looking optimization approach [24, 25, 26, 30, 31, 27, 28, 29] is adopted, which implies that the HEV controller determines operation of ICE and EM, so that the vehicle meets the target performance (speedv and accelerationa) specified in benchmark driving cycles [55]. In reality, the drivers determine the speed v and power demandp dem =! wh T wh profiles for propelling the HEV (through pressing the acceleration or brake pedal.) The backward-looking optimization is equivalent to actual HEV management becausep dem anda satisfy a relationship specified in Section 2.5. With given values of vehicle speed v and acceleration a (or power demand p dem ), the required wheel speed! wh and torqueT wh satisfy (2.9)(2.13). In addition, the five variables, i.e., the ICE speed ! ICE and torque T ICE , the EM speed ! EM and torque T EM , and the gear ratio R(k), should satisfy (2.5) and (2.6) to support the required wheel speed and torque. The HEV controller chooses two of them, say,T EM andR(k), as the control variables. Then the rest of variables (i.e.,! ICE ,T ICE , and! EM ) become dependent (associate) variables, the values of which are determined byT EM andR(k) accordingly. The results of the HEV power management policy are the fuel consumption rate of the ICE, and the battery pack output power (associated with the EM.) 19 Chapter 3 MDP-Based HEV Energy Management This work [56] aims at minimizing the HEV fuel consumption over any driving cycles. Unlike some previous approaches, we do not assume the complete information about the driving cycles to be available to the HEV controller in advance. Therefore, this work proposed to model the HEV energy management problem as a Markov decision pro- cess (MDP) based on the Markov chain model of driving cycles and the MDP model of the battery pack, which capture the stochastic information in driving cycles, and the rate capacity and recovery effects of the battery pack, respectively. The optimal power management policy is derived using a policy iteration algorithm, which is a standard dynamic programming-based algorithm for deriving the optimal policy of an MDP. Sim- ulation results over real-world and testing driving cycles demonstrate that the proposed optimal power management policy improves HEV fuel economy by 23.9% on average compared to the rule-based policy. Markov decision processes (MDPs) provide a powerful mathematical tool for sequential decision making in situations where outcomes are partly random and partly under the control of a decision maker [57]. MDPs have been widely applied to many areas including robotics, automated control, and dynamic power management for embedded systems [58]. 20 3.1 MDP Concepts and Definitions We focus on a discrete-time finite-state Markov decision process of a continual process- control task, which is best suited for modeling the HEV power management problem. The whole time horizon is discretized into a sequence of time steps, indexed by t = 0; 1; 2;:::. There are a finite setS of states and a finite setA of actions. At each time step, the process is in some states2S, and the decision maker may choose any action a2A s , whereA s (A) is the set of actions available for states. MatrixP a is the state transition probability matrix, whereP a ss 0 denotes the probability that actiona in states at time stept will lead to states 0 at time stept + 1. We have s 0 2S P a ss 0 = 1. MatrixR a is the immediate reward matrix, whereR a ss 0 denotes the (expected) immediate reward received after transition to states 0 (at time stept + 1) from states (at time stept) under actiona. MatricesP a andR a (for alla2A) completely specify the most important aspects of the dynamics of an MDP. A policy, demoted by, for the decision maker is a mapping from each states2S to an actiona2A that specifies the actiona =(s) that the decision maker will choose when the process is in state s. The MDP optimization problem targets at finding the optimal policy such that V (s) =E ( 1 X k=0 k r t+k+1 js t =s ) (3.1) is maximized for each state s2S. The value function V (s) is the expected return when the process starts in state s at time step t and follows policy thereafter. is a parameter, 0 < < 1, called the discount rate that ensures the infinite sum (i.e., P 1 k=0 k r t+k+1 ) converges to a finite value. t+k+1 is the immediate reward received at time stept +k + 1, the value of which can be obtained from indexing matricesR a ss 0 withs =s t+k ,s 0 =s t+k+1 , anda =(s t+k ). 21 For any policy and any state s2S, the following consistency condition holds betweenV (s) andV (s 0 ), wheres 0 is a possible successor state ofs: V (s) = X s 0 P a ss 0 (R a ss 0 + V (s 0 )); (3.2) wherea =(s). 3.2 Stochastic Driving Cycle Modeling A driving cycle is given as a vehicle speed versus time profile for a specific trip. The HEV controller aims at deriving a energy management policy to minimize the fuel con- sumption during a whole driving cycle. If the HEV controller has a priori knowledge of a whole driving cycle at the beginning of a trip, the global optimal policy can be derived using dynamic programming (DP) techniques [24, 25, 26]. However, such dependency on a priori knowledge has become a major deterrent to utilizing the DP approaches, i.e., the difficulty in implementation of the real-time control. We capture the stochastic information in driving cycles using a discrete-time Markov chain model, which predicts the probability distribution of states in the next time step given the state in the current time step. We define the state space for the Markov chain model as a finite number of states, each represented by the power demand and vehicle speed levels: S D = s D = [p dem ;v] T jp dem 2P dem ;v2V = s D 1 ;s D 2 ;:::;s D M ; (3.3) wherep dem = T wh ! wh is the power demand for propelling the HEV ,P dem andV are respectively the finite sets of power demand levels and vehicle speed levels (discretiza- tion is required because our MDP model is suitable for discrete-state spaces), andM is 22 the total number of states inS D . Then in the state transition probability matrixP D of the Markov chain, the elementP D i;j denotes the probability that the state in the next time step iss D j given that the current state iss D i : P D i;j =Pr(s D t+1 =s D j js D t =s D i ): (3.4) To estimate these state transition probabilities, one needs observation data for both power demand and vehicle speed. We obtain these observations from real-world and testing driving cycle profiles. These profiles provide histories of vehicle speed versus time, and we use the vehicle dynamics to extract the corresponding power demand his- tories as follows: p dem =F TR v =m dv dt v +mg sinv +mg cosC R v + 0:5C D A F v 3 ; (3.5) which is derived based on Section 2.5. In this work, we use real-world and testing drive cycles developed by different orga- nizations and projects such as EPA (US Environmental Protection Agency), INRETS (French National Institute for Transport and Safety Research), and MODEM, to com- pute the observation data and then derive the state transition probability matrix P D using the maximum likelihood estimation method [59]. In reality, the state transition probability matrix can be updated, if new driving scenarios are observed by the HEV controller. 3.3 Stochastic Battery Modeling Although the battery pack in an HEV does not couple with the drivetrain directly, it is an important and active power component for an HEV since it provides electrical energy to 23 power the EM and also stores electrical energy generated from the EM (as a generator) during regenerative braking. A comprehensive understanding of the battery is necessary for deriving the HEV energy management policy. The state of a battery pack is represented by the amount of charge stored in the battery pack. The majority of literature on HEV energy management adopts a simple battery model as follows [60]: Q t =Q ini t X k=0 I k T; (3.6) whereQ t is the amount of charge stored in the battery pack at the end of time stept, Q ini is the amount of charge stored in the battery pack at the beginning of time step 0, I t is the discharging current of the battery pack at time step t (I t < 0 means battery charging), and T is the length of a time step. However, this battery model ignores the rate capacity effect, which causes the most significant power loss when the battery pack charging/discharging current is high [61]. We know that the battery pack charg- ing/discharging current is high during deceleration and acceleration, and therefore the rate capacity effect should be considered. The rate capacity effect specifies that if the battery pack is discharging (I > 0), the actual charge decreasing rate inside the battery pack is higher than I; and if the battery pack is charging (I < 0), the actual charge increasing rate inside the battery pack is lower thanjIj. In addition, the battery model mentioned above also ignores the recovery effect, which specifies that the battery pack can partially recover charge loss in previous discharges if relaxation time is allowed between discharges [61]. A stochastic battery model was proposed in [62]. Inspired by that model, we propose a Markov decision process model for the battery pack in an HEV that captures both the rate capacity effect and the recovery effect. We define the state space for the MDP 24 battery model by discretizing the range of the stored charge of the battery pack i.e., [Q min ;Q max ] into a finite number of charge levels: S BA = s BA 1 ;s BA 2 ;:::;s BA N ; (3.7) whereQ min = s BA 1 < s BA 2 < ::: < s BA N = Q max . Usually, Q min andQ max are 40% and 80% of the battery pack capacity, respectively, to ensure “healthy” operation of the battery pack [28]. An action a(2A) taken by the decision maker is to discharge the battery pack with a current value of I, where I > 0 denotes discharging, I < 0 denotes charging, and I = 0 denotes idle. The battery charging/discharging current range [I max ;I max ] is discretized into a finite numberL of values. Then the cardinality ofA equalsL. Next, we need to derive the state transition probability matrixPb a , where the ele- mentPb a i;j denotes the probability that actiona in states BA i at time stept will lead to state s BA j at time step t + 1. Specifically, if action a is to set the battery pack in idle (I = 0), the battery pack has a probability to switch to a state with higher charge level due to the recovery effect; if actiona is to discharge the battery (I > 0), the battery pack will switch to some state(s) with lower charge level and the charge decrease is larger thanI T due to the rate capacity effect; and if actiona is to charge the battery pack (I < 0), the battery pack will switch to some state(s) with higher charge level and the charge increase is smaller thanjIj T due to the rate capacity effect. 3.4 MDP Modeling of HEV Energy Management In this section, we propose to model the HEV energy management problem as an MDP based on the Markov chain model of driving cycles and the MDP model of the battery 25 pack derived in the Sections 3.2 and 3.3. We define the state space for the MDP model of HEV energy management as S = s i;j = [s D i ;s BA j ]js D i 2S D ;s BA j 2S BA : (3.8) An action a(2A) taken by the decision maker is to discharge the battery pack with a current value of I, where I > 0 denotes charging, I < 0 denotes discharging and I = 0 denotes idle. The battery charging/discharging current range [I max ;I max ] is discretized into a finite numberL of values. Then the cardinality ofA equalsL. 3.4.1 State Transition Probability Matrix Now, we need to derive the state transition probability matrixP a for the MDP, where the elementP a (i;j);(i 0 ;j 0 ) denotes the probability that actiona in states i;j at time stept will lead to states i 0 ;j 0 at time stept + 1. The value ofP a (i;j);(i 0 ;j 0 ) can be derived by P a (i;j);(i 0 ;j 0 ) =P D i;i 0Pb a j;j 0; (3.9) whereP D andPb a are the state transition probability matrices of the Markov chain model of driving cycles and the MDP model of the battery pack, respectively. Because these two models are independent with each other,P a (i;j);(i 0 ;j 0 ) can be derived by multi- plying the corresponding matrix elements inP D andPb a directly. 3.4.2 Immediate Reward Matrix Next, we need to derive the immediate reward matrix R a for the MDP model of HEV power management, where the elementR a (i;j);(i 0 ;j 0 ) denotes the immediate reward received after taking actiona in states i;j . MDP optimization problems aim to maximize 26 the discounted sum of the immediate rewards in the long run as shown in (3.1). We define the immediate reward as the negative of the fuel consumption in a time step, and therefore by solving the MDP optimization problem we can minimize the overall fuel consumption. The fuel consumption in a time step depends on states i;j and actiona at the time step. Suppose in state s i;j the power demand is p dem and the vehicle speed is v; and actiona specifies that the battery pack discharging current isI. Then, the wheel speed and torque are calculated by ! wh = v=r wh ; (3.10) T wh = p dem r wh =v: (3.11) The battery output power is calculated by P batt =V OC IR batt I 2 ; (3.12) whereV OC is the open-circuit voltage of the battery pack andR batt is the internal resis- tance of the battery pack. Please note that ifP batt < 0, the battery pack is being charged and its input power isjP batt j. In order to derive the fuel consumption in the time step, we need to solve the following fuel optimization (FO) problem: Given the values of! wh ,T wh , andP batt , find the EM torqueT EM and gear ratioR(k) to minimize the fuel consumption rate _ m f subject to (2.2)(2.6). Usually, there are about five values thatR(k) can assume. For each of the possible R(k) values, we first calculate! ICE and! EM using (2.5), next calculateT EM using (2.3) while satisfying (2.4), and then calculateT ICE using (2.6) while satisfying (2.2). With ! ICE andT ICE , the fuel consumption rate _ m f is obtained based on the ICE model. We choose theR(k) value that results in the minimum _ m f i.e., _ m opt f . We calculateR a (i;j);(i 0 ;j 0 ) 27 as _ m opt f T . Please note that the immediate rewardR a (i;j);(i 0 ;j 0 ) is independent of the successor states i 0 ;j 0. Therefore, the four-dimensional matrixR a (i;j);(i 0 ;j 0 ) can be reduced to a two-dimensional matrixR a (i;j) . 3.5 MDP Optimal Policy Derivation In Section 3.4, we have modeled the HEV power management problem as an MDP with the four essential tuples; the state setS, the action setA, the state transition probability matrix P a , and the immediate reward matrix R a . Now, we will derive the optimal policy to maximize the value functionV (s i;j ) for all statess i;j 2S. We adopt the policy iteration algorithm, which is a dynamic programming (DP)-based algorithm to derive the optimal policy for an MDP [57]. The policy iteration algorithm is based on the consistency condition (3.2), which we rewrite here as V (s i;j ) = X (i 0 ;j 0 ) P a (i;j);(i 0 ;j 0 ) R a (i;j) + V (s i 0 ;j 0) ; (3.13) wheres i 0 ;j 0 is a possible successor state ofs i;j . The policy iteration algorithm consists of two basic steps: policy evaluation and policy improvement. The policy evaluation step derives the value function for each state s i;j 2S with a given policy (s i;j ) through iteration. The policy improvement step, for each states i;j 2S, changes the action from the existing policy to a potentially new action that results in a larger value function. If no further improvement can be done, the policy improvement will terminate with the optimal policy; otherwise, the new policy go through policy evaluation and improvement steps once more. In summary, the policy iteration is a DP-based algorithm to derive the optimal policy for an MDP. It will result in an optimal HEV energy management policy that specifies the actiona to take for the HEV , when the HEV is in some states i;j . The actiona itself 28 Algorithm 1 Policy Iteration algorithm 1: InitializationV (s i;j )2R and(s i;j )2A s i;j arbitrarily for alls i;j 2S f%Policy Evaluationg 2: repeat 3: 0 4: for eachs i;j 2S do 5: v V (s i;j ) 6: V (s i;j ) P (i 0 ;j 0 ) P (s i;j ) (i;j);(i 0 ;j 0 ) (R (s i;j ) (i;j) + V (s i 0 ;j 0)) 7: max(;jvV (s i;j )j) 8: end for 9: until < (a small positive number) f%Policy Improvementg 10: policy stable true 11: for eachs i;j 2S do 12: b (s i;j ) 13: (s i;j ) arg max a P (i 0 ;j 0 ) P a (i;j);(i 0 ;j 0 ) (R a (i;j) + V (s i 0 ;j 0)) 14: ifb6=(s i;j ) then 15: policy stable false 16: end if 17: end for 18: ifpolicy stable then 19: stop 20: else 21: go to Policy Evaluation 22: end if is represented as a discharging/charging current level of the battery pack, and the actual control variables i.e., the EM torque T EM and gear ratio R(k), are obtained through solving the FO problem, for which the solution has been derived in Section 3.4.2. The policy iteration algorithm is executed offline, and therefore the execution complexity is not an important concern. 3.6 Experimental Results The HEV model used for this work is based on Honda Insight Hybrid model developed in ADVISOR [52]. Key parameters are summarized in Table 3.1. We compare our 29 VehicleC D 0.32 ICE Max. Power (kW) 50 VehicleA F (m 2 ) 1.48 ICE Max. Torque (Nm) 89.5 Vehicler wh (m) 0.3 EM Max. Power (kW) 10 Vehiclem (kg) 1000 Battery Capacity (Ah) 6.5 Reduction Gear Ratio reg 1.4 Battery V oltage (V) 144 Table 3.1: Honda Insight Hybrid component parameters. Driving Cycle Rule-based Policy Proposed Policy Reduction HWFET 339.94g 313.57g 7.75% IM240 92.20g 48.93g 46.93% LA92 585.26g 353.17g 39.66% NEDC 319.71g 202.49g 36.66% NYCC 86.06g 50.94g 40.81% UDDS 363.95g 204.86g 43.71% Table 3.2: Fuel consumption: proposed and rule-based policies. proposed optimal energy management policy derived from the MDP model with the rule-based energy management policy described in [23]. First, we compare the value functions as defined in (3.1) of the proposed policy opt and the rule-based policy rb . The value functionV (s) of a policy demonstrates the negative of the expected discounted sum of fuel consumption in the long run, which is to be maximized by opt and rb . It is equivalent to minimizing the fuel consumption. We compare avg s V (s) value, which is the average value function over all states, of opt and rb . If a discount rate = 0:9 in (3.1) is used, we obtainavg s V opt (s) =3:41 and avg s V rb (s) =5:53, which shows the proposed policy achieves 38.3% reduction in fuel consumption. If a discount rate = 0:95 is used, we obtainavg s V opt (s) =6:63 andavg s V rb (s) =10:46, which shows the proposed policy achieves 36.6% reduction in fuel consumption. Overall, the proposed policy outperforms the rule-based policy in terms of value function. Next, we test the fuel consumption of the proposed policy and rule-based policy on real-world and testing driving cycles. The fuel consumptions over some driving cycles 30 Figure 3.1: The ICE operation points from the proposed and rule-based policies. are summarized in Table 3.2. We can observe that the proposed policy always results in lower fuel consumption and the maximum reduction in fuel consumption is as high as 46.93%. We plot the ICE operation points over a driving cycle on the ICE fuel efficiency map in Figure 3.1. The “x” points are from the rule-based policy and the “o” points are from our proposed policy. We can observe that the operation points from the proposed policy are more concentrated on the high efficiency region of the ICE. Furthermore, we compare the overall fuel economy of the proposed policy and the rule-based policy over 17 real-world and testing driving cycles with a total driving time of five hours and both local and highway driving conditions. The rule-based policy achieves a MPG value of 46 and the proposed policy achieves a MPG value of 57, demonstrating the proposed policy improves the fuel economy by 23.9% on average. 31 Chapter 4 Reinforcement Learning-Based HEV Energy Management This work [63] also aims at minimizing the HEV fuel consumption over any driving cycles. We propose to use the reinforcement learning technique for deriving the opti- mal HEV power management policy. Unlike some previous approaches, which require complete or stochastic information of the driving cycles, in this new method the HEV controller does not require any prior information about the driving cycles and uses only partial information about the HEV modeling, which avoids reliance on complex HEV modeling while coping with driver specific behaviors. Consequently, we carefully define the state space, action space, and reward in the reinforcement learning technique such that the objective of the reinforcement learning agent coincides with our goal of mini- mizing the HEV overall fuel consumption. We employ the TD()-learning algorithm to derive the optimal HEV energy management policy, due to its relatively higher conver- gence rate and higher performance in non-Markovian environment. Simulation results over real-world and testing driving cycles demonstrate that the proposed HEV power management policy can improve fuel economy by 42%. 32 Agent Environment action reward state t a t s t r 1 t r 1 t s Figure 4.1: The agent-environment iteration. 4.1 Reinforcement Learning Background Reinforcement learning provides a mathematical framework for discovering or learning strategies that map situations onto actions with the goal of maximizing a reward func- tion [57]. The learner and decision-maker is called the agent. The thing it interacts with, comprising everything outside the agent, is called the environment. The agent and envi- ronment interact continually, the agent selecting actions and the environment responding to those actions and presenting new situations to the agent. The environment also gives rise to rewards, which are special numerical values that the agent tries to maximize over time. Figure 4.1 illustrates the agent-environment interaction in reinforcement learning. Specifically, the agent and environment interact at each of a sequence of discrete time steps, i.e.,t = 0; 1; 2; 3;:::. At each time stept, the agent receives some representation of the environment’s state, i.e., s t 2S, whereS is the set of possible states, and on that basis selects an action, i.e., a t 2A(s t )A, whereA(s t ) is the set of actions available in states t andA is the set of all possible actions. One time step later, in part as 33 a consequence of its action, the agent receives a numerical reward, i.e.,r t+1 2R, and finds itself in a new state, i.e.,s t+1 . A policy, denoted by , of the agent is a mapping from each state s 2 S to an action a2A that specifies the action a = (s) that the agent will choose when the environment is in state s. The ultimate goal of an agent is to find the optimal policy, such that V (s) =E ( 1 X k=0 k r t+k+1 js t =s ) (4.1) is maximized for each state s2S. The value function V (s) is the expected return when the environment starts in states at time stept and follows policy thereafter. is a parameter, 0 < < 1, called the discount rate that ensures the infinite sum (i.e., P 1 k=0 k r t+k+1 ) converges to a finite value. More importantly, reflects the uncertainty in the future.r t+k+1 is the reward received at time stept +k + 1. 4.2 Backward-Looking Approach We still use the backward-looking optimization approach as discussed in Section 2.6, where the HEV controller determines the operation of ICE and EM, so that the vehicle meets the target performance (speedv and accelerationa) specified in benchmark driv- ing cycles. However, the HEV controller now chooses the battery output powerP batt (or equivalently, battery charging/discharging current) and the gear ratioR(k) as the control variables. Then the rest of variables i.e.,! ICE ,T ICE ,! EM , andT EM become dependent (associate) variables, the values of which are determined byP batt andR(k). The results of the HEV power management policy are the fuel consumption rate of the ICE. 34 4.3 Motivations Reinforcement learning provides a powerful solution to the problems in which 1. different actions should be taken according to the change of system states, and the future state depends on both the current state and the selected action; 2. an expected cumulative return instead of an immediate reward will be optimized; 3. the agent only needs knowledge of the current state and the reward it receives, while it needs not have knowledge of the system input in prior or the detailed system modeling; 4. the system might be non-stationary to some extent. The second, third, and fourth properties differentiate reinforcement learning from other machine learning techniques, model-based optimal control and dynamic programming, and Markov decision process-based approach. The HEV power management problem, on the other hand, possesses all of the four above-mentioned properties. 1. During a driving cycle, the change of vehicle speed, power demand, and battery charge level necessitates different operation modes and actions as discussed in Section 2.1, and also the future battery charge level depends on the battery charg- ing/discharging current. 2. The HEV power management aims at minimizing the total fuel consumption dur- ing a whole driving cycle rather than the fuel consumption rate at a certain time step. 3. The HEV power management agent does not have a priori knowledge of a whole driving cycle, while it has only the knowledge of the current vehicle speed and 35 power demand values and the current fuel consumption rate as a result of the action taken. 4. The actual driving cycles are non-stationary [55]. Therefore, the reinforcement learning technique better suits the HEV power manage- ment problem than other optimization methods. 4.4 State, Action and Reward of HEV Energy Manage- ment 4.4.1 State Space We define the state space of the HEV energy management problem as a finite number of states, each represented by the power demand, vehicle speed, and battery pack stored charge levels: S = s = [p dem ;v;q] T jp dem 2P dem ;v2V;q2Q ; (4.2) where p dem is the power demand for propelling the HEV 1 , which can be interpreted from the positions of the acceleration pedal and the brake pedal; q is the battery pack stored charge;P dem ,V, andQ are respectively the finite sets of power demand levels, vehicle speed levels, and battery pack stored charge levels. Discretization is required 1 The power demandp dem instead of vehicle acceleration is selected as a state variable because (i) the power demand can be interpreted from positions of accelecration and brake pedals, and (ii) experiments show that the power demand has higher correlation with actions in the system. 36 when defining these finite sets. In particular,Q is defined by discretizing the range of the battery pack stored charge i.e., [q min ;q max ] into a finite number of charge levels: Q =fq 1 ;q 2 ;:::;q N g; (4.3) where q min q 1 < q 2 < ::: < q N q max . q min and q max are 40% and 80% of the battery pack capacity, respectively, in the SOC-sustaining energy management for ordinary HEVs [24, 25, 26]. On the other hand,q min andq max are 0% and 80% of the battery pack capacity, respectively, in the SOC-depletion energy management for plug- in HEVs [29], in which the battery pack can be recharged from the power grid during parking time. 4.4.2 Action Space We define the action space of the HEV power management problem as a finite number of actions, each represented by the discharging current of the battery pack and gear ratio values: A = a = [i;R(k)] T ji2I;R(k)2R ; (4.4) where an actiona = [i;R(k)] T taken by the agent is to discharge the battery pack with a current value ofi and choose thek-th gear ratio 2 . The setI contains within it a finite number of current values in the range of [I max ;I max ]. Please note thati > 0 denotes discharging the battery pack;i< 0 denotes charging the battery pack; andi = 0 denotes idle. The setR contains the allowable gear ratio values, which depend on the drivetrain design. Usually, there are four or five gear ratio values in total [23, 30]. 2 According to the discussions in Section 4.2, the selected action will be sufficient to determine the values of all dependent variables in HEV control. 37 The above definition of the action space enables that the reinforcement learning agent does not require detailed HEV modeling (we will elaborate this Section 4.6). The complexity and convergence speed of reinforcement learning algorithms are propor- tional to the number of state-action pairs [57]. In order to reduce computation com- plexity and accelerate convergence, we modify the action space to reduce the number of actions based on the HEV modeling. The reduced action space only contains charg- ing/discharging current values of the battery pack: A =fa = [i]ji2Ig: (4.5) The inherent principle of reducing the action space is: with the selected actiona = [i], we can derive the best-suited gear ratio analytically when we have the knowledge of the HEV modeling. More precisely, we derive the best-suited gear ratio by solving the following fuel optimization (FO) problem: Given the values of the current state s = [p dem ;v;q] T and the current action a = [i], find the gear ratio R(k) to minimize the fuel consumption rate _ m f subject to (2.2)(2.6). Based on the current states = [p dem ;v;q] T and the current actiona = [i],! wh ,T wh , and the battery output powerP batt are calculated according to ! wh = v=r wh ; (4.6) T wh = p dem r wh =v; (4.7) P batt = V OC iR batt i 2 ; (4.8) whereV OC is the open-circuit voltage of the battery pack andR batt is the internal resis- tance of the battery pack. 38 To solve the FO problem, for each of the possible R(k) values, we first calculate ! ICE and! EM using (2.5), next calculateT EM using (2.3) while satisfying (2.4), and then calculate T ICE using (2.6) while satisfying (2.2). With ! ICE and T ICE , the fuel consumption rate _ m f is obtained based on the ICE model. We choose theR(k) value that results in the minimum _ m f i.e., _ m opt f . We will refer to the action space shown in (4.4) and (4.5) as the original action space and the reduced action space, respectively, in the following discussions. 4.4.3 Reward We define the rewardr that the agent receives after taking actiona while in states as the negative of the fuel consumption in that time step i.e., _ m opt f T , where T is the length of a time step. Remember from Section 4.1 that the agent in reinforcement learning aims at maximizing the expected return i.e., the discounted sum of rewards. Therefore, by using the negative of the fuel consumption in a time step as the reward, the total fuel consumption will be minimized while maximizing the expected return. 4.5 TD()-Learning Algorithm for HEV Energy Man- agement To derive the optimal HEV energy management policy, we employ a specific type of reinforcement learning algorithm, namely the TD()-learning algorithm [64], due to its relatively higher convergence rate and higher performance in non-Markovian environ- ment. In the TD()-learning algorithm, a Q value, denoted by Q(s;a), is associated with each state-action pair (s;a), which approximates the expected discounted cumula- tive reward of taking actiona at states. There are two basic steps in the TD()-learning algorithm: action selection and Q-value update. 39 4.5.1 Action Selection A straightforward approach for action selection is to always choose the action with the highest Q value. If we do so, however, we are at the risk of getting stuck in a sub- optimal solution. A judicious reinforcement learning agent should exploit the best action known so far to gain rewards while in the meantime explore all possible actions to find a potentially better choice. We address this exploration versus exploitation issue by breaking the learning procedure into two phases: In the exploration phase, -greedy- policy is adopted, i.e., the current best action is chosen only with probability of 1. In the exploitation phase, the action with the highestQ value is always chosen. 4.5.2 Q-Value Update Suppose that actiona t is taken in states t at time stept, and rewardr t+1 and new state s t+1 are observed at time stept+1. Then at time stept+1, the TD()-learning algorithm updates theQ value for each state-action pair (s;a) as: Q(s;a) Q(s;a) +e(s;a); (4.9) where is a coefficient controlling the learning rate, e(s;a) is the eligibility of the state-action pair (s;a), and is calculated as r t+1 + max a 0 Q(s t+1 ;a 0 )Q(s t ;a t ): (4.10) In (4.10), is the discount rate. 40 At time stept + 1, the eligibilitye(s;a) of each state-action pair is updated by e(s;a) 8 < : e(s;a) + 1; s =s t \a =a t ; e(s;a); otherwise, (4.11) to reflect the degree to which the particular state-action pair has been chosen in the recent past, where is a constant between 0 and 1. In practice, we do not have to update Q values and eligibilitye of all state-action pairs. We only keep a list ofM most recent state-action pairs since the eligibility of all other state-action pairs is at most M , which is negligible whenM is large enough. 4.5.3 Algorithm Description The pseudo code of the TD()-learning algorithm for HEV power management is sum- marized as follows. Algorithm 2 TD()-Learning Algorithm for HEV Energy Management 1: InitializeQ(s;a) arbitrarily for all the state-action pairs. 2: for each time stept do 3: Choose actiona t for states t using the exploration-exploitation policy discussed in Section 5.2.3. 4: Take actiona t , observe rewardr t+1 and the next states t+1 . 5: r t+1 + max a 0Q(s t+1 ;a 0 )Q(s t ;a t ). 6: e(s t ;a t ) e(s t ;a t ) + 1. 7: for all state-action pair (s;a) do 8: Q(s;a) Q(s;a) +e(s;a). 9: e(s;a) e(s;a). 10: end for 11: end for 41 4.6 Model-Free Property Analysis Theoretically, the reinforcement learning technique could be model-free, i.e., the agent does not require detailed system model to choose actions as long as it can observe the current state and reward as a result of an action previously taken by it. For the HEV energy management problem, model-free reinforcement learning means that the con- troller (agent) should be able to observe the current state (i.e., power demand, vehicle speed, and battery pack charge levels) and the reward (i.e., the negative of fuel consump- tion in a time step) as a result of an action (i.e., battery pack discharging current and gear ratio selection), while the detailed HEV models are not needed by the controller. Now let us carefully examine whether the proposed reinforcement learning technique could be exactly model-free (or to which extent it could be model-free) in practical implemen- tations. For the reinforcement learning technique using the original action space: To observe the current state, the agent can use sensors to measure power demand level and the vehi- cle speed. And also, the reward can be obtained by measuring the fuel consumption. However, the battery pack charge level cannot be obtained directly from online mea- surement during HEV driving, since the battery pack terminal voltage changes with the charging/discharging current and therefore it could not be an accurate indicator of the battery pack stored charge level [61]. To address this problem, a battery pack model together with the Coulomb counting method [65] is needed by the agent. In summary, the reinforcement learning technique with the original action space is mostly model-free, i.e., only the battery pack model is needed. For the reinforcement learning technique using the reduced action space: Given the current state and the action (charging/discharing current) taken, the agent should decide the gear ratio by solving the FO problem, where the ICE, the EM, the drivetrain mechanics and the battery pack models are needed. On the other hand, the vehicle 42 Original action space Reduced action space ICE model no needed EM model no needed Drivetrain mechanics model no needed Vehicle dynamics model no no Battery pack model needed needed Future driving cycle profile no no Table 4.1: Models needed for the original and reduced action spaces. dynamics model (discussed in Section 2.5) is not needed by the agent. In summary, the reinforcement learning technique with the reduced action space is partially model-free. Table 4.1 summarizes the models needed by the agent for reinforcement learning technique with the original and reduced action spaces. 4.7 Complexity and Convergence Analysis The time complexity of the TD()-learning algorithm in a time step is O(jAj +M), wherejAj is the number of actions andM is the number of the most recent state-action pairs kept in memory. Generally,jAj andM are set to be less than 100. Therefore, the algorithm has negligible computation overhead when implementing in the state-of-the- art micro-controllers/processors. As for the convergence speed, normally, the TD()-learning algorithm can converge withinL time steps, whereL is approximately three to five times of the number of state- action pairs. The total number of states could be as large asjP dem jjVjjQj. However, some of the states do not have any physical meanings and will never be encountered by the system. And only 10% of the states are valid in the simulation. In summary, the TD()-learning algorithm can converge after two or three-hour driving, which is much shorter than the total lifespan of an HEV . To further speed up the convergence, the Q values can also be initialized by the manufacturers with optimized values. 43 4.8 Application-Specific Implementations The actual implementation of the TD()-learning algorithm for HEV energy manage- ment can be application-specific. For example, the range of the battery pack stored charge level in the state space for PHEVs (SoC-depletion mode) is different from that for ordinary HEVs (SoC-sustaining mode). In the former case, it is more desirable to use up the energy stored in the battery pack by the end of a trip since the battery can be recharged from the power grid. Also, the parameters (e.g.,, , and) used in the TD()-learning algorithm can be modified for different types of trips. For instances, the HEV controller can use different sets of parameters for urban trips from those for highway trips. Of course, the controller does not need the knowledge of detailed driving cycle profiles in prior. 4.9 Experimental Results We simulate the operation of an HEV based on Honda Insight Hybrid, the model of which is developed in ADVISOR [52]. Key parameters are summarized in Table 3.1. We compare our proposed optimal energy management policy derived by reinforcement learning with the rule-based energy management policy described in [23] using both real-world and testing driving cycles. A driving cycle is given as a vehicle speed versus time profile for a specific trip. The driving cycles may come from real measurements or from specialized generation for testing purposes. In this work, we use the real-world and testing driving cycles provided by different organizations and projects such as U.S. EPA (Environmental Protection Agency), E.U. MODEM (Modeling of Emissions and Fuel Consumption in Urban Areas project) and E.U. ARTEMIS (Assessment and Reliability of Transport Emission Models and Inventory Systems project). 44 We improve the battery pack model used in ADVISOR to take into account the rate capacity effect and the recovery effect. Specifically, the majority of literature on HEV power management adopts a simple battery pack model as follows [60]: q t =q ini t X k=0 I k T; (4.12) where q t is the amount of charge stored in the battery pack at the end of time step t, q ini is the amount of charge stored in the battery pack at the beginning of time step 0, I t is the discharging current of the battery pack at time step t (I t < 0 means bat- tery charging), and T is the length of a time step. However, this model ignores the rate capacity effect, which causes the most significant power loss when the battery pack charging/discharging current is high [61]. We know that the battery pack charg- ing/discharging current is high during deceleration and acceleration, and therefore the rate capacity effect should be considered carefully. The rate capacity effect specifies that if the battery pack is discharging (I > 0), the actual charge decreasing rate inside the battery pack is higher thanI; and if the battery pack is charging (I < 0), the actual charge increasing rate inside the battery pack is lower thanjIj. In addition, the battery mode (4.12) also ignores the recovery effect, which specifies that the battery pack can partially recover the charge loss in previous discharges if relaxation time is allowed in between discharges [61]. First, we test the fuel consumption of an ordinary HEV in the battery SoC-sustaining mode using the proposed energy management policy and the rule-based policy. The fuel consumption over some driving cycles is summarized in Table 4.2. We can observe that the proposed policy always results in lower fuel consumption and the maximum reduction in fuel consumption is as high as 54.9%. The last row in Table 4.2 shows that the proposed policy can reduce the fuel consumption by 28.8% on average. We also 45 Driving Cycle Proposed Policy Rule-based policy Reduction IM240 68.5 g 92.2 g 25.7% LA92 426.6 g 585.3 g 27.1% NEDC 229.4 g 319.8 g 28.3% NYCC 38.8 g 86.1 g 54.9% HWFET 223.7 g 364.0 g 38.5% MODEM 1 151.7 g 228.6 g 33.6% MODEM 2 246.5 g 344.9 g 38.5% MODEM 3 75.8 g 137.1 g 44.7% Artemis urban 128.9 g 220.5 g 41.5% Artemis rural 460.3 g 499.7 g 7.9% total 2050.2 g 2878.2 g 28.8% Table 4.2: Fuel consumption of an ordinary HEV using proposed and rule-based poli- cies. compare the overall fuel economy of the proposed policy and the rule-based policy over the 10 real-world and testing driving cycles in Table 4.2. The rule-based policy achieves an MPG value of 48 and the proposed policy achieves an MPG value of 67. Therefore, the proposed policy improves the fuel economy by 39% compared to the rule-based policy in the ordinary HEV . We plot the ICE operation points over a driving cycle on the ICE fuel efficiency map in Figure 4.2. The “x” points are from rule-based policy and the “o” points are from our proposed policy. We can observe that the operation points from the proposed policy are more concentrated on the high efficiency region of the ICE, validating the effectiveness of the proposed policy. We also test the fuel consumption of a PHEV in the battery SoC-depletion mode using the proposed energy management policy and the rule-based policy in Table 4.3. Again, the proposed policy always results in lower fuel consumption. The proposed policy can reduce the fuel consumption by 60.8% in maximum and 30.4% on average. The MPG value of the rule-based policy over the 10 driving cycles is 55 and the MPG value of the proposed policy over the 10 driving cycles is 78. Therefore, the proposed 46 Figure 4.2: The ICE operation points of an ordinary HEV from the proposed and rule- based policies. policy improves the fuel economy by 42% compared to the rule-based policy in the PHEV . In addition, comparing Table 4.2 with Table 4.3, we can observe that the PHEV usually has higher fuel economy than the ordinary HEV . We also plot the ICE operation points over a driving cycle on the ICE fuel efficiency map in Figure 4.3. We can observe that the operation points from the proposed policy are more concentrated on the high efficiency region of the ICE, again validating the effectiveness of the proposed policy. 47 Driving Cycle Proposed Policy Rule-based policy Reduction IM240 42.4 g 52.8 g 19.7% LA92 408.1 g 544.6 g 25.1% NEDC 214.1 g 270.2 g 20.8% NYCC 24.4 g 62.3 g 60.8% HWFET 193.1 g 323.0 g 40.2% MODEM 1 110.6 g 192.7 g 42.6% MODEM 2 191.0 g 318.0 g 39.9% MODEM 3 50.0 g 108.0 g 53.7% Artemis urban 100.3 g 200.8 g 50.0% Artemis rural 422.1 g 451.8 g 6.6% total 1756.1 g 2524.2 g 30.4% Table 4.3: Fuel consumption of a PHEV using proposed and rule-based policies. Figure 4.3: The ICE operation points of a PHEV from the proposed and rule-based policies. 48 Chapter 5 Battery SoH-Aware HEV Energy Management Batteries are the most widely adopted electrical energy storage devices due to their good reliability, high energy density, low self-discharge rate, etc [61]. Even though some work proposed to incorporate supercapacitors into the electrical energy storage systems in EVs/HEVs [14, 15, 16, 17], batteries are still the main electrical energy stor- age devices. In EVs/HEVs, batteries can provide energy to power EMs and also store electrical energy generated by EMs. The cycle life of batteries, especially for batteries in EVs/HEVs, is one of the most important performance metrics that should be consid- ered carefully in the application of batteries, due to the frequent charging/discharging and the substantial cost of batteries in EVs/HEVs. The state-of-health (SoH) of the battery pack is degrading with the operation of an HEV . The battery pack will reach its end-of-time when it loses 20% or 30% of its nominal capacity and the battery pack replacement brings about additional operational cost of an HEV . Therefore, when improving the fuel economy of an HEV , the HEV energy management strategy should reduce the battery SoH degradation rate. This work [66] investigates the energy management problem in hybrid electric vehi- cles (HEVs) focusing on the minimization of the operating cost of an HEV , including both fuel and battery replacement cost. More precisely, the work presents a nested learning framework in which both the optimal actions (which include the gear ratio selection and the use of internal combustion engine versus the electric motor to drive 49 the vehicle) and limits on the range of the state-of-charge of the battery are learned on the fly. The inner-loop learning process is the key to minimization of the fuel usage whereas the outer-loop learning process is critical to minimization of the amortized bat- tery replacement cost. Experimental results demonstrate a maximum of 48% operating cost reduction by the proposed HEV energy management policy. 5.1 Battery SoH Degradation Model The battery cycle life is directly related to the state-of-health (SoH), which is defined as the ratio of full charge capacity of an aged battery to its designed (nominal) capac- ity. The metric captures the general condition of batteries and their ability to store and deliver energy compared to their initial state (i.e., compared to a brand new battery). The battery SoH degradation strongly depends on the charging/discharging operation condi- tions of the battery. A battery SoH degradation model was proposed in [67], which relates the battery SoH degradation with the average SoC 1 and the SoC swing 2 during a battery charging/discharging cycle as shall be discussed in the following. First, we formally define the SoC and capacity fading of a battery. The SoC of a battery is defined as SoC = C batt C full 100%; (5.1) whereC batt is the amount of battery stored charge, andC full is the amount of battery stored charge when it is fully charged. SoC can be interpreted as the state of a battery. 1 Battery SoC (state-of-charge) is defined as the available charge stored in the battery, given by a percentage of the full charge capacity of the battery. 2 SoC swing is defined as the SoC change during a charging/discharging cycle. 50 And the value of C full gradually decreases as the battery ages (i.e., capacity fading.) The battery capacity fading, denoted byC fade , is defined as C fade = C nom full C full C nom full 100%; (5.2) whereC nom full is the nominal value ofC full for a brand new battery. The battery capacity fading results from long-term electrochemical reaction, which involves the carrier concentration loss and internal impedance growth. These effects strongly relate to the operating condition of the battery such as charging/discharging current, number of cycles, SoC swing, average SoC, and operation temperature [68]. Accurate electrochemistry-based models [69] have been developed for characterizing battery capacity fading. However, they are difficult for use in practice due to the com- plexity. On the other hand, mathematical models provide us an effective and efficient way to estimating the capacity fading (or SoH degradation in general.) Hence, we dis- cuss in the following the SoH degradation model of Li-ion batteries proposed in [67], which shows a good match with real data but can only be applied to charging and dis- charging cycles with the same SoC swing and the same average SoC. 5.1.1 SoH Degradation in Cycled Charging/Discharging Pattern The SoH degradation model in [67] estimates the capacity fading of a Li-ion battery for cycled charging/discharging based on real battery measurements, where a (charg- ing/discharging) cycle is defined as a charging process of the battery cell fromSoC low toSoC high and a subsequent discharging process fromSoC high toSoC low . 51 The battery capacity fading during one cycle depends on the average SoC level SoC avg and the SoC swingSoC swing . SoC avg andSoC swing in one cycle are calculated as SoC avg = (SoC low +SoC high )=2; (5.3) SoC swing = SoC high SoC low : The capacity fadingC fade;cycle during one charging/discharging cycle is then given by D 1 = K CO exp[(SoC swing 1) T ref K ex T B ] + 0:2 life (5.4) D 2 = D 1 exp[4K SoC (SoC avg 0:5)](1C fade ) C fade;cycle = D 2 exp[K T (T B T ref ) T ref T B ] where K CO , K ex , K SoC , and K T are battery specific parameters; T B and T ref are the battery temperature and reference battery temperature, respectively; is the duration of this charging/discharging cycle; life is the calendar life of the battery. Please note thatC fade;cycle is a function ofSoC avg andSoC swing . The total capacity fading afterM charging/discharging cycles is calculated by C fade = M m=1 C fade;cycle (m); (5.5) whereC fade;cycle (m) denotes the battery capacity fading in them-th cycle. From (5.5), the capacity fadingC fade will increase over the battery lifetime from 0 (brand new) to 100% (no capacity left.) Generally,C fade = 20% orC fade = 30%, which is equivalent to 80% or 70% remaining capacity, respectively, is used to indicate the end of life of battery. 52 Figure 5.1: Illustration of the cycle-decoupling method. 5.1.2 Cycle-Decoupling Method In reality, a battery may not follow a cycled charging/discharging pattern, and therefore the SoH degradation model proposed in [67] has quite limited applications in battery cycle life evaluation. To address this shortcoming, we proposed a cycle-decoupling method [70] to estimate capacity fading under arbitrary battery charing/discharging pat- terns (i.e., battery SoC profiles). The proposed cycle-decoupling method in [70] is based on the following observation: Observation I: Consider the SoC profile of a battery cell in Fig. 5.1 (a). We can perceive it as a combination of two charging/discharging cycles as shown in Fig. 5.1 (b). Based on the above observation, the cycle-decoupling method proceeds as fol- lows. Suppose we are going to estimate the battery capacity fading after certain charg- ing/discharging profile (SoC profile). First, a set of turning points t 1 , t 2 , :::, t n are identified from the battery SoC profile. At a turning point, the battery switches from charging to discharging or from discharging to charging. Fig. 5.2 shows an example of battery SoC profile versus time with the set of turning points. Second, any four con- secutive turning points (t i ;t i+1 ;t i+2 ;t i+3 ) can be classified into one of the following six cases: (a)SoC(t i )SoC(t i+2 )<SoC(t i+1 )SoC(t i+3 ), 53 Figure 5.2: Battery SoC profile versus time with the set of turning points. (b)SoC(t i+2 )<SoC(t i )<SoC(t i+1 )SoC(t i+3 ), (c)SoC(t i )SoC(t i+2 )<SoC(t i+3 )<SoC(t i+1 ), (d)SoC(t i+3 )SoC(t i+1 )<SoC(t i+2 )SoC(t i ), (e)SoC(t i+1 )<SoC(t i+3 )<SoC(t i+2 )SoC(t i ), (f)SoC(t i+3 )SoC(t i+1 )<SoC(t i )<SoC(t i+2 ). The six cases are shown in Fig. 5.3 (a)-(f). Third, a complete charging/discharging cycle as shown by the shadowed area in Fig. 5.3 (a)-(f) can be identified. If we take case (a) as an example. The average SoC and SoC swing of the identified charg- ing/discharging cycle are given by: SoC avg = SoC(t i+1 ) +SoC(t i+2 ) 2 ; (5.6) SoC swing = SoC(t i+1 )SoC(t i+2 ): The capacity fading in this identified cycle C fade;cycle (SoC avg ;SoC swing ) can be esti- mated based on the method in Section 5.1.1. The identified cycle should be deleted 54 Figure 5.3: Six basic cases for turning point classification and cycle identification. from the SoC profile after calculatingC fade;cycle and the total capacity fading is updated by C fade C fade +C fade;cycle : (5.7) In this way, charging/discharging cycles can be identified and C fade (the cumulative capacity fading also accounting for charging/discharging processes before this specific profile) is updated in chronological sequence based on the battery SoC profile. Fig. 5.4 provides an example of cycle identification. We have proved that charging/discharging cycles can be decoupled from arbitrary battery SoC profile using this procedure. 55 Figure 5.4: An example of cycle identification. 5.2 SoH-Aware Reinforcement Learning for HEV Energy Management The state-of-health (SoH) of the battery pack is degrading with the operation of an HEV . Reducing the SoH degradation rate of the battery pack is beneficial for an HEV , since it can prolong the lifetime of the battery pack and therefore decrease the operational cost of an HEV . In this section, we propose a reinforcement learning model for the HEV 56 energy management taking into account the degradation of battery SoH during the HEV operation. We formulate a nested reinforcement learning model, where the inner loop mainly determines the operation modes of the HEV power components (i.e., the ICE, the EM, and the battery pack) taking into account both fuel economy and battery SoH degradation, and the outer loop aims at modulating the battery SoH degradation from a global view. 5.2.1 Motivation The fuel cost is one apparent component of the HEV operational cost, and therefore a lot of previous work on HEV energy management are focusing on improving the fuel economy. On the other hand, the battery pack has limited cycle life and the battery pack will reach the end-of-life when it loses 20% or 30% of its nominal capacity [67]. It means that the battery pack needs to be replaced, which brings about additional opera- tional cost. In the literature work, when improving the fuel economy, an ideal battery model is adopted, which does not take into account the battery SoH degradation. This may result in shortened battery lifetime. For example, some rule-based control strategies may tend to use up the available battery energy to improve fuel economy in some very short trips, which may harm battery SoH more seriously. However, it is not necessary to use up the available battery energy in short trips, since the battery can obtain energy dur- ing regenerative braking. Simulation results demonstrate that the battery state-of-charge (SoC) swing of 5% is enough to improve fuel economy in short urban trips consider- ing the regenerative braking. Therefore, we propose a battery SoH-aware HEV energy management strategy using the reinforcement learning technique to modulate the SoH degradation while optimizing the fuel economy. 57 In the following, we will first discuss the inner-loop reinforcement learning model and then discuss how to integrate the outer-loop learning model with the inner loop to modulate the battery SoH degradation from a global point of view. 5.2.2 Inner-Loop Reinforcement Learning Model Based on the reinforcement learning background in Section 4.1, we need define the triplets i.e., state space, action space, and reward in order to build a reinforcement learn- ing model. In this section, we formulate the reinforcement learning model of the inner loop, which determines the operation modes of the HEV power components taking into account both fuel economy and battery SoH degradation. State Space of the Inner Loop We define the state space of the inner loop as a finite number of states, each represented by the power demand, vehicle speed, and battery pack stored charge levels: S = s = [p dem ;v;q] T jp dem 2P dem ;v2V;q2Q ; (5.8) wherep dem is the power demand for propelling the HEV ,v is the vehicle speed, andq is the battery pack stored charge. In the real implementation of the reinforcement learn- ing in the HEV energy management, the reinforcement learning agent can observe the current state by using sensors to measure the power demand level and the vehicle speed. However, the battery pack charge level cannot be observed from online measurement during HEV driving, since the battery pack terminal voltage changes with the charg- ing/discharging current and therefore it could not be an accurate indicator of the battery pack stored charge level [61]. To address this problem, a battery pack model together with the Coulomb counting method [65] is needed by the agent. 58 P dem ,V, andQ in (5.8) are respectively the finite sets of power demand levels, vehicle speed levels, and battery pack stored charge levels. Discretization is required when defining these finite sets. In particular,Q is defined by discretizing the range of the battery pack stored charge i.e., [q min ;q max ] into a finite number of charge levels: Q =fq 1 ;q 2 ;:::;q N g; (5.9) where q min q 1 < q 2 < ::: < q N q max . Generally, q min and q max are 40% and 80% of the battery pack nominal capacity, respectively, in the SOC-sustaining energy management for ordinary HEVs [24, 25, 26]. In our outer-loop reinforcement learning, we will optimizeq max value to modulate the battery SoH degradation. q min will not be changed to a lower value in the outer-loop to deal with emergent events during HEV operation. Action Space of the Inner Loop We define the action space of the inner loop as a finite number of actions, each repre- sented by the discharging current of the battery pack: A =fa = [i]ji2Ig: (5.10) where an actiona = [i] taken by the agent is to discharge the battery pack with a current value of i. The set I contains within it a finite number of current values in the range of [I max ;I max ]. Please note that i > 0 denotes discharging the battery pack; i < 0 denotes charging the battery pack; andi = 0 denotes idle. The inherent principle of using the battery discharge current as the action is that: with the selected action a = [i], we can derive the best-suited gear ratio analytically 59 when we have the knowledge of the HEV modeling. More precisely, we derive the best-suited gear ratio by solving the following fuel optimization (FO) problem: Given the values of the current state s = [p dem ;v;q] T and the current action a = [i], find the gear ratio R(k) to minimize the fuel consumption rate _ m f subject to (2.2)(2.6). Based on the current states = [p dem ;v;q] T and the current actiona = [i],! wh ,T wh , and the battery output powerP batt are calculated according to ! wh = v=r wh ; (5.11) T wh = p dem r wh =v; (5.12) P batt = V OC iR batt i 2 ; (5.13) whereV OC is the open-circuit voltage of the battery pack andR batt is the internal resis- tance of the battery pack. To solve the FO problem, for each of the possible R(k) values, we first calculate ! ICE and! EM using (2.5), next calculateT EM using (2.3) while satisfying (2.4), and then calculate T ICE using (2.6) while satisfying (2.2). With ! ICE and T ICE , the fuel consumption rate _ m f is obtained based on the ICE model. We choose theR(k) value that results in the minimum _ m f i.e., _ m opt f . Reward of the Inner Loop We define the rewardr that the agent receives after taking actiona while in states as the negative of the weighted sum of the fuel consumption and the battery SoH degradation in that time step i.e., _ m opt f Tw C fade , where T is the length of a time step, C fade is the battery capacity fading in that time step, andw is the weight of the battery SoH degradation. Remember from Section 4.1 that the agent in reinforcement 60 learning aims at maximizing the expected return i.e., the discounted sum of rewards. Therefore, by using the negative of the weighed sum of the fuel consumption and the battery SoH degradation in a time step as the reward, the fuel consumption and battery SoH degradation will be minimized while maximizing the expected return. In the real implementation of the reinforcement learning in the HEV energy manage- ment, the reinforcement learning agent should be able to observe the reward it receives after taking an action. In the above-mentioned reward, the _ m opt f T part can be obtained by measuring the fuel consumption. However, the C fade part cannot be obtained by online measurement. To address this problem, the reinforcement learning agent keeps a record of the battery charging/discharging profile i.e., i(t) and therefore the battery SoC profileSoC(t), which is achieved by the Coulomb counting method [65]. Based on these profiles, the reinforcement learning agent can calculate the total battery capac- ity fading i.e.,C fade after taking an action by using the battery SoH degradation model discussed in Section 5.1. And then C fade can be obtained by subtracting the battery capacity fading before taking the action i.e.,C 0 fade fromC fade . However, the computation complexity will be high if the accurate SoH degradation model is used. Instead, we propose an SoH degradation estimation method to derive the estimated C fade value. As discussed in Section 5.1, the battery capacity fading in one charing/discharging cycle is a function of the average SoC and SoC swing in that cycle, i.e., C fade;cycle (SoC avg ;SoC swing ). Then the estimated average SoC SoC avg and SoC swingSoC swing can be derived as following SoC high = max t SoC(t); (5.14) SoC low = min t SoC(t); (5.15) SoC avg = SoC high +SoC low 2 ; (5.16) SoC swing =SoC high SoC low : (5.17) 61 We can estimate the battery capacity fading in a cycle using C fade;cycle (SoC avg ;SoC swing ). The number of cycles that the system has experi- enced is estimated by N C = X t i(t) TI[i(t)< 0] C full SoC swing ; (5.18) where the indicator functionI[x]=1 when x is true. The total battery capacity fading after taking actiona is then estimated by C fade =C fade;cycle (SoC avg ;SoC swing )N C : (5.19) Therefore, the C fade value can be estimated as C fade =C fade C 0 fade ; (5.20) whereC 0 fade andC fade are the battery total capacity fading before and after taking action a. 5.2.3 TD()-Learning Algorithm for the Inner Loop We employ a specific type of reinforcement learning algorithm, namely the TD()- learning algorithm [64], due to its relatively higher convergence rate and higher per- formance in non-Markovian environment. In the TD()-learning algorithm, aQ value, denoted byQ(s;a), is associated with each state-action pair (s;a), which approximates the expected discounted cumulative reward of taking actiona at states. There are two basic steps in the TD()-learning algorithm: action selection and Q-value update. 62 Action Selection A straightforward approach for action selection is to always choose the action with the highest Q value. If we do so, however, we are at the risk of getting stuck in a sub- optimal solution. A judicious reinforcement learning agent should exploit the best action known so far to gain rewards while in the meantime explore all possible actions to find a potentially better choice. We address this exploration versus exploitation issue by breaking the learning procedure into two phases: In the exploration phase, -greedy- policy is adopted, i.e., the current best action is chosen only with probability of 1. In the exploitation phase, the action with the highestQ value is always chosen. Q-Value Update Suppose that actiona t is taken in states t at time stept, and rewardr t+1 and new state s t+1 are observed at time stept+1. Then at time stept+1, the TD()-learning algorithm updates theQ value for each state-action pair (s;a) as: Q(s;a) Q(s;a) +e(s;a); (5.21) where is a coefficient controlling the learning rate, e(s;a) is the eligibility of the state-action pair (s;a), and is calculated as r t+1 + max a 0 Q(s t+1 ;a 0 )Q(s t ;a t ): (5.22) In (5.22), is the discount rate. At time stept + 1, the eligibilitye(s;a) of each state-action pair is updated by e(s;a) 8 < : e(s;a) + 1; s =s t \a =a t ; e(s;a); otherwise, (5.23) 63 to reflect the degree to which the particular state-action pair has been chosen in the recent past, where is a constant between 0 and 1. In practice, we do not have to update Q values and eligibilitye of all state-action pairs. We only keep a list ofM most recent state-action pairs since the eligibility of all other state-action pairs is at most M , which is negligible whenM is large enough. Algorithm Description The pseudo code of the TD()-learning algorithm is summarized as follows. Algorithm 3 TD()-Learning Algorithm for the Inner Loop 1: InitializeQ(s;a) arbitrarily for all the state-action pairs. 2: for each time stept do 3: Choose actiona t for states t using the exploration-exploitation policy discussed in Section 5.2.3. 4: Take actiona t , observe rewardr t+1 and the next states t+1 . 5: r t+1 + max a 0Q(s t+1 ;a 0 )Q(s t ;a t ). 6: e(s t ;a t ) e(s t ;a t ) + 1. 7: for all state-action pair (s;a) do 8: Q(s;a) Q(s;a) +e(s;a). 9: e(s;a) e(s;a). 10: end for 11: end for 5.2.4 Outer-Loop Adaptive Learning The action (i.e., the battery charging/discharging current) taken by the inner-loop rein- forcement learning agent can directly affect the battery SoH degradation as has been discussed previously. In this outer-loop adaptive learning process, the HEV controller modulates the battery SoH degradation from a global point of view by tuning the max- imum SoC range. More specifically, when we define the state space in the inner loop, we have actually limited the battery SoC range by [ q min C full ; qmax C full ]. We know from Sec- tion 5.1 that the SoC range (from which SoC swing and average SoC can be derived) 64 strongly affect the SoH degradation rate. Therefore, in this outer loop the HEV con- troller tunes theq min value in different driving trip types such that the SoH degradation can be reduced. State Space of the Outer-Loop Adaptive Learning We define the state s in the outer loop using the trip characteristics including the trip length, average speed, and road condition (urban or highway). In the real implementa- tion in an HEV , the outer loop agent can obtain such trip information from driver inputs at the beginning of a trip. The battery usage condition (i.e., charging/discharging pro- file) should be different for different types of driving trip, and therefore we use the trip characteristics as the state. The state in the supervised machine learning technique is also called the feature. Action Space of the Outer-Loop Adaptive Learning The action taken by the outer loop agent is to choose aq min value. Therefore, the SoC range during a trip can be clamped between the selectedq min value and theq max . The action in the supervised machine learning technique is also called the target. C(s;a) Cost Function of the Outer-Loop Adaptive Learning If the system is in a states (i.e., a specific trip type) and an actiona (i.e., aq min value) is taken by the machine learning agent, the agent will obtain a cost value C, which is associated with each state-action pair by the cost function C(s;a). The machine learning agent aims at minimizing the cost value when choosing an action for a state. In order to both improve the fuel economy and reduce the SoH degradation during a trip, we use the weighted sum of the fuel consumption and the SoH degradation during that trip as the cost function. The machine learning agent should be able observe the cost 65 value when take an action in a state. In the implementation of the machine learning, the fuel consumption is obtained by real measurement whereas the battery SoH degradation during the trip is obtained using the method similar to that discussed in the Section 5.2.2. 5.2.5 Adaptive Learning Algorithm for the Outer Loop The machine learning agent can choose the optimal action for the current state based on its past experiences. When the system is in states, the machine learning agent chooses the action that results in the minimum cost value i.e., a min a 0 C(s;a 0 ): (5.24) After taking actiona, the agent obtains the new cost value and updates theC(s;a) value accordingly. However, the machine learning agent does not have the knowledge of theC(s;a) val- ues and therefore could not make decision on the action selection for a brand new HEV . To address this issue, the manufacturer can pre-set the C(s;a) values by performing driving tests on the same type of HEV for different state and action combination. This initialization of theC(s;a) values is called regulation in the machine learning technique. The time complexity of the adaptive learning algorithm, which is performed for each driving trip, isO(jAj), wherejAj is the number of actions in the action space of the outer-loop adaptive learning. Normally, we choose theq min value from a finite set consisting of up to ten allowableq min levels. Therefore, the adaptive learning algorithm has negligible computation overhead. In addition, the outer-loop adaptive learning does not rely on accurate HEV modeling and only the battery SoH estimation method are needed. 66 Table 5.1: PHEV key parameters. Vehicle Transmission ICE m = 1254 kg reg = 1:75 peak power 41kW C R = 0:009 reg = 0:98 peak eff. 34% C D = 0:335 gb = 0:98 EM A F = 2 m 2 R(k) = [13:5; 7:6; peak power 56kW r wh = 0:282 m 5:0; 3:8; 2:8] peak eff. 92% battery Capacity 25Ah V oltage 240V 5.3 Experimental Results We simulate the operation of a PHEV , the model of which is developed in the vehicle simulator ADVISOR [1]. The key parameters of the PHEV are summarized in Table 5.1. We test our proposed policy and compare with the reinforcement learning (RL) policy [63] and the rule-based policy [20]. We use both real-world and testing driving trip profiles, which are developed and provided by different organizations and projects such as U.S. EPA (Environmental Protection Agency) and E.U. MODEM (Modeling of Emissions and Fuel Consumption in Urban Areas project). Table 5.2 presents the simulation results of the operating cost of the PHEV dur- ing different driving trips when the proposed, the RL, and the rule-based policies are adopted. For example, as shown in Table 5.2, the proposed policy results in 0.0028% battery capacity fading and 344.17g fuel consumption in the MODEM5713 driving trip, which correspond to $0.76 amortized battery replacement cost and $0.37 fuel consump- tion cost, and the total operating cost is $1.13. When calculating the operating cost, we use the America average gasoline price of $3/gal and the total battery replacement cost of $8,000 for the PHEV . Generally, the battery replacement cost of a PHEV is in the range $10,000$12,000 [71] for battery pack with average capacity of 10kWh. We use the battery replacement cost of $8,000 for the 6kWh battery. We use the complete cycle-decoupling method [70] to evaluate the battery capacity fading during each trip. 67 From Table 5.2 we can observe that the proposed policy consistently achieves the low- est operating cost comparing with the RL and rule-based policies. The proposed policy achieves a maximum of 47% operating cost reduction comparing with the rule-based policy, and a maximum of 48% reduction comparing with the RL policy. Based on Table 5.2, we also have the following observations: (i) For a PHEV , the amortized battery replacement cost is a large portion of the total operating cost and is even higher than the fuel cost for some driving trips. (ii) The relative amortized battery replacement cost is more significant for shorter driving trip. (iii) Our proposed policy can prolong the battery life significantly besides reducing the operating cost. (iv) Although the RL policy can reduce the fuel consumption comparing with the rule-based policy, in some case the operating cost from the RL policy is even higher because the RL policy does not take into account the battery cost when optimizing the fuel consumption. (v) The amortized battery replacement cost is non-negligible when optimizing the total operating cost. Furthermore, we also simulate an HEV (without the plug-in feature) using the Honda Insight Hybrid model from ADVISOR. The battery pack replacement of an HEV is $2,000 [71]. Table 5.3 presents the operating cost of an HEV . We can observe that the proposed policy achieves the lowest operating cost comparing with the RL, and the rule-based policies. We also find that the amortized battery replacement cost is less significant for an HEV than for a PHEV . 68 Table 5.2: Operating cost of the PHEV in different trips using the proposed, RL, and rule-based policies. Trip Proposed RL Rule MODEM 5713 0.0028%($0.76) +344.17g($0.37) 0.0045%($1.22) +310.56g($0.33) 0.0044%($1.18) +383.30g($0.41) cost =($1.13) =($1.55) =($1.59) Hyzem motorway 0.0018%($0.50) +1991.9g($2.16) 0.0048%($1.28) +2001.9g($2.17) 0.0050%($1.36) +2093.6g($2.27) cost =($2.66) =($3.45) =($3.63) FTP75 0.0027%($0.73) +311.40g($0.33) 0.0043%($1.16) +295.97g($0.32) 0.0048%($1.30) +623.73g($0.67) cost =($1.06) =($1.48) =($1.97) US06 0.0028%($0.74) +414.17g($0.45) 0.0043%($1.17) +354.34g($0.38) 0.0036%($0.98) +321.02g($0.34) cost =($1.19) =($1.55) =($1.32) UDDS 0.0032%($0.85) +298.48g($0.32) 0.0044%($1.19) +355.85g($0.38) 0.0048%($1.30) +630.22g($0.68) cost =($1.17) =($1.57) =($1.98) OSCAR 0.0021%($0.57) +149.51g($0.16) 0.0043%($1.16) +222.75g($0.24) 0.0042%($1.12) +242.54g($0.26) cost =($0.73) =($1.40) =($1.38) Table 5.3: Operating cost of the HEV in different trips by the proposed, RL, and rule- based policies. Trip Proposed RL Rule LA92 0.0010%($0.07) +474.83g($0.51) 0.0039%($0.26) +460.03g($0.50) 0.0067%($0.44) +568.43g($0.61) cost =($0.58) =($0.76) =($1.05) Artemis urban 0.0015%($0.10) +110.34g($0.12) 0.0028%($0.18) +109.61g($0.11) 0.0051%($0.34) +209.20g($0.22) cost =($0.22) =($0.29) =($0.56) Modem1 0.0009%($0.06) +143.25g($0.15) 0.0029%($0.19) +138.33g($0.15) 0.0058%($0.39) +215.48g($0.23) cost =($0.21) =($0.34) =($0.62) Modem2 0.0012%($0.08) +221.62g($0.24) 0.0029%($0.19) +229.32g($0.24) 0.0056%($0.37) +330.26g($0.35) cost =($0.32) =($0.43) =($0.72) Modem3 0.0015%($0.10) +66.00g($0.07) 0.0026%($0.18) +58.30g($0.06) 0.0044%($0.29) +121.21g($0.13) cost =($0.17) =($0.24) =($0.42) 69 Chapter 6 Joint Control of the Powertrain and Auxiliary Systems in HEVs Autonomous driving has become a major goal of automobile manufacturers and an important driver for the vehicular technology. Hybrid electric vehicles (HEVs), which represent a trade-off between conventional internal combustion engine (ICE) vehicles and electric vehicles (EVs), have gained popularity due to their high fuel economy, low pollution, and excellent compatibility with the current fossil fuel dispensing and electric charging infrastructures. To facilitate autonomous driving, an autonomous HEV con- troller is needed for determining the power split between the powertrain components (including an ICE and an electric motor) while simultaneously managing the power consumption of auxiliary systems (e.g., air-conditioning and lighting systems) such that the overall electromobility is enhanced. Certain (partial) prior knowledge of the future driving profile is useful information for the automatic HEV control. In this work [72], we investigate a joint control framework of powertrain and aux- iliary systems in an HEV by means of reinforcement learning (RL). We minimize fuel cost induced both by propelling the vehicle and by the auxiliary systems since both are critical parts in the overall fuel consumption, and meanwhile maximize a total utility function (representing the degree of desirability) of the auxiliary systems. Unlike some previous approaches, the learning process does not require complete a priori informa- tion about driving profiles and uses only partial information about the HEV drivetrain modeling, i.e., it can be partially model-free. The learning process properly determines 70 the operating modes of the HEV components, such as battery discharging/charging power/current, gear ratio, operating power of auxiliary systems, etc., based on the proper definition of ”states”. We properly determine the reward of the RL agent such that the objective of the RL agent coincides with our goal of both minimizing the overall fuel consumption and maximizing the total utility function of the auxiliary systems. The TD()-learning algorithm [64] is employed as the RL algorithm due to its higher con- vergence rate and higher performance in non-Markovian environment. In order to further enhance the effectiveness of the RL framework, we incorporate prediction of future driving profile characteristics. The prediction results will serve as a part of the “state” classification in the main RL algorithm, and can enhance the perfor- mance of the RL agent because certain partial information of the future characteristics can be provided [64]. An exponential weighting function, although quite simple, can serve as a desirable prediction method of future driving profile characteristics, in order to strike a balance between effectiveness in prediction and additional complexity in the RL algorithm. Simulation results over real-world and testing driving cycles demonstrate the effectiveness of the proposed RL-based joint HEV control mechanism. 6.1 Auxiliary Systems The auxiliary system of HEV is comprised of lighting, air conditioning (or more gen- erally, heating, ventilation, and air conditioning or HV AC), and other battery-powered systems such as GPS. The auxiliary systems may account for 10% - 30% of the overall fuel consumption for an ordinary (fuel-based) vehicle. For HEVs and EVs, it is pro- jected that auxiliary systems will take a larger portion of the overall energy consump- tion partly because heating of an ordinary vehicle can be partially achieved by the heated internal combustion engine. Hence, the power consumption of auxiliary systems needs 71 to be jointly considered and optimized with the powertrain control for an HEV in order to achieve the global optimal solution. For example, if the battery stored charge is not enough or the battery output power is relatively large, it is desirable to limit the power consumption of auxiliary systems. The effect of auxiliary systems (or more specifically, the HV AC module) can be compensated later after the battery is charged by the ICE or when the battery output power is reduced. Let p aux denote the total operating power of auxiliary systems, which is a control variable of HEV controller (and partially for the driver.) We adopt a utility function f aux (p aux ) to represent the total satisfaction level when applying operating powerp aux to the auxiliary systems, which is widely adopted in modeling HV AC systems [73] 1 . The utility function is general in the sense that it demonstrates the combination of effects of multiple auxiliary systems such as lighting, HV AC, and other battery-powered systems. In general, the utility function is a uni-modal (quasi-concave) function since neither too high power consumption nor too low power consumption is desirable for the auxiliary system components (for example, too high power consumption for the HV AC means either too hot or too cold, and vice versa.) The utility functionf aux (p aux ) can be either inferred from driver behaviors (e.g., the target temperature set by the driver) or from past learning experiences, and may vary from time to time. The goal of HEV controller is to maximize the total utility function value over the whole driving profile. 1 Please note that this utility function is a simplified version of the actual utility function since the actual utility function of HV AC is not only a function of the instantaneous power consumption but also depends on the previous temperature. 72 6.2 Joint Control Framework of Powertrain and Auxil- iary Systems 6.2.1 Motivations The power consumption of auxiliary systems (including HV AC, lighting, etc.) may account for 10% - 30% of the overall fuel consumption of the HEV . The power man- agement results may be sub-optimal by neglecting this important portion of power con- sumption. In order to mitigate this shortcoming, we aim to develop a more effective joint control framework for HEV propulsion and auxiliary systems, to minimize the overall fuel consumption and maximize the overall objective function of auxiliary systems. So as to further enhance the effectiveness of the RL-based joint control framework, we incorporate prediction of future driving profile characteristics. The prediction results can serve as a part of the ”state” in the RL-based control algorithm, and can enhance the performance of the RL agent by providing partial information of the future characteris- tics of driving profiles. Details are described in the next subsection. 6.2.2 Prediction of Future Driving Profile Characteristics In this subsection, we describe the prediction method of future driving profile character- istics. As one know, the prediction cannot be highly accurate because of the following two reasons: (i) the prediction accuracy is inherently limited by the difficulty and ran- domness in driving profile prediction, and (ii) a more accurate prediction result (with higher precision levels) will significantly add computation complexity and reduce con- vergence rate of the RL algorithm, because the prediction results will add at least one 73 dimension to the state space of the RL algorithm. Hence, we need to achieve a desir- able tradeoff between the effectiveness in prediction and additional complexity in the RL algorithm. Another important observation is that although we could predict both the future velocity and future propulsion power demand (or acceleration), predicting the later is more desirable for the RL agent. This is because the propulsion power demand is more directly related to the action chosen (e.g., the battery discharging current, gear ratio, etc.) by the RL agent than the velocity. Based on the above-mentioned two observations, we adopt the exponential weight- ing function, which predicts the future data (propulsion power demand) based on the current measurement data as follows: pre i (1)pre i1 +meas i1 ; (6.1) where pre i is the i-th predicted future data (propulsion power demand), pre i1 is the (i 1)-th predicted data, meas i1 is the (i 1)-th measured data (propulsion power demand), and is the learning rate. Experiments show that the exponential weighting function, though quite simple, can serve as a desirable prediction method to strike a bal- ance between effectiveness in prediction and additional complexity in the RL algorithm. Other methods such as artificial neural network (ANN) can also be utilized for future driving profile prediction. Details are omitted due to space limitation. 74 6.2.3 Details of the Reinforcement Learning Process State Space We define the state space of the RL technique as a finite number of states, each repre- sented by the propulsion power demand, vehicle speed, battery pack stored charge level, and predicted driving profile characteristics, given by S = s = [p dem ;v;q;pre] T jp dem 2P dem ;v2V;q2Q;pre2Pre (6.2) wherep dem is the power demand for propelling the HEV ,v is the vehicle speed,q is the amount of charge stored in the battery pack, andpre is the predicted characteristics of future driving profiles. Different actions may be taken in different states. For instance, if the propulsion power demand level is negative, i.e., during vehicle braking, the action chosen by the agent (HEV controller) should be charging the battery by using the EM as a generator. On the other hand, if the propulsion power demand level is a very large positive value, the selected action should be discharging the battery to power the EM, which propels the vehicle and provides power for auxiliary systems in assistance with ICE. A RL agent is able to observe a state from online measurement. In the actual imple- mentation, the current propulsion power demand level p dem and vehicle speed v are obtained by sensors to measure the driver-controlled pedal motion, and future driv- ing profile characteristics are predicted using methods described above. However, the charge levelq cannot be obtained from online measurement of battery’s terminal volt- age, because the terminal voltage of battery pack changes with the charging/discharging current and therefore it is not an accurate indicator of q [61]. In order to observe the charge levelq, the Coulomb counting method [65] is required by the RL agent, which is typically realized using a dedicated circuit implementation [74]. 75 P dem ,V,Q, andPre in (5.2) are, respectively, the finite sets of propulsion power demand levels, vehicle speed levels, levels of charge stored in the battery pack, and predicted driving profile characteristics. Discretization is required when defining these four finite sets. In particular,Q is constructed by discretizing the range of charge stored in the battery pack, i.e., [q min ;q max ], into a finite number of charge levels: Q =fq 1 ;q 2 ;:::;q N g (6.3) whereq min =q 1 <q 2 <:::<q N =q max . Generally,q min andq max are 40% and 80% of the nominal capacity of battery pack, respectively, for an ordinary HEV (charge- sustaining mode). Action Space and Reduced Action Space We define the action space of the RL framework as a finite number of actions, each represented by the discharging current of battery pack, the gear ratio, and operating power of auxiliary systems, i.e., A = a = [i;R(k);p aux ] T ji2I;R(k)2R;p aux 2P aux (6.4) where an actiona = [i;R(k);p aux ] T chosen by the RL agent denotes to discharge the battery using currenti, choose thek-th gear ratio, and apply operating powerp aux for the auxiliary systems. The set I in (6.4) contains within it a finite (discretized) number of discharging current values in the range of [I max ;I max ].i> 0 denotes discharging the battery pack, andi< 0 denotes charging the battery pack. The setR contains all allowable gear ratio values, which depend on the powertrain design. Usually, there are four or five gear ratio 76 values in total [75]. Finally,P aux represents a finite (discretized) set of operating power levels of auxiliary systems. Alternatively, we define a reduced action spaceA re , in which an actiona re = [i] T only accounts for the discharging/charging current of the battery. Using this reduced action space, the gear ratio R(k) and auxiliary systems operating power p aux can be selected by solving an optimization problem such that the instantaneous reward func- tion (as shall be discussed later) can be maximized. Since the computation complexity and convergence speed of RL algorithms are proportional to the number of state-action pairs [57], the reduced action spaceA re significantly reduces the computation com- plexity and increase convergence speed of the RL algorithm. Another advantage of the reduced action space is that discretization ofp aux in the original action space is no longer required, which in turn enhances the control precision and performance. Of course, there is a side effect that the reduced action space relies on (partial) HEV component modeling when solving the optimization problem. However, due to the significant advantages, we would suggest to use the reduced action spaceA re for reduced computation complexity and increased convergence rate, and make the RL agent partially model-free. Reward Function The objective of the RL-based joint control mechanism is to minimize the total fuel cost, induced by both propelling the vehicle and auxiliary systems, and to maximize the overall utility function value of the auxiliary systems over the whole driving profile. Therefore, we define the rewardr that the agent receives after taking actiona in state s as the negative of the fuel consumption plus the utility function value of the auxiliary systems at that time step, i.e., ( _ m f +wf aux (p aux )) T , where T is the length of a time step, _ m f is the fuel consumption in that time step, andw is a weighting factor determining the relative importance of fuel consumption and the auxiliary system utility 77 function. The RL agent targets at maximizing the expected return, which is a discounted sum of rewards. Hence, by using the above-mentioned reward function, the overall fuel consumption will be minimized and the overall utility function value will be maximized (of course they are connected through the weighting factor w) while maximizing the expected return. In an actual reinforcement learning implementation, the RL agent (HEV controller) should be aware of the reward it receives after taking an action, since the observation of reward is critical in deriving the optimal policy. In the above-mentioned reward defini- tion, _ m f can be obtained by measuring the fuel consumption directly, and utility function f aux (p aux ) can be either inferred from driver behaviors (e.g., the target temperature set by the driver) or from past learning experience. TD()-Learning Algorithm for Joint HEV Control We employ the TD()-learning algorithm [64] to derive the optimal policy for the joint control of powertrain and auxiliary systems, because of (i) its relatively higher con- vergence rate and (ii) higher performance in non-Markovian environment. In TD()- learning, aQ value, denoted byQ(s;a), is associated with each state-action pair (s;a), where a states is represented by the propulsion power demandp dem , the vehicle speed v, the battery charge q, and predicted future driving profile characteristics pre, and an action a can be either a complete action or a reduced action as described before. TheQ(s;a) value approximates the expected (discounted) cumulative reward of taking actiona in states. Details of the TD() algorithm is summarized as follows. In the TD()-learning procedure, theQ values are initialized arbitrarily in the begin- ning of execution. At each time stept, the agent selects an actiona t for current states t based on theQ(s;a) values. To avoid the risk of getting stuck at a sub-optimal solution, the exploration versus exploitation policy [64] is employed for the action selection, i.e., 78 the agent does not always select the actiona with the maximumQ(s t ;a) value for cur- rent states t . Instead, the current best action is chosen only with probability of 1, and the other actions are chosen with equal probability. Suppose that the chosen action is a t at time step t, the learning agent observes a new states t+1 and receives rewardr t+1 at time stept + 1. Then based on the observed s t+1 and r t+1 , the agent updates Q values for each state-action pair (s;a), in which the eligibilitye(s;a) of each state-action pair (s;a) is updated and effectively utilized duringQ value updating. The eligibilitye(s;a) of a state-action pair (s;a) reflects the degree to which the particular state-action pair has been chosen in the recent past, where is a constant between 0 and 1. Due to the usage of the eligibility of state-action pairs, in practice we do not need to updateQ values and eligibilitye of all state-action pairs. We only keep a list ofM most recent state-action pairs since the eligibility of all other state-action pairs is at most M , which is negligible when for a large enoughM. Algorithm 4 TD()-Learning Algorithm 1: InitializeQ(s;a) arbitrarily for all the state-action pairs. 2: for each time stept do 3: Choose actiona t for states t using the exploration-exploitation policy. 4: Take actiona t , observe rewardr t+1 and next states t+1 . 5: r t+1 + max a 0Q(s t+1 ;a 0 )Q(s t ;a t ). 6: e(s t ;a t ) e(s t ;a t ) + 1. 7: for all state-action pair (s;a) do 8: Q(s;a) Q(s;a) +e(s;a). 9: e(s;a) e(s;a). 10: end for 11: end for 6.3 Experimental Results We simulate the operation of an HEV , the model of which is developed in the vehicle simulator ADVISOR [1]. The key parameters of the HEV are summarized in Table 79 Table 6.1: HEV key parameters. Vehicle Transmission ICE m = 1254 kg reg = 1:75 peak power 41kW C R = 0:009 reg = 0:98 peak eff. 34% C D = 0:335 gb = 0:98 EM A F = 2 m 2 R(k) = [13:5; 7:6; peak power 56kW r wh = 0:282 m 5:0; 3:8; 2:8] peak eff. 92% battery Capacity 25Ah V oltage 240V Figure 6.1: Normalized fuel consumption of RL-based HEV control frameworks with and without prediction. 6.1. We test our joint HEV control framework and compare with the reinforcement learning (RL) policy [63] and the rule-based policy [20]. We use both real-world and testing driving trip profiles, which are developed and provided by different organizations and projects such as U.S. EPA (Environmental Protection Agency) and E.U. MODEM (Modeling of Emissions and Fuel Consumption in Urban Areas project). One improvement of this work over [63] is that we introduce prediction of future driving profile characteristics. First, we measure the fuel economy improvement due to the prediction only. Figure 6.1 shows the normalized fuel consumption for three driving profiles (i.e., OSCAR, UDDS, and MODEM) under HEV control frameworks with and without the prediction. The fuel economy improvement due to prediction only can be as high as 12%. 80 Table 6.2: Reward function values from the proposed joint control framework and the rule-based policy. Proposed Rule-based OSCAR -275.76 -337.50 UDDS -754.85 -849.25 SC03 -284.14 -319.66 HWFET -741.12 -861.68 Figure 6.2: The MPG values achieved by the proposed joint control framework and the rule-based policy. Furthermore, we compare the proposed joint control framework with the rule-based policy [20]. We assume the most desirable power consumption of the auxiliary systems is 600W and more or less power consumption from the auxiliary systems will reduce the value of the utility functionf aux (p aux ). Table 6.2 shows the accumulation of the reward function ( _ m f +wf aux (p aux )) T over whole driving profiles. Please note that _ m f is a negative value and also the reward function value is negative. We can observe the proposed control framework always achieves higher reward function values than the rule-based policy. To compare the fuel economy of the proposed and rule-based policy, Figure 6.2 shows the corresponding MPG values from the two policies for different driving profiles. The proposed framework achieves up to 29% MPG improvement. 81 Chapter 7 Photovoltaic System Reconfiguration for Electric Vehicles Thanks to the abundance and easy access of solar energy, photovoltaic (PV) cells provide us a clean and quiet form of electrical energy generation. Moreover, PV cells can be an ideal power source for EVs due to their stability and controllability [76, 55]. The highest solar irradiance during a day is around 1000 W/m 2 , the energy conversion efficiency of PV cells is about 30 %, and therefore a PV module with 1 m 2 area can generate a peak power of 300 W. The total horizontal panel area including the rooftop, hood and trunk of a typical passenger vehicle is around 4 - 5 m 2 . The electric motor power rating of a modern EV with similar or even higher driving performance than the conventional vehicles is commonly over 100 kW [77]. A passenger vehicle needs high horsepower when accelerating and hill climbing, whereas the moderate horsepower is needed for cruising (e.g., less than 10 kW during city driving.) Therefore, although it may be not practical to realize a fully PV-powered EV with similar driving performance as a conventional vehicle, an EV with onboard PV electrical energy generation system (PV system) is still beneficial since PV cells can charge the EV battery pack when the EV is running and parking to mitigate the power demand from the grid [55]. To increase the PV electrical power generation capability of a partially PV-powered EV , we should enlarge the onboard PV cell modules by using all possible vehicle surface areas including the rooftop, hood, trunk and door panels. Even though PV cell modules may be mounted on different vehicle surface areas, the string charger architecture [78], 82 where PV cell modules are connected in series and a single power converter is employed to control the operating points of the all the PV modules simultaneously, is a practical choice for the onboard PV system taking into account cost considerations and the high voltage of the EV battery pack. Comparing to the micro charger architecture, where each PV module has its own power converter for operating point setting, the string charger architecture results in lower cost (since only one power converter is used) and higher overall efficiency (due to the fact that a high output voltage can be achieved by PV modules connected in series to match the battery pack voltage, thereby increasing the efficiency of the power converter) [78]. However, the solar irradiance levels on PV cell modules may be different from each other due to different solar incidence angles. For example, the solar incidence angle on the rooftop or hood PV module is smaller than that on the door PV module at noon, and therefore the solar irradiance level on the rooftop or hood PV module is larger. In addition, the solar irradiance profiles on the driver-side door panel and the passenger- side door panel are virtually opposite to each other determined by the vehicle direction and time of the day. Under the non-uniform distribution of solar irradiance, the output current and thereby the output power of a PV system with the string charger architecture is limited by the PV module with the lowest solar irradiance level [79]. This work [80] aims at maximizing the output power of a vehicular PV system with the string charger architecture taking into account the non-uniform distribution of solar irradiance levels on different vehicle surface areas. This work is based on the dynamic PV array reconfiguration architecture from previous work [79, 81] with the accommoda- tion of the rapidly changing solar irradiance in the onboard scenario. Most importantly, this work differs from previous dynamic PV array reconfiguration work [79, 81] in that (i) first, we propose an event-driven PV array reconfiguration framework replacing the periodic reconfiguration framework in previous work [79, 81] to reduce the computation 83 and energy overhead of the PV array reconfiguration; (ii) second, we provide a sensor- less (and also event-driven) PV array reconfiguration framework, which further reduces the cost of a vehicular PV system, by proposing a solar irradiance estimation algorithm for estimating the instantaneous solar irradiance level on each PV cell module. The solar irradiance estimation algorithm is supported by the dynamic PV array reconfiguration architecture by only activating PV cells in one PV module while bypassing all oth- ers. We implement a high-speed, high-voltage PV array reconfiguration switch network with IGBTs (insulated-gate bipolar transistors) and the CAN (controller area network.) Furthermore, we implement a solar irradiance sensor network for acquiring benchmark solar irradiance profiles on vehicle panels to evaluate our proposed PV array reconfig- uration frameworks. Experimental results demonstrate that the event-driven framework achieves up to 2.85X performance enhancement and the sensorless framework achieves up to 2.77X performance enhancement compared with the baseline. 7.1 Onboard PV System Fig. 7.1 shows the system diagram of an onboard PV system. PV cell modules mounted on the rooftop, hood, trunk and door panels constitute the whole PV array. Depending on the available area on each vehicle panel, PV modules may consist of different numbers of PV cells. As shown in Fig. 7.1, a charger (i.e., power converter) connects the whole PV array with the EV battery pack for regulating the operating point of the PV array. This is the string charger architecture [78], where PV modules in series share a single power converter. The power modeling of a charger was proposed in [82]. Generally speaking, the energy efficiency of a power converter is high when its input and output voltages are close to each other. Therefore, due to the high voltage of the vehicle battery 84 Figure 7.1: System diagram of a PV system on the electric vehicle. pack [31], the string charger architecture has the potential for achieving high overall system efficiency. 7.1.1 PV Cell Modeling The whole PV array consists of multiple identical PV cells. We use the method in [83] to extract PV cell modeling. Fig. 7.2 shows the current-voltage (I-V) and power-voltage (P-V) output characteristics of a PV cell under different solar irradiance levels, where G STC = 1000 W/m 2 stands for the solar irradiance level under standard test condition. On the PV cell I-V output curves, the solid black dots represent the maximum power points (MPPs) of a PV cell, which correspond to the peak power points on the P-V output curves. The maximum output power of a PV cell increases as solar irradiance increases. A charger (or power converter) can regulate the operating point of the PV array by controlling the output current of the charger. Generally, the maximum power 85 Figure 7.2: I-V and P-V output characteristics of a PV cell. point tracking (MPPT) and maximum power transfer tracking (MPTT) techniques are employed in the charger controller to track maximum output power under changing solar irradiance. 7.1.2 PV Reconfiguration Structure A conventional PV array has a fixednm configuration, wheren PV cells are series- connected and m PV cells are parallel-connected. When the PV array receives uni- form solar irradiance, the PV cells can be set to operate at their MPPs simultaneously and therefore the PV array achieves the maximum output power. However, in reality, especially for an onboard PV system, PV modules mounted on different vehicle panels receive different solar irradiance levels, which also keep changing during driving. This non-uniform distribution of solar irradiance on a PV array results in significant output power degradation due to the fact that PV cells cannot operate at their MPPs simultane- ously. The dynamic PV array reconfiguration method [79, 81] was proposed to address the output power degradation problem under the non-uniform distribution of solar irradi- ance. The PV reconfiguration method has the potential to make PV cells operate at their MPPs simultaneously even under non-uniform solar irradiance. Fig. 7.3 presents the 86 Figure 7.3: PV array reconfiguration structure. structure of a reconfigurable PV array [81] with a total number ofN PV cells. Please note that in an onboard PV system, PV cells in the reconfigurable PV array come from all PV modules mounted on the rooftop, hood, trunk, and door panels. Fig. 7.3 (b) represents the electric connection of PV cells instead of their physical locations. As shown in Fig. 7.3 (a), each i-th PV cell is integrated with three solid-state switches: a top parallel switch S pT;i , a bottom parallel switch S pB;i , and a series switch S s;i . The reconfigurable PV array can change its configuration by controlling the ON/OFF states of the switches. The two parallel switches of a PV cell are always in the same state, and the series switch of a PV cell is in the opposite state of its parallel switches. The parallel switches connect PV cells in parallel forming PV cell groups, whereas the series switches connect PV cell groups in series forming a PV array config- uration. Now we provide the formal definition of the configuration of a reconfigurable PV array. Consider a reconfigurable PV array withN PV cells, it can have an arbitrary number (less than or equal toN) of PV cell groups. The number of parallel-connected PV cells in thej-th PV cell group (i.e.,r j > 0) should satisfy: g j=1 r j =N; (7.1) where g is the number of PV cell groups. We define the configuration as C(g;r 1 ;r 2 ;:::;r g ). The configuration can be viewed as a partitioning of the PV cell 87 index setA =f1; 2;:::;Ng. The partitioning is denoted by subsetsB 1 ,B 2 , ..., andB g ofA, which correspond to theg PV cell groups comprised ofr 1 ,r 2 , ..., andr g PV cells, respectively. The subsets satisfy [ g j=1 B j =A; (7.2) and B j \B k =;;8j;k2f1; 2;:::;gg;j6=k: (7.3) Due to the structure characteristics of the reconfigurable PV array, we also havei 1 <i 2 for8i 1 2 B j and8i 2 2 B k satisfying 1 j < k g. A partitioning satisfying the above properties is called an alphabetical partitioning. 7.1.3 PV Array Reconfiguration Algorithm A polynomial-time PV array reconfiguration algorithm was proposed in [81], which finds the optimal PV array configuration given the solar irradiance levels on all the PV cells to maximize the PV system output power. The PV array reconfiguration algorithm is comprised of an outer loop to find the optimal number of PV cell groups in the PV array and a kernel algorithm to determine the optimal configuration based on the optimal number of PV cell groups given by the outer loop. This reconfiguration algorithm (i) should be executed frequently to keep the PV array configuration updated under chang- ing solar irradiance; and (ii) needs the knowledge of solar irradiance levels on all the PV cells as the input, or at least solar irradiance levels on the five panels, i.e., rooftop, hood, trunk, right door and left door panels, for an onboard PV system if we assume the solar irradiance is uniform on each vehicle panel. It is straightforward to acquire such solar irradiance information via sensors if five sensors are attached to the five panels, respectively. However, in the sensorless case in which the system is not equipped with 88 solar sensors, we need additional steps to estimate the solar irradiance levels on the five panels. This reconfiguration algorithm will serve as a basis for our event-driven and sensorless PV array reconfiguration frameworks in the present work. 7.2 Overview of the Two Proposed PV Reconfiguration Frameworks The most straightforward reconfiguration method, i.e., periodic reconfiguration, was used in [81] in order to keep the configuration of the PV array updated periodically under rapidly changing solar irradiance. The reconfiguration period is a critical design parameter. A large reconfiguration period may not be able to capture the fast change in solar irradiance levels, whereas a small reconfiguration period will induce high timing and energy overheads and may eventually degrade the PV system performance. The optimal reconfiguration period may be quite different for various driving scenarios (i.e., city driving vs freeway driving.) Therefore, in the present work we propose an event- driven PV array reconfiguration framework, in which the PV array reconfiguration algo- rithm is triggered only if there is noticeable change in the solar irradiance levels. In this way, timing and energy overheads of PV reconfiguration can be largely reduced. By a thorough examination of the reconfigurable PV array structure, we found that it has the potential for achieving more flexible PV array configurations, in which part of the PV cells are active and the rest of the PV cells can be bypassed. Based on this obser- vation, we provide a solar irradiance estimation algorithm to estimate the instantaneous solar irradiance level on each PV cell module. Therefore, we go one step further to propose the sensorless PV array reconfiguration framework that reduces the capital cost of an onboard PV system due to the sensor node network. The sensorless framework is also event-driven to decrease the timing and energy overheads. 89 Next we will first describe the PV reconfiguration hardware design, including the IGBT (insulated-gate bipolar transistor)-based reconfiguration switch network, the solar irradiance sensor network to acquire instantaneous solar irradiance levels and bench- mark solar irradiance profiles, as well as a thorough overhead analysis of various com- ponents in the reconfiguration system. Finally we will describe the event-driven (and sensor-based) reconfiguration framework and the sensorless (and event-driven) recon- figuration framework, respectively. 7.3 PV Reconfiguration Hardware Design 7.3.1 IGBT-Based Reconfiguration Switch Network In the onboard scenario, the rapid change of vehicle driving direction results in fast changing of solar irradiance levels on PV modules, which demands fast PV array recon- figuration within a few milliseconds. In addition, high-voltage or high-current gate control is required for switches in the reconfigurable vehicular PV array. Therefore, we implement an IGBT (insulated-gate bipolar transistor)-based reconfiguration switch network to meet the above mentioned requirements. We carefully select commercial IGBTs and gate drivers for switches in the reconfig- urable PV array. The selected IGBT IXXK200N65B4 can handle voltage and current ratings of 650 V and 370 A, respectively, which are sufficient ratings for vehicular PV arrays. We select gate driver MC33153 that has a small propagation delay of few hun- dreds nanoseconds to control the IGBTs. The photo-coupler isolation is used between the high-voltage IGBT side and the controller logic side to prevent damage due to power surge. The stability of the IGBT and gate driver selections has been verified using square-wave input voltage on the gate driver. 90 We implement a communication system using the controller area network (CAN). The CAN employs a bus structure for the integration with sensor nodes (if sensors are incorporated). We carefully select ADM3053 as the isolated CAN physical layer transceiver with LM3S2965 as the control processor, which supports hardware layers of CAN communications. 1 Mbps communication speed in transmission makes the trans- mission delay below 1 ms. 7.3.2 Solar Irradiance Sensor Network In order to (i) acquire instantaneous solar irradiance levels on PV modules in the event- driven (with sensors) PV array reconfiguration framework and (ii) acquire benchmark solar irradiance profiles on vehicle panels for evaluating the proposed PV reconfigura- tion frameworks, we build a Zigbee-based solar irradiance sensor node network and a logger program. Zigbee is a wireless network protocol to create personal networks, which is com- monly used for low power and low data-rate applications. We use dual AAA-size bat- teries to supply power for each sensor node without DC-DC converter. Each sensor node is integrated with a Zigbee transceiver module, which automatically reads value from the sensor with its internal ADC and sends it to a receiving node every 50 ms with 250 kbit/s data transmission speed. A specially designed logger program collects sensor data from the receiving node with vehicle speed and location information from GPS. We install magnets to each corner of a sensor node to stick the sensor node to vehicle surface easily and firmly. We attach five sensor nodes at the rooftop, hood, trunk, left side and right side of a vehicle to measure benchmark solar irradiance profiles G roof , G hood ,G trunk ,G left , andG right , respectively. 91 7.3.3 Overhead Analysis To justify the proposed PV array reconfiguration frameworks, we need a thorough anal- ysis of both timing overhead and energy overhead of various components/processes in the reconfiguration system: 1. Sensing: With the above mentioned sensor network, each sensor node senses and converts the solar irradiance data every 50 ms, which is the sensing period. Based on the ADC setup, the sensing delayT sense is less than 10s. 2. Transmission: The transmission delay T trans of the sensor network is no more than 1 ms using CAN transmission protocol. 3. Computation: For a moderate-scale PV array with 60 PV cells, it only takes 3 - 4 ms to calculate the optimal configuration on a 3.0 GHz desktop computer. The computation delayT com should take less than 10 ms on a typical ARM-based embedded processor (as the reconfiguration controller) [84]. 4. Reconfiguration: Our experiments show that the gate drivers and IGBTs can reconfigure within 10 s with only a little distortion of waveform. Therefore, 1 ms should be a safe estimation of the reconfiguration delayT recon . 5. MPPT or MPTT control: The MPPT or MPTT technique used for tracking max- imum output power of the PV array or the PV system has a control delayT control less than 2.5 ms if the perturb & observe-based control is employed. As for the energy overhead, the vehicular PV system has zero output power dur- ing reconfiguration (i.e., changing the ON/OFF states of switches) and has sub-optimal output power during MPPT or MPTT control. 92 7.4 Event-Driven PV Array Reconfiguration In this section, we propose the event-driven PV array reconfiguration framework to over- come the difficulty in determining the optimal reconfiguration period. Inspired by the event-driven power modeling and power management techniques [85, 86], the PV array reconfiguration is triggered only by the event to avoid unnecessary timing and energy overheads involved in the reconfiguration. For example, if the solar irradiance is stable and the driving direction does not change, there is no reconfiguration performed in our event-driven framework (and there is also no reconfiguration required). Once the PV array reconfiguration is triggered, the reconfiguration controller uses the latest sensed solar irradiance levels on the five panels to calculate the optimal configuration (using the PV array reconfiguration algorithm discussed in Section 7.1.3), and subsequently perform reconfiguration and MPPT/MPTT control. The PV array should reconfigure with the solar irradiance changes, and therefore we use the change of the solar irradiance sensor readout as the event, i.e., only if the readout of any sensor has a change larger than G, the PV array reconfiguration will be triggered. Upon triggered by an event, the PV reconfiguration will take about 13.5 ms, i.e., the summation of the computation delay, reconfiguration delay, and MPPT or MPTT control delay as listed in Section 7.3.3, which is the timing overhead. The PV system will have zero output power during reconfiguration and MPPT/MPTT control (please note that this is a conservative estimation). We use the adaptive learning method to derive the optimal G value in an online manner [87]. We maintain multiple candidate G values, and choose one with the currently highest performance. After a period of time, we evaluate all the candidate G values and update their performance using an exponential weighting function [88]. Then, the candidate value with the highest performance is chosen as the current G. 93 Please note that to avoid over-frequent reconfiguration, we limit the maximum reconfig- uration frequency by 10 Hz (1=100 ms). Please refer to Algorithm 5 for details. Algorithm 5 Adaptive Learning-Based Event-Driven PV Array Reconfiguration 1: This algorithm is performed at a learning period of lengthD. Please note that this learning period is in the order of minutes or even hours and is much longer than typical time intervals between reconfigurations (in the order of 100ms to seconds.) 2: G the optimal one from the candidates with the highest performance. 3: MaintainG roof (t pre ),G hood (t pre ),G trunk (t pre ),G left (t pre ), andG right (t pre ) values. 4: while in this learning period do 5: Monitoring solar irradiance levels on the five vehicle panels: G roof (t),G hood (t), G trunk (t),G left (t), andG right (t). 6: G roof =jG roof (t)G roof (t pre )j; 7: G hood =jG hood (t)G hood (t pre )j; G trunk =:::; ... 8: ifmaxfG roof ; G hood ; G trunk ; G left ; G right g> G then 9: Execute the PV array reconfiguration algorithm in [81] and perform reconfig- uration. 10: G roof (t pre ) G roof (t);G hood (t pre ) G hood (t); 11: G trunk (t pre ) G trunk (t); ... 12: end if 13: end while 14: Evaluate all the candidate G values and update their performance using an expo- nential weighting function. 7.5 Sensorless PV Array Reconfiguration In this section, we propose the sensorless PV array reconfiguration framework to further reduce the capital cost of an onboard PV system due to the sensor node network (please note that the elimination of solar sensors will help make the vehicle surfaces smooth.) Due to the absence of solar sensors, we propose a solar irradiance estimation algorithm to estimate the instantaneous solar irradiance level on each PV module. According to the PV cell characteristics, the MPP power of a PV cell (or PV module) is proportional to the solar irradiance level on it, and therefore we can infer the solar irradiance level on a PV cell (or PV module) from its measured MPP power. Based on the structural 94 characteristics of the reconfigurable PV array, we can form annm configuration with PV cells from one PV module mounted on one vehicle panel, whereas the rest of the PV cells can be bypassed. Then we perform MPPT control on thenm configuration to measure the MPP power thereby inferring the solar irradiance level on this PV module. Fig. 7.4 shows how to achieve a 24 configuration out of the 16-cell reconfigurable PV array. In this 2 4 configuration, only PV cells 3 - 10 are active and the rest of the PV cells are bypassed. Suppose PV cells 3 - 10 belong to the PV module mounted on the hood panel, and we assume the solar irradiance is uniform on this panel. If we measure the MPP power of these PV cells using MPPT control, the solar irradiance level on these PV cells can be inferred. Since there are only five PV modules (each of which is mounted on a vehicle panel) in the vehicular PV system, we can estimate the solar irradiance levels on these PV modules after five times of reconfiguration and MPPT control. Therefore, the timing overhead of one reconfiguration in the sensorless PV reconfiguration framework should be calculated as 5 (T recon +T control ) +T com +T recon +T control ; (7.4) which is 31 ms according to the values of delay components listed in Section 7.3.3. The PV system will have zero output power during the whole 31 ms (please note that this is a conservative estimation). In the sensorless PV array reconfiguration framework, the timing and energy over- heads are larger due to the solar irradiance estimation procedure. Therefore, it is ben- eficial to make the sensorless framework also event-driven in order to limit the total number of reconfigurations and thereby total timing and energy overheads. Different from Section 7.4 that uses the change of sensor readout as the event to trigger PV recon- figuration, the sensorless PV array reconfiguration framework uses the change of system 95 Figure 7.4: An illustration of the solar irradiance estimation algorithm. output power (i.e., the output current of the charger) as the event because only the PV system output power is available to be measured at runtime in the sensorless setup. The adaptive learning method is also employed to derive the optimal P value in an online manner. Details are illustrated in Algorithm 6. 7.6 Experimental Results In this section, we justify the two proposed frameworks using the measured solar irradi- ance traces from the Zigbee-based solar irradiance sensor node network and the logger program. We drive a vehicle along six paths to collect Traces 1 - 6: Incheon airport, Ontario to Riverside, west Los Angeles to Indio, west Los Angeles to Carson, west Los Angeles to Riverside, and Riverside. Each trace consists of five solar irradiance profiles on the rooftop, hood, trunk, left door and right door, respectively. 96 Algorithm 6 Adaptive Learning-Based Sensorless PV Array Reconfiguration 1: This algorithm is performed at a learning period of lengthD. 2: P the optimal one from the candidates with the highest performance. 3: MaintainP out (t pre ) corresponding to the PV system output power after the previous reconfiguration. 4: while in this learning period do 5: Monitoring the PV system output powerP out (t). 6: ifjP out (t)P out (t pre )j> P then 7: for each PV cell module do 8: Activate the PV cells in the module while disabling the rest PV cells in the array. 9: Perform MPPT control to obtain the MPP and estimate the solar irradiance level on this PV module. 10: end for 11: Execute the PV array reconfiguration algorithm in [81] and perform reconfig- uration. 12: Measure the currentP out (t) and setP out (t pre ) P out (t); 13: end if 14: Evaluate all the candidate P values and update their performance using an exponential weighting function. 15: end while We measure a mid-size family sedan Renault-Samsung NEW-SM5 car and observe the following area parameters: roof: 1.99 m 2 (1.274 m by 1.565 m); hood: 1.6 m 2 (1.024 m by 1.565 m), trunk: 0.63 m 2 (0.400 m by 1.565 m), left and right door: 1.7 m 2 each (0.616 m by 2.760 m). These values are the available installation area for each PV module. We assume fixed-size PV cells with 0.15 m 2 area each, 20 V MPP voltage, and 2.25 A MPP current atG = 1000 W/m 2 . We assume 200 V terminal voltage of the EV battery pack. We consider a realistic PV charger model with efficiency variations [82]. We also consider a baseline setup, which has the same PV modules as in the proposed frameworks but without PV reconfiguration. Table 7.1 compares the average system output power values from the event-driven (and sensor-based) framework, the sensorless framework, and the baseline under all the 97 Table 7.1: Average system output power (W). Trace 1 Trace 2 Trace 3 Trace 4 Trace 5 Trace 6 Event-Driven 1055.1 524.1 849.9 808.0 475.9 1037.6 Sensorless 1034.6 514.9 822.4 760.6 461.2 1004.5 Baseline 687.8 329.1 535.9 479.7 166.7 576.8 six driving traces. Both the event-driven framework and the sensorless framework out- perform the baseline significantly, demonstrating the effectiveness of the even-driven PV array reconfiguration and the solar irradiance estimation algorithm. Specifically, comparing with the baseline, the event-driven framework achieves up to 2.85X perfor- mance enhancement, and the sensorless framework achieves up to 2.77X performance enhancement. From the output power values in Table 7.1, we can also observe that the even-driven (and sensor-based) framework achieves higher average system output power than the sensorless framework, which is due to the following reasons: (i) the sensorless frame- work has higher timing and energy overheads due to the solar irradiance estimation process, and (ii) in the sensorless framework, we can only use the system output power change rather than the solar irradiance change as the event to trigger reconfiguration. Fig. 7.5 shows the output power profiles of the event-driven framework, the sen- sorless framework, and the baseline under (a) Trace 5 and (b) Trace 6, which again demonstrates that the proposed two frameworks outperform the baseline and the event- driven framework achieves slightly higher output power than the sensorless framework. Please note that on the output power profiles of the event-driven and sensorless frame- works, there are some sudden power drops, which are the energy overhead of the PV reconfiguration. The power values in Fig. 7.5 are actually the average output power in a 50 ms time window. The amplitude of power drop in the sensorless framework is larger than that in the event-driven framework due to the energy overhead involved in the solar irradiance estimation process. However, the overall performance of the sensorless 98 Figure 7.5: Output power profiles of the event-driven, sensorless, and baseline frame- works. framework is quite close to the event-driven (and sensor-based) framework as shown in Table 7.1 due to the relatively infrequent reconfigurations. Fig. 7.6 compares the reconfiguration energy overhead and average output power of the event-driven framework and the periodic framework under Traces 1 - 4. We use reconfiguration periods 0.1 s, 0.2 s and 0.5 s in the periodic framework. The reconfigu- ration period of 0.5 s is the optimal one in terms of the overall system performance. We can observe that the event-driven framework reduces the energy overhead by around 50 % compared to reconfiguration with a period of 0.5 s. The energy overhead reduction implies fewer reconfiguration times and therefore prolonged system lifespan. Compared to the reconfiguration with a period of 0.1 s, the energy overhead of the event-driven framework is even negligible. Furthermore, the average output power of the event- driven framework is the highest. 99 Figure 7.6: Energy overhead and average output power comparisons of periodic and event-driven frameworks. 100 Chapter 8 Optimal Pricing Policy for Aggregators in the Smart Grid Traditional power grids are usually utilized to deliver electricity from central generators to a large number of users. On the contrary, in this work we consider the smart grid infrastructure, where an automated and distributed energy delivery network is estab- lished between the electricity suppliers and users [89]. Between the electricity suppliers and users, aggregators are incorporated to reduce the amount of computation and com- munication overheads associated with the direct interaction between them. An aggrega- tor coordinates the electricity consumption of (a group of) users by setting the electricity price in response to the imbalance between supply (the energy purchased from suppli- ers) and demand (the energy consumption of all users associated with the aggregator). For better reliability of the smart grid system, the aggregator employs a battery energy storage system (BESS) to buffer the mismatch between supply and demand. Each user (a residential user or an EV/PHEV user) is equipped with a software agent that sched- ules the user’s energy consumption to pursue its best interest. The system structure is shown in Figure 8.1. In this work [90], we aim at maximizing the overall profit of an aggregator in a billing period by designing a real-time pricing policy. The aggregator pre-announces a pricing policy for an entire billing period, then in each time interval of the billing period, the electricity users (both residential and EV/PHEV users) try to maximize their own utility functions based on the pricing model in the current time interval and the 101 Supplier Aggregator Residential User Supplier Supplier PHEV User EV User Storage Residential User Figure 8.1: The system structure. awareness of the other users’ behaviors. First, we formulate a nested two-stage game between the aggregator and the users for each time interval in the billing period. The aggregator is the leader and the users are the followers. In the first stage of the game, the aggregator provides a pricing model (including the price coefficient and the amount of energy purchased from the suppliers) for this time interval. In the second state of the game, the users maximize their own utility functions based on the current pricing model and the awareness of the other users. We find the unique Nash equilibrium in the second stage of the game, which is the subgame perfect equilibrium in the nested two-stage game. 102 Then we look at the original problem: how the aggregator determines the pricing policy for an entire billing period such that its overall profit can be maximized. We use a dynamic programming algorithm to derive the optimal real-time pricing policy for maximizing the aggragator’s overall profit, based on backward induction. Different from other works, we integrate a BESS with the aggregator to buffer the mismatch between supply and demand. More importantly, we derive the optimal pricing policy for an aggregator from a global point of view, taking into account the BESS energy state variation in a billing period. In this chapter, we proceed by discussing the system models including the pricing model, the overall profit of the aggregator in a billing period, the BESS stored energy, and the utility functions of users. We then propose the formulation of a nested two-stage game for each time interval of a billing period. We derive the optimal pricing policy for maximizing the aggregator’s overall profit using a dynamic programming algorithm based on backward induction. Simulation results are given to justify the effectiveness of the optimization procedure. 8.1 System Model We present the system model in this section. Let us consider a smart grid infrastructure as shown in Figure 8.1 and focus on the profit maximization of the aggregator. Between the electricity suppliers and users, the aggregator plays an important role in the sys- tem. It purchases electricity from suppliers and sells electricity to users. The aggregator can coordinate the electricity consumption of the users by setting electricity price in response to the imbalance between supply (the energy purchased from suppliers) and demand (the energy consumption of all users). For better reliability of the smart grid system, the aggregator employs a BESS to buffer the mismatch between supply and 103 demand. The users can be categorized into two types: the residential users and the EV/PHEV users. The residential users can only buy electricity from the aggregator, whereas the EV/PHEV users can both buy and sell back electricity. The users deter- mine their energy consumption according to the real-time electricity price set by the aggregator. 8.1.1 Aggregator We divide the entire billing periodT (e.g., one day) intoK discrete time intervals, each with length of T =T=K. Each time interval is indexed by an integer value time index k. The aggregator purchases a total amount ofD 0 k electriccal energy from suppliers at the price ofp 0 k in time intervalk. The aggregator determinesD 0 k for the entire billing period (for 1 k K), whereas p 0 k is set and pre-announced by the suppliers. Let D k denote the total amount of energy consumption from all users in time intervalk. In order to minimize the mismatch betweenD 0 k andD k , the aggregator employs a pricing model as p s k = p b 1 + (D k D 0 k ); (8.1) p b k = p b 2 + (D k D 0 k ); where p s k is the price of selling energy to the users, p b k is the price of buying energy from EV/PHEV users, p b 1 andp b 2 are the base prices, and k is a positive coefficient. Deriving a pricing policy for an entire billing period is equivalent to determining the values of k andD 0 k for 1 k K. As can be observed in (8.1), whenD k > D 0 k ,p s k and p b k become higher than base prices, which discourages all users to consume more energy and encourages EV/PHEV users to sell more energy back. On the other hand, 104 whenD k < D 0 k , this pricing model encourages users to consume more energy and sell back less energy. LetD neg k 0 denote the total amount of negative energy consumption from all users in time intervalk. Then,jD neg k j is the total amount of energy the aggregator buys from EV/PHEV users. The overall profit of the aggregator in a billing period is calculated as profit = K X k=1 (D k D neg k )p s k +D neg k p b k D 0 k p 0 k : (8.2) For better reliability of the smart grid system, the aggregator employs a BESS to buffer the mismatch between supply and demand. Assume that the BESS has a maxi- mum energy capacity ofE full . LetE k denote the energy stored in the BESS at the end of time intervalk, thenE k can be derived by E k =E ini + k X t=1 (D 0 t D t ); (8.3) whereE ini denotes the energy stored in the BESS at the beginning of the billing period. Please note that 0E k E full should be satisfied for 1kK. 8.1.2 Residential Users and EV/PHEV Users Assume there are a number ofN users in the system. Letd i k denote the energy demand of user i in time interval k. If d i k 0, the user i consumes energy purchased from the aggregator. If d i k < 0, the user i provides energy to the aggregator. The negative energy demand only applies to EV/PHEV users. ThenD k defined in Section 8.1.1 can be calculated as D k = N X i=1 d i k : (8.4) 105 AndD neg k defined previously can be obtained by D neg k = N X i=1 d i k I[d i k < 0]; (8.5) where I[] is an indicator function that equals to 1 if the input Boolean variable is true, and equals to 0 otherwise. At any time interval, a residential user wants to minimize the cost for purchasing electricity from the aggregator and to maximize its own satisfaction level. As a combi- nation of these two objectives, a residential user in a time interval maximizes a utility function in the following form: UR i k =b i k d i k 2 +c i k d i k p s k (d i k ;d i k )d i k ; (8.6) whered i k denotes the energy demands of all the other users except for useri.p s k (d i k ;d i k ) is a function of bothd i k andd i k because it is an increasing function of the total demand D k as given by (8.4).b i k andc i k in (8.6) are positive coefficients and they may be different for different time intervals and different users. The utility function is a concave function. In any time interval, a residential user maximizes its utility functionUR i k by finding the most desirabled i k value, which depends onb i k ,c i k , andp s k (d i k ;d i k ). Because residential users can only buy electricity from the aggregator,d i k 0 is a constraint that must be satisfied when finding the most desirabled i k value. On the other hand, an EV/PHEV user can buy energy from the aggregator at the price ofp s k as well as sell energy from its battery energy storage bank to the aggregator at the price ofp b k . We define the following utility function for an EV/PHEV user: UEV i k =f i k q g i k +d i k pd i k ; (8.7) 106 where p = 8 < : p s k (d i k ;d i k ); d i k 0; p b k (d i k ;d i k ); d i k < 0: (8.8) f i k andg i k are coefficients for useri in time intervalk. The first term in (8.7) represents the energy level at the end of time interval k, and the second term in (8.8) represents the cost of charging or the revenue of discharging. Each EV/PHEV user maximizes its utility functionUEV i k by determining the most desirabled i k value according to the related coefficients and the behaviors of the other users. 8.2 Problem Formulation and Algorithm The aggregator tries to find the optimal real-time pricing policy (i.e., k andD 0 k ) in an entire billing period (for 1 k K), such that its overall profit in a billing period, given by (8.2), can be maximized. In order to derive the optimal pricing policy, we should understand the behaviors of the users in response to a pricing model in a time interval. Therefore, we formulate a nested two-stage game between the aggregator and the users for each time interval in a billing period. The game is described as follows. For each time intervalk: Stage I: The aggregator provides the total amount of energy purchased from sup- pliers (i.e.,D 0 k ) as well as the price coefficient (i.e., k ) to the users. Stage II: The users maximize their own utility functions by determining their own energy demandsd i k (for 1iN) with the information provided by the aggre- gator and the awareness of the other users’ behaviors. We can find the unique Nash equilibrium in Stage II, which is the subgame perfect equilibrium in the nested two-stage game. Then, we propose a dynamic programming 107 algorithm to derive the optimal real-time pricing policy in a billing period. The algo- rithm is based on the backward induction method, which encapsulates the sequential rationality of decision makers and is used as a powerful technique to obtain the best strategies for the players in each stage of the nested game [91]. We will discuss the optimization procedure in detail as follows. 8.2.1 Game Theoretic Optimization in Stage II In Stage II of the nested game, considering time intervalk, each user maximizes its own utility function (i.e.,UR i k orUEV i k ) by setting its energy demandd i k to a desirable value, according to k andD 0 k (which has been provide by the aggregator in Stage I) and the awareness of the other users’ behaviors. The total energy demand of all users, i.e.,D k , affects the prices of energy, i.e., p s k and p b k , through (8.1), thereby affecting the utility function of user i as given by (8.6) or (8.7). Therefore, the interactions between the users form a normal-form game, where all users take action simultaneously. We name this game the User Energy Demand Optimization (UEDO) game, which is a subgame of the nested two-stage game. The Nash equilibrium of a normal-form game is the optimal strategy profile for all players in the sense that no player can find a better strategy (i.e., the d i k value) if he deviates from the current strategy unilaterally. In other words, no player (residential or EV/PHEV user) has the incentive to leave its strategy in the Nash equilibrium. The Nash equilibrium of the UEDO game is the subgame perfect equilibrium of the nested two-stage game. Now we prove the existence and uniqueness of the Nash equilibrium in the UEDO game. Theorem I: The Nash equilibrium of the UEDO game exists and is unique. Proof : According to the user utility functions, given by (8.6) and (8.7), we are essentially trying to maximize a strictly concave utility function for each player on a 108 closed convex set. Therefore, from the first and third theorem in [92], the existence and uniqueness of the Nash equilibrium is proved. Algorithm 7 Find the Nash equilibrium of the UEDO game. Input: the total amount of energy purchased from supplier i.e.,D 0 k , and the price coef- ficient i.e., k . Output: the optimal energy demands of all users in time intervalk i.e., the optimald i k (1iN). 1: Initialize the energy demands of all users i.e.,d i k for 1iN. 2: repeat 3: for each 1iN do 4: Find the optimald i k value for useri to maximize its utility function i.e., (8.6) for residential users or (8.7) for EV/PHEV users, assuming that the energy demands of the other users are given. 5: Update thed i k value for useri. 6: end for 7: until the solution converges The unique Nash equilibrium can be found using standard convex optimization tech- nique [93], as described in Algorithm 7. Since k andD 0 k are provided by the aggregator in Stage I, for different k andD 0 k values, we can derive different Nash equilibrium by applying Algorithm 7. We define two matricesM D k andM D neg k , where each entryM D k ( k ;D 0 k ) orM D neg k ( k ;D 0 k ) denotes the total energy demandD k or the total negative energy demand D neg k in the Nash equilibrium obtained from Algorithm 7 with given k and D 0 k values. These matrices are used to reduce the computation complexity later in the dynamic programming algorithm. Please note that for each time intervalk (1kK), we build tow matricesM D k andM D neg k , since the coefficientsb i k ,c i k ,f i k andg i k are different for differentk value. 109 8.2.2 Aggregator Overall Profit Optimization Based on the analysis of user behaviors in response to a pricing model in one time interval, now we are ready to derive the optimal pricing policy in an entire billing period such that the aggregator’s overall profit can be maximized. Consider a general problem of maximizing the profit made by the aggregator in the firstk time intervals, such that by the end of time intervalk the BESS stored energy isE 0 . We call this problem (E 0 ,k) problem. If we solve (E 0 ,k = K) problems with all possible E 0 values, we can find the solution to our original problem by picking one with the maximum profit in all (E 0 ,k =K) problems. We find optimal substructure property of (E 0 ,k) problem below, implying the applicability of dynamic programming algorithm. Optimal substructure property: Suppose that (E 0 ,k) problem has been optimally solved, and that in the solution the BESS stored energy isE 00 by the end of time interval k1. The optimal solution to the (E 0 ,k) problem contains within it the optimal solution to the (E 00 ,k 1) problem. To solve the original optimal pricing policy problem, we maintain matricesProf, Alpha, andM D 0 . The entryProf(j;k) stores the maximum profit made by the aggregator in the first k time intervals of a billing period, when the energy stored in the BESS isj E by the end of time intervalk, where E = E full =M andj is an integer between 0 andM.Prof(j;k) value will be obtained by solving the (j E,k) problem. Actually, we discretize the full range of the BESS energy state i.e., [0;E full ] into a set ofM + 1 values i.e.,f0; E; 2 E;:::;M Eg. The aggregator’s profit made in the firstk time intervals can be calculated based on (8.2) as prof k = k X t=1 (D t D neg t )p s t +D neg t p b t D 0 t p 0 t : (8.9) 110 Alpha(j;k) andM D 0 (j;k) store corresponding optimal k andD 0 k values, respec- tively, that result inProf(j;k). In the dynamic programming algorithm, the matrixProf will be filled up column by column (i.e., from k = 1 to k = K), andAlpha andM D 0 will also be filled up together with Prof. When we finish the calculation of whole matrices, the last column ofProf will store the maximum profits made by the aggregator in the entire billing period when the BESS ends up with different energy levels. We will pick the maximum one fromProf(;K) as the maximum overall profit of the aggregator, and the corresponding pricing policy will be obtained by tracing back [92]. Now let us look at how to derive the values of Prof(j;k), Alpha(j;k), and M D 0 (j;k) i.e., how to solve (j E, k) problem. When we want to calculate Prof(j;k), we have known the values ofProf(;k 1). Therefore, we only need to find the optimal k andD 0 k values in time intervalk. We provide more details in the following. When k andD 0 k are given, the profit made by the aggregator in the firstk time intervals is calculated by Prof(j 0 ;k 1) + (D k D neg k )p s k +D neg k p b k D 0 k p 0 k ; (8.10) whereD k = M D k ( k ;D 0 k ); D neg k = M D neg k ( k ;D 0 k ); p s k andp b k can be obtained by (8.1); andj 0 is determined by solving (jj 0 ) E =D 0 k D k : (8.11) (8.11) implies the change of BESS stored energy from j 0 E to j E in the time intervalk is due to the mismatch between the energy purchased from suppliers i.e.,D 0 k and the energy demand of all users i.e., D k . We need to find the optimal k and D 0 k values, i.e., opt k and D 0;opt k , such that (8.10) is maximized. Then we setProf(j;k) 111 as the maximum value of (8.10), and setAlpha(j;k), andM D 0 (j;k) as opt k and D 0;opt k , respectively. Algorithm 8 describes the dynamic programming algorithm for maximizing the aggregator’s overall profit in a billing period. Algorithm 8 Maximize aggregator’s overall profit. Input: matricesM D k (1kK), matricesM D neg k (1kK), and the BESS stored energy at the beginning of the billion period i.e.,E ini . Output: the optimal pricing policy in the billing period i.e., the optimal k andD 0 k for 1kK. 1: InitializeProf(; 0) to1. 2: InitializeProf(j =E ini =E; 0) to 0. 3: fork from 1 toK do 4: forj from 1 toM do 5: Find opt k andD 0;opt k values, such that (8.10) is maximized. 6: SetProf(j;k) as the maximum value of (8.10), and setAlpha(j;k), and M D 0 (j;k) as opt k andD 0;opt k , respectively. 7: end for 8: end for 9: Perform tracing back to find the optimal k andD 0 k for 1kK. 8.3 Experimental Results In this section, we demonstrate experimental results of the optimal pricing policy for maximizing the aggregator’s overall profit in a billing period. We compare the optimal pricing policy with a baseline pricing policy. The baseline pricing policy is different from the optimal pricing policy in that it finds the k and D 0 k values for each time intervalk such that the mismatch betweenD 0 k andD k is zero and the profit made by the aggregator in each time interval is maximized. In our simulation, we consider a group of 20 users associated with an aggregator. Among all the users, 10 of them are residential users and 10 of them are EV/PHEV users. We consider a billing period of one day consisting of 12 time intervals, each with length of 2 hours. For the residential users,b i k is set as a randomized value between 2 112 0 6 12 18 24 0 2 4 6 8 Time (hour) Price of Energy from Suppliers 0 6 12 18 24 0 2 4 6 8 Time (hour) Price of Energy from Suppliers 0 6 12 18 24 0 2 4 6 8 Time (hour) Price of Energy from Suppliers Figure 8.2: Three sets ofp 0 k values. and 3, and c i k is set as a randomized value between 175 and 225. For the EV/PHEV users,f i k is set as randomized value between 10 and 600,g i k is set as a randomized value between 35 and 200. The maximum energy capacity of the BESS (i.e.,E full ) is 100 and the initial energy stored in BESS at the beginning of the billing period (i.e.,E ini ) is 0. The base prices (i.e.,p b 1 andp b 2 ) are set as 8. The range of k is set as [0.10, 0.30] and the range ofD 0 k is set as [100, 800]. We use three sets ofp 0 k values for 1 k 12 as shown in Figure 8.2. We derive the optimal pricing policy with different sets ofp 0 k values using our pro- posed dynamic programming algorithm. As a result of the optimal pricing policy, we plot the BESS energy as a function of time for different sets ofp 0 k values in Figure 8.3. We can observe that the BESS energy starts from 0, reaches a peak value around the noon time, and finally goes back to 0. The mismatch between supply (energy from suppliers) and demand (consumption of users) is totally buffered by the BESS The comparison between the optimal pricing policy and the baseline pricing policy is shown in Figure 8.4. We can observe that the optimal pricing policy always achieves 113 Figure 8.3: BESS stored energy as a function of time. Figure 8.4: Overall profit of the aggregator from the optimal pricing policy and the baseline pricing policy. 114 higher overall profit than the baseline pricing policy. The improvements achieved by the optimal pricing policy are 19.5%, 24.3%, and 16.4%, respectively, for the different sets ofp 0 k values. 115 Reference List [1] ADVISOR 2003 documentation. National Renewable Energy Lab. [2] F. R. Salmasi, “Control strategies for hybrid electric vehicles: Evolution, classifi- cation, comparison, and future trends,” Vehicular Technology, IEEE Transactions on, vol. 56, no. 5, pp. 2393–2404, 2007. [3] V . Shah, R. Chaudhari, P. Kundu, and R. Maheshwari, “Performance analysis of hybrid energy storage system using hybrid control algorithm with bldc motor driving a vehicle,” in Power Electronics, Drives and Energy Systems (PEDES) & 2010 Power India, 2010 Joint International Conference on, pp. 1–5, IEEE, 2010. [4] C. Chan, “The state of the art of electric, hybrid, and fuel cell vehicles,” Proceed- ings of the IEEE, vol. 95, no. 4, pp. 704–718, 2007. [5] M. Eshani, Y . Gao, S. E. Gay, and A. Emadi, “Modern electric, hybrid electric and fuel cell vehicles,” Fundamentals, Theory, and Design. Boca Raton, FL: CRC, 2005. [6] S. Lukic, “Charging ahead,” Industrial Electronics Magazine, IEEE, vol. 2, no. 4, pp. 22–31, 2008. [7] C.-C. Chan, A. Bouscayrol, and K. Chen, “Electric, hybrid, and fuel-cell vehi- cles: Architectures and modeling,” Vehicular Technology, IEEE Transactions on, vol. 59, no. 2, pp. 589–598, 2010. [8] M. Ahman, “Assessing the future competitiveness of alternative powertrains,” International Journal of vehicle design, vol. 33, no. 4, pp. 309–331, 2003. [9] A. Emadi, K. Rajashekara, S. S. Williamson, and S. M. Lukic, “Topological overview of hybrid electric and fuel cell vehicular power system architectures and configurations,” Vehicular Technology, IEEE Transactions on, vol. 54, no. 3, pp. 763–770, 2005. 116 [10] P. Tulpule, V . Marano, and G. Rizzoni, “Effects of different phev control strate- gies on vehicle performance,” in American Control Conference, 2009. ACC’09., pp. 3950–3955, IEEE, 2009. [11] M. Ceraolo, A. di Donato, and G. Franceschi, “A general approach to energy opti- mization of hybrid electric vehicles,” Vehicular Technology, IEEE Transactions on, vol. 57, no. 3, pp. 1433–1441, 2008. [12] M. A. Masrur, “Penalty for fuel economy-system level perspectives on the relia- bility of hybrid electric vehicles during normal and graceful degradation opera- tion,” Systems Journal, IEEE, vol. 2, no. 4, pp. 476–483, 2008. [13] A. Sciarretta and L. Guzzella, “Control of hybrid electric vehicles,” Control sys- tems, IEEE, vol. 27, no. 2, pp. 60–70, 2007. [14] E. Faggioli, P. Rena, V . Danel, X. Andrieu, R. Mallant, and H. Kahlen, “Super- capacitors for the energy management of electric vehicles,” Journal of Power Sources, vol. 84, no. 2, pp. 261–269, 1999. [15] R. Carter and A. Cruden, “Strategies for control of a battery/supercapacitor sys- tem in an electric vehicle,” in Power Electronics, Electrical Drives, Automation and Motion, 2008. SPEEDAM 2008. International Symposium on, pp. 727–732, IEEE, 2008. [16] M. Ort´ uzar, J. Moreno, and J. Dixon, “Ultracapacitor-based auxiliary energy sys- tem for an electric vehicle: Implementation and evaluation,” Industrial Electron- ics, IEEE Transactions on, vol. 54, no. 4, pp. 2147–2156, 2007. [17] J. Cao and A. Emadi, “A new battery/ultracapacitor hybrid energy storage sys- tem for electric, hybrid, and plug-in hybrid electric vehicles,” Power Electronics, IEEE Transactions on, vol. 27, no. 1, pp. 122–132, 2012. [18] S. Park, Y . Kim, and N. Chang, “Hybrid energy storage systems and battery man- agement for electric vehicles,” in Design Automation Conference (DAC), 2013 50th ACM/EDAC/IEEE, pp. 1–6, IEEE, 2013. [19] F. R. Salmasi, “Designing control strategies for hybrid electric vehicles,” Tutorial Presentation in EuroPes, pp. 15–17, 2005. [20] H. Banvait, S. Anwar, and Y . Chen, “A rule-based energy management strat- egy for plug-in hybrid electric vehicle (phev),” in American Control Conference, 2009. ACC’09., pp. 3938–3943, IEEE, 2009. [21] B. M. Baumann, G. Washington, B. C. Glenn, and G. Rizzoni, “Mechatronic design and control of hybrid electric vehicles,” Mechatronics, IEEE/ASME Trans- actions on, vol. 5, no. 1, pp. 58–72, 2000. 117 [22] H.-D. Lee, E.-S. Koo, S.-K. Sul, J.-S. Kim, M. Kamiya, H. Ikeda, S. Shinohara, and H. Yoshida, “Torque control strategy for a parallel-hybrid vehicle using fuzzy logic,” Industry Applications Magazine, IEEE, vol. 6, no. 6, pp. 33–38, 2000. [23] N. J. Schouten, M. A. Salman, and N. A. Kheir, “Fuzzy logic control for parallel hybrid vehicles,” Control Systems Technology, IEEE Transactions on, vol. 10, no. 3, pp. 460–468, 2002. [24] A. Brahma, Y . Guezennec, and G. Rizzoni, “Optimal energy management in series hybrid electric vehicles,” in American Control Conference, 2000. Proceed- ings of the 2000, vol. 1, pp. 60–64, IEEE, 2000. [25] C.-C. Lin, H. Peng, J. W. Grizzle, and J.-M. Kang, “Power management strategy for a parallel hybrid electric truck,” Control Systems Technology, IEEE Transac- tions on, vol. 11, no. 6, pp. 839–849, 2003. [26] L. V . P´ erez, G. R. Bossio, D. Moitre, and G. O. Garc´ ıa, “Optimization of power management in an hybrid electric vehicle using dynamic programming,” Mathe- matics and Computers in Simulation, vol. 73, no. 1, pp. 244–254, 2006. [27] G. Paganelli, M. Tateno, A. Brahma, G. Rizzoni, and Y . Guezennec, “Control development for a hybrid-electric sport-utility vehicle: strategy, implementation and field test results,” in American Control Conference, 2001. Proceedings of the 2001, vol. 6, pp. 5064–5069, IEEE, 2001. [28] P. Pisu and G. Rizzoni, “A comparative study of supervisory control strategies for hybrid electric vehicles,” Control Systems Technology, IEEE Transactions on, vol. 15, no. 3, pp. 506–518, 2007. [29] Q. Gong, Y . Li, and Z.-R. Peng, “Trip-based optimal power management of plug- in hybrid electric vehicles,” Vehicular Technology, IEEE Transactions on, vol. 57, no. 6, pp. 3393–3401, 2008. [30] C.-C. Lin, H. Peng, and J. Grizzle, “A stochastic control strategy for hybrid elec- tric vehicles,” in American Control Conference, 2004. Proceedings of the 2004, vol. 5, pp. 4710–4715, IEEE, 2004. [31] S. J. Moura, H. K. Fathy, D. S. Callaway, and J. L. Stein, “A stochastic opti- mal control approach for power management in plug-in hybrid electric vehicles,” Control Systems Technology, IEEE Transactions on, vol. 19, no. 3, pp. 545–555, 2011. [32] S. Shao, M. Pipattanasomporn, and S. Rahman, “Challenges of phev penetra- tion to the residential distribution network,” in Power & Energy Society General Meeting, 2009. PES’09. IEEE, pp. 1–8, IEEE, 2009. 118 [33] K. Parks, P. Denholm, and A. J. Markel, Costs and emissions associated with plug-in hybrid electric vehicle charging in the Xcel Energy Colorado service ter- ritory. National Renewable Energy Laboratory Golden, CO, 2007. [34] M. Kintner-Meyer, K. Schneider, and R. Pratt, “Impacts assessment of plug-in hybrid vehicles on electric utilities and regional us power grids, part 1: Technical analysis,” Pacific Northwest National Laboratory (a), 2007. [35] S. Kishore and L. V . Snyder, “Control mechanisms for residential electricity demand in smartgrids,” in Smart Grid Communications (SmartGridComm), 2010 First IEEE International Conference on, pp. 443–448, IEEE, 2010. [36] N. Rotering and M. Ilic, “Optimal charge control of plug-in hybrid electric vehi- cles in deregulated electricity markets,” Power Systems, IEEE Transactions on, vol. 26, no. 3, pp. 1021–1029, 2011. [37] Y . Cao, S. Tang, C. Li, P. Zhang, Y . Tan, Z. Zhang, and J. Li, “An optimized ev charging model considering tou price and soc curve,” Smart Grid, IEEE Transac- tions on, vol. 3, no. 1, pp. 388–393, 2012. [38] S. Sojoudi and S. H. Low, “Optimal charging of plug-in hybrid electric vehicles in smart grids,” in Power and Energy Society General Meeting, 2011 IEEE, pp. 1–6, IEEE, 2011. [39] D. Wu, D. C. Aliprantis, and L. Ying, “Load scheduling and dispatch for aggre- gators of plug-in electric vehicles,” Smart Grid, IEEE Transactions on, vol. 3, no. 1, pp. 368–376, 2012. [40] J. J. Escudero-Garzas and G. Seco-Granados, “Charging station selection opti- mization for plug-in electric vehicles: An oligopolistic game-theoretic frame- work,” in Innovative Smart Grid Technologies (ISGT), 2012 IEEE PES, pp. 1–8, IEEE, 2012. [41] W. Kempton and J. Tomi´ c, “Vehicle-to-grid power implementation: From sta- bilizing the grid to supporting large-scale renewable energy,” Journal of Power Sources, vol. 144, no. 1, pp. 280–294, 2005. [42] J. Tomi´ c and W. Kempton, “Using fleets of electric-drive vehicles for grid sup- port,” Journal of Power Sources, vol. 168, no. 2, pp. 459–468, 2007. [43] C. Guille and G. Gross, “Design of a conceptual framework for the v2g imple- mentation,” in Energy 2030 Conference, 2008. ENERGY 2008. IEEE, pp. 1–3, IEEE, 2008. 119 [44] B. Geng, J. K. Mills, and D. Sun, “Two-stage charging strategy for plug-in electric vehicles at the residential transformer level,” Smart Grid, IEEE Transactions on, vol. 4, no. 3, pp. 1442–1452, 2013. [45] K. Shimizu, T. Masuta, Y . Ota, and A. Yokoyama, “Load frequency control in power system using vehicle-to-grid system considering the customer convenience of electric vehicles,” in Power System Technology (POWERCON), 2010 Interna- tional Conference on, pp. 1–8, IEEE, 2010. [46] S. Han, S. Han, and K. Sezaki, “Optimal control of the plug-in electric vehicles for v2g frequency regulation using quadratic programming,” in Innovative Smart Grid Technologies (ISGT), 2011 IEEE PES, pp. 1–6, IEEE, 2011. [47] W. Saad, Z. Han, H. V . Poor, and T. Basar, “A noncooperative game for dou- ble auction-based energy trading between phevs and distribution grids,” in Smart Grid Communications (SmartGridComm), 2011 IEEE International Conference on, pp. 267–272, IEEE, 2011. [48] C.-T. Li, C. Ahn, H. Peng, and J. Sun, “Integration of plug-in electric vehicle charging and wind energy scheduling on electricity grid,” in Innovative Smart Grid Technologies (ISGT), 2012 IEEE PES, pp. 1–7, IEEE, 2012. [49] C. Wu, H. Mohsenian-Rad, and J. Huang, “Wind power integration via aggregator-consumer coordination: A game theoretic approach,” in Innovative Smart Grid Technologies (ISGT), 2012 IEEE PES, pp. 1–6, IEEE, 2012. [50] K. Mets, F. De Turck, and C. Develder, “Distributed smart charging of electric vehicles for balancing wind energy,” in Smart Grid Communications (SmartGrid- Comm), 2012 IEEE Third International Conference on, pp. 133–138, IEEE, 2012. [51] J.-M. Kang, I. Kolmanovsky, and J. Grizzle, “Dynamic optimization of lean burn engine aftertreatment,” Journal of Dynamic Systems, Measurement, and Control, vol. 123, no. 2, pp. 153–160, 2001. [52] T. Markel, A. Brooker, T. Hendricks, V . Johnson, K. Kelly, B. Kramer, M. OKeefe, S. Sprik, and K. Wipke, “Advisor: a systems analysis tool for advanced vehicle modeling,” Journal of power sources, vol. 110, no. 2, pp. 255– 266, 2002. [53] S. Kermani, S. Delprat, T. Guerra, and R. Trigui, “Predictive control for hev energy management: experimental results,” in Vehicle Power and Propulsion Conference, 2009. VPPC’09. IEEE, pp. 364–369, IEEE, 2009. [54] S. Delprat, J. Lauber, T.-M. Guerra, and J. Rimaux, “Control of a parallel hybrid powertrain: optimal control,” Vehicular Technology, IEEE Transactions on, vol. 53, no. 3, pp. 872–881, 2004. 120 [55] “Dynamometer drive schedules,” http://www.epa.gov/nvfel/testing/dynamometer.htm. [56] X. Lin, Y . Wang, P. Bogdan, N. Chang, and M. Pedram, “Optimizing fuel econ- omy of hybrid electric vehicles using a markov decision process model,” in Intel- ligent Vehicles Symposium (IV), 2015 IEEE, pp. 718–723, IEEE, 2015. [57] A. G. Barto, Reinforcement learning: An introduction. MIT press, 1998. [58] L. Benini, A. Bogliolo, A. Paleologo, and G. De Micheli, “Policy optimization for dynamic power management,” Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 18, no. 6, pp. 813–833, 1999. [59] T. W. Anderson and L. A. Goodman, “Statistical inference about markov chains,” The Annals of Mathematical Statistics, pp. 89–110, 1957. [60] M. Ehsani, Y . Gao, and A. Emadi, Modern electric, hybrid electric, and fuel cell vehicles: fundamentals, theory, and design. CRC press, 2009. [61] D. Linden and T. Reddy, “Handbook of batteries, 2002.” [62] T. Panigrahi, D. Panigrahi, C. Chiasserini, S. Dey, R. Rao, A. Raghunathan, K. Lahiri, et al., “Battery life estimation of mobile embedded systems,” in VLSI Design, 2001. Fourteenth International Conference on, pp. 57–63, IEEE, 2001. [63] X. Lin, Y . Wang, P. Bogdan, N. Chang, and M. Pedram, “Reinforcement learn- ing based power management for hybrid electric vehicles,” in Computer-Aided Design (ICCAD), 2014 IEEE/ACM International Conference on, pp. 33–38, IEEE, 2014. [64] R. S. Sutton, “Learning to predict by the methods of temporal differences,” Machine learning, vol. 3, no. 1, pp. 9–44, 1988. [65] G. L. Plett, “Extended kalman filtering for battery management systems of lipb- based hev battery packs: Part 1. background,” Journal of Power sources, vol. 134, no. 2, pp. 252–261, 2004. [66] X. Lin, P. Bogdan, N. Chang, and M. Pedram, “Machine learning-based energy management in a hybrid electric vehicle to minimize total operating cost,” in Computer-Aided Design (ICCAD), 2015 IEEE/ACM International Conference on, pp. 627–634, IEEE, 2015. [67] A. Millner, “Modeling lithium ion battery degradation in electric vehicles,” in Innovative Technologies for an Efficient and Reliable Electricity Supply (CIT- RES), 2010 IEEE Conference on, pp. 349–356, IEEE, 2010. 121 [68] M. Dubarry, V . Svoboda, R. Hwu, and B. Y . Liaw, “Capacity and power fading mechanism identification from a commercial cell evaluation,” Journal of Power Sources, vol. 165, no. 2, pp. 566–572, 2007. [69] Q. Zhang and R. E. White, “Capacity fade analysis of a lithium ion cell,” Journal of Power Sources, vol. 179, no. 2, pp. 793–798, 2008. [70] Y . Wang, X. Lin, Q. Xie, N. Chang, and M. Pedram, “Minimizing state-of-health degradation in hybrid electrical energy storage systems with arbitrary source and load profiles,” in Design, Automation and Test in Europe Conference and Exhibi- tion (DATE), 2014, pp. 1–4, IEEE, 2014. [71] http://batteryuniversity.com/learn/article/hybrid electric vehicle. [72] Y . Wang, X. Lin, M. Pedram, and N. Chang, “Joint automatic control of the powertrain and auxiliary systems to enhance the electromobility in hybrid electric vehicles,” in Design Automation Conference (DAC), 2015 52nd ACM/EDAC/IEEE, pp. 1–6, IEEE, 2015. [73] A.-H. Mohsenian-Rad and A. Leon-Garcia, “Optimal residential load control with price prediction in real-time electricity pricing environments,” Smart Grid, IEEE Transactions on, vol. 1, no. 2, pp. 120–133, 2010. [74] High-performance battery monitor IC with coulomb counter, voltage and, tem- perature measurement. Texas Instruments. [75] H. Borhan, A. Vahidi, A. M. Phillips, M. L. Kuang, I. V . Kolmanovsky, and S. Di Cairano, “Mpc-based energy management of a power-split hybrid elec- tric vehicle,” Control Systems Technology, IEEE Transactions on, vol. 20, no. 3, pp. 593–603, 2012. [76] I. Arsie, G. Rizzo, and M. Sorrentino, “Optimal design and dynamic simulation of a hybrid solar vehicle,” SAE Transactions-Journal of Engines, pp. 115–3, 2007. [77] A. Affanni, A. Bellini, G. Franceschini, P. Guglielmi, and C. Tassoni, “Battery choice and management for new-generation electric vehicles,” Industrial Elec- tronics, IEEE Transactions on, vol. 52, no. 5, pp. 1343–1349, 2005. [78] W. Xiao, N. Ozog, and W. G. Dunford, “Topology study of photovoltaic interface for maximum power point tracking,” Industrial Electronics, IEEE Transactions on, vol. 54, no. 3, pp. 1696–1704, 2007. [79] Y . Wang, X. Lin, Y . Kim, N. Chang, and M. Pedram, “Enhancing efficiency and robustness of a photovoltaic power system under partial shading,” in Quality Elec- tronic Design (ISQED), 2012 13th International Symposium on, pp. 592–600, IEEE, 2012. 122 [80] X. Lin, Y . Wang, M. Pedram, J. Kim, and N. Chang, “Event-driven and sensorless photovoltaic system reconfiguration for electric vehicles,” in Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015, pp. 19–24, IEEE, 2015. [81] X. Lin, Y . Wang, S. Yue, D. Shin, N. Chang, and M. Pedram, “Near-optimal, dynamic module reconfiguration in a photovoltaic system to combat partial shad- ing effects,” in Proceedings of the 49th Annual Design Automation Conference, pp. 516–521, ACM, 2012. [82] Y . Wang, Y . Kim, Q. Xie, N. Chang, and M. Pedram, “Charge migration effi- ciency optimization in hybrid electrical energy storage (hees) systems,” in Pro- ceedings of the 17th IEEE/ACM international symposium on Low-power elec- tronics and design, pp. 103–108, IEEE Press, 2011. [83] W. Lee, Y . Kim, Y . Wang, N. Chang, M. Pedram, and S. Han, “Versatile high-fidelity photovoltaic module emulation system,” in Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design, pp. 91–96, IEEE Press, 2011. [84] Samsung Exynos 4 Dual 45nm (Exynos 4210) Microprocessor, 2012. [85] T. Simunic, L. Benini, P. Glynn, and G. De Micheli, “Event-driven power man- agement,” in IEEE Trans. Computer-Aided Design, Citeseer, 2001. [86] C. Yoon, D. Kim, W. Jung, C. Kang, and H. Cha, “Appscope: Application energy metering framework for android smartphone using kernel activity monitoring,” in Presented as part of the 2012 USENIX Annual Technical Conference (USENIX ATC 12), pp. 387–400, 2012. [87] C. M. Bishop, “Pattern recognition,” Machine Learning, 2006. [88] C.-H. Hwang and A. C.-H. Wu, “A predictive system shutdown method for energy saving of event-driven computation,” ACM Transactions on Design Automation of Electronic Systems (TODAES), vol. 5, no. 2, pp. 226–241, 2000. [89] X. Fang, S. Misra, G. Xue, and D. Yang, “Smart grid?the new and improved power grid: A survey,” Communications Surveys & Tutorials, IEEE, vol. 14, no. 4, pp. 944–980, 2012. [90] X. Lin, Y . Wang, and M. Pedram, “Designing the optimal pricing policy for aggregators in the smart grid,” in Green Technologies Conference (GreenTech), 2014 Sixth Annual IEEE, pp. 75–80, IEEE, 2014. [91] K. Leyton-Brown and Y . Shoham, “Essentials of game theory: A concise mul- tidisciplinary introduction,” Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 2, no. 1, pp. 1–88, 2008. 123 [92] T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein, et al., Introduction to algorithms, vol. 2. MIT press Cambridge, 2001. [93] S. Boyd and L. Vandenberghe, Convex optimization. Cambridge university press, 2009. [94] D. Agrawal, S. Das, and A. El Abbadi, “Big data and cloud computing: current state and future opportunities,” in Proceedings of the 14th International Confer- ence on Extending Database Technology, pp. 530–533, ACM, 2011. [95] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, et al., “A view of cloud computing,” Commu- nications of the ACM, vol. 53, no. 4, pp. 50–58, 2010. [96] I. Foster, Y . Zhao, I. Raicu, and S. Lu, “Cloud computing and grid computing 360- degree compared,” in Grid Computing Environments Workshop, 2008. GCE’08, pp. 1–10, Ieee, 2008. [97] R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, and I. Brandic, “Cloud computing and emerging it platforms: Vision, hype, and reality for delivering computing as the 5th utility,” Future Generation computer systems, vol. 25, no. 6, pp. 599–616, 2009. [98] R. H. Katz, “Tech titans building boom,” Spectrum, IEEE, vol. 46, no. 2, pp. 40– 54, 2009. [99] J. Koomey, “Growth in data center electricity use 2005 to 2010,” A report by Analytical Press, completed at the request of The New York Times, 2011. [100] D. Niyato, S. Chaisiri, and L. B. Sung, “Optimal power management for server farm to support green computing,” in Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 84–91, IEEE Computer Society, 2009. [101] X. Lin, Y . Wang, and M. Pedram, “A reinforcement learning-based power man- agement framework for green computing data centers,” in Proceedings of the International Conference on Cloud Engineering, IEEE, 2016. [102] Y . Wang, S. Chen, H. Goudarzi, and M. Pedram, “Resource allocation and con- solidation in a multi-core server cluster using a markov decision process model,” in Quality Electronic Design (ISQED), 2013 14th International Symposium on, pp. 635–642, IEEE, 2013. [103] J. Leverich, M. Monchiero, V . Talwar, P. Ranganathan, and C. Kozyrakis, “Power management of datacenter workloads using per-core power gating,” Computer Architecture Letters, vol. 8, no. 2, pp. 48–51, 2009. 124 [104] C. Subramanian, A. Vasan, and A. Sivasubramaniam, “Reducing data center power with server consolidation: Approximation and evaluation,” in High Per- formance Computing (HiPC), 2010 International Conference on, pp. 1–10, IEEE, 2010. [105] D. Meisner, B. T. Gold, and T. F. Wenisch, “Powernap: eliminating server idle power,” ACM SIGARCH Computer Architecture News, vol. 37, no. 1, pp. 205– 216, 2009. [106] A. Gandhi, Y . Chen, D. Gmach, M. Arlitt, and M. Marwah, “Minimizing data center sla violations and power consumption via hybrid resource provisioning,” in Green Computing Conference and Workshops (IGCC), 2011 International, pp. 1– 8, IEEE, 2011. [107] Y . Gao, Y . Wang, S. K. Gupta, and M. Pedram, “An energy and deadline aware resource provisioning, scheduling and optimization framework for cloud sys- tems,” in Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, p. 31, IEEE Press, 2013. [108] J. Xu and J. A. Fortes, “Multi-objective virtual machine placement in virtualized data center environments,” in Green Computing and Communications (Green- Com), 2010 IEEE/ACM Int’l Conference on & Int’l Conference on Cyber, Physi- cal and Social Computing (CPSCom), pp. 179–188, IEEE, 2010. [109] R. Raghavendra, P. Ranganathan, V . Talwar, Z. Wang, and X. Zhu, “No power struggles: Coordinated multi-level power management for the data center,” in ACM SIGARCH Computer Architecture News, vol. 36, pp. 48–59, ACM, 2008. [110] J. Wilkes, “More Google cluster data.” Google research blog, Nov. 2011. Posted at http://googleresearch.blogspot.com/2011/11/more-google-cluster-data.html. [111] M. Drozdowski, Scheduling for parallel processing. Springer, 2009. [112] L. Benini, A. Bogliolo, and G. De Micheli, “A survey of design techniques for system-level dynamic power management,” Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 8, no. 3, pp. 299–316, 2000. [113] M. Ghamkhari and H. Mohsenian-Rad, “Optimal integration of renewable energy resources in data centers with behind-the-meter renewable generator,” in Proceed- ings of the IEEE International Conference in Commmunications (ICC), IEEE, 2012. [114] E. Pakbaznia and M. Pedram, “Minimizing data center cooling and server power costs,” in Proceedings of the 2009 ACM/IEEE international symposium on Low power electronics and design, pp. 145–150, ACM, 2009. 125
Abstract (if available)
Abstract
The conventional internal combustion engine (ICE)-powered vehicles have contributed significantly to the development of modern society. However, they have also brought about large amounts of fuel consumption and pollution emissions due to the increasing number of vehicles in use around the world. Electric vehicles (EVs) and hybrid electric vehicles (HEVs) have been developed to improve the fuel economy and reduce the pollution emissions. ❧ This thesis first introduces basic components of EV and HEV and methods for the EV/HEV energy management. After an accurate and detailed modeling of the HEV, this thesis provides two control strategies for the HEV energy management to improve the fuel economy. Different from some previous literature work that rely on a priori knowledge of the driving profiles, the proposed control strategies, namely, a Markov decision process based strategy and a reinforcement learning based strategy, only need stochastic knowledge of the driving profiles or do not rely on any prior knowledge of the driving profiles. In particular, the reinforcement learning based control strategy can be model-free, which enables one to (partially) avoid reliance on complex HEV modeling while coping with driver specific behaviors. ❧ The state-of-health (SoH) of the battery pack is degrading with the operation of an HEV. The battery pack will reach its end-of-life when it loses 20% or 30% of its nominal capacity. At the same time, the battery pack replacement results in additional operational cost for an HEV. Therefore, this thesis investigates the energy management problem in HEVs focusing on the minimization of the operating cost of an HEV, including both fuel and battery replacement cost. A nested learning framework is proposed, in which the inner-loop learning process is the key to minimization of the fuel usage whereas the outer-loop learning process is critical to minimization of the amortized battery replacement cost. ❧ On the other hand, auxiliary systems of HEVs/EVs, comprised of lighting, air conditioning (or more generally, heating, ventilation, and air conditioning), and other battery-powered systems such as GPS, may account for 10%-30% of the overall fuel consumption for an ordinary (fuel-based) vehicle. For HEVs and EVs, it is projected that auxiliary systems will take a larger portion of the overall energy consumption, partly because heating of an ordinary vehicle can be partially achieved by the heated internal combustion engine. Hence, in this thesis, the control of HEV powertain and auxiliary systems are jointly considered for the minimal operational cost. We minimize fuel cost induced both by propelling the vehicle and by the auxiliary systems, and meanwhile maximize a total utility function (representing the degree of desirability) of the auxiliary systems. To further enhance the effectiveness of the RL framework, the prediction of future driving profile characteristics is incorporated. ❧ An EV with onboard PV electrical energy generation system (PV system) is beneficial since PV cells can charge the EV battery pack when the EV is running and parking to mitigate the power demand from the grid. This thesis aims at maximizing the output power of a vehicular PV system with the string charger architecture taking into account the non-uniform distribution of solar irradiance levels on different vehicle surface areas. This work is based on the dynamic PV array reconfiguration architecture from previous work with the accommodation of the rapidly changing solar irradiance in the onboard scenario. Most importantly, this work differs from previous dynamic PV array reconfiguration work in that an event-driven and a sensorless PV array reconfiguration framework are proposed. ❧ The concept of vehicle-to-grid (V2G) was developed to make use of the electrical energy storage ability of EV/HEV batteries for frequency regulation, load balancing, etc. This thesis also presents the work on the smart grid optimal pricing policy problem, in which the aggregator maximizes its profit by designing a real-time pricing policy while taking into account the behaviors of both residential users and EV/HEV users. The aggregator pre-announces a pricing policy for an entire billing period, then in each time interval of the billing period, the electricity users (both residential and EV/PHEV users) try to maximize their own utility functions based on the pricing model in the current time interval and the awareness of the other users' behaviors. We use a dynamic programming algorithm to derive the optimal real-time pricing policy for maximizing the aggregator's overall profit, based on backward induction.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Integration of energy-efficient infrastructures and policies in smart grid
PDF
A joint framework of design, control, and applications of energy generation and energy storage systems
PDF
Multi-level and energy-aware resource consolidation in a virtualized cloud computing system
PDF
Electric vehicle integration into the distribution grid: impact, control and forecast
PDF
SLA-based, energy-efficient resource management in cloud computing systems
PDF
Compiler and runtime support for hybrid arithmetic and logic processing of neural networks
PDF
Energy-efficient shutdown of circuit components and computing systems
PDF
Towards green communications: energy efficient solutions for the next generation cellular mobile communication systems
PDF
An FPGA-friendly, mixed-computation inference accelerator for deep neural networks
PDF
Theoretical and computational foundations for cyber‐physical systems design
PDF
The dynamic interaction of synchronous condensers, SVC, STATCOM and superconducting magnetic energy storage on electric vehicles
PDF
Energy efficient design and provisioning of hardware resources in modern computing systems
PDF
Variation-aware circuit and chip level power optimization in digital VLSI systems
PDF
Stochastic dynamic power and thermal management techniques for multicore systems
PDF
Acceleration of deep reinforcement learning: efficient algorithms and hardware mapping
PDF
Environmental effects from a large-scale adoption of electric vehicle technology in the City of Los Angeles
PDF
Architectures and algorithms of charge management and thermal control for energy storage systems and mobile devices
PDF
Reward shaping and social learning in self- organizing systems through multi-agent reinforcement learning
PDF
Thermal modeling and control in mobile and server systems
PDF
Trustworthiness of integrated circuits: a new testing framework for hardware Trojans
Asset Metadata
Creator
Lin, Xue
(author)
Core Title
Reinforcement learning in hybrid electric vehicles (HEVs) / electric vehicles (EVs)
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
04/05/2016
Defense Date
03/16/2016
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
hybrid electric vehicle,OAI-PMH Harvest,reinforcement learning
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Pedram, Massoud (
committee chair
), Bogdan, Paul (
committee member
), Nakano, Aiichiro (
committee member
)
Creator Email
sunny.linxue@gmail.com,xuelin@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-225024
Unique identifier
UC11277509
Identifier
etd-LinXue-4227.pdf (filename),usctheses-c40-225024 (legacy record id)
Legacy Identifier
etd-LinXue-4227.pdf
Dmrecord
225024
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Lin, Xue
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
hybrid electric vehicle
reinforcement learning