Yu He Final Dissertation
Yu He Final Dissertation
by
Yu He
Doctoral Committee:
Assistant Professor Youngki Kim, Chair
Assistant Professor Zhen Hu
Professor Dewey Dohoy Jung
Assistant Professor Doohyun Kim
Assistant Research Scientist Kyoung Hyun Kwak
Yu He
heyuz@umich.edu
ORCID iD: 0000-0003-4293-0049
© Yu He 2022
ACKNOWLEDGMENTS
First and foremost, I would like to express my most significant appreciation to my academic advi-
sor, Professor Youngki Kim, for his unwavering support and guidance. His extensive knowledge
and valuable advice not just inspired my academic study throughout my Ph.D. career but also so-
lidified my confidence in pushing forward. Moreover, his attitude towards colleagues, students,
and challenging problems proves that he is an excellent advisor, not just in the academic area. I
very much appreciate Prof. Dewey DoHoy Jung and Prof. Oleg Zikanov. They introduce Prof.
Youngki Kim to me at the beginning of my Ph.D. Journey.
I am thankful to the committee members, Prof. Zhen Hu, Prof. Dewey DoHoy Jung, Prof.
DooHyun Kim and Dr. Kyoung Hyun Kwak, for taking their time and interest in evaluating my
work and providing constructive feedback. I believe that their insightful comments have led this
dissertation to be more thorough and complete. I would like to acknowledge the assistance from
Dr. Kyoung Hyun Kwak. He constantly provides me with creative ideas throughout my Ph.D.
journey and helps me improve the structure of my papers.
Many thanks to the financial support by the Hyundai-Kia America Technical Center, Inc., also
known as HATCI. I also wish to thank Brian Link and Dr. Jason Hoon Lee who offered me a
chance to work in HATCI during the Summer of 2022. This opportunity allows me to further im-
prove Chapter 5’s work within a human-in-the-loop environment. Thank should also go to all col-
laborators in the CVD team, including Heeseong Kim, Justin Holmer, Yueming (Max) Chen, and
John Harbor. Much help from them has made the exciting experiment in Chapter 5 go smoothly.
Special thanks to Shihong Fan, my friend and colleague during the internship, for setting up the
HIL environment and repetitively test-driving my algorithms.
I am also grateful to all my roommates and friends who accompany my side during the COVID-
19 pandemic. Their encouragement and mutual motivation make the journey much easier than it
should be.
Last but not least, I want to thank my parents, Zhizhou He and Baohong Tai, for their support
and faith in me. Without their unconditional and perennial support, I would not have completed
this journey.
ii
TABLE OF CONTENTS
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
LIST OF ACRONYMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
CHAPTER
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Background of a 48V P0+P4 System . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Energy Management Strategies with Realistic Operational Constraints . . . . . . 5
1.2.1 Optimization-based Approaches . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Learning-based Approaches . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Optimizing Longitudinal Motion in a Car-following Scenario . . . . . . . . . . . 8
1.3.1 Adaptive Cruise Control . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.2 Braking Optimization in Deceleration Events . . . . . . . . . . . . . . . 10
1.4 Organization and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Real-time Torque-split Strategy for P0+P4 Mild Hybrid Vehicles with eAWD Capa-
bility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Vehicle and Powertrain Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.1 Longitudinal Vehicle Dynamics Model . . . . . . . . . . . . . . . . . . 18
2.2.2 Nonlinear Tire Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.3 Braking Force Distribution Constraints . . . . . . . . . . . . . . . . . . 21
2.2.4 Battery and Motor Power . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.5 Engine Fuel Consumption . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 Optimal Torque Split Control Strategy . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.1 Optimal Torque-Split Problem . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.2 Dynamic Programming Results and Analysis . . . . . . . . . . . . . . . 24
2.3.2.1 Energy Consumption and Regeneration . . . . . . . . . . . . . . 24
2.3.2.2 Braking Distribution Analysis . . . . . . . . . . . . . . . . . . . 26
iii
2.4 Real-time Torque-split Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4.1 Approximated Adaptive Equivalent Consumption Strategy for Propulsion 27
2.4.2 Suboptimal Braking Force Distribution Function for Regeneration . . . . 29
2.4.3 Adaptation of Different Driving Scenarios: A Parametric Study . . . . . 30
2.4.4 A Rule-based Real-time Control Algorithm . . . . . . . . . . . . . . . . 31
2.4.5 Performance of Real-time Control Strategies . . . . . . . . . . . . . . . 34
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3 HEV Energy Management Strategy Based on TD3 with Prioritized Exploration and
Experience Replay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2 Deep Reinforcement Learning with Expert Knowledge . . . . . . . . . . . . . . 41
3.2.1 Optimal Torque Split Problem for P0+P4 MHEV . . . . . . . . . . . . . 41
3.2.2 Expert Knowledge from Dynamic Programming . . . . . . . . . . . . . 42
3.2.3 Twin-delayed Deep Deterministic Policy Gradient . . . . . . . . . . . . 42
3.2.4 P4 Motor Power Control with on/off . . . . . . . . . . . . . . . . . . . . 43
3.2.5 Networks Updating Rule . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2.6 Prioritized Experience Replay . . . . . . . . . . . . . . . . . . . . . . . 46
3.2.7 Prioritized Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3 Learning Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.4 Importance Study of Expert Knowledge . . . . . . . . . . . . . . . . . . . . . . 50
3.5 Comparison with Other Learning-based Methods . . . . . . . . . . . . . . . . . 53
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4 Defensive Ecological Adaptive Cruise Control Considering Neighboring Vehicles’
Blind-spot Zones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2 Driving Conditions Based on Blind Spot Zone . . . . . . . . . . . . . . . . . . . 59
4.2.1 Computation of Blind Spot Zone . . . . . . . . . . . . . . . . . . . . . . 59
4.2.2 Blind Spots Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.2.3 N-many Neighboring Vehicles Scenarios . . . . . . . . . . . . . . . . . 63
4.3 MPC Formulation for DEco-ACC . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.3.1 Modeling of Vehicle Longitudinal Dynamics . . . . . . . . . . . . . . . 64
4.3.2 Optimal Control Problem Formulation . . . . . . . . . . . . . . . . . . . 65
4.4 Simulation and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4.2 A Parametric Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.4.3 A Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5 Personalized One-pedal Driving for Electric Vehicles by Learning-based Model Pre-
dictive Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.2 Driving Behavior Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.2.1 Time Headway Constraints . . . . . . . . . . . . . . . . . . . . . . . . . 82
iv
5.2.2 Perceptual Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.3 Driving Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.3.1 Real-world Driving Data . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.3.2 Identification of Headway Constraints . . . . . . . . . . . . . . . . . . . 85
5.3.3 Identification of Perceptual Constraint . . . . . . . . . . . . . . . . . . . 86
5.3.4 Performance of Constraints Fitting . . . . . . . . . . . . . . . . . . . . . 88
5.4 Personalized One-Pedal-Driving Algorithm . . . . . . . . . . . . . . . . . . . . 88
5.4.1 Vehicle Longitudinal Dynamics . . . . . . . . . . . . . . . . . . . . . . 88
5.4.2 MPC Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.5 MPC Weights Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.5.1 Optimal Weight Learning . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.5.2 Prediction Method Selection . . . . . . . . . . . . . . . . . . . . . . . . 90
5.6 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.6.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.6.2 Desired Relative Distance-based Personalized Braking . . . . . . . . . . 92
5.6.3 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.7 Experimental Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.7.1 Driver Simulator Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.7.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.1 Summary of Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.2 Possible Future Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.2.1 MHEV Power-split Combined with Trip Information . . . . . . . . . . . 103
6.2.2 Thermal System Integrated Control . . . . . . . . . . . . . . . . . . . . 104
6.2.3 Impact of Different Risk Penalty Functions on DEco-ACC Performance . 104
6.2.4 Enable DEco-ACC to Learn from Human . . . . . . . . . . . . . . . . . 104
6.2.5 POPD Dealing with Traffic Signals and Stop Signs . . . . . . . . . . . . 105
APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
v
LIST OF FIGURES
FIGURE
1.1 CO2 emission from the year 2010 to 2050. (Projection) [1] . . . . . . . . . . . . . . 1
1.2 Projection of CO2 emission in the U.S. area: by sector and by fuel. [1] . . . . . . . . 2
1.3 Target CO2 emission of each type of vehicle from 2022 to 2026. [2] . . . . . . . . . . 2
1.4 MHEV powertrain architecture: electric machine locations. [3] . . . . . . . . . . . . 4
1.5 Dissertation outline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
vi
2.14 Comparison of operation points distribution under the WLTC driving schedule from
the DP results: (a) the engine, (b) the P0 motor, and (c) the P4 motor, from the pro-
posed strategy results: (d) the engine (e) the P0 motor and (f) the P4 motor and from
the rule-based strategy results: (g) the engine (h) the P0 motor and (i) the P4 motor.
The size of the bubble indicates frequency. . . . . . . . . . . . . . . . . . . . . . . . 38
3.1 The proposed power-split strategy for the P0+P4 MHEV: structure of expert TD3 with
prioritized experience replay and prioritized exploration. . . . . . . . . . . . . . . . . 44
3.2 Combined control of motor activation and motor power: the relationship between
motor normalized power and actor network output. . . . . . . . . . . . . . . . . . . . 46
3.3 (a): Training process of three TD3-PEER agents with different initial random seed.
(b): The L of both critic networks of a selected agent. . . . . . . . . . . . . . . . . . . 50
3.4 SOC trajectory of the DP and the proposed strategy results under the five driving
cycles: (a) the WLTC, (b) the UDDS, (c) the HWFET, (d) the LA92 and (e) the US06 . 51
3.5 Greedy-run of both TD3-PEER agents and the DP results over the WLTC cycle. (a):
The engine torque over the time. (b): the P0 motor torque over the time. (c): the P4
motor torque over the time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.6 SOC trajectories of DP and several learning-based methods results under the five driv-
ing cycles: (a) the WLTC, (b) the UDDS, (c) the HWFET, (d) the LA92 and (e) the
US06 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.7 Greedy run of learning-based methods and the DP results over the WLTC cycle. (a):
The engine torque over the time. (b): The P0 motor torque over the time. (c): The P4
motor torque over the time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.1 An example diagram of blind spot zones of a sedan in orange color; visible region by
head tilt in yellow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 A concept of car-following in consideration of the BSZs of neighboring vehicles . . . 61
4.3 Graphical demonstration of the constraints to avoid the BSZ of the neighboring vehicle. 62
4.4 The average occurrence probability of NVs scenarios using 2403 vehicles from
NGSIM data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.5 The proposed algorithm that determines when to activate penalty of blind spots. . . . 66
4.6 Concept diagram of the penalty function for the normalized BSZ, which will be used
to formulate the slack variable when the ego vehicle enters the blind spot. . . . . . . . 67
4.7 Car-following simulation setup for a 2-NVs scenario. . . . . . . . . . . . . . . . . . 68
4.8 Parameter study results for 3000 cases. . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.9 A histogram shows 100-case fuel consumption and dwelling time of DEco-ACC and
Eco-ACC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.10 Comparison of trajectories with DEco-ACC and eco-ACC: (a) displacement of each
vehicle, relative to the preceding vehicle, (b) velocity, acceleration, and jerk. . . . . . 75
4.11 Acceleration distribution for different driving cycles. (a): HWFET. (b): WLTC. (c):
US06. (d): Preceding vehicle. (e): eco-ACC. (f): DEco-ACC. . . . . . . . . . . . . . 77
4.12 Jerk distribution for different driving cycles. (a): HWFET. (b): WLTC. (c): US06.
(d): Preceding vehicle. (e): eco-ACC. (f): DEco-ACC. . . . . . . . . . . . . . . . . . 78
vii
5.1 Personalized One-pedal driving: algorithm generate human-like deceleration before
the driver takes any action. The driver only needs to control the acceleration pedal
most of the time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.2 Wiedemann’s car following model describes the relationship between the relative dis-
tance and the relative velocity. SDV, OPDV, and SDX represent brake threshold, ac-
celeration pedal threshold, and max follow distance threshold, respectively. . . . . . . 83
5.3 Data of four selected drivers, used for identifying time headway constraints (a) and
for identifying perceptual constraints (b). . . . . . . . . . . . . . . . . . . . . . . . . 84
5.4 Headway constraints and perceptual constraints fitting of all drivers’ data: (a) Min-
imum time headway, (b) Maximum time headway, (c) Minimum distance headway
and (d) Maximum distance headway. . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.5 A fitted constraint function for a selected driver. . . . . . . . . . . . . . . . . . . . . 86
5.6 Headway constraints and perceptual constraints fitting of all drivers’ data: (a) Slope
of the perceptual constraint and (b) Bias of the perceptual constraint. . . . . . . . . . . 87
5.7 Statistics of all driver’s constraints fitting. . . . . . . . . . . . . . . . . . . . . . . . . 87
5.8 (a) The comparison of L∗ between CA and perfect information prediction. L∗ of CA
is normalized base on L∗ of the perfect information prediction method.(b) Averaged
L∗
L
of 50 driver. N ∈ [7, 9] results highest performance. . . . . . . . . . . . . . . . . . 91
5.9 Histogram comparison between POPD with DRD-PB: the L of POPD is normalized
based on DRD-PB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.10 Simulated time series comparison between POPD and DRD-PB method of a selected
driver. (a): ego vehicle velocity over the time. (b): ego vehicle acceleration over
the time. (c): relative distance between ego vehicle and the preceding vehicle over
the time. (d): relative velocity between ego vehicle and the preceding vehicle. (e):
acceleration pedal position and brake pedal actuation signal. . . . . . . . . . . . . . . 94
5.11 Probability distribution comparison between DRD-PB and POPD to the human driver:
the probability density function shows the brake action generated from POPD is more
similar to humans than the DRD-PB. The mean and standard deviation are listed as
driver #3 in Table 5.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.12 Human-in-the-loop co-simulation environment: (a) driving route from Ann Arbor
area, (b) simulator setup (c) simulator interface . . . . . . . . . . . . . . . . . . . . . 97
5.13 Human-in-loop experimental results: (a) the ego vehicle velocity compared to the
preceding vehicle, (b) desired and actual acceleration, (c) relative distance between
the ego vehicle and the preceding vehicle and constraints (d) relative velocity between
the ego vehicle and the preceding vehicle, (e) brake pedal position from POPD, brake
pedal position from human driver, algorithm activation indicator (Ipb). . . . . . . . . 99
A.1 A concept of car-following in consideration of the BSZs of four neighboring vehicles. 106
A.2 Performance comparison of the Deco-ACC in 10 real driving scenarios including four
neighboring vehicles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
A.3 Comparison of trajectories with DEco-ACC and Eco-ACC at lower speed operation
when there are four neighboring vehicles: (a) displacement of each vehicle, relative
to the preceding vehicle, (b) velocity, acceleration, and jerk . . . . . . . . . . . . . . . 110
viii
LIST OF TABLES
TABLE
A.1 Time span and average preceding vehicle velocity of each car following scenario with
four neighboring vehicles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
ix
LIST OF ACRONYMS
AWD All-Wheel-Drive
CA Constant Acceleration
CS Constraints Satisfaction
DP Dynamic Programming
x
ECU Electronic Control Unit
FE Fuel Economy
FWD Front-Wheel-Drive
HIL Human-in-the-Loop
NV Neighboring Vehicle
OPD One-Pedal-Driving
PE Prioritized Exploration
PI Proportional Integration
RWD Rear-Wheel-Drive
xi
SOC State-of-Charge
TD Temporal Difference
TPD Two-Pedal-Driving
V2I Vehicle-to-Infrastructure
V2V Vehicle-to-Vehicle
xii
ABSTRACT
Due to the increasing trend of greenhouse gas emissions, the United States Environmental Pro-
tection Agency (EPA) has started to publish strict regulations regarding emissions for different
types of vehicles. Battery electric vehicles (BEVs) have drawn much attention in recent years
because they potentially eliminate all tailpipe emissions. However, due to charging speed and ca-
pacity limitations on battery, currently, EV users are facing the problem of range anxiety and the
lack of charging stations. Hybrid electric vehicles (HEVs), which possess the advantages of both
conventional vehicles and BEVs, appear to be a viable solution to cope with such strict emission
regulations while mitigating range anxiety. Among all types of hybrid electric powertrain systems,
a P0+P4 system possesses distinct advantages: two electric motors located on the front and rear
axles allow brake energy to be recovered from both axles. Moreover, the dual motor configuration
enables the driver to switch among front-drive, rear-drive and all-wheel-drive modes. Particularly,
a 48V P0+P4 HEV requires less expensive wiring and electric shock protection and hence it is
considered to be the most cost-effective HEV for reducing GHG emissions.
This dissertation focus on improving the energy efficiency, ride comfort, and safety of a 48V
P0+P4 MHEV. To achieve these goals, this dissertation proposes a hierarchical control design
among domains of power-split and vehicle longitudinal motion of 48V P0+P4 MHEV. In the do-
main of power-split, two real-time implementable controllers are proposed: (1) the optimization-
based controller and (2) the learning-based controller. In the optimization-based control design, the
approximated adaptive equivalent consumption minimization strategy (AA-ECMS) with a subopti-
mal braking distribution derived from dynamic programming (DP) analysis is proposed to capture
the global optimal operation trends of the P0 motor operation, front/rear tire force distribution. In
the learning based control design, twin delayed deep deterministic policy gradient with prioritized
exploration and experience replay (TD3+PEER), a novel prioritized exploration approach, is pro-
posed to encourage the deep reinforcement learning (DRL) agent to explore states with complex
dynamics. Both proposed power-split controllers achieve better fuel economy during the test trips
compared to state-of-art rule-based and learning-based controllers.
In vehicle longitudinal motion control design, two controllers have been developed using model
predictive control (MPC): (1) the defensive ecological adaptive cruise control (DEco-ACC) and (2)
the personalized one-pedal-driving (POPD). The DEco-ACC is a novel car-following algorithm
xiii
that balances fuel economy, ride comfort, and avoidance of blind spots from neighboring vehicles.
In DEco-ACC, a novel continuous and differentiable penalty function is proposed to describe the
projection of several neighboring vehicles’ blind spots to the ego vehicle’s traffic lane. The pro-
posed MPC-based controller considers this blind spot penalty function as a soft constraint within
its prediction horizon; and is able to make its own decision to either yield, pass, or stay within the
blind spots based on the MPC’s cost function and the traffic scenario. The simulation results show
that with two neighboring vehicles present simultaneously, the defensive ecological adaptive cruise
control (DEco-ACC) reduces 29.5% of dwelling time in the blind spot and only scarifies 0.4% of
fuel consumption. The POPD is a novel personalized one-pedal driving method that can learn the
individual driver’s preference during everyday driving. In POPD, two types of MPC constraints
that represent distinct driver’s behavior are identified by analyzing 450 real-world drivers’ data.
And then, the POPD algorithm is validated in both the simulation environment and the human-in-
the-loop (HIL) traffic simulator. The experiment shows that the brake pedal usage is reduced from
31.3% to 5.25% for human driver A and from 16.7 to 3.56% for human driver B.
In summary, a hierarchical control design approach between power-split and longitudinal mo-
tion improves the energy efficiency, ride comfort, and safety of a P0+P4 MHEV. Two power-split
algorithms (optimization-based and learning-based) improve energy efficiency. The DEco-ACC
at automated driving scenario ensures safety while neighboring vehicles exist without sacrificing
energy efficiency and comfort. The POPD in human driving scenarios enhances comfort by gener-
ating desired braking profiles for a target driver.
xiv
CHAPTER 1
Introduction
Figure 1.1: CO2 emission from the year 2010 to 2050. (Projection) [1]
EIA’s International Energy Outlook, the projection of CO2 emission in 2050 will be 42.5 billion
metric tons worldwide, which is 21% higher than it was in 2021. In addition, the Annual Energy
Outlook 2022 (AEO2022) has reported that transportation-related CO2 emission has dominated the
total CO2 in the U.S. Area since 2016 due to the increasing demand for transportation. Although
1
coal consumption has declined over the projection period, the increasing CO2 from natural gas and
petroleum still intensifies the greenhouse effect.
Figure 1.2: Projection of CO2 emission in the U.S. area: by sector and by fuel. [1]
In 2017, the United States Environmental Protection Agency (EPA) published a strict regulation
regarding greenhouse gas emissions and corporate average fuel economy (CAFE) standards for
2017 and later model year light-duty vehicles [4]. In December 2021, EPA revised the greenhouse
gas emission standard again [2] for passenger cars and light trucks for the model year 2023 to 2026.
The updated standard requires that the CO2 emitted per mile of each type of vehicle decreases at
least 5% each year from 2023 to 2026, and the total changes of emission per mile of each type of
vehicle are over 29% from 2023 to 2026.
Figure 1.3: Target CO2 emission of each type of vehicle from 2022 to 2026. [2]
2
becomes more difficult. Battery electric vehicles (BEVs) eliminate all tailpipe emissions but face
major challenges on the battery front, such as charging time and range anxiety in long hauls [5].
Thus, many studies have explored hybrid electric vehicles (HEVs) from various perspectives, such
as the system architecture, the energy management strategy, and the potential fuel economy (FE)
gain. Compared to a conventional vehicle with a standalone internal combustion engine (ICE), it is
known that a well-designed HEV with a decent control strategy can bring a significant fuel-saving
benefit [6].
Among various HEVs, 48 V mild hybrid electric vehicles (MHEVs) have been drawing atten-
tion because of their potential to enhance the advantages of hybridization without compromising
electric power and performance. The increased voltage system from the conventional 12 V still
avoids the additional cost of the expensive wiring and electric shock protection mandated for a
higher-voltage arrangement [7, 8]. Moreover, it offers a platform for more capable electric ma-
chines that may empower an extended stop/start function on top of boosted torque assist and re-
generation [5].
To stretch such benefits of the 48 V system, hybrid configurations with more than one electric
machine have also started to be investigated actively in recent years [9–11]. Most of the 48 V
MHEVs on the market are under P0 architectures, which minimizes the necessity of powertrain
design modification compared to the P1/P2/P3 hybrid variations. Although a P0 architecture may
be the most attainable configuration in terms of cost [10], it inherits the limitations of a belt-driven
system and often excludes the usage of pure electric driving mode. On the other hand, to cope with
the recent market where the sales share of sport utility vehicles (SUVs) has substantially increased,
the additional P4 hybrid architecture is being spotlighted. A bigger electric motor integrated on
the rear axle of a vehicle enables electric all-wheel-drive (eAWD) capability as well as much more
aggressive regenerative braking.
To promote the utilization of the hybrid system and eAWD capability even further, a P0+P4
48 V MHEV system with a dual-motor configuration, such as a less-expensive P0 motor combined
with a P4 module, can be considered [11]. The P0+P4 system has the following advantages: (i)
regenerative braking at both axles allows for maximized energy recuperation; and (ii) additional P4
motor allows for eAWD capability that caters to the mass-market demands. Unlike the other hybrid
powertrain systems with a single motor, the P0+P4 MHEV has power sources on both the front
and the rear axles: an engine and a P0 motor on the front axle and a P4 motor on the rear axle. This
inherently allows the P0+P4 MHEV to switch among front-wheel-drive (FWD), rear-wheel-drive
(RWD), and eAWD during vehicle operation.
This dissertation aims at improving the energy efficiency, safety and comfort of 48V P0+P4
MHEV. To address multiple goals at the same time, control optimization should be done in both
vehicle’s power-split and vehicle longitudinal motion perspectives. A prior art [12] has proved that
3
P3
P0 P1
ICE TRN
P2
P4
co-optimization between vehicle velocity and hybrid powertrain components introduces additional
control difficulties, which makes the co-optimization method hard to be real-time implementable.
Furthermore, the results from [12] show that the co-optimization approach only achieves negli-
gible improvements in energy efficiency when a certain level of passenger comfort is required.
Therefore, this dissertation develops a hierarchical control approach to optimizing the vehicle’s
power-split and longitudinal vehicle motion in a sequential manner.
Optimization-based and learning-based strategies for the considered unique P0+P4 HEV system
are proposed for the vehicle’s power-split to seek energy-optimal operations. For velocity control,
this dissertation further exploits the potential of Advanced Driver Assistance Systems (ADAS) and
vehicle connectivity technologies, proposing defensive ecological adaptive cruise control (DEco-
ACC) and personalized one-pedal-driving (POPD). The DEco-ACC further improves the existing
Eco-ACC algorithms, allowing the ego vehicle to avoid staying in neighboring vehicles’ blind spot
zone during car-following scenarios. The POPD is able to perform personalized braking by learn-
ing the current driver’s preferred driving style. The DEco-ACC is mainly designed for highway
operation, and the POPD is mainly developed for urban/highway mixed trips. A schematic of the
proposed hierarchical control approach is shown in Fig. 1.5.
4
Longitudinal Motion Control Highway
DEco-ACC
solution. Ch4
Urban/highway
48V P0+P4 MHEV POPD
solution. Ch5
Hierarchical Control
Optimization-based
AA-ECMS
strategy. Ch.2
Power Learning-based
Split TD3+PEER
strategy. Ch3
5
hybrid powertrain systems in the literature have not considered important dynamics, including the
longitudinal load transfer, the nonlinear tire effects, and realistic constraints on the regeneration
of a P4 motor by braking force distribution. Moreover, detailed analysis and control design for
effective real-time torque-split for P0+P4 MHEVs has not yet been rigorously studied.
Despite the P0+P4 MHEVs’ many advantages, the overall energy performance heavily relies
on the energy management strategy (EMS) that coordinates between the engine and motors. Previ-
ously, rule-based methods were widely applied to the torque split of HEVs for its simple structure
and real-time implementation [15]. However, the preset rule can only achieve a limited level of
optimality [16]. Moreover, the performance is also heavily influenced by human calibration, and
driving conditions [17, 18]. The authors in [14] have reported that a rule-based control method
calibrated by human and particle swarm optimization achieves drastically different powertrain op-
erating behaviors.
An optimization-based method can be considered as a solution to eliminating human influence
and further increasing optimality. In [19], a cost function of equivalence fuel that consists of instan-
taneous fuel rate and battery power is designed. By minimizing the equivalence fuel at every step
of control with model predictive control (MPC), the real-time energy consumption performance
can be further increased. However, the method of MPC requires a control-oriented model known
ahead for prediction, and the problem complexity also influences the computation speed. When it
comes to P0+P4 MHEVs, the additional control variable brings additional computation expenses,
which makes the implementation of the MPC-based method a challenging problem in real-time.
To the author’s best knowledge, there is no optimization-based power-split strategy available for
48 V P0+P4 MHEV in the existing literature. Hence, this dissertation will fill the research gap
by developing an optimization-based power-split strategy for the enhance performance of a 48 V
P0+P4 MHEV.
6
a rule-based strategy by at least 29% for a parallel HEV. It is worth mentioning that Q-learning
was initially designed for problems with a finite number of discretized actions, such as turning left
or turning right [23]. However, HEV power management problems are usually formulated with
continuous actions, and hence the performance of Q-learning methods is inevitably limited by the
level of discretization.
Many studies have found an “actor-critic” structure to be a potential solution to address con-
tinuous action space problems. In an “actor-critic” structure, the actor network is responsible for
generating the control action as its output based on the defined state as input. The output of a
neural network is inherently continuous, eliminating the error caused by discrete action. The critic
network predicts the Q value, which estimates the long-term reward of each transition that the actor
performed. The training process is still done through backpropagation with the temporal difference
error (TD error) between the current critic prediction and a better TD target prediction for the same
transition. The deep deterministic policy gradient (DDPG) is a state-of-the-art technique of actor-
critic structure [28]. By converting Q-learning to DDPG, the studies in [29–31] observed different
extents of fuel economy improvement. However, as a deterministic policy, the DDPG considers
the optimal action at every step, which results in the overestimation of certain actions. This over-
estimation can easily distract the control decision with even random noise. Therefore, to prevent
DDPG from exploiting certain regions in the action space, two variants of DDPG were proposed
almost simultaneously: the soft actor-critic (SAC) [32] and the twin delayed DDPG (TD3) [33].
The SAC introduces an entropy term into the cost function of an actor network, which encourages
the action selected to be as sparse as possible compared with the previous decision. Thanks to the
sparse action, the training time in SAC is significantly reduced compared with DDPG and DQN.
For example, in [34], the authors adopted SAC for power management of a hybrid electric bus with
battery thermal and health constraints and have shown that their method achieves a 96.3% training
time reduction compared with DQN. On the other hand, TD3 resolved overestimation issues by
making the following three modifications to the DDPG [33]:
• Introduce additional critic network, only use the lowest Q prediction to generate TD-error.
The first bullet resolves the overestimation of certain transitions. The second and third bullet
ensures the stability of the training process. In [35], TD3 was adopted for power management
of a single-motor HEV, showing that 2% further fuel-consumption reduction could be achieved
compared with DDPG. The authors in [36] showed that SAC and TD3 can outperform each other
on different tasks, and both perform better than DDPG in most scenarios.
7
Aside from the algorithm architecture, the method called “experience replay” plays an essential
role in improving the convergence speed and convergence performance of the RL training process.
In [36], the authors showed that the same TD3 algorithm with hindsight experience replay (HER)
or prioritized experience replay (PER) can improve the fuel economy by 3.2% and 2.1%, respec-
tively, compared with random experience replay. Although both HER and PER can enhance the
utilization of existing samples in the experience buffer, they do not encourage agents to explore
complex dynamics regions in the state space. Without sufficient experience, the critic may not be
able to predict the correct value in those regions. In addition, the agent may miss a potentially
better solution from those regions. It is found that the current literature lacks a novel prioritized
exploration technique that can encourage the agent to actively explore states in which Q values are
sensitive to action selection.
8
accuracy by using data from vehicle connectivity such as vehicle-to-vehicle (V2V) and vehicle-to-
infrastructure (V2I) [40] [41]. This availability of future information and the necessity of handling
safety constraints and multi-objective cost functions make model predictive control (MPC) one
of the notable trends toward eco-driving. For instance, the authors in [42] propose an adaptive
cruise controller that functions by setting control decisions as multi-stage MPC constraints. This
controller can handle both cruise control and adaptive cruise control scenarios. The work in [43]
presents an Eco-ACC system that takes advantage of radar and traffic light-to-vehicle communi-
cations to predict the future trajectory of the preceding vehicle, leading to a 17% improvement
compared to a traditional ACC. In [39], a nonlinear MPC-based ACC strategy is proposed for
energy-optimal operation of electric vehicles. Safety and comfort requirements are implemented
as state and input constraints and strictly enforced. In [44], an MPC-based ACC controller using
a control barrier function is proposed for improving ride comfort and safety for an autonomous
vehicle.
Although the aforementioned Eco-ACC algorithms have demonstrated their safety and energy
efficiency performance, maintaining a safe/comfortable distance from a preceding vehicle is not the
only factor that drivers consider during driving. Vehicles traveling in adjacent lanes also influence
the driving strategy of the ego vehicle. For example, a neighboring car cannot be observed by
the ego vehicle’s driver if this vehicle is located in the blind spot zone of the ego vehicle. The
blind spot zone (BSZ) is the region where an object is completely invisible to the driver without
sufficiently tilting his or her head. Making a lane shift without checking for the existence of
another car in the blind spot can be dangerous. Thus, to prevent collisions during lane changes,
blind spot detection (BSD) and lane-change alert systems have been developed and equipped in
modern vehicles [45]. These systems monitor the unnoticed vehicles in BSZs around the vicinity
of the ego vehicle and warn the driver.
Even though most modern vehicles are equipped with a lane departure warning system or a
BSD system, many do not have such advanced driver-assistant systems, particularly older vehicles.
Moreover, the BSD system does not provide any information on whether the ego vehicle is located
inside the BSZs of neighboring vehicles, which may not have BSD systems. Once the ego vehicle
enters the blind spot of the neighboring car, there exists a high risk that the neighboring car does
not have a BSD system and will make a lane shift into the ego vehicle. An experienced driver
usually tries to avoid entering the blind spot or passes the blind spot of the neighboring vehicle
at a faster speed to minimize the risk. For instance, the authors in [46] proposed a collision risk
assessment algorithm based on probabilistic motion prediction of surrounding vehicles. However,
the algorithm does not consider interactions between the ego vehicle and the surrounding vehicles.
In summary, the current literature lacks an adaptive cruise control algorithm that can actively avoid
the blind spots of neighboring vehicles for improved vehicle safety.
9
1.3.2 Braking Optimization in Deceleration Events
Under ADAS concepts, many researchers also focus on optimal regenerative braking for electrified
vehicles, aiming to maximize energy regeneration during the braking process [47–49]. However,
human factors such as ride comfort and sense of control have not been sufficiently considered. For
example, a large braking force is needed to achieve the highest regenerative braking performance,
which often results in discomfort to the human driver. In [47], the average deceleration values from
simulations are ranged from -3.79m/s2 to -7.09m/s2 , which are much higher than the average
deceleration value of -2.1m/s2 that human drivers usually perform [50]. The increment in the
magnitude of deceleration often introduces discomfort to the driver and reduces driving pleasure.
Realizing the importance of the human factor in regenerative braking control, researchers
also have considered the individual driver’s characteristics. In [51], the authors have proposed
a Pontryagin’s maximum principle (PMP)-based energy regeneration algorithm with the individ-
ual driver’s speed preference considered. First, the long-term optimal control is optimized with the
PMP approach. Thereafter, an MPC-based heuristic is proposed to track the PMP solution in real
time. The authors claim that the proposed algorithm achieves 98% of the optimal energy recovery
calculated by dynamic programming. The authors in [51] also point out that, depending on dif-
ferent drivers, there can be a difference of energy consumption as large as 5% between the human
desired operation and the energy-optimal operation. Therefore, it is crucial to take the human factor
into consideration while developing a regenerative braking algorithm. The research of [52] shows
a nonlinear model predictive control (NMPC)-based car-following controller with different phases
introduced. Based on relative distance and relative speed to the preceding vehicle, the ego vehi-
cle operations are classified into “free driving,” “approaching,” or “unconscious following” phases
and assigned with different reference velocity/distance values. The authors claim that after proper
calibration, the trajectory of the ego vehicle becomes smooth and human-like. In [53], the authors
have proposed a deceleration planning algorithm based on the intelligent driver model (IDM) [54],
which consists of parameters reflecting the driver’s personal characteristics. Those parameters are
updated online using the driver’s historical data to improve the prediction precision.
Although the design and implementation of regenerative braking methods considering the hu-
man factor have been systematically explored, most existing research is limited to two-pedal driv-
ing (TPD) vehicles. As one of the recent features of BEVs on the market, one-pedal driving (OPD)
has only a few algorithms considering the human factor. In [55], the tractive torque of OPD is
determined through acceleration pedal position (APS) and current vehicle velocity with rule-based
methods. On the other hand, in [56], similar inputs are used, but the tractive torque is determined
through a lookup table. In [57], a robust controller is designed to ensure OPD safety during a
car-following scenario. However, none of this available research bridges the human factor with the
OPD operation. Still, the on-market OPD vehicles such as Nissan Leaf, BMW i3, and Tesla Model
10
S have already attained their popularity [58] with a relatively simple logic:
• When the acceleration pedal is pressed, the vehicle accelerates, similar to a conventional
vehicle.
• With the acceleration pedal slightly released, the vehicle starts coasting.
• With the acceleration pedal further released, the vehicle performs a regenerative braking
action.
The most noticeable benefit of OPD is that a driver can use the brake pedal less frequently during
city traffic, especially with many stop-and-go events. This feature also ensures that the EV recap-
tures as much energy as possible during braking. For example, the authors in [56] reported that
the OPD algorithm could save up to 2–9% of energy compared to a parallel regeneration algorithm
based on the same driving speed in city and rural driving.
However, to maintain the desired speed during driving, this OPD requires the driver to press
the pedal constantly and carefully to a certain angle with forces applied. The driver also needs to
pay extra attention to a single pedal-degree control. Although most test drivers can adapt to the
new feature quickly [59], the transition from TPD into OPD still confuses drivers. As proposed
in [58, 60, 61] and [57], the inter-vehicle spacing control methods based on OPD reduce a certain
amount of the driver’s effort within a platoon. However, these OPD algorithms do not sufficiently
consider the individual driver’s behavior. As a result, the driver might feel a sense of intrusion
after these algorithms are activated. Hence, the current literature lacks a personalized one-pedal-
driving algorithm that can learn a specific driver’s driving behaviors. Once this algorithm learns
the driver’s behaviors, it will significantly save the driver’s efforts on the brake pedal and increase
ride comfort during braking.
11
dinal load transfer is a considerably complex optimization problem; hence, it is not tractable in
real-time. In Chapter 2, the control trajectories of the DP solutions satisfying global optimality are
analyzed, and a simple and effective torque-split strategy using an adaptive equivalent consump-
tion minimization strategy (A-ECMS) and a suboptimal force distribution strategy is proposed.
Chapter 3 proposes a real-time energy management strategy for P0+P4 MHEV based on TD3.
As prior art [29] has proven benefits of expert knowledge to DRL training speed and converged
returns, our proposed TD3 framework will incorporate the expert experience from DP analysis in
Chapter 2. Moreover, an innovative prioritized exploration technique is proposed to encourage
the TD3 agent actively explore states in which their Q values are action-sensitive. The proposed
framework, named TD3 with prioritized exploration and experience replay (TD3-PEER), allows
the agent to learn a near-optimal control policy compared to existing DRL methods.
Chapter 4 proposes a novel DEco-ACC algorithm using MPC in consideration of neighboring
vehicles’ BSZs to further improve the potential safety of the ego vehicle without significant dete-
rioration of fuel economy and drivability. The optimal cruise control problem can easily include
constraints related to vehicle safety and riding comforts such as minimum and maximum distance
from a preceding vehicle, minimum and maximum acceleration, and speed limits; thus, MPC is
exploited to formulate the DEco-ACC problem. More specifically, the neighboring vehicles’ BSZs
are converted into state constraints, and a continuous and one-time differentiable penalty function
is introduced to penalize the dwelling time in the BSZs of neighboring vehicles. As recent studies
in the literature (e.g., [62, 63]) have shown that V2V technology is matured and capable of provid-
ing precise surrounding vehicle velocity information in actual operation, this study assumes that
the velocity and position information from neighboring vehicles and a preceding vehicle and their
BSZs are attainable. For optimizing and evaluating the performance of the proposed DEco-ACC,
real-world traffic data from Next Generation Simulation (NGSIM) are used to analyze and generate
car-following scenarios during highway driving. Especially, in consideration of the most probable
case that one neighboring vehicle exists at one adjacent lane, a parametric study is conducted to
investigate the impact of the weighting factors on the performance of the DEco-ACC.
Chapter 5 develops a more advanced one-pedal driving system using MPC, personalized one-
pedal driving (POPD), inspired by the optimal regenerative braking of traditional two-pedal-
driving (TPD). Similar to OPD, the POPD allows the driver to control the vehicle’s acceleration
with a single acceleration pedal. The upcoming braking event can automatically be handled by
the POPD when the driver releases the acceleration pedal, thanks to the predictability of MPC
design. To mimic a driver’s braking behavior in the MPC control design, we consider headway
and perceptual constraints; in particular, we have analyzed 450 drivers’ real-world on-road data
to investigate the constraints’ dependence on a driver. In addition, we introduce a learning frame-
work in the POPD where the weights of the MPC cost function is optimized with particle swarm
12
optimization. In addition, to investigate the impact of prediction accuracy on POPD performance,
we have conducted a comparative case study about prediction methods and horizon lengths using
real-world driving data.
The main research contributions of this thesis are summarized as follows:
– [64] He, Y., Kwak, K.H., Kim, Y., Jung, D., Lee, J.H. and Ha, J., 2021. “Real-time
Torque-split Strategy for P0+ P4 Mild Hybrid Vehicles with eAWD Capability,” IEEE
Transactions on Transportation Electrification, 8(1), pp.1401-1413.
• Deep reinforcement learning-based torque split strategy for P0+P4 MHEVs with prioritized
exploration and experience replay
State-of-art TD3 requires a critic network to generate a predicted Q value for state-action
pairs for updating the policy network. However, the critic network may struggle with predict-
ing Q values at certain states when Q values of these states are sensitive to action selection.
To address this issue, this dissertation proposes a prioritized exploration technique that en-
courages the agent to visit action-sensitive states more frequently in the application of HEV
energy management. Based on this expert twin-delayed deep deterministic policy gradient
with prioritized exploration and experience replay (TD3-PEER), a novel energy manage-
ment strategy is proposed for a 48V P0+P4 MHEV. Simulation results demonstrate that,
with expert knowledge considered for all learning-based methods, the proposed TD3-PEER
outperforms other RL-based energy management strategies including DDPG-PER and DQN
by averagely 2.3% and 3.74% over the training and validation cycles, respectively. This
work has been submitted to:
13
– He, Y., and Kim, Y. “Energy Management Strategy for 48V MHEVs Based on Expert
Twin Delayed Deep Deterministic Policy Gradient Algorithm with Prioritized Explo-
ration and Experience Replay,” Submitted to 2023 American Control Conference.
– [65] He, Y., Kim, Y., Lee, D.Y. and Kim, S.H., 2021. Defensive ecological adaptive
cruise control considering neighboring Vehicles’ blind-spot zones. IEEE Access, 9,
pp.152275-152287.
14
relative distance-based personalized braking (DRD-PB) algorithm. Specifically, the human-
in-the-loop results from two drivers show that brake pedal use can be reduced on a specific
route by around 80%. This work has been submitted to:
– He, Y., Kwak, K.H., Kim, Y., Fan, S.. Personalized One-pedal Driving for Electric
Vehicles by Learning-based Model Predictive Control. Submitted to IEEE Transactions
on Systems, Man, and Cybernetics: Systems.
15
CHAPTER 2
2.1 Introduction
As introduced in Chapter 1, the torque split strategy is essential to the fuel consumption of hybrid
electric vehicles, especially dual-motor HEV. The rule-based strategy has the advantage of low
computation cost. However, it suffers from low-level of optimality [16]. This chapter describes
an optimization-based approximated A-ECMS method for P0+P4 MHEV, which improve the fuel
efficiency of this P0+P4 MHEV. At first, a P0+P4 MHEV model is laid out, including realistic
constraints such as nonlinear tire, load transfer effects and braking force distribution constraints.
Then, a dynamic programming analysis is conducted to determine the optimal fuel consumption
and ideal torque split behaviors. Next, a modified logistic function, which captures this ideal torque
split behavior, is combined with A-ECMS to develop a three-power-source torque-split algorithm.
Finally, a comparison was made to the existing rule-based torque split strategy [14].
The main contributions of this chapter are summarized as follows:
• A realistic P0+P4 MHEV model is presented, with the effects of longitudinal load trans-
fer, nonlinear tire effects, and braking force distribution for vehicle safety included in the
optimization problem.
• The fuel economy potential of the 48 V P0+P4 MHEV is investigated, and useful features
for developing a real-time control problem are derived from observations of the DP analysis.
16
Battery
Belt
ICE P4 EM
P0 EM
Diff. Diff.
Clutch
DCT Front Rear
Axle Axle
The rest of this chapter is organized in the following order: The vehicle/powertrain modeling
and the fuel consumption minimization problem using the DP algorithm are presented in Sections
2.2 and 2.3, respectively. The development of a real-time-implementable control method and sim-
ulation results using the proposed method are presented and discussed in Section 2.4. Finally, a
summary are made in Section 2.5.
17
2.2.1 Longitudinal Vehicle Dynamics Model
The vehicle dynamics model is constructed in consideration of the longitudinal transfer of the load
acting on axles. The overall wheel torque, τw , is computed by,
where Jw = 0.51 kg/m2 and M = 1725 kg are the wheel inertia and the vehicle gross mass, respec-
tively; rw = 0.347 m is the effective rolling radius of the wheel. With longitudinal acceleration of
the vehicle, ax , the angular acceleration of tire can be derived as:
ax
ω̇w = , (2.2)
rw
Fr = C0 + C1 v + C2 v 2 + M gsinθ, (2.3)
The coefficients C0 , C1 and C2 were obtained from a vehicle test as 123.88 N, 2.83 N/(m/s) and
0.49 N/(m2 /s2 ), respectively. The θ is the road gradient slope.
When tires slip, the angular speed of a wheel is different from the simple quotient of vehicle
speed and tire radius. The angular speed, ωw , of the front and rear wheels is then calculated by
considering the slip term as follows:
v
ωw,j = (1 + κj ) , (2.4)
rw
where κ is the tire slip and subscript j indicates terms for the front or rear axle/wheel/tire.
The tire slip for each axle is given as a function of the tire force developed on the tires and the
normal load on the specific tire as follows:
Wj
κj = f Fx,j , , (2.5)
2
where the tire force of the tires on a single axle is defined as,
The torque at the front wheels, τw,f ront , can be described as the difference between total wheel
torque, τw , and rear wheels torque, τw,rear , that is,
18
𝑎𝑥
Vehicle mass 𝑀
𝑏 𝑐
ℎ
𝐹𝑥,𝑓𝑟𝑜𝑛𝑡 𝑀𝑔 𝐹𝑥,𝑟𝑒𝑎𝑟
𝑊𝑓𝑟𝑜𝑛𝑡 𝑊𝑟𝑒𝑎𝑟
During acceleration or deceleration, inertia force causes vertical axle load transfers from front
to rear or vice versa. This changing of vertical axle load influences the maximum braking force
that the tires on each axle can handle. The vertical axle load on the front or rear axle is the normal
load on the tires in Eq. (2.5), which is expressed as
cM hM
Wf ront = g− ax , (2.8)
b+c b+c
bM hM
Wrear = g+ ax , (2.9)
b+c b+c
where b, c, and h measured as 1.15 m, 1.52 m and 0.67 m, are the horizontal distances from the
vehicle center of gravity to the front axle and rear axle and the height of the center of gravity,
respectively, as illustrated in Fig. 2.2; g denotes the gravitational acceleration of 9.81 m/s2 .
As for the powertrain, the torque and speed at the front axle are computed as follows:
τt = τe + γp τm1 , (2.10)
ωt = γf d γt ωw,f ront , (2.11)
τprop = ((τt − τgb,loss ) γt ηt − τf,loss ) γf d , (2.12)
where τt is the transmission input torque, τe is the engine torque, τm1 is the P0 motor torque, and
γp is the pulley ratio between the engine and the P0 motor; γf d and γt are the final drive ratio
and transmission ratio of the gear box, respectively. The transmission efficiency, ηt , is assumed
constant for each gear state for simplification. Then, the front wheels torque can be rewritten with
propulsion torque to the front axle, τprop , and the front brake’s friction torque, τf,f ront , as
19
Table 2.1: Relations between magic formula coefficients D and normal load.
The torque losses by the transmission, τgb,loss , and the final drive, τf,loss , are given by:
where Jt,in and Jt,out are the inertia of transmission components and Jf d,in and Jf d,out are the
inertia of final drive components, respectively; ω̇t,in , ω̇f d,in , and ω̇f d,out are the angular acceleration
at input of transmission, input of final drive, and output of final drive, respectively.
Since the P4 motor is mounted on the rear axle, the rear wheel torque, τw,rear , is expressed as
where τm2 and τf,rear are the P4 motor torque and the rear brakes friction torque, respectively. The
gear ratio for the P4 motor drive is denoted by γm2 .
Since only longitudinal dynamics is considered in this study, the force coupling between lon-
gitudinal slip and lateral slip is ignored. Therefore, B, C, and E are dimensionless constants
determined as 0.0735, 1.8704 and 0.686, respectively, using tire data from the commercial soft-
ware CARSIM. The coefficient D is a function of instant normal load on this tire, and the relations
of them are described in Table 2.1.
Figure 2.3 shows the relationship of tire force vs. tire slip generated by the nonlinear tire model
used in this study. In general, the tire slips when torque is applied to the wheel. The generated
20
Figure 2.3: Tire force vs. tire slip modeled with the Magic Formula Tire Model. The tire force is
a function of both the normal load on the tire and the tire slip.
tire force is greater with a higher normal load to the tire. At first, the generated tire force increases
monotonically as slip increases; then, the force declines as the tire keeps slipping. In the figure,
the tire performance reaches its peak value at around 20 % tire slip. After this point, increased
tire slip does not contribute to developing better tire force. Therefore, operating the tire with the
tire slip under 20 % is preferred. Under this condition, the tire force has a monotonic relationship
with the tire slip, allowing for a simple 2-D lookup table. It should be noted that the vehicle lateral
dynamics are not considered, and hence the coupling between longitudinal and lateral tire forces
is ignored.
where the friction utilization at each axle is defined as the ratio of braking force Fx,j to the maxi-
max
mum braking force Fx,j at the axle:
Fx,j
kj = max . (2.19)
Fx,j
The friction utilization and braking intensity is regulated per requirement 3.1.1 of UNECE [71],
which demands the friction utilization of a given axle of between 0.2 and 0.8 and the braking
21
intensity, z, in the following inequality:
with
Fx,total ax
z= =− . (2.21)
Mg g
˙ = − Ib ,
SOC (2.22)
Qb
where the battery capacity, Qb , is 19.4 Ah. The battery current, Ib , is calculated using the open
circuit voltage, Voc , the internal resistance, Rb , and the battery power, Pb , as follows:
p
Voc − Voc2 − 4Rb Pb
Ib = − , (2.23)
2Rb
where the internal resistance, Rb , is set to be 9 mΩ. The Li-ion battery provides the power to the
two motors as well as to the auxiliary load. Therefore, the battery power is expressed as follows:
where Pm1 and Pm2 are the electrical power consumption of the P0 and P4 motors, respectively;
Paux is the auxiliary power consumption. It should be noted that in this study, the battery temper-
ature is assumed to be well regulated around its target value, and hence the temperature effects are
ignored.
The electrical power at each motor is calculated by
Pm1 /m2 = ωm1 /m2 τm1 /m2 + Pm1 /m2 ,loss . (2.25)
where the power loss of each motor, Pm,loss , is a function of the motor speed and torque.
22
motoring curve, which is the zero fueling torque curve. The minimum fuel curve can influence
fuel consumption during the regenerative braking events of an MHEV [72].
N
X −1
min J = ṁf,k ∆t
k=0
where the cost, J , is the total fuel consumption over a driving cycle; the fueling rate, ṁf,k , is a
function of the engine speed, ωe , and the engine torque, τeng ; and x is the SOC of the battery. The
subscripts 0 and f indicate the initial and final values, respectively. The control variables u1 and
u2 represent the P0 motor torque and the P4 motor torque, respectively. The feasible sets of state
and control variables are denoted by X and U , respectively.
A global solution to the problem given in Eq. (2.26) is numerically obtained by using dynamic
programming with the dpm function implemented in the MATLAB environment [73]. It is noted
that the dynamic equations with state and control variables need to be properly discretized with
consideration for the accuracy of the solution and the computation time. The level of discretization
is as follows:
• time: 1 s,
• SOC: 2 ×10−3 ,
23
50 20
Engine Power
Motor Power
40 WLTC
10
30
(kW)
(kW)
0
20
10 -10
0 -20
0 200 400 600 800 1000 1200 1400 1600 1800
(a)
50 20
Engine Power
Motor Power
UDDS
40 10
30
(kW)
(kW)
0
20
10 -10
0 -20
0 200 400 600 800 1000 1200 1400
(b)
50 Engine P0 motor P4 motor 20
Engine Power
Motor Power
40 HWFET
10
30
(kW)
(kW)
0
20
10 -10
0 -20
0 100 200 300 400 500 600 700 800
Time (s)
(c)
Figure 2.4: Power distribution by the DP algorithm under (a) the WLTC (b) the UDDS and (c) the
HWFET.
Three regulatory driving cycles are considered in this study: the World harmonized Light-duty
Test Cycle (WLTC), the Urban Dynamometer Driving Schedule (UDDS), and the Highway Fuel
Economy Test Cycle (HWFET). Figure 2.4 shows the power distribution by the DP algorithm of the
IC engine and the two motors with globally optimized torque-split control under the three driving
cycles. In the figures, the engine provides the most propulsion power, especially when the vehicle
is traveling at higher speeds, such as in the later stage of the WLTC driving cycles in Fig. 2.4 (a) or
the HWFET driving cycle in Fig. 2.4 (c). This preference for using the engine is because the sizes
of the P0 and P4 motors are much smaller compared to the engine in the 48 V MHEV system.
Figures 2.5 (a) and (b) show the results of longitudinal load transfer and the corresponding slip
ratio of front and rear axle transfer under WLTC. Since the variation in slip ratio is relatively small
(-1.69 % to 4.07 % for WLTC, -1.29 % to 3.55 % for UDDS, and -1.43 % to 2.77 % for HWFET),
the common rolling assumption can be applied in the development of torque-split strategies.
24
Figure 2.5: Simulation results of (a) longitudinal load transfer, (b) slip ratio and under the WLTC
(DP).
To compare overall energy distribution over the driving cycles, energy consumption and regen-
eration of the engine and the two motors are compared in Fig. 2.6. The overall energy consumption
also confirms that most energy is consumed by the engine in this 48 V MHEV. For propulsion, it
is clear that the P0 motor is rarely used in all three cycles as it uses less than 1 % of the engine en-
ergy consumption. The P0 motor is mechanically coupled with the engine through the belt-pulley
system which has lower transition efficiency than the geared P4 motor. Therefore, the utilization
of the P4 motor for the torque assist is preferred.
The P4 motor is used for propulsion much more in the WLTC or the UDDS cases than in the
HWFET. Under the WLTC or the UDDS cycles, frequent stop-and-go operations, where the IC
engine efficiency is relatively low, make the use of the P4 motor preferable for assisting vehicle
acceleration. Thus, the P4 motor uses about 8 % and 14 % of the total engine energy consumption
under the WLTC and the UDDS cycles, respectively. In the HWFET case, the vehicle cruises at
higher speeds, where the engine can operate more fuel-efficiently, and the benefits of having P0
and P4 motors are small due to the less frequent opportunity for torque assisting in acceleration
and battery regeneration in deceleration.
Notably, the amount of regeneration energy by the P4 motor is similar to that of the P0 motor
despite its larger size. This is because the braking force is constrained by Eqs. (2.18) and (2.21),
so that the rear brakes are utilized less than the front brakes and the amount of energy regenerated
by the P4 motor is limited. More specifically, the P0 motor captures breaking energy by 14 % and
10 % of the total engine energy consumption under the UDDS and WLTC, respectively. The P4
25
1.1 1.00 1.00 1.00
0.9 WTLC UDDS HWFET
Normalized Energy 0.7
Energy
consumption
0.5 Energy
regeneration
0.3
0.08 0.14
0.1 0.01 0.01 <0.01 0.01
motor captures 17 % and 8 % under the same driving cycles. In the HWFET case, regeneration is
minimal due to the fact that the vehicle is mostly cruising at higher speeds.
Since the P0 motor is small and hardly used for assisting the torque to the front axle, the P0 mo-
tor can be used to provide additional propulsion and regeneration power when the torque demand
is limited to two cases: (1) In the case of the torque demand exceeding the limit of the engine and
the P4 motor, the P0 torque is determined as the torque required to fulfill the demand. (2) In case
of the torque demand being less than the limit of the engine motoring torque, the P0 torque helps
regenerate energy only.
Under these two cases, the P0 motor torque is calculated without including P0 torque control
in the optimization problem, especially for the propulsion. Therefore, a real-time-implementable
optimal torque-split controller of a P0+P4 MHEV at a reduced scale that retains performance close
to the globally optimal solution is feasible. For deceleration, the ratio of brake utilization between
the front and the rear wheels needs to be determined first, before the calculation of P4 motor torque.
Figure 2.7 shows the scatter plots of the ratio of front tire force to the total braking force under the
three driving cycles. The braking force ratio constraints are calculated based on Eqs. (2.18) and
(2.21), and the derivation of the upper and lower limits can be found in [74].
The braking force distribution in the figures shows front-biased braking per the UNECE regu-
lation. The braking force distribution affects the regeneration of the P0 and P4 motors. In a mild
braking condition with deceleration lower than about 0.5 m/s2 , braking is done mostly at the front
axle; however, in a hard braking condition, the rear brake is utilized as much as possible, biasing
the braking force ratio toward the lower limit to maximize the energy recuperation. If the braking
26
Ratio of Tire Force
(Front/Total) WLTC UDDS HWFET
Figure 2.7: DP results of braking distribution under (a) WTLC, (b) UDDS, and (C) HWFET driving
cycle. The size of bubble indicates frequency.
force distribution is predetermined, then the P4 motor torque can be calculated without solving the
optimization problem, as can the torque of the P0 motor. This strategy gives the additional benefit
of reducing the computational load for the real-time implementable torque-split controller.
with (
τ̃e − τe,max if τ̃e > τe
τm1 = . (2.28)
0 otherwise
This simplification reduces the number of control variables, allowing for use of the adaptive
equivalent consumption minimization strategy (A-ECMS) [67], which is widely used for real-time
torque-split control of HEVs because of its performance in terms of good optimality and low
27
CAN/Sensor signal
Signal inside controller
Regeneration Output signal
𝜔𝑚1 , 𝜔𝑚2
𝜔𝑚1
Motor power
distribution function torque
𝜏𝑑𝑚𝑑,𝑡𝑜𝑡𝑎𝑙 < 0
calculation
𝛾𝐵𝐹
Driving cycle
𝜏𝑑𝑚𝑑,𝑡𝑜𝑡𝑎𝑙 if 𝜏𝑒 < 𝜏𝑒,𝑚𝑖𝑛
&
𝜏𝑚1= 𝜏𝑒 − 𝜏𝑒,𝑚𝑖𝑛
Battery
Vehicle/powertrain
𝜏𝑚2
dynamics
Approximately solve optimal torque split using A-ECMS (section IV-A)
𝜔𝑚1, 𝜔𝑚2
Motor power
𝑆𝑂𝐶
∗ ∗
28
calculation
𝜏𝑑𝑚𝑑,𝑡𝑜𝑡𝑎𝑙 𝜏𝑚1 , 𝜏𝑚2
Propulsion
∗
𝜆 𝑡 𝑃𝑚1 , 𝑃𝑚2
𝜏𝑚2 = argmin 𝑚ሶ 𝑓 𝑡 + 𝑃 𝑡
Fuel consumption
𝐿𝐻𝑉 𝑏
∗
Output
∗
calculation
𝜏𝑑𝑚𝑑,𝑡𝑜𝑡𝑎𝑙 > 0 𝜏ǁ 𝑒 = 𝜏𝑑𝑚𝑑,𝑡𝑜𝑡𝑎𝑙 − 𝜏𝑚2 𝜏𝑒∗ 𝑚ሶ 𝑓
∗
𝜔𝑚2 If 𝜏ǁ 𝑒∗ > 𝜏𝑒,𝑚𝑎𝑥 , then 𝜏𝑒∗ = 𝜏𝑒,𝑚𝑎𝑥 and 𝜏𝑚1 = 𝜏ǁ 𝑒∗ − 𝜏𝑒,𝑚𝑎𝑥
∗ ∗ ∗
Otherwise, 𝜏𝑒 = 𝜏ǁ 𝑒 and 𝜏𝑚1 = 0
𝜔𝑒
Figure 2.8: The proposed torque-split strategy for the P0+P4 MHEV: Approximated A-ECMS and suboptimal brake force distribution
function are used for propulsion and braking, respectively.
computational expense. The formulation of the approximated A-ECMS is given as follows:
λk
min ṁf,k + Pb,k
u2,k LHV
s.t. xk+1 = f (xk , uk ) (2.29)
xk = SOCk , uk = τm2,k
xk ∈ X , uk ∈ U, ϕ(xk , uk ) ∈ Ω
where λ and LHV are equivalence factor and the lower heating value of the fuel. The state and
control constraints are denoted by X and U , respectively. The nonlinear function ϕ(xk , uk ) repre-
sents the vehicle and powertrain dynamics, accounting for the constraints on the IC engine, the P0
motor, and the battery, which are denoted by Ω.
In this formulation, Pm1 provides torque assist only if the front axle power demand exceeds the
engine capability. Thus, Pm1 is the dependent variable, and the only independent variable is Pm2 .
Fx,f ront
γBF =
Fx,total
A1 c + h(−ax /g) (2.30)
≈ + A4 +
1 + exp(−A2 (−ax − A3 )) b {z
+c }
| {z } |
scaled and shifted logistics function lower bound of γBF
where A1 , A2 , A3 , and A4 are the coefficients determined by the least squares method using DP
results of three driving cycles. It is noted that these coefficients may need to be updated in different
driving conditions, which can be one of the directions for future work. This optimized regression
model is compared with the data from the DP results in Fig. 2.9. With this model, the negative
torque demand can be simply split and provided by the P0 and the P4 motors, and hence the
computational expense can be dramatically reduced.
29
Feasible zone
Infeasible zone
Figure 2.9: Suboptimal braking force ratio function compared to γBF distribution of all three cycles
combined. The size of the bubble indicates frequency.
In order to maximize braking energy recuperation, the P4 motor is used first, and then the
rear friction brake is used when the torque demand at the rear axle exceeds the P4 motor torque
capacity. At the front axle, the P0 motor is used for regeneration when the demanded braking
torque is greater than the engine motoring torque.
These three approaches are not stand alone from each other, and their combinations have also
been proposed. Especially, experimental results from [69] show that a parallel HEV could achieve
30
near-optimal fuel economy even with a sole adaptation of the approach 3).
As suggested by [68] and [69], in considering the memory of car ECU and computation time,
adaptation approach 3) is applied in this study. To guarantee the robustness of the time varying
equivalence factor λk , the relationship between λk and the current SOC, SOCk , is given by:
where the reference SOC, SOCref , is set to be 60 %. The parameters Kp and λ0 are constant values
to be determined to maximize the overall fuel economy of the P0+P4 MHEV in consideration of
charge sustainability. Equation (31) forces the IC engine to operate by increasing the λk when the
SOC is low, and vice versa.
To properly select Kp and λ0 for the proposed strategy, 1736 cases of parameters for Kp and
λ0 were evaluated over the WLTC, the UDDS, and the HWFET cycles in considering fuel con-
sumption and the terminal SOC deviation to the initial SOC. The electric power associated with
the terminal SOC deviation for each case is converted into equivalent fuel and added to the total
fuel consumption, noted as corrected fuel. Each driving cycle is run with ECMS algorithm [76]
and several different equivalence factors. With enough runs on the same cycle, the change of fuel
∆(f uel)
regarding SOC deviation, ∆(SOC) , is obtained as a near-constant value and used for fuel consump-
tion correction. At current Architecture, with Kp and λ0 being as 8500 and 10000, the proposed
strategy can achieve the best overall fuel economy among all cases with a reasonable terminal SOC
deviation.
31
Corrected fuel consumption SOC deviation
WLTC WLTC
(a) (b)
UDDS UDDS
(c) (d)
HWFET HWFET
(e) (f)
Figure 2.10: Parameter study of Kp and λ0 for the WLTC, the UDDS and the HWFET driving cy-
cles, left column (a, c, e) represents corrected fuel, right column (b, d, f) represents SOC deviation.
Highlighted point is the selected to conduct following research.
32
Zone 5
Zone 1
Zone 4 Zone 2
Zone 3
Zone 6
Figure 2.11: Demand power zone on engine brake specific fuel consumption (BSFC) map. The
green curve denotes engine optimal power level. the red curve denotes EV mode on/off power.
The yellow curve denotes positive/negative power boundary.
The engine and two motors operate based on which zone the current demand power falls into:
• Zone 1: when the battery SOC is low, use the engine solely; when the battery SOC is high,
operate the engine at optimal power level and use the P4 motor to satisfy the total demand
power.
• Zone 2: when the battery SOC is high, use the engine solely; when the battery SOC is low,
operate the engine at optimal power level to charge the battery with excess power.
• Zone 3: when the battery SOC is high, use the P4 motor solely; when the battery SOC is
low, operate the engine at optimal power level to charge the battery with excess power.
• Zone 4: when the battery SOC is high, use the P4 motor solely; when the battery SOC is
low, use the engine solely.
• Zone 5: operates the engine at the maximum power level, and the P0 and P4 motors assist
the engine.
• Zone 6: the engine provides motoring power; the P4 and P0 motors recapture braking energy
based on the method described in Fig. 2.9.
The thresholds for the EV mode and low/high states of the battery SOC are carefully calibrated
for the best performance using the WLTC, the UDDS, and the HWFET driving cycles.
33
2.4.5 Performance of Real-time Control Strategies
The proposed torque-split algorithm is implemented in the P0+P4 MHEV model and evaluated
under various driving cycles by comparing the results with the DP and the rule-based strategy.
The WLTC, the UDDS, and the HWFET cycles are used to determine the optimal parameters of
the proposed strategy shown in Fig. 2.8, and the LA92 and the US06 are used for performance
validation.2
Table 2.2: Fuel consumption comparison between DP, proposed and rule-based algorithm. Cor-
rected fuel consumption is compared with DP.
Simulation results from the dynamic programming, the proposed strategy and the rule-based
strategy for all five drive cycles are summarized in Table 2.2. As shown in the table, the termi-
nal SOC values from the proposed strategy and the rule-based strategy deviate from the initial
SOC. Therefore, for a fair comparison, the total fuel consumption is corrected by considering the
mismatch of terminal SOC. Each driving cycle is run again by the proposed strategy with several
different equivalence factors. With enough runs on the same cycle, the change of fuel regarding
∆(f uel)
SOC deviation, ∆(SOC) , is obtained and used for fuel consumption correction.
Compared to the DP results, the corrected fuel consumption values from the proposed strategy
are 0.9 %, 1.4 %, 0.51 %, 3.05 % and 1.63 % higher under the WLTC, the UDDS, the HWFET,
the LA92 and the US065 driving cycles, respectively. The rule-based strategy underperforms com-
pared to the proposed strategy; the corrected fuel consumption values are 8.67 %, 5.23 %, 5.15 %,
6.45 % and 7.46 % higher than the DP results under the same driving cycles. On average, the
fuel consumption performance is 93.6 % of global optimality in the training cycles and 94.5 %
of global optimality in the validation cycles with the rule-based strategy. The performance of the
2
For a detailed discussion, only the results from the WLTC are presented in the paper; however, the fuel economy
results over all driving cycles are reported in Table 2.2.
34
150
100
50
0
1550 1560 1570 1580 1590 1600
20
0
-20
1550 1560 1570 1580 1590 1600
35
0
-20
-40
1550 1560 1570 1580 1590 1600
Figure 2.12: Torque trajectory of the DP, the proposed strategy and rule-based (RB) results under the WLTC cycle for (a) Engine, (b) P0
motor, and (c) P4 motor.
proposed strategy is superior as an average of 99.1 % of global optimality in the training cycles
and an average of 97.7 % of global optimality in the validation cycles are achieved.
Figure 2.12 shows the torque trajectories of the engine, the P0 motor, and the P4 motor with
three strategies under the WLTC cycle. As shown in the figure, the proposed strategy controls the
torque split similar to the DP. The P0 motor rarely operates for the propulsion by the DP as shown
in Fig. 2.12 (b), the P0 motor torque trajectory from the proposed strategy mostly matches to that
of the DP as well. In contrast, the rule-based strategy’s torque split notably differs from those by
the other strategies. Sometimes the engine of the rule-based strategy provides higher power than
the others and the P4 motor captures the exceeding power to charge the battery (the zoomed-in
portion in Fig. 2.12 (a) and (c)). This double energy conversion makes the rule-based strategy to
be inefficient.
Figure 2.13 shows the SOC trajectories from the DP, the proposed strategy and the rule-based
strategy under all five drive cycles ((a) the WLTC, (b) the UDDS, (c) the HWFET, (d) the LA92 and
(e) the US06). As discussed, the rule-based strategy pays an extra cost (double energy conversion)
to maintain the SOC close to the reference SOC. Therefore, the resulting SOC trajectory differs
from those obtained by the proposed strategy and the DP. For example, in Fig. 2.13 (c), unlike other
two strategies that drain the battery SOC from 300 s to 600 s, the rule-based strategy charge the
battery around 420 s because the battery SOC falls below the low SOC threshold. Fig. 2.13 shows
that the overall SOC trends of the proposed strategy are similar to the those from the DP results in
all five cycles, which can be explained by the similarity of the powertrain operations between two
strategies as shown in the Fig 2.14.
Figure 2.14 compares the operating points of the engine and the two motors with the DP, the
proposed strategy and the rule-based strategy under the WLTC. It can be observed that the engine
(Fig. 2.14 (a) and (d)) and the P4 motor (Fig. 2.14 (c) and (f)) operate very similarly under the
DP and the proposed strategy. The frequency and range of the visited operating points are almost
identical except for a few high-load points in the engine operation. On the other hand, the P0
motor operations are slightly different, as seen from Fig. 2.14 (b) and (e), which is due to the
fact that the P0 motor is used mostly for regenerative braking under the proposed strategy. It is
noted that since the P0 motor is attached to the engine, the speed range of the P0 motor operation
is very similar between the two strategies. The engine and the P4 motor operation of the rule-
based strategy (Fig. 2.14 (g) and (f)), however, are very different from those with the DP and the
proposed strategy. The engine (Fig. 2.14 (g)) of the rule-based algorithm tends to operate at the
optimal power level that is defined in Fig. 2.11. As discussed earlier, for maintaining the battery
SOC level, the P4 motor operation with the rule-based strategy is also different from those with
the DP and the proposed strategy as shown in Fig. 2.14 (f). However, the P0 motor operation
of the rule-based strategy is similar to the proposed strategy, which is due to the fact that both
36
0.8 50
DP Proposed Strategy RB Driving cycle
Vehicel Speed
0.7 WLTC 40
SOC 0.6 30
(m/s)
20
0.5 10
0.4 0
0 200 400 600 800 1000 1200 1400 1600 1800
(a)
0.7 50
Vehicel Speed
0.6 40
UDDS 30
SOC
0.5
(m/s)
20
0.4 10
0.3 0
0 200 400 600 800 1000 1200 1400
(b)
0.7 50
Vehicel Speed
0.6 40
30
SOC
0.5 HWFET
(m/s)
20
0.4 10
0.3 0
0 100 200 300 400 500 600 700 800
(c)
0.7 50
Vehicel Speed
0.6 40
30
SOC
0.5 LA92
(m/s)
20
0.4 10
0.3 0
0 200 400 600 800 1000 1200 1400
(d)
0.8 50
Vehicel Speed
0.7 US06 40
30
(m/s)
SOC
0.6 20
0.5 10
0.4 0
0 100 200 300 400 500 600
Time (s)
(e)
Figure 2.13: SOC trajectories of the DP and the proposed strategy results under the five driving
cycles: (a) the WLTC, (b) the UDDS, (c) the HWFET, (d) the LA92 and (e) the US06
strategies adopt similar braking rules: sub-optimal braking force function is utilized to find the
force distribution between two axles and the P0 motor only recuperate braking energy when the
braking demand power is larger than the engine motoring power.
37
DP DP DP
Engine P0 Motor P4 Motor
Figure 2.14: Comparison of operation points distribution under the WLTC driving schedule from
the DP results: (a) the engine, (b) the P0 motor, and (c) the P4 motor, from the proposed strategy
results: (d) the engine (e) the P0 motor and (f) the P4 motor and from the rule-based strategy
results: (g) the engine (h) the P0 motor and (i) the P4 motor. The size of the bubble indicates
frequency.
2.5 Summary
A real-time-implementable torque-split strategy for minimizing the fuel consumption of a P0+P4
MHEV is proposed in this chapter. Since the optimal torque-split among the IC engine and the
P0 and P4 motors is complicated and computationally demanding, reducing the size of the opti-
mization problem is desired. In this chapter, the optimal torque-split problem is formulated with
a detailed modeling approach, including longitudinal load transfer, non-linear tire model with tire
slip, and brake distribution regulation, and then solved with dynamic programming. The DP results
reveal that (i) the commonly used rolling assumption is applicable, (ii) the P0 motor is rarely used
for propulsion, and (iii) the ratio of tire force (front-to-total) is highly related to deceleration. Based
38
on these observations, the proposed strategy combines an approximated A-ECMS and a subopti-
mal braking force distribution function for vehicle propulsion and regeneration, respectively. The
simulation results show that the proposed strategy for the considered P0+P4 MHEV can achieve
more than 99.1% and 97.7% of global optimality compared with the DP results in both training
and validation cycles, and also it is capable to adapt other drive cycles that is not been exposed to.
In contrast, a rule-based strategy only achieves 93.6% and 94.5% of global optimality under same
drive cycles.
39
CHAPTER 3
3.1 Introduction
Thanks to reinforcement learning’s astonishing adaptability to challenging problems, researchers
have developed several torque-split controllers for single motor HEV with RL methods: recurrent
Q-learning [27], DDPG [29] and TD3 [36]. However, to the authors’ knowledge, there is no
existing literature about the DRL EMS for the P0+P4 MHEV system. Also, existing literature
does not encourage the DRL agent to explore action-sensitive states during the training. To address
the issues of the aforementioned RL-based HEV energy management strategies in section 1.2.2,
this paper proposes a prioritized exploration and experience replay (PEER) technique as an add-
on to TD3-PER for the energy management of HEVs. During the early stage of the training,
the PEER encourages the agent to explore high-complexity regions in the transition space more
frequently. As prior art in [29] has proven benefits of expert knowledge to DRL training speed
and converged returns, our proposed TD3 framework will incorporate the expert experience from
dynamic programming analysis [64] from a previous study. The main contributions of this work
are threefold:
• a non-linear mapping between actor output and motor power is constructed, which condenses
the on/off state of the motor and the motor power into a single variable.
• an expert-interposing DRL method is developed based on the state-of-art algorithm TD3 for
P0+P4 HEVs.
• a novel exploration method for TD3 is proposed to encourage agents to explore complex
dynamics region of the system, and its performance is compared with DDPG-PER and DQN.
The later sections organize as follows: The detailed reinforcement learning framework is in
section 3.2. The convergence performance of the proposed methods during training is in section
40
3.3. Section 3.4 analyzes the importance of expert knowledge to the proposed method in this P0+P4
MHEV architecture. Section 3.5 investigates the improvements between the proposed methods and
existing DRL methods. Section 3.6 draws the summary and future work.
N
X −1
min J = ṁf,k ∆t
k=0
The cost J is the total fuel consumption over the trip. The instantaneous fuel consumption ṁf,k
depends on engine speed, ωe , and the engine torque, τeng . u1 and u2 represent the control variables
of the P0 motor torque and the P4 motor torque, respectively. x represents the battery SOC, and its
initial and terminal values are defined as SOC0 and SOCf , respectively. A global solution to the
problem in equation (3.1) is obtained through a numerical approach of dynamic programming with
dpm.m in MATLAB [73]. Unfortunately, the dynamic programming method is computationally
expensive and non-causal, hence cannot be converted into a real-time control strategy. However,
the control sequence from the global optimal solution usually contains a substantial trend that can
be taken advantage of while developing a real-time control strategy. The level of discretization is
chosen to be:
• time: 1 s,
• SOC: 2 ×10−3 ,
41
• τm1 and τm2 : 2 Nm.
and the further torque split between the engine and the P0 motor is controlled by a rule-based logic:
(
τ̃e − τe,max if τ̃e > τe
τm1 = , (3.3)
0 otherwise
where τe,max is the maximum engine torque. This simplified torque split problem is then solved
with twin-delayed deep deterministic policy gradient (TD3) methods.
42
main networks. The main actor network maps each state s to action a. Two main critic networks
map transitions to their estimated value Q(s, a), respectively. This actor-critic architecture aims to
select an action that leads to the largest possible Q(s, a).
Based on the MDP, the actions and states for the considered HEV power management problem
are chosen as
The trip ratio rtp is defined as the traveled distance divided by the total trip distance. The P4 motor
power Pm2 and its on/off are combined into a single action output of the actor-network. The reward
function is constructed as follows:
where ṁf,t is the fuel rate. The lower and upper bounds of the battery SOC are denoted by SOClb
and SOCub , respectively, and the ci s represent weighting factors for the terms in the reward func-
tion.
In each iteration, the action network will generate action with policy π, current state s, and a
certain amount of random exploration ϵ:
at ∼ πϕ (st ) + ϵ, (3.5)
where ϵ follows the normal distribution with a mean of 0 and a standard deviation of σ,
The observed reward rt and next state st+1 also will be recorded into a tuple (st , at , rt , st+1 ) for
experience replay and the evaluation of the two critic networks.
43
Noise 𝜖
𝑎𝑡
acc Expert
knowledge
v
G Deterministic Policy gradient ∇𝜙 𝐽ሚ 𝜙
SOC 𝑆𝑡 2𝜆𝑤𝑗
Actor ∇𝜙 𝐽ሚ 𝜙 = max Σ 𝑦 − 𝑄𝜃𝑖 𝑠𝑡 , 𝑎𝑡 ∇𝜙 𝐽 𝜙 + (1 − 𝜆)∇𝜙 𝐽 𝜙
𝑖 𝑁
dt /dmax ∇𝜙 𝐽 𝜙 = 𝑁 −1 Σ ∇𝑎 𝑄𝜃1 𝑠, 𝑎 ቚ ∇𝜙 𝜋𝜙 𝑠
𝑎=𝜋𝜙 𝑠
Soft update 𝑁
PER 𝑦 = 𝑟 + 𝛾min(𝑄𝜃′ 𝑖 (𝑠𝑡+1 , 𝑎))
r
Critic 1’ & 2’
Noise
Replay buffer s’ r 𝑄𝜃′𝑖
𝜖 𝑎
Actor
Figure 3.1: The proposed power-split strategy for the P0+P4 MHEV: structure of expert TD3 with
prioritized experience replay and prioritized exploration.
where a ∈ [1, −1] is the actor-network output and P̂m2 ∈ [−1, 1] is the normalized P4 motor power.
The motor torque automatically adjusts itself within the torque max/min limits at different motor
speeds by controlling the motor power. In addition, the control of the P4 motor on/off is merged
into the same power control variable, which simplifies the control problem.
44
Algorithm 1 TD3 with prioritized exploration and experience replay
1: initialization: critic network and actor network with weights θi and ϕ
2: copy target net θi′ ← θi , ϕ′i ← ϕi
3: Initialize Replay Buffer and random process for action exploration.
4: for episode 1:M do
5: get initial states: SOC, gear, ve , acceleration, rtp
6: for t=1:T do
7: Select action at ∼ πϕ (st ) + ϵ, according to the current policy and exploration noise
8: Execute action at , observe reward rt and new states st+1
9: Store transition (st , at , r, st+1 ) in Replay buffer
10: sample a mini-batch of N transitions (st , at , r, st+1 ) from Replay buffer
11: Set y = r + γmini=1,2 Qθi′ (s′ , ã)
12: Update critic parameters θi by minimizing the loss: L = wi (Qi − yi )2
13: if t mod d then
14: Update the actor policy using the deterministic policy gradient: ∇ϕ J(ϕ)˜
15: Update the target networks: θi′ ← τ θi + (1 + τ )θi′ , ϕ′i ← τ ϕi + (1 + τ )ϕ′i ,
where γ is the discount factor for the future predicted value, usually chosen as less than 1; ã is the
action generated by the target action network based on the state st+1 and its policy parameter ϕ′
with a Gaussian-distributed but clipped random exploration noise:
The purpose of having both critic networks is to avoid overestimation of the Q value. Each main
critic network parameter θi is updated with the cost function, defined as the square of the TD error:
N
1 X
Li = (ym − Qθi (sm,t , am,t ))2 , (3.11)
N m=1
To ensure the stability of the training process, a traditional TD3 adopts a delayed update policy
for the actor network and all other target networks. For every d time steps, the parameter ϕ in the
actor network is updated with a deterministic policy gradient, given by
N
X
−1
∇ϕ J(ϕ) = N ∇am Qθ1 (sm , am )|am =πφ (sm ) ∇ϕ πϕ (sm ), (3.12)
m=1
45
Normalized
power 𝑃𝑚2 (1,1)
(−𝜁,0) (𝜁,0)
Action: 𝑎
Figure 3.2: Combined control of motor activation and motor power: the relationship between
motor normalized power and actor network output.
θi′ ←
− τ θi + (1 − θi )θi′ (3.13)
ϕ′i ←
− τ ϕi + (1 − ϕi )ϕ′i (3.14)
where ϵp is a small value that prevents a zero chance of sampling for certain transitions. The chance
46
of each transition being sampled is given by
pαj
P (j) = (3.16)
Σk pαk
The hyper-parameter α balances between greedy search and random search. When α is equal
to 0, this PER method becomes a random search. During the sampling, the [0,Σk pαk ] is divided
into k intervals, and one transition is sampled from each interval with the method of sum tree
method [79]. The sum tree is a structured binary tree, where each parent node equals the sum of
its two children. The weighted priority of each transition pαj is stored in a distinct leaf node. The
top node contains the information of Σk pαk . When any random number is generated within the
interval [0,Σk pαk ], the sum tree allows the agent to search and locate the leaf and transition with the
complexity of O(log(N )).
The PER method increases the utilization of high-priority transition. However, high utilization
also leads to a biased update of the neural network parameters compared with random-sampled
mini-batch training. To eliminate the bias caused by high utilization, an importance sampling (IS)
weight is introduced to the updated rule:
β
1 1
w̄j = (3.17)
N P (j)
where β determines the bias correction and is usually adjusted from β0 to 1 as training evolves.
The intuition of selecting β0 is between 0 and 1. The philosophy of w̄j is to utilize high-priority
transition experience more frequently but to contribute less to the update each time. To guarantee
the stability of the update, w̄j is normalized with maxj (w̄j ) between 0 and 1. After simplification,
the normalized form of IS weight is
P (j)−β P (j)−β
wj = = (3.18)
max(P (j)−β ) min(P (j))−β
N
wj X
Li=1,2 = (ym − Qθi (sm,t , am,t ))2 , (3.19)
N m=1
During each time step, a new transition will be stored into the experience replay buffer. A high
initial priority value pinit will be assigned to this transition to guarantee that it will be replayed at
least once in the future. When the replay buffer reaches the maximum size k, the earliest experience
will be discarded.
47
3.2.7 Prioritized Exploration
The cost function of the actor network in a traditional TD3 is selected such that the policy π will
take an action that maximizes Q at the current state, and, in (3.12), the back-propagation updating
rule of ϕ relies on the prediction of Q from the critic networks. However, during the cold start
of the first few training episodes, the critic networks may not be able to predict Q(s, a) precisely.
Therefore, updating the actor network in the first few episodes may not lead to the optimal solution.
There are usually two reasons that critic networks predict Q of a transition poorly:
• The experience of transition (s, a) is replayed few times; hence the estimated Q is not yet
converged,
• The estimation of Q(s, a) is sensitive to the action selection; hence, more interaction between
agent and environment in a similar state is needed to learn the correct dynamic.
The prioritized experience replay allows the replay buffer to select a transition with bad predic-
tion more frequently, which solves the issue of slow convergence. However, replaying the same
transition does not provide more information to those states which are sensitive to the decision of
action. For example, in a 2 dimensional state-space, the slope of state A in each direction varies
much larger than state B, which means the cost of transition of (s, a) is more sensitive to the se-
lection of action. The difficulty of predicting transition from state A is much larger than from
state B. Therefore, to resolve the issue of insufficient exploration, this paper proposes prioritized
exploration that actively explores the system’s region with complex action dynamics.
During the TD target calculation in (3.8), the target action ã is perturbed by a random explo-
ration noise ϵ. If a certain state’s Q is sensitive to the selection of action, the TD error δ of that
transition is likely to be high. Therefore, in prioritized exploration, the cost function of the actor
network J is dynamically adjusted such that the actor is encouraged to actively explore the tran-
sitions that critic networks do not predict well in the first few epochs. As explained earlier, these
bad predictions are mostly due to the current state being highly sensitive to action selection. This
active exploration is achieved by including the mean square error between Q(s, a) and the better
prediction generated from (3.8) into the cost of actor networks J. After the critic networks capture
the dynamics of the environment, the policy shifts its focus to the maximization of Q. The cost
function of the actor network in prioritized exploration is designated as
where λ is a hyper-parameter that starts between (0, 1) and eventually decays to 0. Similar to a tra-
ditional TD3, only the first critic network is used to generate the gradient of the actor network. The
48
actor parameters are then updated with gradient descent through a deterministic policy gradient:
N
˜ 2λwj X
∇ϕ J(ϕ) = max (ym − Qθi (sm,t , am,t ))∇ϕ J(ϕ)
i N m=1 (3.21)
+ (1 − λ)∇ϕ J(ϕ)
where
N
X
∇ϕ J(ϕ) = N −1 ∇am Qθ1 (sm , am )|am =πφ(sm ) ∇ϕ πϕ (sm ), (3.22)
m=1
Note that in (3.21), L and Q may not be in the same order of magnitude. Therefore, the first term
needs to be properly scaled. Also, note that in (3.8), the prediction of Q is coupled with current
policy πϕ . Therefore, the initial λ should be selected such that it does not overly perturb the goal
of the actor network. The overall algorithm structure and pseudo code of TD3 with prioritized
exploration and experience replay (PEER) are listed as figure 3.1 and Algorithm 0.
49
Figure 3.3: (a): Training process of three TD3-PEER agents with different initial random seed.
(b): The L of both critic networks of a selected agent.
50
Figure 3.4: SOC trajectory of the DP and the proposed strategy results under the five driving cycles:
(a) the WLTC, (b) the UDDS, (c) the HWFET, (d) the LA92 and (e) the US06
policy, the learned operations are obviously different from those suggested by the dynamic pro-
51
100 60
DP 40
50 20
0
Expert TD3 0 -20
660 670 680 1150 1160 1170
TD3
0 0
-2 -10
-4 -20
660 670 680 1150 1160 1170
52
20
0
-20
1150 1160 1170
Figure 3.5: Greedy-run of both TD3-PEER agents and the DP results over the WLTC cycle. (a): The engine torque over the time. (b):
the P0 motor torque over the time. (c): the P4 motor torque over the time.
gramming. For example, from time 660 seconds to 680 seconds of figure 3.5, The P0 motor of
non-expert TD3 is recapturing energy while the engine is generating propulsion torque. As in-
dicated from the same plot, the expert knowledge from dynamic programming does not suggest
this double energy conversion as it brings unnecessary energy consumption. From 1150 to 1175
seconds, both Expert TD and the dynamic programming prefer to use P0 and P4 motors to recover
energy in a similar trend. However, The non-expert TD3 only learns a sub-optimal solution, which
behaves differently to the aforementioned cases. The detailed fuel consumption statistics and SOC
deviation for each cycle and each case are listed in table 3.2.
Considering that the terminal SOC of each agent deviates slightly from the initial SOC, the fuel
consumption is corrected based on the SOC deviation. The change in fuel in response to the SOC
∆(f uel)
deviation ∆(SOC) is obtained through the case study from chapter 2 and our previous work about a
48V P0+P4 MHEV [64].
53
Table 3.2: Fuel consumption comparison between DP, the proposed TD3 algorithm with and with-
out expert knowledge.
Table 3.3: Fuel consumption comparison the proposed method and two state of art method: DDPG-
PER and DQN.
Corrected fuel
Type Driving consumption (kg)
cycle TD3-PEER DDPG-PER DQN
Tra. WLTC 0.9339 0.9493(+1.65%) 0.9633(+3.15%)
Tra. UDDS 0.4229 0.4368(+3.29%) 0.4426(+4.66%)
Tra. HWFET 0.5659 0.5672(+0.23%) 0.5773(+2.01%)
Val. LA92 0.6677 0.6906(+3.43%) 0.7024(+5.20%)
Val. US06 0.6556 0.6748(+2.93%) 0.6797(+3.68%)
better fit the optimal control policy. Prioritized experience replay also improves sample utiliza-
tion, speeding up the training process. With the method of DDPG-PER, the agent spends roughly
1.72% and 3.18% more fuels over the training and validation cycles as compared with the proposed
TD3-PEER.
Benefiting from the dual critic networks setup, the agent of TD3-PEER is less likely to be
distracted by overestimated actions during the training. The trick of delayed policy updates and
gradient clip improves the stability of the training. The prioritized exploration mechanism encour-
ages the actor to actively collect low-confidence transitions for the critics. Then, the critic networks
can provide a more precise gradient for policy training. As shown in Fig. 3.7 (b) and (c), the P0 and
P4 motor operation of TD3-PEER from 182 to 190 seconds is much closer to the DP’s operation,
as compared with the DQN and DDPG-PER. In Fig. 3.7 (c), the P4 motor operation from 1150
to 1170 seconds also demonstrates that the TD3-PEER agent has already learned a near-optimal
policy. Overall, the agent of TD3-PEER achieves fuel consumption closest to the DP results in
54
Figure 3.6: SOC trajectories of DP and several learning-based methods results under the five driv-
ing cycles: (a) the WLTC, (b) the UDDS, (c) the HWFET, (d) the LA92 and (e) the US06
both the training and the validation cycles among all three methods.
55
DP DQN DDPG-PER TD3-PEER
0
-10
-20
182 184 186 188 190
56
20 20
0 0
-20 -20
182 184 186 188 190 1150 1155 1160 1165 1170 1175
Figure 3.7: Greedy run of learning-based methods and the DP results over the WLTC cycle. (a): The engine torque over the time. (b):
The P0 motor torque over the time. (c): The P4 motor torque over the time.
3.6 Summary
This chapter presents a novel energy management strategy for P0+P4 HEVs that is based on an
expert twin-delayed deep deterministic policy gradient with prioritized exploration and experience
replay (TD3-PEER). To address the issue that the critic network in the state-of-the-art TD3 may
struggle with predicting Q value, this paper proposes prioritized exploration that encourages the
agent to visit action-sensitive states more frequently. The proposed algorithm is tested and vali-
dated on a P0+P4 HEV model, including consideration of realistic operational constraints such as
nonlinear tire effects, braking force distribution for safety, and motor/engine limitations are con-
sidered. To simplify the control design, the P4 motor’s on/off control and the power control are
condensed into a single variable by introducing a motor activation threshold into the final layer of
the agent’s actor. In addition, the dynamic programming results are incorporated into the training
of TD3, helping the agent avoid inefficient operations. The results from the case study with random
seeds show that the method is well stabilized during training, and all agents converge to a similar
level of optimal solutions after around 10 episodes. With expert knowledge known for all methods,
the proposed TD3-PEER outperforms DDPG-PER and DQN, reducing fuel consumption over the
training and validation sets by an average of 2.3% and 3.74%, respectively.
57
CHAPTER 4
4.1 Introduction
Aside from the torque-split optimization described in chapter 2 and chapter 3, optimizing a vehi-
cle’s longitudinal motion also possesses a huge impact on fuel economy, safety and driving comfort
to the P0+P4 MHEV. Existing literature [43] have developed an adaptive cruise controller (ACC)
with advanced V2V/V2I technology that reduces driver’s efforts during highway scenario and lead
to 17% fuel consumption reduction from traditional cruise control. However, no existing literature
considers the potential threat from the adjacent lane. Dwelling within the blind spot of a neighbor-
ing vehicle blind spot can increase the risk of lane-change-collision. In this chapter, a defensive
eco-logical adaptive cruise control (DEco-ACC) method for P0+P4 MHEV, which is capable of
predicting and avoiding neighboring vehicles’ blind spots, is proposed. The main contribution of
this chapter is twofold:
1. The BSZs of the vehicles in adjacent lanes are considered and described mathematically to
develop a DEco-ACC algorithm.
The remainder of this chapter is organized as follows: Section 4.2 briefs the definition and
computation of the BSZs of neighboring vehicles to design constraints used in the considered
control problem. Section 4.3 provides the formulation of the proposed DEco-ACC based on model
predictive control. Section 4.4 presents simulation results to demonstrate the performance of the
proposed controller in comparison with a traditional ACC and Eco-ACC from other literature.
Finally, a summary and directions for future work are presented in Section 4.5.
58
Figure 4.1: An example diagram of blind spot zones of a sedan in orange color; visible region by
head tilt in yellow
59
Table 4.1: Blind spot angle of sample vehicles obtained by ray method
Table 4.2: The symbols and corresponding definitions used in this paper
Symbol Definition
LBSZ Blind spot zone’s length
LE Ego vehicle’s bumper-to-bumper distance
LN Neighboring vehicle’s rear bumper to a side mirror distance
a Ego vehicle’s acceleration
∆a Ego vehicle’s acceleration change
ve Ego vehicle’s speed
vn Neighboring vehicle’s speed
vp Preceding vehicle’s speed
ye Ego vehicle’s displacement
yn Neighboring vehicle’s displacement
yp Preceding vehicle’s displacement
∆N E
Y yn − ye
Y ∆P E yp − ye
Y ∆P N yp − y n
60
Next-lane Car (𝑁𝑁2 )
𝑦𝑦𝑁𝑁2
𝐿𝐿𝐵𝐵𝐵𝐵𝐵𝐵1
Front car Ego car
𝑦𝑦𝐸𝐸
𝐿𝐿𝐵𝐵𝐵𝐵𝐵𝐵2
𝑦𝑦𝑁𝑁1
Next-lane Car (𝑁𝑁1 )
𝑦𝑦𝑃𝑃
Figure 4.2: A concept of car-following in consideration of the BSZs of neighboring vehicles
∆P E
Ymin (t) ≤ Y ∆P E ≤ Ymax
∆P E
(t), (4.1)
∆P E ∆P E
where Ymin and Ymax are the minimum safety distance and the maximum comfort distance,
respectively. The minimum safety distance to the preceding vehicle is formulated as a constant
time headway policy [40],
∆P E ∆P E
Ymin (t) = Ymin,0 + t h ve , (4.2)
∆P E
where Ymin consists of the constant time-gap th , the ego vehicle’s speed ve , and the constant dis-
∆P E
tance headway to the preceding vehicle Ymin,0 . A constant distance dcom is introduced to determine
∆P E
Ymax for comfortable following as well as for preventing a neighboring vehicle’s cutting in [86]:
∆P E ∆P E
Ymax (t) = Ymin (t) + dcom . (4.3)
The spacing policy introduced in (4.2) and (4.3) provides the ego vehicle with room for minimizing
both fuel consumption and dwelling time in the neighboring vehicles’ blind spots.
For simplicity, let us consider a single neighboring vehicle case. To avoid entering the BSZ, the
61
𝐿𝐿𝐵𝐵𝐵𝐵𝐵𝐵
𝐿𝐿𝐸𝐸
𝐿𝐿𝐸𝐸
𝐿𝐿𝑁𝑁 𝑌𝑌 Δ𝑁𝑁𝐸𝐸
Figure 4.3: Graphical demonstration of the constraints to avoid the BSZ of the neighboring vehicle.
As shown in Fig. 4.3, constraint (4.4) requires that the ego vehicle stays in front of the neighboring
vehicle’s BSZ; on the other hand, constraint (4.5) requires that the ego vehicle stays behind the
BSZ of the neighboring vehicle. Since these constraints (4.4) and (4.5) cannot be satisfied at the
same time, the midpoint of the actual region in which the ego vehicle is completely hidden is
introduced as follows:
LBSZ
l = −LN + LE + . (4.6)
2
Then, constraints (4.4) and (4.5) can be given by
1
|Y ∆N E∗ | ≥ LBSZ (4.7)
2
where
Y ∆N E∗ = Y ∆N E − l. (4.8)
Constraint (4.7) will be penalized with a slack variable to resolve feasibility issues when the ego
vehicle needs to pass the blind spots of the neighboring vehicles.
62
Figure 4.4: The average occurrence probability of NVs scenarios using 2403 vehicles from NGSIM
data.
63
daily highway driving scenario. As it can be seen, 3-NVs and 4-NVs scenarios are not observed
when the speed of the ego vehicle is above 8 m/s, meaning that a 2-NVs scenario could be sufficient
to cover normal (high-speed) highway driving, which is the main focus of this paper. It is noted
that the formulation of the Deco-ACC for 4-NVs scenarios and simulation results are presented in
the Appendix.
with
1 0 0 −Ts − 12 Ts2
0 1 0 −Ts − 21 Ts2
A= 0 0 1 −Ts − 21 Ts2
, (4.10)
0 0 0 1 Ts
0 0 0 0 1
h 3 3 3
iT
Ts2
B T = − T6s − T6s − T6s 2
Ts , (4.11)
h iT
dTk = Ts vp Ts vn1 Ts vn2 0 0 , (4.12)
64
where the state and control vectors are defined by
Y ∆P E
∆N E1∗
Y
xk = Y
∆N E2∗ , uk = ȧk . (4.13)
ve
ae k
Note that term (4.12) includes velocity information about the neighboring vehicles and the preced-
ing vehicle; thus, in MPC, future velocity and displacement are treated as known disturbances. It
is also noted that the jerk ȧk is used as a control input to directly penalize for ride comfort and to
achieve zero-offset tracking performance.
where Pi ’s are weighting factors for penalizing jerk, acceleration, velocity difference to the pre-
ceding vehicle, and the ego vehicle’s dwelling in BSZs, respectively. In (4.14), δslack1 and δslack2
are slack variables for implementing constraint (4.7) as two soft constraints for two neighboring
vehicles and are defined as follows:
∆N E ∗ ,i
2Y π
δslack,i = cos max min , π , −π (4.15)
LBSZ,i
In particular, mode1 and mode2 are boolean signals that are defined as follows: if the neighboring
vehicle i approaches from behind, modei is set to be 0; if the neighboring vehicle i approaches from
the front, modei is set to be 1. When a neighboring vehicle approaches the ego vehicle from behind,
it is unlikely that the neighboring vehicle will make a lane change toward the ego vehicle causing
any danger because the neighboring vehicle’s driver can see the ego vehicle through DLO. Thus,
at each time instant, the controller determines whether to turn on/off the penalty on a neighboring
vehicle’s BSZ using mode1 and mode2 . Note that modei is fixed as the same value as the current
one in the prediction horizon.
65
Previous Time step
Mode (k=k+1)
Dynamic Model
Current
Yes No
Mode
Mode == 1 Mode == 0 MPC
‘Front BSZ’ ‘Behind BSZ’
Constraints
No Dynamic
Mode
Model
Modified?
Yes (1,2,3, …,n) Cost function
(BSZ Penalty)
Update BSZ
Penalty
Front
BSZ
Mode == 1 Mode == 0
Behind
BSZ
Figure 4.5: The proposed algorithm that determines when to activate penalty of blind spots.
As shown in Fig. 4.6, the distance of LBSZ about its midpoint is mapped into [−π, π]. When
the ego vehicle enters the BSZ of the neighboring vehicle, the term becomes a positive value;
otherwise, it is zero. Moreover, this function is continuous and one-time differentiable.
State and input constraints for the DEco-ACC formulation are summarized as follows:
∆P E
Ymin,k ≤ Yk∆P E ≤ Ymax,k
∆P E
(4.16)
ve,min ≤ ve,k ≤ ve,max (4.17)
ae,min ≤ ae,k ≤ ae,max (4.18)
umin ≤ uk ≤ umax . (4.19)
66
Penalty (𝛿𝛿slack)
Risk
𝐿𝐿𝐵𝐵𝐵𝐵𝐵𝐵
−𝜋𝜋 𝜋𝜋
Figure 4.6: Concept diagram of the penalty function for the normalized BSZ, which will be used
to formulate the slack variable when the ego vehicle enters the blind spot.
The formulated optimization problem is solved with MPC solver CasADi [92] via the mpctools
interface [93] in the Matlab environment.
67
Entering line
Preceding (respect to a preceding car)
car
Next-lane
Car (𝑁𝑁1 ) Next-lane
Car (𝑁𝑁2 )
Max/min(t)
Distance to
preceding
Ego car
Neighboring
Vehicle
vehicle generated
from here with
random ID. Vanishing line
(respect to a preceding car)
Figure 4.7: Car-following simulation setup for a 2-NVs scenario.
are located in adjacent lanes. Suppose a neighboring vehicle has a larger average speed than the
preceding vehicle. In that case, this neighboring vehicle will catch up with the ego vehicle from
the rear and drive away to the front of the ego vehicle. Therefore, the neighboring vehicle driver
can notice the ego vehicle from the front windshield during driving, and the DEco-ACC is not
necessary in this case. For this reason, the neighboring vehicle speed profiles are selected such
that all neighboring vehicles are generally slower than the preceding vehicle in terms of average
speed. There are two lines defined in Fig. 4.7: an entering line (solid red) and a vanishing line
(green dashed) of neighboring vehicles with respect to the preceding vehicle. Once a neighboring
vehicle falls behind the vanishing line, a new neighboring vehicle that is randomly chosen from
neighboring vehicle candidates will be assigned at the entering line. Two red dashed lines in
∆P E ∆P E
Fig. 4.7 represent the lower bound, Ymin , and upper bound, Ymax , of Y ∆P E , respectively. As
∆P E
discussed earlier, the Ymin in (4.2) changes dynamically with th and ve .
The speed profiles of the considered neighboring vehicles candidates and the preceding vehicle
are obtained from the data collected through the Next Generation Simulation (NGSIM) project
[87]. Among all 2403 vehicles from the NGSIM data, 21 vehicles are selected based on their
average speed values. The detailed information of the preceding and neighboring vehicles are
listed in Table 4.3.
In vehicle simulation, a sampling time of Ts = 0.5 second is used, and the prediction horizon N
68
Table 4.3: Statistics of the preceding vehicle (ID: 0) and neighboring vehicles (ID: 1-20) speed
trajectories candidates used in this study, units are in [m/s].
is set to be 20, which corresponds to 10 seconds; that is, the accurate speed profiles of the preceding
and neighboring vehicles are available for the next 10 seconds.1 The car length of each vehicle is
assumed to be 4.5 m for simplicity, and hence the BSZ values of all vehicles are the same. The
road width and the blind spot angle for determining the length of BSZ are set to be 3.7 m and 74◦ ,
respectively. The road width is a typical value considered in the United States. Thus, the projection
of the blind spot zone of the neighboring vehicle to the ego vehicle lane is 3.7 tan(74◦ ) ≈ 12.9
meters, resulting in the BSZ length of 8.4 meter. The ego vehicle can be seen by the neighboring
vehicle as long as the ego vehicle is not completely in the BSZ.
The information about the upper and lower limits of the constraints are summarized in Table 4.4.
It is noted that the acceleration and jerk limits are chosen with consideration of passenger comfort.
Specifically, the acceleration limit is bounded by ±0.5 m/s2 based on the analysis in [95], and the
1
Many approaches to forecasting a vehicle’s future speed have shown that 10-second prediction is reasonable in
consideration of the state-of-art vehicle communication technologies (e.g., [94]).
69
Table 4.4: The minimum and maximum values for the considered constraints
jerk u is bounded by ±2.5 m/s3 smaller than the maximum limit to retain the passenger-comfortable
2.94m/s3 as suggested in [96]. In a typical car-following scenario, the vehicle velocity cannot be
negative, and hence vmin is set as 0 m/s. The maximum speed vmax is set as 33 m/s (118.8km/h
or 74 mph), which covers the highway speed limit in the United States. Regarding the minimum
∆P E
safe distance, Ymin,0 is set to 2 m. The time headway th of 2 seconds is considered in this study,
which allows the ego vehicle to stay at a safe distance from the preceding vehicle when the fleet
speed is significant, as suggested in [40], and dcom is set to 30 m for the purpose of preventing the
neighboring vehicle cutting in [86].
As the desired goal is to reduce the BSZ dwelling time without significant sacrifice in fuel con-
sumption, two metrics are considered: (i) dwelling time in BSZs, TBSZ , and (ii) fuel consumption,
mf . The computation of TBSZ is performed by integrating an indicator function that determines the
status of the ego vehicle as follows:
N
X
TBSZ = I(k)Ts
k=1
with (
0 if (4.7) is satisfied OR both modes are zero,
I(k) = (4.20)
1 otherwise.
The fuel consumption of the ego vehicle is computed by the integration of engine fueling rate ṁf .
N
X
mf = ṁf (k)Ts . (4.21)
k=1
70
Since the target vehicle is a P0+P4 MHEV, the fuel consumption is corrected with terminal SOC
∆(f uel)
deviation and ∆(SOC) obtained from section 2.4.3.
71
B
among all parameter sets is approximately 0.4034 kg and the BSZ dwelling time is around 47.4
seconds. When P4 is introduced, by slightly sacrificing the average fuel consumption to 0.4052 kg
(+0.45%), the average BSZ dwelling time can be significantly reduced to 31.3 s (−33.97%).
Among all the cases, Case A and Case B are of interest: Case B emphasizes the minimization
of fuel consumption but does not care much about the ego vehicle’s dwelling time in the BSZs
of neighboring vehicles, which can be classified as Eco-ACC in the literature (e.g., [38, 44, 98]).
Case A, however, sacrifices the fuel consumption by a little but gains a dramatic reduction in the
BSZ dwelling time, which possibly decreases the risk of collision with neighboring vehicles. Case
72
B (or Eco-ACC) leads to the least fuel consumption: 0.4010 kg during the 480-second trip. As
can be seen from the weighting factors in Table 4.6, the Eco-ACC penalizes only jerk, accelera-
tion and velocity difference to the preceding vehicle but not BSZ dwelling time. In comparison
to the Eco-ACC, Case A, which is a good candidate of DEco-ACC, sacrifices only 0.67% of fuel
consumption and reduces its BSZ dwelling time by 52.9%. Since the DEco-ACC considers min-
imizing both BSZ dwelling time and fuel consumption, it penalizes jerk, acceleration, tracking
error, and dwelling time in the blind spots, as shown in the Table 4.6.
It is noted that the average computation time of solving the MPC problem at each time step is
0.015 second. Therefore, in consideration of the time step of 0.5 second, the proposed controller
is real time implementable.
73
Figure 4.9: A histogram shows 100-case fuel consumption and dwelling time of DEco-ACC and
Eco-ACC.
shown in Fig. 4.10(a). The weighting factors P1 and P2 in Eco-ACC make the ego vehicle follow
the preceding vehicle with reduced acceleration and jerk, which satisfies the comfort needs of ACC
and improves energy efficiency. In summary, the operation of the ego vehicle with Eco-ACC is not
influenced by the motion of the neighboring vehicles.
On the other hand, the ego vehicle with the DEco-ACC behaves defensively and avoids blind
spots when passing neighboring vehicles. The ego vehicle slows down to avoid entering the BSZ
of the neighboring vehicle if possible, e.g., 180 seconds. When passing the BSZ, the ego vehicle
accelerates to minimize the BSZ dwelling time, for example, 450 seconds and 710 seconds. It
should also be noticed that the DEco-ACC decides to pass by using the best knowledge within
this prediction horizon and satisfies all kinds of constraints. As can be seen from Fig. 4.10(b),
all states (acceleration and velocity) and input (jerk) constraints are well satisfied, guaranteeing
vehicle safety and comfort. Compared to the Eco-ACC algorithm, the DEco-ACC actively changes
the ego vehicle’s relative displacement to avoid the neighboring vehicle’s blind spots, this change
is reflected in velocity, acceleration, jerk trajectories as shown in Fig. 4.10(b). Because of the
proactive avoidance behavior, the DEco-ACC has a slower velocity than Eco-ACC at 180 seconds
and faster velocity at 450 and 710 seconds. As the price of reducing the BSZ dwelling time, the
74
75
Figure 4.10: Comparison of trajectories with DEco-ACC and eco-ACC: (a) displacement of each vehicle, relative to the preceding
vehicle, (b) velocity, acceleration, and jerk.
magnitude of jerk and acceleration with the DEco-ACC is slightly increased as compared to that
with Eco-ACC; however, their magnitude values are sufficiently smaller than the maximum limits.
Figures 4.11 and 4.12 show the comparison of acceleration and jerk distribution, respectively,
among various driving cycles. Particularly, the first three plots are obtained from three federal
driving cycles: (a) HWFET, (b) WLTC, and (c) US06. As a human driver drove these cycles, the
results are assumed to be representative of human behavior. The other three plots are obtained from
the trip shown in Fig. 4.10 with the preceding vehicle and two different car-following methods: (d)
the preceding vehicle itself (PV), (e) Eco-ACC, and (f) DEco-ACC. The preceding vehicle from
the trip of Fig. 4.10, which serves as a baseline for comparison, was human-driven and followed
by Eco-ACC and DEco-ACC. As shown in Figs. 4.11(a)–(d), the maximum acceleration values
by human driving are close to or greater than ±2 m/s2 . With regard to jerk performance, as shown
in Figs. 4.12(b)–(d), human driving, except for the HWFET case, results in relatively high jerk
values. In contrast, driving with the Eco-ACC and the DEco-ACC results in considerably milder
operation since both controllers penalize acceleration and jerk with non-zero values of P1 and P2 .
Figures 4.11(e) and (f) show that during more than 90% of the time, the acceleration of those
two controllers is less than 0.3m/s2 . Figure 4.12 (e) shows that the jerk distribution of the Eco-
ACC and the DEco-ACC are similar to the HWFET cycle and much milder than the aggressive
cycles of WLTC and US06.
The detailed acceleration and jerk statistics of the driving cycles are summarized in Table 4.7.
As can be seen from Table 4.7, the maximum and minimum acceleration values from driving with
the Eco-ACC are 0.05 m/s2 and -0.02 m/s2 , respectively. Its average acceleration, 0.01m/s2 , and
average deceleration, −0.01m/s2 , are the mildest among all driving cases. This low magnitude
of average acceleration and deceleration allows the Eco-ACC controller to achieve the best fuel
economy, as discussed earlier. The lowest magnitude of the maximum jerk, 0.09m/s3 , and the
Table 4.7: Acceleration statistics for different cycles; average acceleration is denoted as acc(+) and
average deceleration is denoted as acc(-). Units for a are in [m/s2 ]. Units for ȧ are in [m/s3 ]
76
0.6 0.6
0.4 0.4
0.2 0.2
0 0
-4 -2 0 2 4 -4 -2 0 2 4
0.6 0.6
0.4 0.4
0.2 0.2
0 0
-4 -2 0 2 4 -4 -2 0 2 4
0.6 0.6
0.4 0.4
0.2 0.2
0 0
-4 -2 0 2 4 -4 -2 0 2 4
Figure 4.11: Acceleration distribution for different driving cycles. (a): HWFET. (b): WLTC. (c):
US06. (d): Preceding vehicle. (e): eco-ACC. (f): DEco-ACC.
minimum jerk, −0.01m/s3 , as shown in Table 4.7, allows the Eco-ACC controller to offer the best
ride comfort to the passengers.
The magnitudes of max a, min a, ā+ , and ā− with the DEco-ACC are greater than those with the
Eco-ACC, demonstrating that the DEco-ACC controller sacrifices fuel economy to avoid the neigh-
boring vehicle’s BSZ. However, Fig. 4.9 shows that the sacrifice of fuel economy is insignificant
despite considerable reduction in BSZ dwelling time. With regard to ride comfort, the maximum
jerk, 0.64m/s3 , and the minimum jerk, −0.2m/s3 , of the DEco-ACC indicate that the magnitude
is still smaller than the maximum limit to retain passenger comfort, 2.94m/s3 . Therefore, the
DEco-ACC operation is acceptable from both the fuel economy and comfort perspectives [44].
77
0.6 0.6
0.4 0.4
0.2 0.2
0 0
-4 -2 0 2 4 -4 -2 0 2 4
0.6 0.6
0.4 0.4
0.2 0.2
0 0
-4 -2 0 2 4 -4 -2 0 2 4
0.6 0.6
0.4 0.4
0.2 0.2
0 0
-4 -2 0 2 4 -4 -2 0 2 4
Figure 4.12: Jerk distribution for different driving cycles. (a): HWFET. (b): WLTC. (c): US06.
(d): Preceding vehicle. (e): eco-ACC. (f): DEco-ACC.
4.5 Summary
This chapter proposes a defensive ecological adaptive cruise control (DEco-ACC) method to proac-
tively avoid the neighboring vehicles’ BSZs during car-following. Unlike the existing Eco-ACC
utilizing information about the preceding vehicle only, the proposed DEco-ACC utilizes the neigh-
boring vehicles’ speeds and positions and their BSZs. The DEco-ACC is formulated in model
predictive control such that car-following can be performed with consideration of energy con-
sumption, ride comfort, and vehicle safety. Specifically, a penalty function that is continuous
and one-time differentiable is introduced to handle the constraints about the neighboring vehicles’
BSZs.
The impact of the weighting factors of the proposed DEco-ACC is comprehensively evaluated
78
through a realistic car-following scenario using real-world traffic data from Next Generation Sim-
ulation (NGSIM) with two neighboring vehicles located in adjacent lanes. Optimal weighting
factors for DEco-ACC are selected based on the results, with 3000 sets generated with the com-
bination of LHS (P1 , P 2, and P 3) and unevenly discretized P4 values. In order to statistically
investigate the performance of the proposed DEco-ACC, 100 different driving scenarios are simu-
lated, and the results show that on average, the proposed DEco-ACC can further reduce, by 29.5%,
the dwelling time in the BSZs of the neighboring vehicles without significant fuel penalty (0.4%
increase in fuel consumption) and deterioration of ride comfort compared to the Eco-ACC and can
successfully follow the preceding vehicle without violating safety-related constraints.
In this study, exact information about the nearest surrounding vehicles’ future speed is assumed
to be known for the MPC controllers. Since uncertainties could lead to inaccuracy in speed predic-
tion, investigating their impact on the proposed DEco-ACC’s performance is an important direction
for future research.
79
CHAPTER 5
5.1 Introduction
As proposed in chapter 4, the defensive ecological adaptive cruise control (DEco-ACC) for P0+P4
MHEV can reduce driver’s effort during car-following, minimize the threat from neighboring lanes
and maintains a similar level of fuel economy compared to Eco-ACC. However, such type of ACC
feature has limited application outside the highway scenarios, especially in an urban area with
many traffic signals and stop signs.
In order to maximize the range of electrified vehicles, the area of optimal regenerative braking
is being actively studied. Furthermore, to cope with the scenarios with many stop-and-go events,
researchers have developed another feature, one-pedal-driving (OPD), which automatically starts
the energy regeneration once the driver begins to release the acceleration pedal. This feature is
soon implemented onto many on-market vehicles such as Nissan Leaf, BMW i3, and Tesla Model
S. However, [59] reports that the transition from two-pedal-driving (TPD) to OPD still confuses
drivers. And, [57, 58] showed that existing OPD does not sufficiently consider individual driver’s
behaviors when performing brake action. Hence, this chapter proposes a personalized one-pedal-
driving algorithm (POPD), which possesses a learning framework to adapt the individual driver’s
braking behavior. The POPD only performs braking action when an upcoming braking event is
predicted through the MPC controller. Therefore, it requires less effort for the driver to adapt this
POPD feature. Furthermore, thanks to MPC’s predictability of car-following dynamics, all upcom-
ing collision events from experiments are prevented from happening. The main contributions of
this chapter are threefold:
• Constraints related to an individual driver’s preference during braking are designed based on
the analysis of real-world driving data.
80
Computed braking Human braking
𝑣 𝑣
𝑡 𝑡
Ego Preceding
vehicle vehicle
Figure 5.1: Personalized One-pedal driving: algorithm generate human-like deceleration before
the driver takes any action. The driver only needs to control the acceleration pedal most of the
time.
• A personalized one-pedal driving (POPD) method is proposed using model predictive control
with a learning framework that determines optimal weights of the cost function.
• The performance of the proposed POPD is tested and validated in open-loop simulations and
human-in-the-loop co-simulations.
The remainder of this chapter is organized as follows: First, two constraints associated with
the driver’s characteristics are introduced in Section 5.2. Then, the identification of the constraints
is detailed with real-world driving data in Section 5.3. Next, the proposed POPD algorithm and
its learning method are presented in Section 5.4 and Section 5.5, respectively. The simulation
and experimental validation of the POPD are discussed in Section 5.6 and Section 5.7. Finally,
Section 5.8 presents the Summary and future work.
81
5.2.1 Time Headway Constraints
The time headway, defined as the time interval between the front bumpers of two successive ve-
hicles, has been considered a useful indicator for safety evaluation [101]. Thus, the comfortable
distance range is often described with the time headway [65, 99] as follows:
drel ≥ th ve + dh , (5.1)
drel ≤ t̄h ve + d¯h . (5.2)
where ve is the ego vehicle’s speed and drel is the distance from the rear bumper of the preceding
vehicle to the front bumper of the ego vehicle. The maximum and the minimum relative distances
are determined with the maximum and the minimum time headway values t̄h and th and constants
d¯h and dh , respectively. Note that these time headway values and constants reflect a driver’s pref-
erence during car following; therefore, these parameters could be driver-specific and need to be
determined for an individual driver.
82
Relative distance, 𝑑𝑟𝑒𝑙
SDX
Brake
action Acceleration
action
Reaction Reaction
zone Unconscious zone
Deceleration
Collision
Relative velocity, Δ𝑣
Decreasing distance Increasing distance
Figure 5.2: Wiedemann’s car following model describes the relationship between the relative dis-
tance and the relative velocity. SDV, OPDV, and SDX represent brake threshold, acceleration pedal
threshold, and max follow distance threshold, respectively.
penalizing the SDV, the CLDV may not be violated and hence can be excluded from the formu-
lation. Note that this study focuses on developing a personalized braking algorithm for one-pedal
driving; the constraints of OPDV and SDX in Fig. 5.2 are handled with the human acceleration
pedal position. Thus, the case with negative ∆v is considered only.
Knowing the constraint-like property of the reaction zone in the Wiedemann’s car-following
model, the boundary line between the reaction zone and unconscious zone, as shown in Fig. 5.2,
can be simplified as a linear constraint, noting as
where k and b are slope and bias values for the fitted constraints, which varied depending on
drivers. Hence, they need to be determined through an identification method described in the later
section.
83
Figure 5.3: Data of four selected drivers, used for identifying time headway constraints (a) and for
identifying perceptual constraints (b).
mimics a specific driver’s behavior, the constraints in Eqs. (5.1), (5.2) and (5.3) need to be identified
for an individual driver. This section proposes a constraints identification method based on linear
regression, which can automatically identify headway constraints and perceptual constraints with
a small amount of pre-collected driving data from a specific driver.
84
Lower bound Upper bound
Figure 5.4: Headway constraints and perceptual constraints fitting of all drivers’ data: (a) Mini-
mum time headway, (b) Maximum time headway, (c) Minimum distance headway and (d) Maxi-
mum distance headway.
85
Figure 5.5: A fitted constraint function for a selected driver.
defined by
t̄h = (X ⊤ −1 ⊤ ¯
max X max ) X max (Y max − dh ), (5.4)
th = (X ⊤ −1 ⊤
min X min ) X min (Y min − dh ). (5.5)
As an example, the fitted constraints boundaries of (5.1) and (5.2) of a selected driver are shown
in Fig. 5.5 (a). Note that the constraints satisfaction (CS) rate of these headway constraints is 0.95,
meaning that over 95% of the driver’s braking action can be captured by the constraints (5.1) and
(5.2) despite human drivers’ nonlinear behavior. As we admit the limitation of the constraints (5.1)
and (5.2) in capturing human behavior, these constraints will be handled as soft constraints in the
control problem.
86
Average
Figure 5.6: Headway constraints and perceptual constraints fitting of all drivers’ data: (a) Slope of
the perceptual constraint and (b) Bias of the perceptual constraint.
in Fig. 5.5(b). In each interval, the data point with the minimum relative distance is identified.
Then, these identified data points are collected into vectors X p and Y p . The bias term b in (5.3) is
identified with Pn
j=1 (Y p,j − kX p,j )
b= (5.6)
n
where n is the number of the identified data points.
Figure 5.5(b) shows the identified perceptual constraint of an example driver and his/her braking
data in a drel vs. ∆v plot. Since this study focuses on braking rather than acceleration, only the
negative ∆v side constraint is considered. It should be noted that this perceptual constraint is a
simplified substitute for the left reaction zone in Fig. 5.2. The CS of the perceptual constraint
in Fig. 5.5(b) is 0.998, meaning that most of the driver’s braking behavior is captured by this
constraint. Similar to the headway constraints, the non-linearity from the reaction zone in Fig. 5.2
cannot be captured with a linear constraint. Therefore, this perceptual constraint is treated as a soft
constraint in the control design and allowed to be slightly violated.
87
5.3.4 Performance of Constraints Fitting
The CS statistics of all 450 drivers’ time headway and perceptual constraints are shown in Fig. 5.7.
Fig. 5.7(a) shows that most of 450 drivers have their headway CS higher than 0.7, and Fig. 5.7(b)
demonstrates almost all of drivers have their perceptual CS higher than 0.98. It can be concluded
that both the proposed time headway and perceptual fitting methods describe drivers’ behavior
reasonably well.
with
1 0 0
A = Ts 1 0 , (5.8)
− 12 Ts2 −Ts 1
h i⊤
Ts2 Ts3
B = Ts 2 − 6 , (5.9)
h i⊤
Ts Ts
dk = 0 0 2 vp,k + 2 vp,k+1 , (5.10)
It is noted that the rate of acceleration is used as a control input in order to penalize a jerk for
enhanced ride comfort.
88
5.4.2 MPC Formulation
The objective function in the MPC-based POPD is formulated as the weighted sum of the square
of acceleration a, rate of acceleration u, and velocity difference ∆v over the prediction horizon N ,
with two additional costs associated with slack variables:
N
X −1
J(u, δ1 , δ2 ) = (ae,k )2 P1 + u2k P2 + ∆vk2 P3
k=0 (5.12)
+ δ 1 P 4 + δ 2 P5 .
where Pi ’s are the weighting factors for penalizing acceleration, rate of acceleration and velocity
difference to the preceding vehicle. The slack variables δ1 and δ2 are introduced to implement
the constraints (5.1), (5.2) and (5.3) as soft constraints. Other state and control constraints are
summarized as follows:
The proposed algorithm activates itself when the driver releases the acceleration pedal in the
presence of a preceding vehicle, and the algorithm terminates when the driver presses the accel-
eration pedal again or in the absence of a preceding vehicle/stop sign. During the activation of
the proposed algorithm, a Proportional-Integrator (PI) controller converts the optimized ȧ into the
desired brake pedal position. The driver, therefore, only needs to control the acceleration pedal
most of the time. If the optimized deceleration does not meet the driver’s requirement, the driver
can still press the brake pedal to override the brake signal generated by the proposed algorithm.
This override action will be recorded and used for learning MPC weighting factors to be described
in the later section. When there are no preceding vehicle/traffic stops, the algorithm allows the
vehicle to coast and avoid double energy conversion.
89
5.5.1 Optimal Weight Learning
[105] trains a controller to capture human’s behavior with inverse reinforcement learning and a
performance evaluation metric consisting of mean square error between the controller’s action and
historical human action.
In order to judge the similarity between our controller’s action and the human driver’s action, a
similar performance metric L is considered:
T
1X ˜ j )2
L= (ae,j − ãe,j )2 + (∆vj − ∆ṽj )2 + (ȧj − ȧ (5.16)
T j=1
where ãe , ∆ṽ and ȧ˜ represents acceleration, relative velocity and rate of acceleration of human
driver’s action during braking. The variable T denotes the total samples used for optimal weight
learning. The performance metric L depends on the weights Pi′ s in the cost function (5.12), and a
driver with different driving behavior may have different optimal weights.
Finding Pi′ s that minimizes L for a specific driver can be considered an optimization problem.
However, there is no guarantee of differentiability and convexity on such an optimization problem.
Thus, the gradient-based optimization technique is challenging to find an optimal solution. There-
fore, in this study, Particle swarm optimization (PSO) is applied to optimize the weights of the cost
function (5.12) for a specific driver. PSO is one of the most popular meta-heuristics [106, 107],
inspired by group collaboration behavior from nature, such as a flock of birds. At the beginning of
the optimization, PSO randomly generates a certain number of particles within the search space.
In this study, P1 to P3 are optimized with PSO, that is, the max dimension jmax = 3. Each
particle in the swarm represents a set of weights [P1 , P2 , P3 ] that leads to a certain evaluation
metric L. The swarm size imax is chosen to be 50, and maximum iteration tmax is 20. In (5.12),
P4 and P5 are associated with slack variables and only contribute to the cost term if the constraints
in Fig. 5.5 are violated. Since the constraint in Fig. 5.5 are fitted to be feasible in most of the
scenarios, P4 and P5 are chosen to be universal constants for improving computational efficiency.
The research in [107] has shown that in a low-dimension search space, the PSO is less likely to
get trapped into a local optimal. The upper and lower bounds of the searching parameter space are
chosen as P1 ∈ [0.01 500], P2 ∈ [0.01 100], and P3 ∈ [0.01 100].
90
autonomous driving and V2V technology, the prediction of the future motion can only be made
by collecting its previous operations. In the literature, [94, 108, 109], various methods capable of
predicting the preceding vehicle’s speed have been proposed with a relatively small error through
a polynomial fitting of previous information. However, our study observed that a relatively simple
constant acceleration (CA) prediction [110] is sufficient to predict braking operation. The CA
method predicts the preceding vehicle’s l-step future speed vp,k+l by assuming that it maintains the
same acceleration as the current step k as follows:
Although the CA prediction in a long horizon possesses errors due to the change of acceleration
in the preceding vehicle’s acceleration, this prediction error in a long horizon has little impact on
the performance of MPC [108]. The reason is, in each time step, the MPC only adopts the first
control action from the control sequence generated within the prediction horizon and re-calculates
the control sequence when it reaches the next time step [111]. Thus, the MPC performance can be
guaranteed if the vp prediction in a short horizon is precise enough.
To assess the performance of the CA method in the current MPC formulation, a comparative
case study is conducted between the CA method and the perfect information prediction where the
preceding vehicle’s future speed vp can be acquired with no error. In addition, the impact of the
horizon length can affect the performance of the proposed POPD, and hence different prediction
lengths ranging from 3 to 10 are investigated.
Figure 5.8: (a) The comparison of L∗ between CA and perfect information prediction. L∗ of CA is
∗
normalized base on L∗ of the perfect information prediction method.(b) Averaged LL of 50 driver.
N ∈ [7, 9] results highest performance.
Figure 5.8 (a) shows the performance comparison between the CA and perfect prediction among
50 selected driving cycles. The best performance metric L among all prediction horizons (from
N = 3 to N = 10) is denoted by L∗ in a specific cycle. Note that L∗ of the CA is normalized based
on L∗ of the perfect prediction. The average normalized L∗ of the CA method among all drivers is
0.997, which means the CA method does not lead to severe performance degradation.
The study shows that the optimal prediction length N varies depending on different drive cycles.
91
For simplification, the prediction horizon is selected to lead to the best performance for most of
∗
the drivers. To determine the optimal prediction horizon, a normalized metric LL is introduced to
∗
assess how different N performs in a specific cycle. The best N in a specific cycle leads to LL = 1,
∗ ∗
and other N ’s lead to LL < 1. Through averaging LL among all 50 drivers, the best N that fits for
∗
all drivers, in general, can be found. The relationship between averaged LL and N are shown in
∗
Fig. 5.8 (b). As it can be seen, the prediction length N = 8 yields the highest LL ; thus, in the rest
part of this paper, the optimal horizon of N = 8 is used.
The desired distance headway ds , desired time-headway τ , and personalized factor bdes are identi-
fied to capture the driver’s behavior in active braking. Since DRD-PB is also designed to perform
92
braking only, ∆v 2 term in Eq. (5.18) does not interfere with the braking performance of the al-
gorithm. When the preceding vehicle speeds up, ∆v 2 term increases so does ddes . In this case,
the DRD-PB will not result in acceleration and will wait for the driver to operate the acceleration
pedal.
To accommodate the stochastic behavior of the human driver’s braking action, the desired dis-
tance in Eq. (5.18) is corrected by a factor cf expressed as follows:
cf = min 1, dˆrel /dˆdes , (5.19)
where dˆrel and dˆdes are the relative distance and computed desired relative distance at the moment
of acceleration pedal released. The correction factor is kept the same for continuous braking and
will reset when the driver press and release the acceleration pedal in the next event. When the driver
releases the acceleration pedal late so that drel becomes shorter than computed ddes initially, the
correction factor will reduce ddes to avoid harsh braking caused by a large initial relative distance
error. In the DRD-PB, a PI controller is used to track the corrected desired relative distance,
Since this DRD-PB method only focuses on braking control, there will be no positive acceler-
ation generated from the controller if drel is larger than d˜des ; Instead, the ego vehicle maintains a
coasting mode. Much detailed information about DRD-PB can be found in [112].
93
Figure 5.9: Histogram comparison between POPD with DRD-PB: the L of POPD is normalized
based on DRD-PB.
Note that 𝑑𝑟𝑒𝑙 is not plotted when preceding vehicle is not detected.
Figure 5.10: Simulated time series comparison between POPD and DRD-PB method of a selected
driver. (a): ego vehicle velocity over the time. (b): ego vehicle acceleration over the time. (c):
relative distance between ego vehicle and the preceding vehicle over the time. (d): relative velocity
between ego vehicle and the preceding vehicle. (e): acceleration pedal position and brake pedal
actuation signal.
94
acceleration pedal and with an existence of a preceding vehicle, as shown in Fig. 5.10(a). However,
Fig. 5.10(b) indicates that the peak deceleration generated from the DRD-PB is much larger than
the human data and the POPD at around 100 seconds, 1400 seconds, and 2100 seconds. The
zoom-in portion of Fig. 5.10(a), (b) and (d) show that the ego vehicle encounters a preceding
vehicle between 2110 and 2140 seconds. In this case, the time headway is small, and there exists
a risk of collision. The DRD-PB is not able to predict the upcoming collision event until two
vehicles are extremely close. Therefore, The DRD-PB lets the vehicle coast for a while and then
applies a relatively harsh brake, for which the driver’s ride comfort can deteriorate. The shape of
this acceleration profile in Fig. 5.10(b) mismatches with the driver’s desired action, which can lead
to a feeling of heterogeneity. When it comes to the proposed POPD algorithm, benefiting from the
learned Wiedemann’s constraints, headway constraints, and personalized weights from the driver’s
historical data and the predictive model, the POPD can realize the collision before it happens. As
shown in Fig. 5.10(a), (b), and (d), the proposed POPD generates the braking action that matches
the human’s behavior in a, ∆v and drel . Moreover, the change of acceleration is smooth, which
eliminates the jerky operation during the braking.
To statistically analyze the behavior of the DRD-PB and the POPD in each component of L, the
difference of ȧ, a and vrel to the human behavior from both methods are plotted into histograms as
shown in Fig. 5.11. Because the driver’s behavior contains stochasticity and is also influenced by
every day’s mood and purpose of driving, which cannot be quantified easily. Therefore, the errors
in each component of L can only be reduced but not eliminated. The blue and red solid lines in each
subplot of Fig. 5.11 represent the probability density function (PDF) of the DRD-PB and the POPD
in the validation set. The standard deviation σ = [σȧ , σa , σv ] of POPD is [0.467, 0.413, 0.891],
which is smaller than [0.479, 0.425, 0.949] of the DRD-PB in every perspective. Therefore, it can
be concluded that with the POPD, braking operation becomes more similar to human behavior
than that with the DRD-PB. The statistical analysis of three additional drivers is listed in Table 5.1,
clearly indicating that the POPD outperforms the DRD-PB in terms of σ for every independent
driver.
Figure 5.11: Probability distribution comparison between DRD-PB and POPD to the human driver:
the probability density function shows the brake action generated from POPD is more similar to
humans than the DRD-PB. The mean and standard deviation are listed as driver #3 in Table 5.1.
95
Table 5.1: Statistics of three selected drivers:
∆v error
ȧ error (m/s3 ) a error (m/s2 )
Driver Method (m/s)
µ σ µ σ µ σ
DRD-PB 0.032 0.479 -0.001 0.425 0.132 0.949
1
POPD -0.023 0.467 -0.088 0.413 0.192 0.891
DRD-PB 0.025 0.494 -0.038 0.617 -0.103 2.00
2
POPD -0.029 0.374 -0.027 0.295 -0.224 0.767
DRD-PB -0.017 0.560 -0.023 0.706 0.093 1.626
3
POPD -0.023 0.522 -0.088 0.545 0.171 1.181
96
(a) (b)
(c)
Figure 5.12: Human-in-the-loop co-simulation environment: (a) driving route from Ann Arbor
area, (b) simulator setup (c) simulator interface
acceleration/deceleration with the acceleration and brake pedals, like driving a real-world vehicle.
These input signals are collected by CarMaker® and then passed to GT-suite® [116] for powertrain
and vehicle dynamics simulation. It is noted that a lane shift is made by the driver’s decision when
it is needed. In the meantime, PTV VISSIM® [117] generates random numbers of robot sedans and
robot buses to the CarMaker® environment that share the same road with the test vehicle. These
robot vehicles are programmed by VISSIM® and follow a similar behavior to real-world drivers.
Robot vehicles make lane shifts as they desire and wait for stop signs and red traffic signals. The
traffic volume of randomly generated robot vehicles is obtained through the Michigan department
of transportation website [118]. There is a possibility that neighboring lane vehicles will cut in
with high relative speed, which will be a chance to test the safety level of the proposed POPD. The
POPD algorithm is compiled from Matlab/SIMULINK® to C-code and is called by the CarMaker®
master environment.
97
5.7.2 Experimental Results
During the experiment, two different drivers are involved. Each driver performs two test drives on
the same route, with and without the proposed algorithm implemented. In order to show the full
potential of the POPD, the drivers’ behavior is fully learned before the test drives. The overall trip
time, braking time by human and braking time by the algorithm of each driver are summarized in
Table 5.2.
As shown in Table 5.2, because of the driving style difference, each driver has completed the
test drive with different trip times. In addition, the randomness in the traffic simulation (e.g.,
neighboring vehicles and traffic signals) makes even the same driver complete the same route with
different trip times. Without the POPD algorithm, the overall braking time occupies 31.3% of
driver A’s overall trip time and 16.7% of driver B’s trip time. However, with POPD included,
the human braking time is reduced to 5.25% and 3.56% of the total trip time, respectively, which
significantly reduces the driver’s brake pedal efforts. Note that the current POPD algorithm does
not use any navigation information, which means that the human driver is responsible for braking
events caused by traffic signals, stop signs, and slowing down for turning left/right.
The time-series response of driver A’s ego vehicle velocity and the preceding vehicle velocity
is shown in Fig. 5.13 (a). The desire and actual acceleration, relative distance to the preceding
vehicle, and relative speed to the preceding vehicle are listed in Fig. 5.13 (b), (c), and (d), respec-
tively. The algorithm-generated braking signal, human override signal, and algorithm activation
indicator over time are shown in Fig. 5.13 (e). Overall, Fig. 5.13 (a) shows that driver A can track
the preceding vehicle’s speed relatively well with the POPD implemented onto the ego vehicle.
Fig. 5.13 (b) demonstrates that the actual acceleration always matches the desired acceleration if
the driver does not override the algorithm. It is worth noting that the drel in Fig. 5.13 (c) is set to 0
if the preceding vehicle does not exists. Otherwise, the constraints of the relative distance can be
satisfied most of the time.
From the time 1000 to 1200 seconds, there are several consecutive stop signs, and human driver
A would have to press the brake pedal by himself in Fig. 5.13(e). With navigation information
Driver A Driver B
w/o POPD w/ POPD w/o POPD w/ POPD
Trip time (s) 1345 1383 1635 1623
Brake time (s) 425.6 72.7 273.0 57.8
POPD usage (s) - 540.9 - 633
98
Figure 5.13: Human-in-loop experimental results: (a) the ego vehicle velocity compared to the
preceding vehicle, (b) desired and actual acceleration, (c) relative distance between the ego vehicle
and the preceding vehicle and constraints (d) relative velocity between the ego vehicle and the
preceding vehicle, (e) brake pedal position from POPD, brake pedal position from human driver,
algorithm activation indicator (Ipb).
included in the future, the human brake usage in the POPD case can be further decreased from
5.25% (or 3.56%) to an even lower value. During driving, the human driver can override the
operation from the POPD with the gas/brake pedal when feeling uncomfortable. As Table 5.2
indicates, the driver merely overrides the POPD action except for traffic/ stop signs, meaning that
the driver accepts the generated deceleration most of the time. Throughout the entire trip, there
was no collision observed, and the test driver did not feel heterogeneity when POPD engaged, as
shown in Fig. 5.13.
During the time 1350 seconds to 1360 seconds in Fig. 5.13(a) and (d), driver A intentionally
pushes the acceleration pedal, approaches the preceding vehicle at high speed, and releases the
acceleration pedal when two vehicles are extremely close. At the time 1358 seconds when POPD
was activated, it successfully recognized the danger of collision and reduced the vehicle’s speed
with more consideration for safety and less consideration of comfort, as shown in Fig. 5.13 (b), (c)
and (e).
99
5.8 Summary
This study proposes a personalized one-pedal-driving (POPD) algorithm using learning-based
MPC. To capture a driver’s unique characteristics, two personalized constraints, namely time head-
way and perceptual constraints, are identified based on the analysis of real-world on-road driving
data. Learning a driver’s braking behavior is realized through the optimization of the weighting
factors in the MPC cost function with particle swarm optimization using historical driver data.
Specifically, an evaluation metric is introduced to judge the similarity between the controller’s and
the human driver’s actions. With this evaluation metric, optimal weights learning is conducted to
ensure the controller’s action matches the individual driver’s expectation.
A comparative case study shows that predicting a preceding vehicle’s speed with a constant
acceleration assumption leads to a similar level of control performance compared to the case with
a precise estimation of the preceding vehicle’s speed. The comparative study also reveals that
the optimal prediction horizon for this car-following control problem is 8 seconds. An open-loop
simulation with 450 cycles from different drivers was conducted to test the proposed POPD’s
capability of matching driver’s braking behavior. The simulation results show that, on average,
the proposed POPD algorithm outperforms our earlier developed desired relative distance model-
based personalized braking algorithm by over 50% in terms of similarity to the individual human
drivers.
In addition, a human-in-loop experiment was conducted under a GT-
Suite® /CarMaker® /SIMULINK® /VISSIM® co-simulation environment. Over this simulator
test with two different drivers, the proposed POPD algorithm proved its effectiveness and
real-time implementability in a real-life driving environment. Furthermore, by implementing the
POPD algorithm, both drivers can finish the same drive cycle with only one pedal control in most
scenarios, which dramatically reduces the deceleration pedal’s effort. Meanwhile, both drivers are
satisfied with the deceleration generated from the POPD as the driver merely overrides the brake
signals.
100
CHAPTER 6
Conclusion
101
tionally expensive hierarchical control approach to vehicle power-split and longitudinal motion in
a sequential manner.
Despite the P0+P4 MHEV’s advantages, the three power source structure brings extra difficul-
ties in designing a real-time torque split strategy. In this dissertation, the safety, ride comfort, and
fuel economy of a P0+P4 MHEV are optimized in a sequential manner: torque split optimization
and vehicle longitudinal motion optimization. Moreover, personalized behaviors of an individual
driver are considered in control design to further improve the drivability and ride comfort through
a one-pedal-driving feature.
To begin with, the detailed P0+P4 MHEV model considers realistic constraints are developed.
Those realistic constraints include nonlinear tire, longitudinal load transfer and brake constraints
due to vehicle safety. Then, the dynamic programming analysis of the P0+P4 MHEV is conducted
on three standard test cycles such as the WLTC, the UDDS and the HWFET. Finally, through
analyzing the global optimal solution of energy consumption and torque distribution, two important
features are derived from the DP results:
1. The P0 motor is hardly used for propulsion unless the demanding torque at the front axle
exceeds the engine’s maximum torque.
2. In the case of braking, the torque distribution at the front and the rear axle has a clear rela-
tionship with the current deceleration.
Due to the first feature, the real-time torque split strategy can exclude the P0 motor from the control
variables. Hence, the power split during the propulsion is solved with one of the well-known
optimization-based techniques, A-ECMS. The second feature is captured by a modified logistic
function in equation (2.30). During the braking event, the torque distribution can be determined
through the diagram shown in figure 2.8. The overall control framework is named approximated
A-ECMS for P0+P4 MHEVs.
In addition to the optimization-based strategy, this dissertation proposes another learning-based
torque-split algorithm using a twin delayed deep deterministic policy gradient algorithm with pri-
oritized exploration and experience replay (TD3-PEER). Due to the data-driven characteristic,
this TD3-PEER can be easily migrated to a similar hybrid architecture or a BEV with multiple
different-sized motors. The simulated training process with different seeds demonstrated the sta-
bility of the proposed TD3 algorithm. The case study also proves that the expert knowledge from
DP results helps the agent achieve much better fuel consumption. The proposed TD3 method out-
performs state-of-art DDPG+PER and DQN in fuel economy among all five standard test cycles.
Regarding the vehicle longitudinal motion, this dissertation proposes a defensive ecological
adaptive cruise control strategy (DEco-ACC). The DEco-ACC formulates the multi-lane car-
following system into MPC problem, and blind spot zone of neighboring vehicles as a one-time dif-
102
ferentiable continuous penalty function. Unlike the traditional ACC or Eco-ACC, this DEco-ACC
allows the vehicle proactively avoid the blind spot zone of the neighboring vehicles within its
prediction horizon. The simulation results show that with two neighboring vehicles present si-
multaneously, the DEco-ACC reduces 29.5% of dwelling time in the blind spot and only scarifies
0.4% of fuel consumption. The case study about four neighboring vehicles during car following is
presented in the appendix.
In complex urban driving scenarios, the DEco-ACC may be disabled due to many stop signs
and traffic signals. To cope with such situation, this dissertation proposes a personalized one-
pedal-driving algorithm (POPD) based on a learning-based model predictive controller whose pa-
rameters are trained with the driver’s historical braking data. With the POPD, braking actions will
be performed automatically during car-following. During the driving, the driver may press the
brake/acceleration pedal to override the actions done by the POPD controller temporarily. Such
overriding behavior will be recorded for future parameter learning improvements. The proposed
POPD algorithm is validated through a Human-in-the-loop co-simulation with complex driving
scenarios duplicated from the Ann Arbor area in Michigan, including multiple traffic lanes, stop
signs, roundabouts, and traffic signals. The experiment shows that the brake pedal usage is reduced
from 31.3% to 5.25% for human driver A and from 16.7 to 3.56% for human driver B.
103
energy consumption over the trip. The studies in [123–132] have demonstrated the potential of
such combinations. The energy efficiency of the proposed AA-ECMS and TD3 may be enhanced
after the trip information included. Also, future work will include fuel economy and dynamic
performance of the proposed strategy through vehicle testing for experimental verification.
104
brake pedal position data, a driver usually does not override actions from the ACC controller un-
less they turn off the ACC mode. Therefore, a novel data collection approach should be developed.
Several existing pieces of literature about personalized ACC can be a reference [105, 146–152].
Moreover, Chapter 4 assumes that all states required by DEco-ACC can be observed accurately.
However, in real-world operation, states acquired from sensors may be subject to measurement
error and noise. It is worth exploring the performance of DEco-ACC with the existence of state
estimation/observation error. The implementation of robust MPC on DEco-ACC can be a potential
solution to this future direction.
105
APPENDIX
The simulation shown in section 4.4.3 is mainly focused on the controller’s performance at high-
speed operation with two neighboring vehicles (2NVs). When a traffic condition changes (e.g.,
heavy traffic or urban driving), the number of neighboring vehicles could increases. Therefore,
in this Appendix, the DEco-ACC is modified to handle a 4-NVs scenario and its performance is
presented through low speed car-following scenarios with more neighboring vehicles surrounded
from the NGSIM data. More specifically, 10 cases of 4-NVs scenarios are extracted from real-
world NGSIM data.
𝑦𝑦𝐸𝐸
𝐿𝐿𝐵𝐵𝐵𝐵𝐵𝐵2 𝐿𝐿𝐵𝐵𝐵𝐵𝐵𝐵4
Figure A.1: A concept of car-following in consideration of the BSZs of four neighboring vehicles.
106
In a 4-NVs scenario, the state-space matrices in Eq. (4.9) are modified as follows:
1 0 0 −Ts − 21 Ts2 0 0
0 1 0 −Ts − 12 Ts2 0 0
0 0
1 −Ts − 12 Ts2 0 0
A = 0 0 0 1 Ts 0 0 ,
0 0 0 0 1 0 0
1 2
0 0 0 −Ts − 2 Ts 1 0
0 0 0 −Ts − 12 Ts2 0 1
h 3 3 3
iT
Ts2 3 3
B T = − T6s − T6s − T6s 2
Ts − T6s − T6s ,
h iT
dTk = Ts vp Ts vn1 Ts vn2 0 0 Ts vn3 Ts vn4 ,
where Y ∆N E3∗ and Y ∆N E4∗ are blind spots constraints of neighboring vehicle 3 and neighboring
vehicle 4 which are defined in a similar way shown in Eq. (4.8). The following constraints are
included to address the neighboring vehicles 3 and 4:
1
|Y ∆N E3∗ | ≥ LBSZ3 ,
2
1
|Y ∆N E4∗ | ≥ LBSZ4 .
2
After including the slack variable and mode signal for neighboring vehicles 3 and 4, the cost
function of this incremental Deco-ACC controller is modified as follows:
N
X −1
J= u2k P1 + (ae,k )2 P2 + (ve,k − vp,k )2 P3
k=0
107
Table A.1: Time span and average preceding vehicle velocity of each car following scenario with
four neighboring vehicles.
As shown in Table A.1, a total of 10 real driving scenarios are extracted with a preceding
vehicle’s velocity ranging from 5.36 m/s to 8.89 m/s which is much lower than 23.9 m/s from the
case study in Section 4.4.3. Since the ego vehicle is rarely surrounded by four neighboring vehicles
simultaneously, each realistic scenario is relatively short, ranging from 20 seconds to 50 seconds.
The comparison of both controllers among all 10 cases are summarized in Fig. A.2. The fuel
consumption and dwelling time of DEco-ACC controller are normalized with respect to the values
obtained with the Eco-ACC controller. It can be seen that the DEco-ACC controller can reduce
the BSZ dwelling time in the most cases, and that as discussed in Section 4.4, an additional fuel
consumption could be led by the DEco-ACC.
Figure A.3 compares the operations of the DEco-ACC and Eco-ACC for a specific 4-NVs sce-
nario in detail. In Fig. A.3(a), the filled areas show the trajectory of the BSZ associated with each
neighboring vehicle. From Fig. A.3(a), 2 s to 8 s, the DEco-ACC controller commands the ego
vehicle to drive at a lower speed in order to avoid the blind spot of neighboring vehicle N4. How-
ever, the ego vehicle with Eco-ACC controller ignores the blind spot and drives into it. A similar
behavior can be observed from 19 seconds to 21 seconds, the DEco-ACC controller maintains the
ego vehicle outside of the blind spots of N4 and visible to other neighboring vehicle drivers which
could improve the safety from lane-shifting-collision. However, the ego vehicle with Eco-ACC
controller drives directly into the blind spots from time 19 to 21 seconds.
108
Figure A.2: Performance comparison of the Deco-ACC in 10 real driving scenarios including four
neighboring vehicles.
109
Figure A.3: Comparison of trajectories with DEco-ACC and Eco-ACC at lower speed operation
when there are four neighboring vehicles: (a) displacement of each vehicle, relative to the preced-
ing vehicle, (b) velocity, acceleration, and jerk
110
BIBLIOGRAPHY
[1] “Energy and the environment explained outlook for future emissions,”
https://www.eia.gov/energyexplained/energy-and-the-environment/outlook-for-future-
emissions.php, accessed: 2022-08-31.
[2] “Final rule to revise existing national ghg emissions standards for passenger cars and light
trucks through model year 2026,” https://www.epa.gov/regulations-emissions-vehicles-and-
engines/final-rule-revise-existing-national-ghg-emissions, accessed: 2022-08-31.
[4] National Highway Traffic Safety Administration Environmental Protection Agency, “2017
and later model year light-duty vehicle greenhouse gas emissions and corporate average fuel
economy standards; final rule,” 2017.
[5] Z. Liu, A. Ivanco, and Z. S. Filipi, “Impacts of real-world driving and driver aggressiveness
on fuel consumption of 48V mild hybrid vehicle,” SAE International Journal of Alternative
Powertrains, vol. 5, no. 2, pp. 249–258, 2016.
[6] E. Song, L. Fan, G. Liu, and W. Long, “Numerical simulation of combination engine hev on
fuel economy,” in 2010 WASE International Conference on Information Engineering, vol. 4.
IEEE, 2010, pp. 244–249.
[7] M. Kuypers, “Application of 48 volt for mild hybrid vehicles and high power loads,” in SAE
Technical Paper, 2014, no. 2014-01-1790.
[9] P. Biswas, “Adapting SUV AWD powertrain to P0/P2/P4 hybrid EV architecture: Integrative
packaging and capability study,” in 2017 IEEE Transportation Electrification Conference
(ITEC-India). IEEE, 2017, pp. 1–5.
[10] S. Lee, J. Cherry, M. Safoutin, A. Neam, J. McDonald, and K. Newman, “Modeling and
controls development of 48 V mild hybrid electric vehicles,” in SAE Technical Paper, 2018,
no. 2018-01-0413.
111
[11] M. Werra, A. Sturm, and F. Küçükay, “Optimal and prototype dimensioning of 48 V P0+P4
hybrid drivetrains,” Automotive and Engine Technology, pp. 1–14, 2020.
[13] D. Lodaya, J. Zeman, M. Okarmus, S. Mohon, P. Keller, J. Shutty, and N. Kondipati, “Op-
timization of fuel economy using optimal controls on regulatory and real-world driving
cycles,” in SAE Technical Paper, 2020, no. 2020-01-1007.
[15] T. Hofman, M. Steinbuch, R. Van Druten, and A. Serrarens, “Rule-based energy manage-
ment strategies for hybrid vehicles,” International Journal of Electric and Hybrid Vehicles,
vol. 1, no. 1, pp. 71–94, 2007.
[16] A. M. Ali and D. Söffker, “Towards optimal power management of hybrid electric vehicles
in real-time: A review on methods, challenges, and state-of-the-art solutions,” Energies,
vol. 11, no. 3, p. 476, 2018.
[17] J. Peng, H. He, and R. Xiong, “Rule based energy management strategy for a series–parallel
plug-in hybrid electric bus optimized by dynamic programming,” Applied Energy, vol. 185,
pp. 1633–1643, 2017.
[18] S. G. Li, S. M. Sharkh, F. C. Walsh, and C.-N. Zhang, “Energy and battery management of
a plug-in series hybrid electric vehicle using fuzzy logic,” IEEE Transactions on Vehicular
Technology, vol. 60, no. 8, pp. 3571–3585, 2011.
[19] G. Jinquan, H. Hongwen, P. Jiankun, and Z. Nana, “A novel mpc-based adaptive energy
management strategy in plug-in hybrid electric vehicles,” Energy, vol. 175, pp. 378–392,
2019.
[20] Y. Wang, H. Tan, Y. Wu, and J. Peng, “Hybrid electric vehicle energy management with
computer vision and deep reinforcement learning,” IEEE Transactions on Industrial Infor-
matics, vol. 17, no. 6, pp. 3857–3868, 2020.
[21] Y. Hu, W. Li, K. Xu, T. Zahid, F. Qin, and C. Li, “Energy management strategy for a hybrid
electric vehicle based on deep reinforcement learning,” Applied Sciences, vol. 8, no. 2, p.
187, 2018.
[22] Y. Li, H. He, J. Peng, and H. Wang, “Deep reinforcement learning-based energy manage-
ment for a series hybrid electric vehicle enabled by history cumulative trip information,”
IEEE Transactions on Vehicular Technology, vol. 68, no. 8, pp. 7416–7430, 2019.
112
[23] M. Volodymyr, K. Koray, S. David, G. Alex, A. Ioannnis, W. Daan, and R. Martin, “Playing
atari with deep reinforcement learning. arxiv 2013,” arXiv preprint arXiv:1312.5602.
[24] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust region policy optimiza-
tion,” in International conference on machine learning. PMLR, 2015, pp. 1889–1897.
[25] T. Liu, Y. Zou, D. Liu, and F. Sun, “Reinforcement learning of adaptive energy manage-
ment with transition probability for a hybrid electric tracked vehicle,” IEEE Transactions
on Industrial Electronics, vol. 62, no. 12, pp. 7837–7846, 2015.
[26] Y. Zou, T. Liu, D. Liu, and F. Sun, “Reinforcement learning-based real-time energy man-
agement for a hybrid tracked vehicle,” Applied energy, vol. 171, pp. 372–382, 2016.
[27] M. Sun, P. Zhao, and X. Lin, “Power management in hybrid electric vehicles using deep
recurrent reinforcement learning,” Electrical Engineering, vol. 104, no. 3, pp. 1459–1471,
2022.
[28] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra,
“Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971,
2015.
[29] R. Lian, J. Peng, Y. Wu, H. Tan, and H. Zhang, “Rule-interposing deep reinforcement learn-
ing based energy management strategy for power-split hybrid electric vehicle,” Energy, vol.
197, p. 117297, 2020.
[30] R. Liessner, C. Schroer, A. M. Dietermann, and B. Bäker, “Deep reinforcement learning for
advanced energy management of hybrid electric vehicles.” in ICAART (2), 2018, pp. 61–72.
[31] R. Huang, H. He, X. Meng, Y. Wang, R. Lian, and Y. Wei, “Energy management strategy for
plug-in hybrid electric bus based on improved deep deterministic policy gradient algorithm
with prioritized replay,” in 2021 IEEE Vehicle Power and Propulsion Conference (VPPC).
IEEE, 2021, pp. 1–6.
[32] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum
entropy deep reinforcement learning with a stochastic actor,” in International conference on
machine learning. PMLR, 2018, pp. 1861–1870.
[33] S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approximation error in actor-
critic methods,” in International conference on machine learning. PMLR, 2018, pp. 1587–
1596.
[34] J. Wu, Z. Wei, W. Li, Y. Wang, Y. Li, and D. U. Sauer, “Battery thermal-and health-
constrained energy management for hybrid electric bus based on soft actor-critic drl algo-
rithm,” IEEE Transactions on Industrial Informatics, vol. 17, no. 6, pp. 3751–3761, 2020.
[35] J. Zhou, S. Xue, Y. Xue, Y. Liao, J. Liu, and W. Zhao, “A novel energy management strategy
of hybrid electric vehicle via an improved td3 deep reinforcement learning,” Energy, vol.
224, p. 120118, 2021.
113
[36] F. Pardo, “Tonic: A deep reinforcement learning library for fast prototyping and benchmark-
ing,” arXiv preprint arXiv:2011.07537, 2020.
[37] A. Sciarretta and A. Vahidi, “Energy-efficient speed profiles (eco-driving),” in Energy-
Efficient Driving of Road Vehicles. Springer, 2020, pp. 131–178.
[38] A. K. Madhusudhanan and X. Na, “Effect of a traffic speed based cruise control on an
electric vehicle’s performance and an energy consumption model of an electric vehicle,”
IEEE/CAA Journal of Automatica Sinica, vol. 7, no. 2, pp. 386–394, March 2020.
[39] Y. Jia, R. Jibrin, and D. Gorges, “Energy-optimal adaptive cruise control for electric vehicles
based on nonlinear model predictive control,” in 2019 IEEE Vehicle Power and Propulsion
Conference (VPPC), Oct 2019, pp. 1–7.
[40] Y. Zhu, D. Zhao, and H. He, “Synthesis of cooperative adaptive cruise control with feedfor-
ward strategies,” IEEE Transactions on Vehicular Technology, pp. 1–1, 2020.
[41] Y. He, Q. Zhou, M. Makridis, K. Mattas, J. Li, H. Williams, and H. Xu, “Multi-objective
co-optimization of cooperative adaptive cruise control and energy management strategy for
phevs,” IEEE Transactions on Transportation Electrification, pp. 1–1, 2020.
[42] M. I. Miftakhudin, A. Subiantoro, and F. Yusivar, “Adaptive cruise control by considering
control decision as multistage mpc constraints,” in 2019 IEEE Conference on Energy Con-
version (CENCON), Oct 2019, pp. 171–176.
[43] B. Sakhdari, M. Vajedi, and N. L. Azad, “Ecological adaptive cruise control of a plug-in
hybrid electric vehicle for urban driving,” in 2016 IEEE 19th International Conference on
Intelligent Transportation Systems (ITSC), 2016, pp. 1739–1744.
[44] D. He, W. He, and X. Song, “Efficient predictive cruise control of autonomous vehicles with
improving ride comfort and safety,” Measurement and control, vol. 53, no. 1-2, pp. 18–28,
2020.
[45] M. Á. Sotelo and J. Barriga, “Blind spot detection using vision for automotive applications,”
Journal of Zhejiang University-Science A, vol. 9, no. 10, pp. 1369–1372, 2008.
[46] J. Kim and D. Kum, “Collision risk assessment algorithm via lane-based probabilistic mo-
tion prediction of surrounding vehicles,” IEEE Transactions on Intelligent Transportation
Systems, vol. 19, no. 9, pp. 2965–2976, 2017.
[47] D. Kim, J. S. Eo, and K.-K. K. Kim, “Parameterized Energy-Optimal Regenerative Brak-
ing Strategy for Connected and Autonomous Electrified Vehicles: A Real-Time Dynamic
Programming Approach,” IEEE Access, vol. 9, pp. 103 167–103 183, 2021.
[48] S. Zhang and X. Zhuan, “Study on adaptive cruise control strategy for battery electric vehi-
cle,” Math. Probl. Eng., vol. 2019, 2019.
[49] J. Guo, W. Li, J. Wang, Y. Luo, and K. Li, “Safe and energy-efficient car-following con-
trol strategy for intelligent electric vehicles considering regenerative braking,” IEEE Trans.
Intell. Transp. Syst., vol. 23, no. 7, pp. 7070–7081, 2021.
114
[50] S. P. Deligianni, M. Quddus, A. Morris, A. Anvuur, and S. Reed, “Analyzing and modeling
drivers’ deceleration behavior from normal driving,” Transp. Res. Rec., vol. 2663, no. 1, pp.
134–141, 2017.
[53] K. Min, G. Sim, S. Ahn, M. Sunwoo, and K. Jo, “Vehicle Deceleration Prediction Model
to Reflect Individual Driver Characteristics by Online Parameter Learning for Autonomous
Regenerative Braking of Electric Vehicles,” Sensors, vol. 19, no. 19, p. 4171, Sep. 2019.
[54] M. Treiber, A. Hennecke, and D. Helbing, “Congested traffic states in empirical observa-
tions and microscopic simulations,” Physical Review E, vol. 62, no. 2, pp. 1805–1824, Aug.
2000.
[55] M. U. Cuma, Ç. D. Ünal, and M. M. Savrun, “Design and implementation of algorithms for
one pedal driving in electric buses,” Eng. Sci. Technol. Int. J., vol. 24, no. 1, pp. 138–144,
2021.
[56] J. Wang, I. Besselink, J. van Boekel, and H. Nijmeijer, “Evaluating the energy efficiency of
a one pedal driving algorithm,” in 2015 European Battery, Hybrid and Fuel Cell Electric
Vehicle Congress (EEVC 2015), 2015.
[57] S. Yang, Z. Su, and P. Chen, “Robust inter-vehicle spacing control for battery electric vehi-
cles with one-pedal-driving feature,” in 2021 IEEE Conference on Control Technology and
Applications (CCTA). IEEE, 2021, pp. 259–264.
[58] D. Schafer, M. Lamantia, and P. Chen, “Modeling and spacing control for an electric vehicle
with one-pedal-driving feature,” in 2021 American Control Conference (ACC). IEEE, 2021,
pp. 166–171.
[59] J. Van Boekel, I. Besselink, and H. Nijmeijer, “Design and realization of a one-pedal-driving
algorithm for the tu/e lupo el,” World Electr. Veh. J.l, vol. 7, no. 2, pp. 226–237, 2015.
[60] Y. Saito and P. Raksincharoensak, “Risk predictive haptic guidance: Driver assistance with
one-pedal speed control interface,” in 2017 IEEE International Conference on Systems,
Man, and Cybernetics (SMC). IEEE, 2017, pp. 111–116.
[61] L.-W. Chen and G.-L. Wang, “Risk-aware and collision-preventive cooperative fleet cruise
control based on vehicular sensor networks,” IEEE Transactions on Systems, Man, and Cy-
bernetics: Systems, vol. 52, no. 1, pp. 179–191, 2021.
[62] D. Lang, T. Stanger, and L. del Re, “Opportunities on fuel economy utilizing v2v based
drive systems,” SAE Technical Paper, Tech. Rep., 2013.
115
[63] S. Darbha, S. Konduri, and P. R. Pagilla, “Effects of v2v communication on time headway
for autonomous vehicles,” in 2017 American control conference (ACC). IEEE, 2017, pp.
2002–2007.
[64] Y. He, K. H. Kwak, Y. Kim, D. Jung, J. H. Lee, and J. Ha, “Real-time torque-split strategy
for p0+ p4 mild hybrid vehicles with eawd capability,” IEEE Transactions on Transportation
Electrification, vol. 8, no. 1, pp. 1401–1413, 2021.
[65] Y. He, Y. Kim, D. Y. Lee, and S.-H. Kim, “Defensive ecological adaptive cruise control con-
sidering neighboring vehicles’ blind-spot zones,” IEEE Access, vol. 9, pp. 152 275–152 287,
2021.
[66] L. Hewing, K. P. Wabersich, M. Menner, and M. N. Zeilinger, “Learning-based model pre-
dictive control: Toward safe learning in control,” Annu. Rev. Control Robot. Auton. Syst.,
vol. 3, pp. 269–296, 2020.
[67] C. Musardo, G. Rizzoni, Y. Guezennec, and B. Staccia, “A-ECMS: An adaptive algorithm
for hybrid electric vehicle energy management,” European Journal of Control, vol. 11, no.
4-5, pp. 509–524, 2005.
[68] Z. Zhu, Y. Liu, and M. Canova, “Energy management of hybrid electric vehicles via deep
Q-networks,” in 2020 American Control Conference (ACC). IEEE, 2020, pp. 3077–3082.
[69] A. Chasse, A. Sciarretta, and J. Chauvin, “Online optimal control of a parallel hybrid with
costate adaptation rule,” IFAC proceedings volumes, vol. 43, no. 7, pp. 99–104, 2010.
[70] H. B. Pacejka and E. Bakker, “The magic formula tyre model,” Vehicle System Dynamics,
vol. 21, no. sup001, pp. 1–18, 1992.
[71] United Nation Economic Commission for Europe, “Uniform provisions concerning the ap-
proval of vehicles of categories M, N and O with regard to braking. Addendumc 12: Regu-
lation No. 13,” On the WWW, Mar 2014, uRL https://www.unece.org/.
[72] P. Dekraker, D. Barba, A. Moskalik, and K. Butters, “Constructing engine maps for full
vehicle simulation modeling,” in SAE Technical Paper, 2018, no. 2018-01-1412.
[73] O. Sundstrom and L. Guzzella, “A generic dynamic programming matlab function,” in 2009
IEEE Control Applications, (CCA) Intelligent Control, (ISIC), 2009, pp. 1625–1630.
[74] A. Pennycott, L. D. Novellis, P. Gruber, and A. Sorniotti, “Optimal braking force allocation
for a four-wheel drive fully electric vehicle:,” Proceedings of the Institution of Mechanical
Engineers, Part I: Journal of Systems and Control Engineering, 2014.
[75] S. Onori and L. Serrao, “On adaptive-ECMS strategies for hybrid electric vehicles,” in Pro-
ceedings of the international scientific conference on hybrid and electric vehicles, Malmai-
son, France, vol. 67, 2011.
[76] L. Serrao, S. Onori, and G. Rizzoni, “ECMS as a realization of Pontryagin’s minimum
principle for hev control,” in 2009 American control conference. IEEE, 2009, pp. 3964–
3969.
116
[77] R. Bellman, “A markovian decision process,” Journal of mathematics and mechanics, pp.
679–684, 1957.
[78] T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” arXiv
preprint arXiv:1511.05952, 2015.
[79] D. Horgan, J. Quan, D. Budden, G. Barth-Maron, M. Hessel, H. Van Hasselt, and D. Silver,
“Distributed prioritized experience replay,” arXiv preprint arXiv:1803.00933, 2018.
[80] B. Xu, X. Tang, X. Hu, X. Lin, H. Li, D. Rathod, and Z. Wang, “Q-learning-based super-
visory control adaptability investigation for hybrid electric vehicles,” IEEE Transactions on
Intelligent Transportation Systems, 2021.
[83] SAE J941, “Motor vehicle drivers’ eye locations,” SAE International, Standard, 2008.
[84] SAE J1050, “Describing and measuring the driver’s field of view,” SAE International, Stan-
dard, 2001.
[85] Laboratory Test Procedure For FMVSS 111 Rear Visibility, “Rear visibility (other than
school buses),” National Highway Traffic Safety Administration, Standard, 2018.
[86] R. Schmied, H. Waschl, R. Quirynen, M. Diehl, and L. del Re, “Nonlinear mpc for emission
efficient cooperative adaptive cruise control,” IFAC-papersonline, vol. 48, no. 23, pp. 160–
165, 2015.
[87] V. Kovali, V. Alexiadis, and L. Zhang, “Video-based vehicle trajectory data collection,” in
Proceedings of the 86th annual meeting of the TRB, 2007.
[88] M. Montanino and V. Punzo, “Making ngsim data usable for studies on traffic flow theory:
Multistep method for vehicle trajectory reconstruction,” Transportation Research Record,
vol. 2390, no. 1, pp. 99–111, 2013.
[89] V. Punzo, M. T. Borzacchiello, and B. Ciuffo, “On the assessment of vehicle trajectory data
accuracy and application to the next generation simulation (ngsim) program data,” Trans-
portation Research Part C: Emerging Technologies, vol. 19, no. 6, pp. 1243–1262, 2011.
[90] X.-Y. Lu and A. Skabardonis, “Freeway traffic shockwave analysis: exploring the ngsim
trajectory data,” in 86th Annual Meeting of the Transportation Research Board, Washington,
DC. Citeseer, 2007.
[91] B. Coifman and L. Li, “A critical evaluation of the next generation simulation (ngsim) ve-
hicle trajectory dataset,” Transportation Research Part B: Methodological, vol. 105, pp.
362–377, 2017.
117
[92] J. A. E. Andersson, J. Gillis, G. Horn, J. B. Rawlings, and M. Diehl, “CasADi – A software
framework for nonlinear optimization and optimal control,” Mathematical Programming
Computation, vol. 11, pp. 1–36, 2019.
[93] M. Risbeck and J. Rawlings, “Mpctools: Nonlinear model predictive control tools for
casadi,” 2016.
[94] E. Hyeon, Y. Kim, N. Prakash, and A. G. Stefanopoulou, “Short-term speed forecasting us-
ing vehicle wireless communications,” in Proceedings of the 2019 American Control Con-
ference (ACC). IEEE, 2019, pp. 736–741.
[95] Y. Du, C. Liu, and Y. Li, “Velocity control strategies to improve automated vehicle driving
comfort,” IEEE Intelligent transportation systems magazine, vol. 10, no. 1, pp. 8–18, 2018.
[96] G. A. Hubbard and K. Youcef-Toumi, “System level control of a hybrid-electric vehicle driv-
etrain,” in Proceedings of the 1997 American Control Conference (Cat. No. 97CH36041),
vol. 1. IEEE, 1997, pp. 641–645.
[98] S. Cheng, L. Li, M. Mei, Y. Nie, and L. Zhao, “Multiple-objective adaptive cruise control
system integrated with dyc,” IEEE Transactions on Vehicular Technology, vol. 68, no. 5, pp.
4550–4559, 2019.
[99] Y. Zhu, D. Zhao, and H. He, “Synthesis of cooperative adaptive cruise control with feedfor-
ward strategies,” IEEE Trans. Veh. Technol., vol. 69, no. 4, pp. 3615–3627, 2020.
[101] I. W. Suweda, “Time headway analysis to determine the road capacity,” Jurnal Spektran,
vol. 4, no. 2, 2016.
[103] U. Durrani, C. Lee, and H. Maoh, “Calibrating the wiedemann’s vehicle-following model
using mixed vehicle-pair interactions,” Transp. Res. C: Emerg. Technol., vol. 67, pp. 227–
242, 2016.
[104] Z. Zhou, Z. Yang, Y. Zhang, Y. Huang, H. Chen, and Z. Yu, “A comprehensive study of
speed prediction in transportation system: From vehicle to traffic,” iScience, p. 103909,
2022.
[105] M. F. Ozkan and Y. Ma, “Personalized adaptive cruise control and impacts on mixed traffic,”
in 2021 American Control Conference (ACC). IEEE, 2021, pp. 412–417.
118
[106] Y. Fan, P. Wang, A. A. Heidari, H. Chen, M. Mafarja et al., “Random reselection particle
swarm optimization for optimal design of solar photovoltaic modules,” Energy, vol. 239, p.
121865, 2022.
[107] W. Zhao, L. Wang, and S. Mirjalili, “Artificial hummingbird algorithm: A new bio-inspired
optimizer with its engineering applications,” Comput. Methods Appl. Mech. Eng., vol. 388,
p. 114194, 2022.
[108] E. Hyeon, Y. Kim, T. Ersal, and A. Stefanopoulou, “Data-driven forgetting and discount
factors for vehicle speed forecasting in ecological adaptive cruise control,” J. Dyn. Syst.
Meas. Control, vol. 144, no. 1, 2022.
[109] W.-K. Lai, T.-H. Kuo, and C.-H. Chen, “Vehicle speed estimation and forecasting methods
based on cellular floating vehicle data,” Applied Sciences, vol. 6, no. 2, p. 47, 2016.
[110] Y. Zhang, J. Lv, and W. Wang, “Evaluation of vehicle acceleration models for emission
estimation at an intersection,” Transp. Res. D: Transp. Environ., vol. 18, pp. 46–50, 2013.
[111] E. F. Camacho and C. B. Alba, Model predictive control. Springer science & business
media, 2013.
[112] K. H. Kwak, Y. He, Y. Kim, Y. M. Chen, S. Fan, J. Holmer, and J. H. Lee, “(Accepted)
Desired relative distance model-based personalized braking algorithm for one-pedal driving
of electric vehicles,” in 2022 Modeling, Estimation, and Control Conference (MECC), 2022.
[113] S. Fan, Y. Sun, J. H. Lee, and J. Ha, “A co-simulation platform for powertrain controls
development,” in SAE Technical Paper, No. 2020-01-0265, 2020.
[114] S. Fan, J. Lee, Y. Sun, J. Ha, and J. Harber, “Virtual platform development for new control
logic concept test and validation,” in SAE Technical Paper, No. 2021-01-1143, 2021.
[115] S. Ziegler and R. Höpler, “Extending the ipg carmaker by fmi compliant units,” in Proceed-
ings of the 8th International Modelica Conference; March 20th-22nd; Technical University;
Dresden; Germany, no. 063, 2011, pp. 779–783.
[116] C. Chen, “The development of hybrid electric vehicle control strategy based on gt-suite and
simulink,” in 2015 International Conference on Intelligent Systems Research and Mecha-
tronics Engineering. Atlantis Press, 2015.
[119] Y. Choi, J. Guanetti, S. Moura, and F. Borrelli, “Data-driven energy management strategy
for plug-in hybrid electric vehicles with real-world trip information,” IFAC-PapersOnLine,
vol. 53, no. 2, pp. 14 224–14 229, 2020.
119
[120] M. Vajedi, “Real-time optimal control of a plug-in hybrid electric vehicle using trip infor-
mation,” 2016.
[122] A. Rajagopalan and G. Washington, “Intelligent control of hybrid electric vehicles using
gps information,” SAE Technical Paper, Tech. Rep., 2002.
[123] F. Tianheng, Y. Lin, G. Qing, H. Yanqing, Y. Ting, and Y. Bin, “A supervisory control
strategy for plug-in hybrid electric vehicles based on energy demand prediction and route
preview,” IEEE Transactions on Vehicular Technology, vol. 64, no. 5, pp. 1691–1700, 2014.
[124] L. C. Fang, G. Xu, T. L. Li, and K. M. Zhu, “Real-time optimal power management for
hybrid electric vehicle based on prediction of trip information,” in Applied Mechanics and
Materials, vol. 321. Trans Tech Publ, 2013, pp. 1539–1547.
[125] X. Zeng and J. Wang, “Optimizing the energy management strategy for plug-in hybrid elec-
tric vehicles with multiple frequent routes,” IEEE Transactions on Control Systems Tech-
nology, vol. 27, no. 1, pp. 394–400, 2017.
[126] Y. Ma and J. Wang, “Integrated power management and aftertreatment system control for
hybrid electric vehicles with road grade preview,” IEEE Transactions on Vehicular Technol-
ogy, vol. 66, no. 12, pp. 10 935–10 945, 2017.
[127] T. S. Kim, C. Manzie, and R. Sharma, “Two-stage optimal control of a parallel hybrid
vehicle with traffic preview,” IFAC Proceedings Volumes, vol. 44, no. 1, pp. 2115–2120,
2011.
[128] L. Guo, H. Chen, B. Gao, and Q. Liu, “Energy management of hevs based on velocity profile
optimization.” Sci. China Inf. Sci., vol. 62, no. 8, pp. 89 203–1, 2019.
[131] X. Zeng and J. Wang, “A parallel hybrid electric vehicle energy management strategy using
stochastic model predictive control with road grade preview,” IEEE Transactions on Control
Systems Technology, vol. 23, no. 6, pp. 2416–2423, 2015.
[132] Z. Yang, H. Chen, S. Dong, Q. Liu, and F. Xu, “Energy management strategy of hybrid
electric vehicle with consideration of road gradient,” in 2020 Chinese Control And Decision
Conference (CCDC). IEEE, 2020, pp. 2879–2885.
120
[133] J. Lin, X. Liu, S. Li, C. Zhang, and S. Yang, “A review on recent progress, challenges
and perspective of battery thermal management system,” International Journal of Heat and
Mass Transfer, vol. 167, p. 120834, 2021.
[134] A. Wei, J. Qu, H. Qiu, C. Wang, and G. Cao, “Heat transfer characteristics of plug-in oscil-
lating heat pipe with binary-fluid mixtures for electric vehicle battery thermal management,”
International Journal of Heat and Mass Transfer, vol. 135, pp. 746–760, 2019.
[135] A. H. Akinlabi and D. Solyali, “Configuration, design, and optimization of air-cooled bat-
tery thermal management system for electric vehicles: A review,” Renewable and Sustain-
able Energy Reviews, vol. 125, p. 109815, 2020.
[136] S. Arora, “Selection of thermal management system for modular battery packs of electric
vehicles: A review of existing and emerging technologies,” Journal of Power Sources, vol.
400, pp. 621–640, 2018.
[137] M. Akbarzadeh, J. Jaguemont, T. Kalogiannis, D. Karimi, J. He, L. Jin, P. Xie, J. Van Mierlo,
and M. Berecibar, “A novel liquid cooling plate concept for thermal management of lithium-
ion batteries in electric vehicles,” Energy Conversion and Management, vol. 231, p. 113862,
2021.
[138] S. Wiriyasart, C. Hommalee, S. Sirikasemsuk, R. Prurapark, and P. Naphon, “Thermal man-
agement system with nanofluids for electric vehicle battery cooling modules,” Case Studies
in Thermal Engineering, vol. 18, p. 100583, 2020.
[139] A. Verma, S. Shashidhara, and D. Rakshit, “A comparative study on battery thermal man-
agement using phase change material (pcm),” Thermal Science and Engineering Progress,
vol. 11, pp. 74–83, 2019.
[140] M. R. Amini, I. Kolmanovsky, and J. Sun, “Hierarchical mpc for robust eco-cooling of
connected and automated vehicles and its application to electric vehicle battery thermal
management,” IEEE Transactions on Control Systems Technology, vol. 29, no. 1, pp. 316–
328, 2020.
[141] J. Gou and W. Liu, “Feasibility study on a novel 3d vapor chamber used for li-ion battery
thermal management system of electric vehicle,” Applied Thermal Engineering, vol. 152,
pp. 362–369, 2019.
[142] J. Han, H. Shu, X. Tang, X. Lin, C. Liu, and X. Hu, “Predictive energy management for plug-
in hybrid electric vehicles considering electric motor thermal dynamics,” Energy Conversion
and Management, vol. 251, p. 115022, 2022.
[143] G. Caramia, N. Cavina, A. Capancioni, M. Caggiano, and S. Patassa, “Combined optimiza-
tion of energy and battery thermal management control for a plug-in hev,” SAE Technical
Paper, Tech. Rep., 2019.
[144] T. J. Boehme, M. Schori, B. Frank, M. Schultalbers, and B. Lampe, “Solution of a hybrid
optimal control problem for parallel hybrid vehicles subject to thermal constraints,” in 52nd
IEEE conference on decision and control. IEEE, 2013, pp. 2220–2226.
121
[145] Q. Hu, M. R. Amini, H. Wang, I. Kolmanovsky, and J. Sun, “Integrated power and thermal
management of connected hevs via multi-horizon mpc,” in 2020 American Control Confer-
ence (ACC). IEEE, 2020, pp. 3053–3058.
[147] B. Gao, K. Cai, T. Qu, Y. Hu, and H. Chen, “Personalized adaptive cruise control based on
online driving style recognition technology and model predictive control,” IEEE transac-
tions on vehicular technology, vol. 69, no. 11, pp. 12 482–12 496, 2020.
[148] Y. Wang, Z. Wang, K. Han, P. Tiwari, and D. B. Work, “Personalized adaptive cruise con-
trol via gaussian process regression,” in 2021 IEEE International Intelligent Transportation
Systems Conference (ITSC). IEEE, 2021, pp. 1496–1502.
[149] C. Su, W. Deng, R. He, J. Wu, and Y. Jiang, “Personalized adaptive cruise control consider-
ing drivers’ characteristics,” SAE Technical Paper, Tech. Rep., 2018.
[150] A. P. Bolduc, L. Guo, and Y. Jia, “Multimodel approach to personalized autonomous adap-
tive cruise control,” IEEE Transactions on Intelligent Vehicles, vol. 4, no. 2, pp. 321–330,
2019.
[151] J. Jiang, F. Ding, Y. Zhou, J. Wu, and H. Tan, “A personalized human drivers’ risk sensitive
characteristics depicting stochastic optimal control algorithm for adaptive cruise control,”
IEEE Access, vol. 8, pp. 145 056–145 066, 2020.
[152] Y. Liu, J. xiang Qin, and M. di Liao, “Analysis and design of personalized adaptive cruise
system,” SAE Technical Paper, Tech. Rep., 2020.
[153] Q. Xin, R. Fu, W. Yuan, Q. Liu, and S. Yu, “Predictive intelligent driver model for eco-
driving using upcoming traffic signal information,” Physica A: Statistical Mechanics and its
Applications, vol. 508, pp. 806–823, 2018.
[154] C. Sun, X. Shen, and S. Moura, “Robust optimal eco-driving control with uncertain traffic
signal timing,” in 2018 annual American control conference (ACC). IEEE, 2018, pp.
5548–5553.
[155] S. Bae, Y. Choi, Y. Kim, J. Guanetti, F. Borrelli, and S. Moura, “Real-time ecological ve-
locity planning for plug-in hybrid vehicles with partial communication to traffic lights,” in
2019 IEEE 58th Conference on Decision and Control (CDC). IEEE, 2019, pp. 1279–1285.
[156] Y. Lu, J. Lu, S. Zhang, and P. Hall, “Traffic signal detection and classification in street views
using an attention model,” Computational Visual Media, vol. 4, no. 3, pp. 253–266, 2018.
[157] A. Salaymeh, “Machine learning techniques for automated traffic signal detection and tim-
ing,” Ph.D. dissertation, Wayne State University, 2021.
122
[158] K. Anirudh, M. S. Dhanoosh, A. Vamsi, and S. Latha, “Driver assisting feature for collision
avoidance, sign and traffic signal detection.”
[159] R. Zhang, A. Ishikawa, W. Wang, B. Striner, and O. K. Tonguz, “Using reinforcement learn-
ing with partial vehicle detection for intelligent traffic signal control,” IEEE Transactions
on Intelligent Transportation Systems, vol. 22, no. 1, pp. 404–415, 2020.
[160] R. J. Franklin et al., “Traffic signal violation detection using artificial intelligence and deep
learning,” in 2020 5th International Conference on Communication and Electronics Systems
(ICCES). IEEE, 2020, pp. 839–844.
[161] Y. Xiang, W. Niu, E. Tong, Y. Li, B. Jia, Y. Wu, J. Liu, L. Chang, and G. Li, “Conges-
tion attack detection in intelligent traffic signal system: combining empirical and analytical
methods,” Security and Communication Networks, vol. 2021, 2021.
[162] Z. Shi, Y. Huang, Z. Hu, and T. Li, “Design of traffic-signal condition detection system
based on intelligence,” in 2019 4th International Conference on Intelligent Green Building
and Smart Grid (IGBSG). IEEE, 2019, pp. 260–263.
[165] F. Ma, Y. Yang, J. Wang, X. Li, G. Wu, Y. Zhao, L. Wu, B. Aksun-Guvenc, and L. Gu-
venc, “Eco-driving-based cooperative adaptive cruise control of connected vehicles platoon
at signalized intersections,” Transportation Research Part D: Transport and Environment,
vol. 92, p. 102746, 2021.
[167] C. Pan, A. Huang, L. Chen, Y. Cai, L. Chen, J. Liang, and W. Zhou, “A review of the
development trend of adaptive cruise control for ecological driving,” Proceedings of the
Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering, vol. 236,
no. 9, pp. 1931–1948, 2022.
[168] L. Zhu, F. Tao, Z. Fu, N. Wang, B. Ji, and Y. Dong, “Optimization based adaptive cruise
control and energy management strategy for connected and automated fchev,” IEEE Trans-
actions on Intelligent Transportation Systems, 2022.
123