0% found this document useful (0 votes)

19 views12 pages

Self-Driving Car Racing: Application of Deep Reinforcement Learning

This document discusses the application of deep reinforcement learning (RL) techniques to develop an AI agent for self-driving car racing in a simulated environment. The project aims to train the agent to navigate tracks efficiently while maximizing performance metrics, utilizing advanced algorithms like DQN, PPO, and transfer learning. The results indicate improvements in training efficiency and performance through the integration of transfer learning and recurrent neural networks.

Uploaded by

gianggbgt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views12 pages

Self-Driving Car Racing: Application of Deep Reinforcement Learning

Uploaded by

gianggbgt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Self-Driving Car Racing: Application of Deep

Reinforcement Learning

Florentiana Yuwono, Gan Pang Yen and Jason Christopher

arXiv:2410.22766v1 [cs.AI] 30 Oct 2024

A0244109L, A0253516H, A0244120Y

{e0851439, e0959104, e0851450}@u.nus.edu
CS3263 Group 9
Self-Driving Car Racing: Application of Deep Reinforcement Learning

1 Problem Understanding and Formulation

1.1 Motivation and rationale
The motivation behind this project stems from the growing interest in autonomous driving systems, as
evidenced by recent advancements in ”smart car” technology, notably exemplified by companies such as
Tesla. Furthermore, the landscape of AI-driven mobility has evolved towards implementing efficient systems
for autonomous car racing, evident through new events such as Abu Dhabi Autonomous Racing League [1]
and the Indy Autonomous Challenge [6], both happening in 2024. These groundbreaking events underscores
the urgency for robust algorithms that can adeptly navigate dynamic environments.
By leveraging RL techniques, we seek to develop an AI agent capable of learning to drive a car efficiently in
a simulated environment, given partial knowledge of the environment it is in. This research has implications
not only in the field of autonomous vehicles but also in areas such as robotics and control systems.

1.2 Innovativeness
Application of RL in the domain of car racing is relatively less explored compared to other various control
tasks. We aim to demonstrate their demonstrate their effectiveness in a challenging and dynamic scenario.
We prioritize responsible AI considerations, which includes designing reward functions that prioritize safety
as well as exploring techniques for interpretability and transparency in the learned policy. We explore
advanced RL algorithms such as DQN, PPO, and Transfer Learning integration to determine which yields
the best results.

1.3 Problem definition

The problem we address is training an AI agent to effectively control a car in the OpenAI Gymnasium
CarRacing environment. This involves learning a policy that maps observations of the environment (camera
images representing the car’s view of the track) to actions (steering, acceleration, and braking) in order
to navigate the track while maximizing a performance metric (completing laps quickly without crashing).
The objective is to achieve high performance in terms of both speed and safety, i.e. following the track,
demonstrating the ability of RL algorithms to learn complex behaviors in dynamic environments.

Observation space and stating state

The game environment consists frames of game state, where
each frame is a 96 × 96 RGB image of the car and race track,
represented as Box(0, 255, (96, 96, 3), uint8) in Gym-
nasium package.

Action space
In discrete there are 5 actions: 0 = do nothing, 1 = full steer
left, 2 = full steer right, 3 = full gas, 4 = full brake, represented
as an int as indicated.
In continuous there are 3 actions: steering (-1 is full left, +1 is
full right), gas and breaking, represented as Box([-1, 0, 0],
1.0, (3,), float32), a 3-dimensional array where action[0]
= steering direction, action[1] = % gas pedal and action[2] = Figure 1: Game environment
% brake pedal.

Rewards and implied goal

The agent will receive reward -0.1 every frame and +1000/N
for every track tile visited, where N is the total number of tiles
visited in the track. The goal is to finish the race successfully in the least number of frames possible (fast).
We define “solving” as having an average reward of 800 over 100 consecutive trials.

1
Self-Driving Car Racing: Application of Deep Reinforcement Learning

Starting state and episode termination

The car starts at rest in the center of the road. Episode finishes when all the tiles are visited, or go off track
and die with -100 reward.

2 Knowledge and Technical Depth

2.1 Why reinforcement learning?
Reinforcement Learning (RL) is particularly effective in scenarios such as CarRacing, where an agent’s abil-
ity to learn through direct interaction with the environment and iterative feedback is crucial. RL also excels
in deriving complex policies from direct sensory inputs, such as pixels, eliminating the need for predefined
features. Conversely, alternative non-RL strategies, such as physics-based modeling and optimization tech-
niques, provide enhanced computational efficiency and improved interpretability by leveraging established
dynamics for analytical or simulation-based policy development. However, these methods depend heavily
on domain-specific knowledge and may lack robustness in unfamiliar settings.

2.2 First step: formulate the problem as MDP

To solve it with RL, the problem is first formulated as a Markov decision process (MDP) problem, where
outcomes are partly random and partly under the control of the agent. The goal is to discover an optimal
policy, denoted as π ∗ , which is a strategy for the agent that maximizes the expected cumulative reward over
time. In this project, to find π ∗ , both value-based and policy-based methods are adapted.

Value-based methods
This method tries to approximate the optimal action-value function (Q-function) given by:

Q∗ (s, a) = max E[Rt |st = s, at = a, π]

which assesses the expected return of taking a certain action in a given state. The agent then selects the
action that has the highest expected return according to the Q-function. The models implemented here
include Deep Q-network and its self-customized variants, i.e. ResNet transfer learning and LSTM-ResNet
variant.

Policy-based methods
This method directly parameterizes and learns the policy that maps states to actions without explicitly
learning a value function. The model implemented here is Proximal Policy Optimization.

2.3 Deep Q-Network (DQN)

DQN approximates the Q-function with a deep neural network, i.e. Q(s, a; θ) ≈ Q∗ (s, a). Similar to the
functionality of Q-table in Q-learning, Q-network takes in a state as input and outputs the predicted Q-
values for each action, given a state. Then, it will store the agent’s experience at each time-step in replay
buffer and randomly samples a subset of the experience for training.

Full algorithm
The core of the DQN algorithm is encapsulated in its loss function for the training of the Q-network. The
loss function, Li (θi ), quantifies the difference between the predicted Q-value and the target Q-value. It’s
given by the mean squared error[10]:

Li (θi ) = Es,a∽ρ(·) [(yi − Q(s, a; θi ))2 ], with target Q-value yi = Es′ ∽ε [r + γ max
′
Q(s′ , a′ ; θi−1 )|s, a]
a

To optimize the Q-function, stochastic gradient descent, namely RMSprop, is applied to the loss function
(equation 3 in algorithm below):

2
Self-Driving Car Racing: Application of Deep Reinforcement Learning

Figure 2: DQN. Source: https://livebook.manning.com/concept/deep-learning/q-network

∇θi Li (θi ) = Es,a∽ρ(·);s′ ∽ϵ [(r + γ max

′
Q(s′ , a′ ; θi−1 ) − Q(s, a; θi ))∇θi Q(s, a; θi )]
a

Figure 3: DQN algorithm. Source: https://arxiv.org/abs/1312.5602v1

Advantages of DQN
DQN can handle large state space with raw sensory inputs, such as images or complex state representations.
The target network provides a stable target for the online network to learn from, while experience replay
reduces the correlation between consecutive samples and helps to break the temporal dependencies and
stabilize the learning[9].

Improving DQN by Introducing Variants

The original DQN tends to overestimate Q-values due to its maximization step in the Bellman equation
update. To address this, the Double Q-learning variant[4] was introduced to reduce the positive bias by sep-
arating action selection and evaluation in the Q-value update. Meanwhile, Prioritized Experience Replay[12]
improves sample efficiency by prioritizing ”surprising” experiences based on temporal difference error. Lastly,
Deep Exploration via Bootstrapped DQN[11] promotes a more effective exploration using an ensemble of
Q-networks to gauge uncertainty in action selection.

3
Self-Driving Car Racing: Application of Deep Reinforcement Learning

2.4 Transfer Learning and Recurrent Neural Networks

Transfer learning is the reuse of a pre-trained model on a new problem. Transfer learning aims at improving
the performance of target learners on target domains by transferring the knowledge contained in different
but related source domains [17]. While they have been extensively studied for supervised learning, it is an
emerging topic for reinforcement learning [16]. There is a multitude of use cases for transfer learning, as it
can help to process computer vision and natural language processing related tasks.
Recurrent Neural Networks (RNNs) are specific neural network architectures that detects patterns in sequen-
tial data [13]. RNNs excel in its ability to capture temporal relationships in the data. Research have been
done to integrate RNNs with DQN, which performs better due to the agent’s ability to focus on particular
previous states that are deemed important for predicting the action in the current state [2].

2.5 Proximal Policy Optimization (PPO)

Policy gradient methods, the foundation of Trust Region Policy Optimization (TRPO) and PPO, compute
an approximation of the policy gradient and integrate it into a stochastic gradient ascent approach [14]. The
prevalent gradient estimator is typically formulated as:
h i
ĝ = Êt ∇θ log πθ (at |st )Ât

Here, πθ represents a stochastic policy, and Ât is an estimate of the advantage function at time step t.
The expectation Êt [·] denotes the empirical average over a finite batch of samples, within an algorithm that
iterates between sampling and optimization. Implementations employing automatic differentiation software
create an objective function whose gradient yields the policy gradient estimator. The estimator ĝ is derived
by differentiating the objective:
h i
LP G (θ) = Êt log πθ (at |st )Ât

While it may seem enticing to perform multiple optimization steps on this loss LP G using the same trajec-
tory, such an approach lacks sufficient justification. Empirically, it often results in excessively large policy
updates.
TRPO maximizes a surrogate objective while adhering to a constraint on the magnitude of the policy update.
This optimization problem is formulated as:

πθ (at |st )
max Êt Ât subject to Êt [KL[πθold (·|st ), πθ (·|st )]] ≤ δ
θ πθold (at |st )

Here, θold represents the vector of policy parameters before the update. TRPO utilizes a penalty instead of
a strict constraint in an unconstrained optimization problem to maintain monotonic improvement. However,
selecting a suitable value for the penalty coefficient (β) poses challenges in generalization across different
problems.
PPO addresses the limitations of TRPO by introducing a clipped surrogate objective:
h i
LCLIP (θ) = Êt min rt (θ)Ât , clip (rt (θ), 1 − ϵ, 1 + ϵ) Ât

where ϵ is a hyperparameter (e.g., ϵ = 0.2). This objective ensures that policy updates remain within a
reasonable range by constraining the probability ratio. By choosing the minimum between the clipped and
unclipped objectives, PPO maintains a lower bound on the unclipped objective, thus penalizing excessively
large updates.
Advantages: PPO offers simplicity in implementation, greater generality, and improved empirical stability
and data efficiency. Notably, it performs well on continuous action spaces (where DQN struggles [15]) and
doesn’t require extensive hyperparameter tuning.

4
Self-Driving Car Racing: Application of Deep Reinforcement Learning

Algorithm 1 PPO, Actor-Critic Style [14]

for iteration = 1, 2, . . . do
for actor = 1, 2, . . . , N do
Run policy πθold in environment for T timesteps
Compute advantage estimates Â1 , . . . , ÂT
end for
Optimize surrogate L with respect to θ, with K epochs and minibatch size M ≤ N T
θold ← θ
end for

3 Methodology and Results

3.1 Preprocessing
Modify env.reset() to always skip the first 50 states
The game configuration always just gradually zooms in for the first 50 steps. Since this zoom-in phase is a
very small part of the overall game, keeping this during training might hinder our agent from learning to
control the car in the main frames after zoom-in.

Convert the image to grayscale and resize to 84 × 84

This is to reduce the number of dimension (3 channels to 1) and allow for more compact states. For example
truncating the black bar at the bottom of the frame and the both horizontal ends of the frame.

Modify env.step() to use frame skipping technique

The agent sees and selects actions on every 4th frame instead of every frame, and its last action is repeated
on skipped frames. Hence, we can transform each observation to contain 4 frames simultaneously so that
the agent can know whether it is moving forward or backward. This will also help to decrease the time since
we only need 1 action per 4 frames.

Figure 4: Shape of the observation after preprocessing: (4, 84, 84)

3.2 Implementation and result of DQN [GitHub link]

We implemented DQN from scratch by defining the Deep Neural Network Class used in DQN, Experience
Replay buffer Class, DQN agent, and implemented the training algorithm which saves the model and eval-
uates it every 10,000 time steps. In addition, we have also implemented an epsilon decay rate to maximize
exploration at the beginning of the training and gradually shift towards exploitation over the course of
training.

Eventually, the algorithm took 15 hours to reach the max average of performance, 910.12 after 1.45 million
time steps, and started to oscillate afterwards. The full training process is shown in Figure 5:

5
Self-Driving Car Racing: Application of Deep Reinforcement Learning

Figure 5: DQN training process: [GitHub link]

3.3 Implementation of Transfer Learning [GitHub link]

We improved upon the classical DQN implementation by replacing the first two convolutional layers with
PyTorch’s ResNet-18 pretrained model, which incorporates deep residual learning for image recognition. We
chose ResNet-18 as it is relatively lightweight compared to other variants such as ResNet-50 or ResNet-101,
making it suitable for reinforcement learning training.

We changed the preprocessing stage to take in all three RGB color channels to fit the ResNet-18 input. Our
model will now process one image frame at a time.

Figure 6: Training performance with transfer learning

Our model was trained on Google Colab’s L4 GPU for 23 hours. We observed that as compared to the DQN
+ CNN implementation, the model seemed to learn in relatively fewer steps, reaching an average return of
600 in less than 200,000 time steps. This model reached a peak performance of 912 after 1,200,000 time
steps. This performance is likely contributed by the image segmentation effect produced by the ResNet layer,
capturing more meaningful spatial relationships as compared to traditional CNN implementation.

6
Self-Driving Car Racing: Application of Deep Reinforcement Learning

3.4 Implementation of Transfer Learning + RNN Combination [GitHub link]

To overcome the problem of capturing temporal relationships, we propose a new method of combining
transfer learning along with sequential models for reinforcement learning.
This method replaces the ResNet-18 image recognition layer with a combined ResNet-LSTM layer, which
connects each ResNet-18 layer with a LSTM cell as shown in Figure 7.

Figure 7: ResNet-LSTM training performance

We observe that the contribution of spatio-temporal relationship through the combination of an image
segmentation and a memory layer contributed to a faster convergence to reach high average return values in
less than 100,000 steps.
However, this approach demands substantially greater computational resources in contrast to alternative
methods. This limitation prompted our team to discontinue model evaluation after 90,000 time steps. Our
model was trained on Google Colab’s A100 GPU for 8 hours and surpassed Colab’s 83.5 GB system RAM
at this iteration count. Possible improvements may include improving computational efficiency by reducing
model parameter size, or by applying distributed algorithms to reinforcement learning [8]

3.5 Implementation of PPO

PPO on discrete action space [GitHub link]
We customized stable-baselines3 implementation of PPO for training the environment and observed that
the learning happened faster than DQN on the first 400K timesteps, reaching an average score of 800. This
performance became more stable with smaller deviation until 600K timesteps. However, after that the
performance suddenly became unstable, as indicated in Figure 8,
We suspected that this is due to a phenomenon called policy collapse, where as agent continues to interact
with the environment, their performance degrades. Research 1 has shown that the standard use of Adam
can lead to sudden large weight changes even when the gradient is small whenever there is non-stationarity
in the data stream.
We followed the paper to customize Adam optimizer with equal values of betas at 0.99, and achieved the
performance where the model can learn the policy very quickly (beat others in reaching 700 in less than
100K steps), but also collapse very quickly. We suspected that the values of betas are still subject to
hyperparameter tuning. We think that this phenomenon of policy collapse serves as an interesting area to
explore in future research work.

PPO on continuous action space [GitHub link]

We normalized the action space to be continuous between interval [-1, 1] due to the Gaussian distribution
(mean=0, std=1) implementation for continuous actions. During the initial 350K time steps of training, we
observed that the performance was still unstable, although there was upward trend in average return. This
mainly can be caused due to huge continuous action space.
1 Shibhansh Dohare, Qingfeng Lan, A. Rupam Mahmood, ”Overcoming Policy Collapse in Deep Reinforcement Learning”,

Published: 20 Jul 2023, Last Modified: 29 Aug 2023.

7
Self-Driving Car Racing: Application of Deep Reinforcement Learning

Figure 8: PPO with default Adam (left) vs. non-stationary Adam (right)

Figure 9: PPO training with continuous action space

We predict that PPO will still be able to achieve good performance as we increase the timesteps (as proven
by several other research results 2 ), however we are constrained by the compute power to prove this. Never-
theless, we still think that demonstrating this capability is important in real-world scenario of self-driving,
where you should be able to do 20% gas and 10 degree left turn (achievable by continuous action space),
instead of only full gas or full left turn at a time (discrete action).

3.6 Performance comparison

Notably, it can be seen from Figure 10 that incorporating transfer learning (ResNet) into DQN enhances
the agent’s performance since it allows the agent to grasp a more robust and informed representation of the
environment. This allows the agent to achieve near-peak performance in far less iterations. We believe this
result is driven by the presence of quality information, as replacing the convolutional layers by a pretrained
ResNet layer allows the agent to better identify important features in the frame, allowing a more efficient
learning iteration.
ResNet with LSTM also seems to provide promising results from the first few iterations, as we believe that
the introduction of RNNs into the model allows the capturing of spatio-temporal relationships within and
between frames. However, as the model learning was ended prematurely due to limited compute power, more
research may be required to arrive at a better conclusion on the ResNet-LSTM performance. Another notable
observation lies on the time taken per iteration. While we see that the transfer learning options achieves high
performance in fewer iterations, due to the added complexity of the ResNet layer, each iteration now takes
a longer time, taking on average 45 minutes per 10,000 steps, far longer than DQN-CNN iterations.
Whereas PPO can be seen to beat DQN in reaching higher average return in shorter time, it is much
2 https://github.com/elsheikh21/car-racing-ppo

8
Self-Driving Car Racing: Application of Deep Reinforcement Learning

Figure 10: DQN(blue) vs ResNet(red) vs ResNet-LSTM(Green); and DQN(blue) vs PPO(red)

Figure 11: The agent being able to handle the potential skid well, i.e. drifting

more unstable and sensitive to experiencing policy collapse during training. This can be due to PPO’s
reliance on a fixed-size trust region. When the policy deviates too much from the previous policy, the trust
region constraint can lead to overly conservative updates, where the policy becomes stuck in suboptimal
solution.

Comparing the performances of our AI agents with human players (ourselves) 3 , the average score by these
3 human players are around 800, whereas the AI could reach an average of 850-900 reward consistently. We
welcome human testers to play the game here.

3.7 Model behaviour

Intermediate behaviour

The videos uploaded to a Google Drive folder illustrated the exploration behavior of the model during
training. The first video shows that the agent indeed treated all actions equally at the start of the training,
causing it to struggle with going forward even though it was on a straight route. The second video shows
that the agent has acquired the ability to drive decently fast on the track, despite consistently performing
some minor turns along the way.

Final behaviour

The three demonstration videos in the google drive showcase the agent’s advanced driving capabilities. In
particular, the agent has learned to perform delicate drifting and handle skidding when encountering U-
turns, a challenging maneuver especially at high speed when the car is prone to skidding. Furthermore, the
agent demonstrates the ability to slow down appropriately when navigating sharp turns. On straight routes,
the agent consistently applies the ”gas” to maintain optimal speed. These behaviors highlight the agent’s
adaptability and its capacity to make intelligent decisions based on the track’s layout, ultimately resulting
in a smooth and efficient driving performance.

3 https://drive.google.com/drive/folders/1ntYOZsL1ZZ1l8miHlr2z3E1mUHlHdVwD?usp=sharing

9
Self-Driving Car Racing: Application of Deep Reinforcement Learning

4 Effort and Initiative

4.1 Failed project: LuxAI Season 2

We spent around 2-3 weeks trying to debug the LuxAI compatibility issues, due to its outdated implemen-
tation. But as we tried to fix one version, it becomes incompatible with other things (python versioning,
stable-baselines, kaggle notebook, google colab, local machine).

4.2 Evidence of effort and work

We spent a lot of hours on reading research papers, other implementations, coded out the solutions, with
more than 100 hours waiting for model training. We also had 2-3 meetings in a week.

4.3 Team member contributions

Each member contributes towards writing the proposal, report and presentation deck. Specific task alloca-
tions of each member are as follows:

Florentiana Yuwono: overall direction of the team, research on papers and implementation, in charge of
PPO implementation.
Gan Pang Yen: research on papers and implementation, in charge of DQN and PPO training.
Jason Christopher: research on papers and implementation, in charge of ResNet and ResNet + LSTM
model design.

5 Conclusion
This project has demonstrated the potential and effectiveness of various deep reinforcement learning algo-
rithms in navigating a car autonomously in a simulated environment. Through extensive experimentation
with DQN, PPO, and innovative adaptations incorporating transfer learning and RNNs, we have uncovered
significant insights into the strengths and limitations of each approach within the context of self-driving car
racing.

Our findings reveal that while DQN provides a robust foundation, the incorporation of advanced neural
network architectures like ResNet and LSTM can enhance the agent’s performance by enabling it to capture
complex spatial and temporal dependencies within the environment. Meanwhile, PPO has shown promising
results, particularly in scenarios requiring fine control over continuous action spaces, which are crucial for
realistic driving simulations.

The integration of ResNet with LSTM, while offering superior ability to capture spatio-temporal relation-
ships, poses significant computational challenges. To facilitate the scaling of such models to millions of time
steps, further enhancements in computational efficiency or access to more substantial computing resources
will be necessary. This could involve optimizing the architecture for better performance on available hard-
ware or employing more advanced parallel computing techniques. Future work will focus on refining these
models and exploring the integration of these techniques into actual autonomous driving systems. Addition-
ally, further research into the phenomenon of policy collapse in PPO could lead to more stable and reliable
learning algorithms.

This project not only advances our understanding of applying deep reinforcement learning to autonomous
driving but also sets the stage for future innovations in this exciting and rapidly evolving field.

10
Self-Driving Car Racing: Application of Deep Reinforcement Learning

References
[1] Baldwin, A. (2023). Driverless racecars on track for April Abu Dhabi debut. Reuters. Last Modified:
21 December 2023. Available from: https://www.reuters.com/sports/motor-sports/driverless-racecars-
track-april-abu-dhabi-debut-2023-12-20/
[2] Chen, C., Ying, V., Laird, D. (2016). Deep Q-Learning with Recurrent Neural Networks. Stanford
University.
[3] Dohare S, Lan Q, Mahmood AR. Overcoming Policy Collapse in Deep Reinforcement Learning. Pub-
lished: 20 Jul 2023, Last Modified: 29 Aug 2023.
[4] Van Hasselt, H., Guez, A., & Silver, D. (2016, March). Deep reinforcement learning with double q-
learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 30, No. 1).
[5] Hidden Beginner. CartRacing-v2 DQN. hiddenbeginner.github.io/study-notes/contents/tutorials/2023-
04-20 CartRacing-v2 DQN.html.
[6] Indy Autonomous Challenge Unveils Next Gen Autonomous Vehicle Platform IAC AV-24. AIthority.
Last Modified: 9 January 2024. Available from: https://aithority.com/technology/indy-autonomous-
challenge-unveils-next-gen-autonomous-vehicle-platform-iac-av-24/
[7] Johny Code (2024). Deep Q-Learning (DQL) / Deep Q-Network (DQN) Explained — Python+Pytorch
Deep Reinforcement Learning. https://youtu.be/EUrWGTCGzlA?si=7jeYbCsATmYaxBXZ
[8] Kapturowski, S., Ostrovski, G., Dabney, W., Quan, J., Munos, R. (2019). Recurrent Experience Rpelay
in Distributed Reinforcement Learning. International Conference on Learning Representations 2019.
[9] Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., ... & Hassabis, D.
(2015). Human-level control through deep reinforcement learning. nature, 518(7540), 529-533.
[10] Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M. Playing Atari
with Deep Reinforcement Learning. NIPS Deep Learning Workshop 2013. arXiv:1312.5602 [cs.LG]. DOI:
10.48550/arXiv.1312.5602.
[11] Osband, I., Blundell, C., Pritzel, A., & Van Roy, B. (2016). Deep exploration via bootstrapped DQN.
Advances in neural information processing systems, 29.
[12] Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2015). Prioritized experience replay. arXiv preprint
arXiv:1511.05952.
[13] Schmidt, R. (2019). Recurrent Neural Networks (RNNs): A gentle Introduction and Overview. arXiv
preprint arXiv:1912.05911v1.
[14] Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal Policy Optimization Algorithms.
arXiv:1707.06347 [cs.LG]. DOI: 10.48550/arXiv.1707.06347.
[15] Wang K, Bartsch A, Barati Farimani A. MAN: Multi-Action Networks Learning. arXiv:2209.09329
[cs.LG]. DOI: 10.48550/arXiv.2209.09329.
[16] Zhu, Z., Lin, K., Jain, A. K., Zhou, J. (2023). Transfer Learning in Deep Reinforcement Learning: A
Survey. arXiv preprint arXiv:2009.07888.
[17] Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., Xiong, H., He, Q. (2020). A Comprehensive
Survey on Transfer Learning. arXiv preprint arXiv:1911.02685.

Autonomous Vehicle Control Via Deep Reinforcement Learning: Simon Kardell Mattias Kuosku
No ratings yet
Autonomous Vehicle Control Via Deep Reinforcement Learning: Simon Kardell Mattias Kuosku
73 pages
2015.08.26.Lecture01Intro 2
No ratings yet
2015.08.26.Lecture01Intro 2
37 pages
Basic Study For Transfer Learning For Autonomous Driving in Car Race of Model Car
No ratings yet
Basic Study For Transfer Learning For Autonomous Driving in Car Race of Model Car
4 pages
Safe Navigation Based On Deep Q-Network Algorithm Using An Improved Control Architecture
No ratings yet
Safe Navigation Based On Deep Q-Network Algorithm Using An Improved Control Architecture
6 pages
MAE 598 Intro To Autonomous Project Dhiram Omkar Harshal
No ratings yet
MAE 598 Intro To Autonomous Project Dhiram Omkar Harshal
14 pages
SocrAI Day 4
No ratings yet
SocrAI Day 4
38 pages
Modified DDPG Car-Following Model With A Real-World Human Driving Experience With CARLA Simulator
No ratings yet
Modified DDPG Car-Following Model With A Real-World Human Driving Experience With CARLA Simulator
34 pages
DRLAD ExploringApplicationsTalk2019 CognitiveVehicles
No ratings yet
DRLAD ExploringApplicationsTalk2019 CognitiveVehicles
28 pages
Learning A Deep Neural Net Policy For End-to-End Control of Autonomous Vehicles
No ratings yet
Learning A Deep Neural Net Policy For End-to-End Control of Autonomous Vehicles
6 pages
5SC28 L7 Machine Learning
No ratings yet
5SC28 L7 Machine Learning
61 pages
PGP Report Sachin t22060
No ratings yet
PGP Report Sachin t22060
20 pages
RADL LACuong
No ratings yet
RADL LACuong
81 pages
Reinforcement Learning For IoT - Final
No ratings yet
Reinforcement Learning For IoT - Final
45 pages
Autonomous Car Racing in Simulation Environment Using Deep Reinforcement Learning
No ratings yet
Autonomous Car Racing in Simulation Environment Using Deep Reinforcement Learning
6 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
7 pages
Decision-Making Strategy On Highway For Autonomous Vehicles Using Deep Reinforcement Learning
No ratings yet
Decision-Making Strategy On Highway For Autonomous Vehicles Using Deep Reinforcement Learning
11 pages
Unit - 5
No ratings yet
Unit - 5
43 pages
IMPLing The DQN
No ratings yet
IMPLing The DQN
9 pages
Deep Reinforcement Learning For Autonomous Driving A Survey
No ratings yet
Deep Reinforcement Learning For Autonomous Driving A Survey
18 pages
Case
No ratings yet
Case
6 pages
Control of Nonholonomic Vehicle System Using Hierarchical Deep Reinforcement Learning
No ratings yet
Control of Nonholonomic Vehicle System Using Hierarchical Deep Reinforcement Learning
4 pages
6S191 MIT DeepLearning L5
No ratings yet
6S191 MIT DeepLearning L5
62 pages
Lecture Notes Deep Reinforcement Learning: Generalizability in Deep RL
No ratings yet
Lecture Notes Deep Reinforcement Learning: Generalizability in Deep RL
7 pages
Assignment3 Yash Patel
No ratings yet
Assignment3 Yash Patel
10 pages
Untitled Document
No ratings yet
Untitled Document
11 pages
Electronics: Decision-Making System For Lane Change Using Deep Reinforcement Learning in Connected and Automated Driving
No ratings yet
Electronics: Decision-Making System For Lane Change Using Deep Reinforcement Learning in Connected and Automated Driving
13 pages
RLDL PBL AmriteshChandra 09411503121
No ratings yet
RLDL PBL AmriteshChandra 09411503121
15 pages
CH5 - Function Approximation
No ratings yet
CH5 - Function Approximation
33 pages
Autonomous Driving System Using Proximal Policy Optimization in Deep Reinforcement Learning
No ratings yet
Autonomous Driving System Using Proximal Policy Optimization in Deep Reinforcement Learning
10 pages
Unit 5d - Deep Reinforcement Learning
No ratings yet
Unit 5d - Deep Reinforcement Learning
52 pages
Report
No ratings yet
Report
11 pages
Autonomous Driving With Deep Reinforcement Learning in CARLA Simulation
No ratings yet
Autonomous Driving With Deep Reinforcement Learning in CARLA Simulation
7 pages
Experiment 9
No ratings yet
Experiment 9
4 pages
Deep Quality-Value (DQV) Learning: Preprint. Work in Progress
No ratings yet
Deep Quality-Value (DQV) Learning: Preprint. Work in Progress
10 pages
42-Deep Q Learning
No ratings yet
42-Deep Q Learning
8 pages
Chapter 1 Introduction RL Report Kiran
No ratings yet
Chapter 1 Introduction RL Report Kiran
2 pages
Hindsight Experience Replay
No ratings yet
Hindsight Experience Replay
15 pages
Chapter 2 3 Problem and Methodology RL Report Kiran
No ratings yet
Chapter 2 3 Problem and Methodology RL Report Kiran
3 pages
Towards Adapting Reinforcement Learning Agents To New Tasks: Insights From Q-Values
No ratings yet
Towards Adapting Reinforcement Learning Agents To New Tasks: Insights From Q-Values
10 pages
Deep Reinforcement Learning For Autonomous Driving A Survey
No ratings yet
Deep Reinforcement Learning For Autonomous Driving A Survey
18 pages
3.5 Intro2DeepQLearning
No ratings yet
3.5 Intro2DeepQLearning
12 pages
Disertatie
No ratings yet
Disertatie
5 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
25 pages
Controlling An Autonomous Vehicle With Deep Reinforcement Learning
No ratings yet
Controlling An Autonomous Vehicle With Deep Reinforcement Learning
7 pages
Autonomous Driving System Based On Deep Q Learnig: Takafumi Okuyama, Tad Gonsalves Jaychand Upadhay
No ratings yet
Autonomous Driving System Based On Deep Q Learnig: Takafumi Okuyama, Tad Gonsalves Jaychand Upadhay
5 pages
Learning To Drive A Real Car in 20 Minutes
No ratings yet
Learning To Drive A Real Car in 20 Minutes
8 pages
Q-Learning and Deep Q Networks (DQN)
No ratings yet
Q-Learning and Deep Q Networks (DQN)
52 pages
Learning For Autonomous Vehicles: A Focus On Expert Demonstration
No ratings yet
Learning For Autonomous Vehicles: A Focus On Expert Demonstration
24 pages
Deep RL Tutorial Small
No ratings yet
Deep RL Tutorial Small
66 pages
Report On Reinforcement Learning
No ratings yet
Report On Reinforcement Learning
26 pages
Chapter 6 - Consolidated Financial Statements (Part 3)
No ratings yet
Chapter 6 - Consolidated Financial Statements (Part 3)
41 pages
07 Deep Reinforcement Learning (John)
No ratings yet
07 Deep Reinforcement Learning (John)
52 pages
03 04 Lessonarticle
No ratings yet
03 04 Lessonarticle
5 pages
A Brief Survey of Deep Reinforcement Learning PDF
No ratings yet
A Brief Survey of Deep Reinforcement Learning PDF
14 pages
0678XXXXXXX167017 09 2023
No ratings yet
0678XXXXXXX167017 09 2023
5 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
47 pages
An Introduction To Deep Reinforcement Learning PDF
No ratings yet
An Introduction To Deep Reinforcement Learning PDF
140 pages
Playing Geometry Dash With Convolutional Neural Networks
No ratings yet
Playing Geometry Dash With Convolutional Neural Networks
7 pages
Women in Aviation: 1930-1939
100% (2)
Women in Aviation: 1930-1939
73 pages
Office of The Punong Barangay
100% (2)
Office of The Punong Barangay
2 pages
Human Factors Assignment
No ratings yet
Human Factors Assignment
5 pages
CE 211: Plane Surveying: Module 2 - Types, Uses and Development of Surveying
No ratings yet
CE 211: Plane Surveying: Module 2 - Types, Uses and Development of Surveying
6 pages
Final Project
No ratings yet
Final Project
84 pages
RADIOIMMUNOASSAY
No ratings yet
RADIOIMMUNOASSAY
4 pages
45 Shivi Hal
No ratings yet
45 Shivi Hal
4 pages
Business Strategy Analysis
No ratings yet
Business Strategy Analysis
22 pages
226N MSDS
No ratings yet
226N MSDS
6 pages
ETC Console Shortcut Keys: Element v2.3.0
No ratings yet
ETC Console Shortcut Keys: Element v2.3.0
3 pages
Bloom's Taxonomy Domain Verbs
No ratings yet
Bloom's Taxonomy Domain Verbs
3 pages
Optika B-190 Microscope Series - Instruction Manual
No ratings yet
Optika B-190 Microscope Series - Instruction Manual
44 pages
Shri P G Venkat Ram - Bridge Construction Methods and The Failures Associated With Them
No ratings yet
Shri P G Venkat Ram - Bridge Construction Methods and The Failures Associated With Them
24 pages
Solvent Recovery Bottoms Pumps (09P007A/B) : Data Sheet
No ratings yet
Solvent Recovery Bottoms Pumps (09P007A/B) : Data Sheet
6 pages
Ford Figo b517 2010 25 Ewd65
No ratings yet
Ford Figo b517 2010 25 Ewd65
1 page
Colin
No ratings yet
Colin
5 pages
Vocabulary: Example: My Income Fluctuates Wildly When I Work Part-Time
100% (1)
Vocabulary: Example: My Income Fluctuates Wildly When I Work Part-Time
2 pages
Learning Nugget - RAMS
No ratings yet
Learning Nugget - RAMS
14 pages
RSPO Free Prior and Informed Consent FPIC Guide 2022 - RSPO GUI T08 002 V2 ENG
No ratings yet
RSPO Free Prior and Informed Consent FPIC Guide 2022 - RSPO GUI T08 002 V2 ENG
65 pages
Ap15 Compsci A q2
No ratings yet
Ap15 Compsci A q2
9 pages
Aen Micro Project Report
No ratings yet
Aen Micro Project Report
24 pages
English
No ratings yet
English
10 pages
Art 3 A.I. BPM
No ratings yet
Art 3 A.I. BPM
11 pages
Service Bulletin: Service Bulletin NUMBER: 5.10/102 Caterpillar: Confidential Green Page 1 of 2
No ratings yet
Service Bulletin: Service Bulletin NUMBER: 5.10/102 Caterpillar: Confidential Green Page 1 of 2
2 pages
Business Proposal
No ratings yet
Business Proposal
18 pages
Dis W23
No ratings yet
Dis W23
1 page
Invoice
No ratings yet
Invoice
1 page
AP 1 Summer Part 2 Graph Meth
No ratings yet
AP 1 Summer Part 2 Graph Meth
6 pages
How Smart Machines Think
From Everand
How Smart Machines Think
Sean Gerrish
3.5/5 (7)
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Self-Driving Car Racing: Application of Deep Reinforcement Learning

Uploaded by

Self-Driving Car Racing: Application of Deep Reinforcement Learning

Uploaded by

Self-Driving Car Racing: Application of Deep

Florentiana Yuwono, Gan Pang Yen and Jason Christopher

A0244109L, A0253516H, A0244120Y

1 Problem Understanding and Formulation

1.3 Problem definition

Observation space and stating state

Rewards and implied goal

Starting state and episode termination

2 Knowledge and Technical Depth

2.2 First step: formulate the problem as MDP

Q∗ (s, a) = max E[Rt |st = s, at = a, π]

2.3 Deep Q-Network (DQN)

Figure 2: DQN. Source: https://livebook.manning.com/concept/deep-learning/q-network

∇θi Li (θi ) = Es,a∽ρ(·);s′ ∽ϵ [(r + γ max

Figure 3: DQN algorithm. Source: https://arxiv.org/abs/1312.5602v1

Improving DQN by Introducing Variants

2.4 Transfer Learning and Recurrent Neural Networks

2.5 Proximal Policy Optimization (PPO)

Algorithm 1 PPO, Actor-Critic Style [14]

3 Methodology and Results

Convert the image to grayscale and resize to 84 × 84

Modify env.step() to use frame skipping technique

Figure 4: Shape of the observation after preprocessing: (4, 84, 84)

3.2 Implementation and result of DQN [GitHub link]

Figure 5: DQN training process: [GitHub link]

3.3 Implementation of Transfer Learning [GitHub link]

Figure 6: Training performance with transfer learning

3.4 Implementation of Transfer Learning + RNN Combination [GitHub link]

Figure 7: ResNet-LSTM training performance

3.5 Implementation of PPO

PPO on continuous action space [GitHub link]

Published: 20 Jul 2023, Last Modified: 29 Aug 2023.

Figure 9: PPO training with continuous action space

3.6 Performance comparison

Figure 10: DQN(blue) vs ResNet(red) vs ResNet-LSTM(Green); and DQN(blue) vs PPO(red)

3.7 Model behaviour

4 Effort and Initiative

4.1 Failed project: LuxAI Season 2

4.2 Evidence of effort and work

4.3 Team member contributions

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.