0% found this document useful (0 votes)

3 views13 pages

FB DQN

The document outlines an experiment to train an AI agent to play Flappy Bird using Deep Q-Networks (DQN) with the aim of maximizing survival time and score. It details the simulation environment, action and observation spaces, reward structure, and the training algorithm. The experiment successfully demonstrates the agent's ability to play the game after training.

Uploaded by

paramveersinghcr7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views13 pages

FB DQN

Uploaded by

paramveersinghcr7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Exp 5.

Flappy Bird
Exp No: 5 Flappy Bird

27.03.2025

Aim: To train an AI agent to play Flappy Bird using Deep Q-Networks (DQN).
Objective: Flappy Bird is a game where the player controls a bird that must navigate through
gaps between pipes without hitting them. The goal is to maximize the agent's survival time
and score using reinforcement learning with a deep Q-network (DQN).

Simulation Tool: The Flappy Bird environment is implemented

using gym or gymnasium along with pygame for visualization.

Action Space Discrete(2)

Observation
Continuous (processed via CNN in DQN)
Space

import gymnasium.make("FlappyBird-v0")

Description: The game starts with the bird in the air, where it continuously falls due to
gravity. The player (agent) can either flap (jump) or do nothing. The objective is to pass
through as many pipes as possible without colliding.
Algorithm :

1. Initialize the deep Q-network with random weights.

2. Set hyperparameters such as learning rate (), discount factor (), exploration rate (), and
replay memory size.

3. For each episode:

Start at the initial state.

Choose an action using an -greedy policy.

Perform the action and observe the next state, reward, and done status.

Store the experience (state, action, reward, next state, done) in a replay buffer.

Sample a mini-batch from the replay buffer.

Compute the target Q-value using:

Update the Q-network using backpropagation.

Exp 5. Flappy Bird 1

Reduce over time.

Repeat until the game is over.

4. Train the DQN for multiple episodes until convergence.

5. Use the trained model to find the optimal policy.

Action Space: The action shape is (1,) in the range {0, 1} , indicating whether the bird should
flap or not.

0: Do nothing (bird falls)

1: Flap (bird jumps up)

Observation Space:
The observation consists of pixel frames processed using convolutional neural networks
(CNNs). The input state includes:

Stacked frames for temporal information

Bird’s vertical position

Bird’s velocity

Distance to next pipe

Height of next pipe

Rewards : Reward schedule:

Successfully passing a pipe: +1

Collision with ground or pipe: -1 (Game Over)

Starting State :

The episode starts with the bird in an initial position with a downward velocity.
Episode End : The episode ends if the following happens:

The bird collides with a pipe.

The bird collides with the ground.

The bird flies too high (in some implementations).

Arguments
The render_mode argument enables visualization, and sutton_barto_reward modifies the
reward structure to match the original implementation.

Program:

Exp 5. Flappy Bird 2

import os
import random
import numpy as np
import matplotlib.pyplot as plt
from collections import deque
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
import gym
from gym import spaces
from IPython.display import clear_output, display
import time

# Suppress TensorFlow warnings

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

# Force CPU usage instead of GPU to avoid compatibility issues

os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

print("TensorFlow version:", tf.version)

print("Running with device:", tf.config.list_physical_devices())

# Define the Flappy Bird environment

class FlappyBirdEnv(gym.Env):
def __init__(self):
super(FlappyBirdEnv, self).__init__()

# Environment parameters
self.gravity = 1
self.bird_velocity = 0
self.bird_position = 50
self.pipe_gap = 40
self.pipe_width = 10
self.pipe_velocity = 2
self.pipes = []
self.screen_width = 100
self.screen_height = 100

Exp 5. Flappy Bird 3

self.pipe_spawn_freq = 50
self.frames_since_last_pipe = 0
self.score = 0

# Define action and observation space

self.action_space = spaces.Discrete(2) # 0: do nothing, 1: flap
self.observation_space = spaces.Box(low=0, high=255, shape=(4,), dtype

def reset(self):
self.bird_velocity = 0
self.bird_position = 50
self.pipes = [{'x': 70, 'gap_pos': random.randint(20, 80)}]
self.frames_since_last_pipe = 0
self.score = 0
return self._get_state()

def step(self, action):

# Apply action (flap or do nothing)
if action == 1:
self.bird_velocity = -10

# Update bird position

self.bird_velocity += self.gravity
self.bird_position += self.bird_velocity

# Spawn new pipes

self.frames_since_last_pipe += 1
if self.frames_since_last_pipe >= self.pipe_spawn_freq:
self.pipes.append({'x': self.screen_width, 'gap_pos': random.randint(20
self.frames_since_last_pipe = 0

# Move pipes
for pipe in self.pipes:
pipe['x'] -= self.pipe_velocity

# Remove pipes that are off-screen

self.pipes = [pipe for pipe in self.pipes if pipe['x'] + self.pipe_width > 0]

Exp 5. Flappy Bird 4

# Check if bird has passed a pipe
for pipe in self.pipes:
if self.screen_width // 5 == pipe['x'] + self.pipe_width:
self.score += 1

# Check for collisions

done = False
reward = 0.1 # Default small positive reward

# Bird hits the ground or ceiling

if self.bird_position <= 0 or self.bird_position >= self.screen_height:
done = True
reward = -10
else:
# Check for pipe collisions
for pipe in self.pipes:
if (self.screen_width // 5 >= pipe['x'] and
self.screen_width // 5 <= pipe['x'] + self.pipe_width):
if (self.bird_position <= pipe['gap_pos'] - self.pipe_gap // 2 or
self.bird_position >= pipe['gap_pos'] + self.pipe_gap // 2):
done = True
reward = -10
break

return self._get_state(), reward, done, {'score': self.score}

def _get_state(self):
# Get the nearest pipe
nearest_pipe = None
nearest_distance = float('inf')

for pipe in self.pipes:

if pipe['x'] + self.pipe_width >= self.screen_width // 5:
distance = pipe['x'] - self.screen_width // 5
if distance < nearest_distance:
nearest_distance = distance
nearest_pipe = pipe

Exp 5. Flappy Bird 5

if nearest_pipe is None:
# If no pipe ahead, use default values
horizontal_distance = self.screen_width
gap_pos = self.screen_height // 2
else:
horizontal_distance = nearest_pipe['x'] - self.screen_width // 5
gap_pos = nearest_pipe['gap_pos']

# Normalized state:
# [bird_y, bird_velocity, distance_to_pipe, center_of_gap]
state = [
self.bird_position / self.screen_height,
self.bird_velocity / 10,
horizontal_distance / self.screen_width,
gap_pos / self.screen_height if nearest_pipe else 0.5
]

return np.array(state, dtype=np.float32)

def render(self):
# Create a simple visualization using matplotlib
plt.figure(figsize=(5, 5))
plt.xlim(0, self.screen_width)
plt.ylim(0, self.screen_height)

# Draw pipes
for pipe in self.pipes:
# Top pipe
plt.fill_between([pipe['x'], pipe['x'] + self.pipe_width],
[0, 0],
[pipe['gap_pos'] - self.pipe_gap // 2, pipe['gap_pos'] - self.pi
color='green')
# Bottom pipe
plt.fill_between([pipe['x'], pipe['x'] + self.pipe_width],
[pipe['gap_pos'] + self.pipe_gap // 2, pipe['gap_pos'] + self.p
[self.screen_height, self.screen_height],
color='green')

Exp 5. Flappy Bird 6

# Draw bird
plt.scatter(self.screen_width // 5, self.bird_position, color='yellow', s=100)

# Add score
plt.text(5, 95, f'Score: {self.score}', fontsize=12)

plt.title('Flappy Bird')
plt.axis('off')

# Display in Colab
display(plt.gcf())
plt.close()

# Define DQN Agent

class DQNAgent:
def __init__(self, state_size, action_size):
self.state_size = state_size
self.action_size = action_size
self.memory = deque(maxlen=2000)
self.gamma = 0.95 # discount rate
self.epsilon = 1.0 # exploration rate
self.epsilon_min = 0.01
self.epsilon_decay = 0.995
self.learning_rate = 0.001
self.model = self._build_model()
self.target_model = self._build_model()
self.update_target_model()

def _build_model(self):
# Neural Net for Deep-Q learning Model - simplified for stability
model = Sequential()
model.add(Dense(24, input_dim=self.state_size, activation='relu'))
model.add(Dense(24, activation='relu'))
model.add(Dense(self.action_size, activation='linear'))
model.compile(loss='mse', optimizer=Adam(learning_rate=self.learning_r
return model

def update_target_model(self):

Exp 5. Flappy Bird 7

# copy weights from model to target_model
self.target_model.set_weights(self.model.get_weights())

def remember(self, state, action, reward, next_state, done):

self.memory.append((state, action, reward, next_state, done))

def act(self, state):

if np.random.rand() <= self.epsilon:
return random.randrange(self.action_size)
state_tensor = tf.convert_to_tensor(state.reshape(1, -1), dtype=tf.float32)
act_values = self.model(state_tensor, training=False)
return np.argmax(act_values[0])

def replay(self, batch_size):

if len(self.memory) < batch_size:
return

minibatch = random.sample(self.memory, batch_size)

states = np.array([transition[0] for transition in minibatch])
actions = np.array([transition[1] for transition in minibatch])
rewards = np.array([transition[2] for transition in minibatch])
next_states = np.array([transition[3] for transition in minibatch])
dones = np.array([transition[4] for transition in minibatch])

# Get current states and predict Q values

states_tensor = tf.convert_to_tensor(states, dtype=tf.float32)
with tf.GradientTape() as tape:
q_values = self.model(states_tensor, training=True)

# Select the Q values for the actions that were taken

indices = tf.range(0, tf.shape(q_values)[0]) * tf.shape(q_values)[1] + act
selected_q_values = tf.gather(tf.reshape(q_values, [-1]), indices)

# Get Q values for next states with target model

next_states_tensor = tf.convert_to_tensor(next_states, dtype=tf.float32
next_q_values = self.target_model(next_states_tensor, training=False)

# Calculate targets

Exp 5. Flappy Bird 8

max_next_q_values = tf.reduce_max(next_q_values, axis=1)
targets = rewards + (1 - dones) * self.gamma * max_next_q_values

# Calculate loss
loss = tf.keras.losses.mse(selected_q_values, targets)

# Get gradients and update model

grads = tape.gradient(loss, self.model.trainable_variables)
self.model.optimizer.apply_gradients(zip(grads, self.model.trainable_varia

# Decay epsilon
if self.epsilon > self.epsilon_min:
self.epsilon *= self.epsilon_decay

# Custom save method to avoid keras restrictions

def save_model(self, filepath):
self.model.save_weights(filepath)

# Custom load method

def load_model(self, filepath):
self.model.load_weights(filepath)

# Training function
def train_dqn(episodes=100):
env = FlappyBirdEnv()
state_size = env.observation_space.shape[0]
action_size = env.action_space.n
agent = DQNAgent(state_size, action_size)
batch_size = 32

# Keep track of scores

scores = []

print("Starting training for", episodes, "episodes")

for e in range(episodes):
state = env.reset()
total_reward = 0

Exp 5. Flappy Bird 9

done = False
step = 0

while not done:

step += 1
# Choose action
action = agent.act(state)

# Take action
next_state, reward, done, info = env.step(action)
total_reward += reward

# Remember the experience

agent.remember(state, action, reward, next_state, done)

state = next_state

# Train through replay

if len(agent.memory) > batch_size:
agent.replay(batch_size)

# Update target model occasionally

if step % 10 == 0:
agent.update_target_model()

# Visualize occasionally - reduced frequency for better performance

if e % 50 == 0 and step % 50 == 0:
clear_output(wait=True)
env.render()
time.sleep(0.01)

scores.append(info['score'])

# Print episode stats

avg_score = np.mean(scores[-100:]) if len(scores) >= 100 else np.mean(s
print(f"Episode: {e+1}/{episodes}, Score: {info['score']}, Epsilon: {agent.e

# Save model weights occasionally

Exp 5. Flappy Bird 10

if (e+1) % 50 == 0:
print(f"Saving model at episode {e+1}")
model_path = f"flappy_bird_model_ep{e+1}"
try:
agent.save_model(model_path)
print(f"Successfully saved model to {model_path}")
except Exception as ex:
print(f"Failed to save model: {ex}")
# Continue without saving

# Plot learning curve

plt.figure(figsize=(10, 6))
plt.plot(scores)
plt.title('Learning Curve')
plt.xlabel('Episode')
plt.ylabel('Score')
plt.show()

return agent, scores

# Function to watch trained agent play

def watch_agent_play(agent, episodes=3):
env = FlappyBirdEnv()

for e in range(episodes):
state = env.reset()
done = False
step = 0

while not done and step < 1000: # Add step limit as a safeguard
step += 1
clear_output(wait=True)
env.render()

# Agent chooses action with no exploration

state_tensor = tf.convert_to_tensor(state.reshape(1, -1), dtype=tf.float3
action = np.argmax(agent.model(state_tensor, training=False)[0])

Exp 5. Flappy Bird 11

# Take action
state, reward, done, info = env.step(action)

time.sleep(0.05) # Slow down for better visualization

print(f"Episode {e+1}: Score = {info['score']}")

# Run the training

print("Starting Flappy Bird DQN training (Final fixed version for Colab)")

# Use try-except to handle potential errors gracefully

try:
# Train agent with fewer episodes for testing - reduce to 100 for quicker res
agent, scores = train_dqn(episodes=100)

# Watch the trained agent play

watch_agent_play(agent, episodes=3)
except Exception as e:
print(f"An error occurred during execution: {e}")

Output: (Simulation Screen Shots)

Exp 5. Flappy Bird 12

Result: Using Deep Q-Networks (DQN), we successfully train an agent to play Flappy Bird.
The neural network learns optimal Q-values to make decisions based on state observations.
The trained model enables the agent to flap at the right time to maximize its survival and
score.

Exp 5. Flappy Bird 13

Project On Flappy Bird in Python Using Pygame Module
50% (6)
Project On Flappy Bird in Python Using Pygame Module
14 pages
516062397-Presentation-FLappyBird
No ratings yet
516062397-Presentation-FLappyBird
12 pages
converted_text (1)
No ratings yet
converted_text (1)
7 pages
Mini Project
No ratings yet
Mini Project
15 pages
Flappy Bird
No ratings yet
Flappy Bird
6 pages
Ai Code + Ss
No ratings yet
Ai Code + Ss
12 pages
Flappy Bird Report Ca 3
No ratings yet
Flappy Bird Report Ca 3
20 pages
Angry Bird Game
No ratings yet
Angry Bird Game
9 pages
Pygame CODE
No ratings yet
Pygame CODE
4 pages
Flappu Car
No ratings yet
Flappu Car
3 pages
Untitled Document
No ratings yet
Untitled Document
4 pages
PART B Python
No ratings yet
PART B Python
8 pages
Document1
No ratings yet
Document1
5 pages
Flappy - Bird Gaming Code
No ratings yet
Flappy - Bird Gaming Code
5 pages
Flappy Bird
No ratings yet
Flappy Bird
6 pages
cs
No ratings yet
cs
10 pages
Project File (Flappy Bird)
No ratings yet
Project File (Flappy Bird)
14 pages
Flappy Bird
No ratings yet
Flappy Bird
13 pages
flappy.py game code
No ratings yet
flappy.py game code
2 pages
FlappyBird
No ratings yet
FlappyBird
11 pages
cs project snake game
No ratings yet
cs project snake game
11 pages
Mini Pro
No ratings yet
Mini Pro
15 pages
Balloon Flight
No ratings yet
Balloon Flight
19 pages
Project 1: Flappy Bird Game
0% (1)
Project 1: Flappy Bird Game
9 pages
ponggame (1)
No ratings yet
ponggame (1)
2 pages
1. python codes and infos
No ratings yet
1. python codes and infos
5 pages
class 12 computer project final documentation
No ratings yet
class 12 computer project final documentation
28 pages
Unikom - Fajar Abdi Nugraha - Jurnal Dalam Bahasa Inggris
No ratings yet
Unikom - Fajar Abdi Nugraha - Jurnal Dalam Bahasa Inggris
8 pages
SynopsisOfProject by Ekta
No ratings yet
SynopsisOfProject by Ekta
12 pages
Python Free Game by Curious Programmer
No ratings yet
Python Free Game by Curious Programmer
46 pages
Cs Project Code Explanation
No ratings yet
Cs Project Code Explanation
13 pages
AI Practical
No ratings yet
AI Practical
30 pages
Snake Game Python
No ratings yet
Snake Game Python
4 pages
vaidhei
No ratings yet
vaidhei
10 pages
Artificial Intelligence Lab File
No ratings yet
Artificial Intelligence Lab File
16 pages
AI- 5 PROGRAM
No ratings yet
AI- 5 PROGRAM
20 pages
Aifinal
No ratings yet
Aifinal
14 pages
Lab Programs
No ratings yet
Lab Programs
14 pages
Jigensh Ai Manual
No ratings yet
Jigensh Ai Manual
38 pages
Project Flappy
No ratings yet
Project Flappy
14 pages
All Practical AI
No ratings yet
All Practical AI
28 pages
Import Required Modules
No ratings yet
Import Required Modules
6 pages
chittopadhyay1
No ratings yet
chittopadhyay1
8 pages
AI Journal TIT2324007
No ratings yet
AI Journal TIT2324007
27 pages
AI&Apps Journal
No ratings yet
AI&Apps Journal
26 pages
Karthiks_python_program-1
No ratings yet
Karthiks_python_program-1
20 pages
CS Project?
No ratings yet
CS Project?
25 pages
CS Project?
No ratings yet
CS Project?
24 pages
Flappy Bird Using Python
No ratings yet
Flappy Bird Using Python
9 pages
Amity University, Noida Aset (Cse) Batch: 2020-2024: Course Code: CSE401 Course: Artificial Intelligence
No ratings yet
Amity University, Noida Aset (Cse) Batch: 2020-2024: Course Code: CSE401 Course: Artificial Intelligence
31 pages
FA16-BCS-178 (AI ASSIGNMENT#2 8puzzle)
No ratings yet
FA16-BCS-178 (AI ASSIGNMENT#2 8puzzle)
6 pages
Snake Game in Python - Using Pygame Module - Final
No ratings yet
Snake Game in Python - Using Pygame Module - Final
17 pages
Flappy Bird Game Presentation
No ratings yet
Flappy Bird Game Presentation
10 pages
project 1st Review
No ratings yet
project 1st Review
20 pages
9 page
No ratings yet
9 page
12 pages
AIML LAB EXPS
No ratings yet
AIML LAB EXPS
16 pages
AIML Mannual
No ratings yet
AIML Mannual
44 pages
Flappy Bird
No ratings yet
Flappy Bird
11 pages
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Python-Deprecated Library v1.1 Documentation
From Everand
Python-Deprecated Library v1.1 Documentation
Laurent LAPORTE
No ratings yet
Assignment Included (Artificial-Intelligence-as-a-Business-Technology-CIO-Pages)
100% (1)
Assignment Included (Artificial-Intelligence-as-a-Business-Technology-CIO-Pages)
25 pages
TLE 9 - Exam
No ratings yet
TLE 9 - Exam
2 pages
Introduction To Programming 1
No ratings yet
Introduction To Programming 1
22 pages
F-16 MTC Tor
No ratings yet
F-16 MTC Tor
61 pages
Data Sheet SMS
No ratings yet
Data Sheet SMS
4 pages
Contact Book Project
No ratings yet
Contact Book Project
28 pages
Css g9 Quarter 2 - Las 1
No ratings yet
Css g9 Quarter 2 - Las 1
9 pages
BSP
No ratings yet
BSP
8 pages
IEC Wordpress Training Manual
No ratings yet
IEC Wordpress Training Manual
19 pages
K2 Product Overview
No ratings yet
K2 Product Overview
50 pages
Web Forms Server Controls
No ratings yet
Web Forms Server Controls
35 pages
RS232 User Guide: Planar VM Series Displays
No ratings yet
RS232 User Guide: Planar VM Series Displays
36 pages
CIP3 Manual
No ratings yet
CIP3 Manual
21 pages
Beyond Mapping Extend The Role of Cartographers To User Interface Designers in The Metaverse Using Virtual Reality Augmented Reality and Mixed Real
No ratings yet
Beyond Mapping Extend The Role of Cartographers To User Interface Designers in The Metaverse Using Virtual Reality Augmented Reality and Mixed Real
16 pages
ibaDatCoordinator_v1.4_en_A4
No ratings yet
ibaDatCoordinator_v1.4_en_A4
46 pages
ED 88TPlusUserManualQeng1a
No ratings yet
ED 88TPlusUserManualQeng1a
2 pages
EDPM MULTIPLE CHOICE QUESTIONS WITH ANSWERS
No ratings yet
EDPM MULTIPLE CHOICE QUESTIONS WITH ANSWERS
20 pages
FPE-8000-SPC_PPC_Datasheet_51_en_74610565003
No ratings yet
FPE-8000-SPC_PPC_Datasheet_51_en_74610565003
4 pages
Minor Project Report
No ratings yet
Minor Project Report
24 pages
Full Adobe Illustrator CC Classroom in A Book 1st Edition Brian Wood Ebook All Chapters
100% (4)
Full Adobe Illustrator CC Classroom in A Book 1st Edition Brian Wood Ebook All Chapters
52 pages
Keyboard Shortcuts For Adobe
No ratings yet
Keyboard Shortcuts For Adobe
6 pages
Nd Software Engineering 4 4 Software Design
No ratings yet
Nd Software Engineering 4 4 Software Design
48 pages
Data Oriented Design
No ratings yet
Data Oriented Design
17 pages
LeapfrogGeoUserManual Compressed 521 795
No ratings yet
LeapfrogGeoUserManual Compressed 521 795
275 pages
Comprehensive CS Career Paths
No ratings yet
Comprehensive CS Career Paths
4 pages
Drones in Construction Notes
No ratings yet
Drones in Construction Notes
11 pages
Deep Security 20 Administration Guide
No ratings yet
Deep Security 20 Administration Guide
1,749 pages
The Big Trivia Quiz Book - The Virtual Library - Page 282 _ Flip PDF Online _ PubHTML5
No ratings yet
The Big Trivia Quiz Book - The Virtual Library - Page 282 _ Flip PDF Online _ PubHTML5
525 pages
Grammar Games Manual
No ratings yet
Grammar Games Manual
11 pages
8 KLM
No ratings yet
8 KLM
15 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

FB DQN

Uploaded by

FB DQN

Uploaded by

Exp 5.

Simulation Tool: The Flappy Bird environment is implemented

Action Space Discrete(2)

1. Initialize the deep Q-network with random weights.

3. For each episode:

Start at the initial state.

Choose an action using an -greedy policy.

Sample a mini-batch from the replay buffer.

Compute the target Q-value using:

Update the Q-network using backpropagation.

Exp 5. Flappy Bird 1

Repeat until the game is over.

4. Train the DQN for multiple episodes until convergence.

5. Use the trained model to find the optimal policy.

0: Do nothing (bird falls)

1: Flap (bird jumps up)

Stacked frames for temporal information

Bird’s vertical position

Distance to next pipe

Height of next pipe

Rewards : Reward schedule:

Successfully passing a pipe: +1

Collision with ground or pipe: -1 (Game Over)

The bird collides with a pipe.

The bird collides with the ground.

The bird flies too high (in some implementations).

Exp 5. Flappy Bird 2

# Suppress TensorFlow warnings

# Force CPU usage instead of GPU to avoid compatibility issues

print("TensorFlow version:", tf.__version__)

# Define the Flappy Bird environment

Exp 5. Flappy Bird 3

# Define action and observation space

def step(self, action):

# Update bird position

# Spawn new pipes

# Remove pipes that are off-screen

Exp 5. Flappy Bird 4

# Check for collisions

# Bird hits the ground or ceiling

return self._get_state(), reward, done, {'score': self.score}

for pipe in self.pipes:

Exp 5. Flappy Bird 5

return np.array(state, dtype=np.float32)

Exp 5. Flappy Bird 6

# Define DQN Agent

Exp 5. Flappy Bird 7

def remember(self, state, action, reward, next_state, done):

def act(self, state):

def replay(self, batch_size):

minibatch = random.sample(self.memory, batch_size)

# Get current states and predict Q values

# Select the Q values for the actions that were taken

# Get Q values for next states with target model

Exp 5. Flappy Bird 8

# Get gradients and update model

# Custom save method to avoid keras restrictions

# Custom load method

# Keep track of scores

print("Starting training for", episodes, "episodes")

Exp 5. Flappy Bird 9

while not done:

# Remember the experience

# Train through replay

# Update target model occasionally

# Visualize occasionally - reduced frequency for better performance

# Print episode stats

# Save model weights occasionally

Exp 5. Flappy Bird 10

# Plot learning curve

return agent, scores

# Function to watch trained agent play

# Agent chooses action with no exploration

Exp 5. Flappy Bird 11

time.sleep(0.05) # Slow down for better visualization

print(f"Episode {e+1}: Score = {info['score']}")

print("TensorFlow version:", tf.version)