FB DQN
FB DQN
Flappy Bird
Exp No: 5 Flappy Bird
27.03.2025
Aim: To train an AI agent to play Flappy Bird using Deep Q-Networks (DQN).
Objective: Flappy Bird is a game where the player controls a bird that must navigate through
gaps between pipes without hitting them. The goal is to maximize the agent's survival time
and score using reinforcement learning with a deep Q-network (DQN).
Observation
Continuous (processed via CNN in DQN)
Space
import gymnasium.make("FlappyBird-v0")
Description: The game starts with the bird in the air, where it continuously falls due to
gravity. The player (agent) can either flap (jump) or do nothing. The objective is to pass
through as many pipes as possible without colliding.
Algorithm :
2. Set hyperparameters such as learning rate (), discount factor (), exploration rate (), and
replay memory size.
Perform the action and observe the next state, reward, and done status.
Store the experience (state, action, reward, next state, done) in a replay buffer.
Action Space: The action shape is (1,) in the range {0, 1} , indicating whether the bird should
flap or not.
Observation Space:
The observation consists of pixel frames processed using convolutional neural networks
(CNNs). The input state includes:
Bird’s velocity
Starting State :
The episode starts with the bird in an initial position with a downward velocity.
Episode End : The episode ends if the following happens:
Arguments
The render_mode argument enables visualization, and sutton_barto_reward modifies the
reward structure to match the original implementation.
Program:
# Environment parameters
self.gravity = 1
self.bird_velocity = 0
self.bird_position = 50
self.pipe_gap = 40
self.pipe_width = 10
self.pipe_velocity = 2
self.pipes = []
self.screen_width = 100
self.screen_height = 100
def reset(self):
self.bird_velocity = 0
self.bird_position = 50
self.pipes = [{'x': 70, 'gap_pos': random.randint(20, 80)}]
self.frames_since_last_pipe = 0
self.score = 0
return self._get_state()
# Move pipes
for pipe in self.pipes:
pipe['x'] -= self.pipe_velocity
def _get_state(self):
# Get the nearest pipe
nearest_pipe = None
nearest_distance = float('inf')
# Normalized state:
# [bird_y, bird_velocity, distance_to_pipe, center_of_gap]
state = [
self.bird_position / self.screen_height,
self.bird_velocity / 10,
horizontal_distance / self.screen_width,
gap_pos / self.screen_height if nearest_pipe else 0.5
]
def render(self):
# Create a simple visualization using matplotlib
plt.figure(figsize=(5, 5))
plt.xlim(0, self.screen_width)
plt.ylim(0, self.screen_height)
# Draw pipes
for pipe in self.pipes:
# Top pipe
plt.fill_between([pipe['x'], pipe['x'] + self.pipe_width],
[0, 0],
[pipe['gap_pos'] - self.pipe_gap // 2, pipe['gap_pos'] - self.pi
color='green')
# Bottom pipe
plt.fill_between([pipe['x'], pipe['x'] + self.pipe_width],
[pipe['gap_pos'] + self.pipe_gap // 2, pipe['gap_pos'] + self.p
[self.screen_height, self.screen_height],
color='green')
# Add score
plt.text(5, 95, f'Score: {self.score}', fontsize=12)
plt.title('Flappy Bird')
plt.axis('off')
# Display in Colab
display(plt.gcf())
plt.close()
def _build_model(self):
# Neural Net for Deep-Q learning Model - simplified for stability
model = Sequential()
model.add(Dense(24, input_dim=self.state_size, activation='relu'))
model.add(Dense(24, activation='relu'))
model.add(Dense(self.action_size, activation='linear'))
model.compile(loss='mse', optimizer=Adam(learning_rate=self.learning_r
return model
def update_target_model(self):
# Calculate targets
# Calculate loss
loss = tf.keras.losses.mse(selected_q_values, targets)
# Decay epsilon
if self.epsilon > self.epsilon_min:
self.epsilon *= self.epsilon_decay
# Training function
def train_dqn(episodes=100):
env = FlappyBirdEnv()
state_size = env.observation_space.shape[0]
action_size = env.action_space.n
agent = DQNAgent(state_size, action_size)
batch_size = 32
for e in range(episodes):
state = env.reset()
total_reward = 0
# Take action
next_state, reward, done, info = env.step(action)
total_reward += reward
state = next_state
scores.append(info['score'])
for e in range(episodes):
state = env.reset()
done = False
step = 0
while not done and step < 1000: # Add step limit as a safeguard
step += 1
clear_output(wait=True)
env.render()