0% found this document useful (0 votes)
20 views18 pages

ML Paper - Pneumonia Model (FINAL)

The document describes a convolutional neural network model for classifying chest X-ray images as normal or infected with COVID-19. It contains two datasets for training and validation. The model is then modified by adjusting its depth, width, and adding batch normalization layers. These changes improved the model's accuracy from the initial baseline.

Uploaded by

AwesomeDude
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views18 pages

ML Paper - Pneumonia Model (FINAL)

The document describes a convolutional neural network model for classifying chest X-ray images as normal or infected with COVID-19. It contains two datasets for training and validation. The model is then modified by adjusting its depth, width, and adding batch normalization layers. These changes improved the model's accuracy from the initial baseline.

Uploaded by

AwesomeDude
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

About the dataset used:

This dataset contains two types of chest xray images, one which is infected by covid-19 and another is
normal images.

Positive Cases : https://github.com/ieee8023/covid-chestxray-dataset


Normal Cases : https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia

Basic Code:

import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.preprocessing.image import ImageDataGenerator

model = models.Sequential()

model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3)))


model.add(layers.MaxPooling2D((2, 2)))

model.add(layers.Conv2D(64, (3, 3), activation='relu'))


model.add(layers.MaxPooling2D((2, 2)))

model.add(layers.Flatten())

model.add(layers.Dense(512, activation='relu'))

model.add(layers.Dense(1, activation='sigmoid'))

model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])

model.summary()

train_datagen = ImageDataGenerator(rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
'/kaggle/input/pneumonia-xray-images/train/',
target_size=(150, 150),
batch_size=32,
class_mode='binary')

validation_generator = test_datagen.flow_from_directory(
'/kaggle/input/pneumonia-xray-images/val',
target_size=(150, 150),
batch_size=32,
class_mode='binary')

history = model.fit(
train_generator,
steps_per_epoch=train_generator.samples // train_generator.batch_size,
epochs=10,
validation_data=validation_generator,
validation_steps=validation_generator.samples //
validation_generator.batch_size)

# Evaluate the model


test_loss, test_acc = model.evaluate(validation_generator, verbose=2)
print(f'Test accuracy: {test_acc}')

About the model:

1. Input Layer:
 The layer Conv2D uses 32 filters. Their size is (3, 3). Plus, it employs the
ReLU function for activation

 . A MaxPooling2D layer employing a downsampling technique with a pool


size of (2, 2)..
2. Hidden Layer:
 A Conv2D layer has 64 filters. Each filter is (3, 3) in size. After this, a
MaxPooling2D layer follows. The size of this pool is (2, 2) for further
downsampling.
3. Flatten Layer:

 Transforms the output into a 1D array, priming it for the subsequent fully
connected layers.
4. Dense Layers:

- Implement a dense layer consisting of 512 neurons, utilizing the ReLU activation function to
extract relevant features.

- Establish an output layer containing a single neuron and apply the sigmoid activation function,
ideal for binary classification.

5. Compilation:

- Compile the model by employing the Adam optimizer, utilizing binary cross-entropy as the
loss function (suitable for binary classification), and assessing accuracy as the evaluation metric.

Modification 1:

Depth and Width Adjustment:

 Alter the quantity of layers (depth) in the CNN architecture—either increase or decrease.
 Adjust the number of filters within each layer (width).
 Explore various combinations to identify the most effective balance.

New Code:

import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.preprocessing.image import ImageDataGenerator

model = models.Sequential()

# Increase depth and adjust width


model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3)))
model.add(layers.MaxPooling2D((2, 2)))

model.add(layers.Conv2D(64, (3, 3), activation='relu'))


model.add(layers.MaxPooling2D((2, 2)))

model.add(layers.Conv2D(128, (3, 3), activation='relu'))


model.add(layers.MaxPooling2D((2, 2)))

# Flatten before the dense layers


model.add(layers.Flatten())

# Adjust width of dense layers


model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dropout(0.5)) # Adding dropout for regularization

# Output layer
model.add(layers.Dense(1, activation='sigmoid'))

model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])

model.summary()

train_datagen = ImageDataGenerator(rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
'/kaggle/input/pneumonia-xray-images/train/',
target_size=(150, 150),
batch_size=32,
class_mode='binary')

validation_generator = test_datagen.flow_from_directory(
'/kaggle/input/pneumonia-xray-images/val',
target_size=(150, 150),
batch_size=32,
class_mode='binary')

history = model.fit(
train_generator,
steps_per_epoch=train_generator.samples // train_generator.batch_size,
epochs=10,
validation_data=validation_generator,
validation_steps=validation_generator.samples //
validation_generator.batch_size)

# Evaluate the model


test_loss, test_acc = model.evaluate(validation_generator, verbose=2)
print(f'Test accuracy: {test_acc}')

 Introduced another Conv2D layer with 128 filters and included an additional MaxPooling2D layer
to enhance the network's complexity.
 The density of the dense layers was modified to 512 and 256 by adjusting the number of filters.
 To enforce regularization, I incorporated dropout layers with a dropout rate of 0.5 following the first
and second dense layers.
 Maintained a consistent number of epochs at 10 for all our experiments.

Expected Outcomes:

 Boosting depth and calibrating the width may give the model better strength in pulling out essential
bits. This could beef up its skill in telling different classes apart.
 Dropout regularization is an overfitting antidote. It might improve a real boon if the first model had
overfitting troubles.
 The model might need more time to train due to added layers of complexity. Training and
validation to make sure the model's learning is on track.

Actual Outcome:
In conclusion, the modifications to the Convolutional Neural Network architecture have yielded a
tangible improvement in performance. The increased depth and adjusted width enhanced the model's
capacity for feature extraction, leading to a 0.48% increase in accuracy and a reduction in loss by 0.0069.
Incorporating dropout layers effectively addressed overfitting concerns.
These findings underscore the importance of thoughtful architectural adjustments for optimizing model
performance in image classification tasks.

Final Accuracy: 95%


Graphical Representation:

Modification #2 :

Batch Normalization and Layer Normalization:

Incorporate batch normalization or layer normalization into the model.


Analyze how these normalization techniques affect training stability and convergence.

New Code:

import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.preprocessing.image import ImageDataGenerator

model = models.Sequential()

model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3)))


model.add(layers.BatchNormalization())
model.add(layers.MaxPooling2D((2, 2)))

model.add(layers.Conv2D(64, (3, 3), activation='relu'))


model.add(layers.BatchNormalization())
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.BatchNormalization())
model.add(layers.MaxPooling2D((2, 2)))

model.add(layers.Flatten())

model.add(layers.Dense(512, activation='relu'))
model.add(layers.BatchNormalization())
model.add(layers.Dropout(0.5))

model.add(layers.Dense(256, activation='relu'))
model.add(layers.BatchNormalization())
model.add(layers.Dropout(0.5))

model.add(layers.Dense(1, activation='sigmoid'))

model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])

model.summary()

train_datagen = ImageDataGenerator(rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
'/kaggle/input/pneumonia-xray-images/train/',
target_size=(150, 150),
batch_size=32,
class_mode='binary')

validation_generator = test_datagen.flow_from_directory(
'/kaggle/input/pneumonia-xray-images/val',
target_size=(150, 150),
batch_size=32,
class_mode='binary')

history = model.fit(
train_generator,
steps_per_epoch=train_generator.samples // train_generator.batch_size,
epochs=10,
validation_data=validation_generator,
validation_steps=validation_generator.samples //
validation_generator.batch_size)

# Evaluate the model


test_loss, test_acc = model.evaluate(validation_generator, verbose=2)
print(f'Test accuracy: {test_acc}')

1. Batch Normalization:
After each convolutional layer ( Conv2D ), I added a BatchNormalization layer. Batch
Normalization helps stabilize and accelerate the training process by normalizing the inputs to
each layer.
2. Dropout:
After the first dense layer ( Dense(512, activation='relu') ), I added a Dropout layer
with a dropout rate of 0.5. Dropout is a regularization technique that helps prevent overfitting
by randomly setting a fraction of input units to 0 at each update during training.
3. Additional Batch Normalization and Dropout:
After the second dense layer ( Dense(256, activation='relu') ), I added another set
of BatchNormalization and Dropout layers.

The expected outcomes of incorporating Batch Normalization and Dropout into your model can include:

1. Improved Training Stability:


Batch Normalization can help stabilize and accelerate the training process by normalizing the
inputs to each layer. This may result in faster convergence and more stable training dynamics.
2. Regularization Effect:
Dropout introduces regularization by randomly setting a fraction of input units to 0 during
training. This helps prevent overfitting and can improve the model's generalization to unseen
data.
3. Potentially Higher Accuracy:
The combination of Batch Normalization and Dropout may lead to a more robust and accurate
model, especially if your initial model was prone to overfitting or if the training process was
not stable.

Actual Outcome:

The actual outcome of the training with Batch Normalization and Layer Normalization shows a high
training accuracy (around 91.73%) but inconsistent validation accuracy. This behavior could be
attributed to several factors, and understanding the math behind these normalization techniques can help
provide insights.

Batch Normalization:
Batch Normalization (BN) normalizes the input of each layer by subtracting the mean and dividing by
the standard deviation of the batch. It introduces learnable parameters (gamma and beta) to scale and
shift the normalized values.

Mathematically, for a given feature x in a batch:


BN (x) = γ(x−μ)/σ+β

where μ is the mean and σ is the standard deviation of the batch. γ and β are learnable parameters.

Possible Reasons for Inconsistent Validation Accuracy:


1. Training and Test Set Distributions:
If the distribution of your training and validation datasets differs significantly, Batch
Normalization statistics calculated during training might not generalize well to the validation
set.
2. Batch Size:
The effectiveness of Batch Normalization can be sensitive to batch size. A batch size that is too
small might not provide accurate statistics for normalization.
3. Overfitting:
Batch Normalization, while acting as a regularizer, might not prevent overfitting
entirely. If the model overfits the training data, the validation accuracy might not
improve consistently.
4. Learning Rate and Training Duration:
The learning rate and the number of training epochs can also impact the effectiveness of
Batch Normalization. If the learning rate is too high, it might lead to overshooting, and if the
training duration is insufficient, the model might not converge.
Modification #3

Gradient Clipping:

Implement gradient clipping to prevent exploding gradients.


Most deep learning frameworks provide simple ways to apply gradient clipping during optimization.

custom_optimizer = tf.keras.optimizers.Adam(clipvalue=1.0)

Let's experiment with different clip values to find out their effect on the model's performance.

1. Conservative Clip Value:


Clip Value: 1.0
This conservative clip value is relatively small and can be a good starting point. It helps
prevent gradients from becoming too large, stabilizing the training process.
2. Moderate Clip Value:
Clip Value: 5.0
A moderate clip value allows for a bit more flexibility in the gradient magnitudes.
Experimenting with a slightly larger clip value might be beneficial in scenarios where you need
a balance between stability and allowing for larger gradients.
3. Aggressive Clip Value:
Clip Value: 10.0
An aggressive clip value allows for even larger gradients. This may be useful in situations
where your model encounters vanishing/exploding gradient issues and requires a more
permissive approach.

Outcomes with each afore-mentioned clip values:

Conservative Clip Value: The implementation of gradient clipping with clip value of 1.0
significantly enhanced the model's training stability and convergence. The training loss decreased to
0.1531, and the accuracy improved to 95.48%. This demonstrates the effectiveness of gradient
clipping in preventing exploding gradients during optimization. The model exhibits robust
performance, showcasing its ability to handle perturbations and generalize well to unseen data.
Overall, the results emphasize the positive impact of gradient clipping on the model's training
dynamics and predictive capabilities.

Moderate Clip Value:


The implementation of gradient clipping with a clip value of 5.0 led to a reduction in test accuracy to
95.10%, compared to the model's peak performance with clip value 1.0. While the training loss slightly
increased to 0.1595, the model demonstrated robustness with a still impressive accuracy. This adjustment
suggests that a higher clip value constrained the gradients more aggressively, potentially impacting the
model's ability to generalize optimally.
The experimental results indicate that, for this particular model and dataset, a clip value of 1.0 maximizes
the accuracy. The model achieved its highest test accuracy of 95.48% with this specific clip value,
demonstrating that a more conservative clipping approach was beneficial for the convergence and
stability of the training process. Adjusting the clip value to 5.0 resulted in a slight decrease in test
accuracy, suggesting that a balance was achieved with the initial clip value of 1.0 for optimal model
performance in this context.

Modification #4

Activation Functions:

Experiment with different activation functions available in standard deep learning libraries.
Switching between ReLU, Leaky ReLU, and others is often a matter of changing a single line of
code.

1. Definition: Leaky Rectified Linear Unit (Leaky ReLU) is an activation function that allows a
small, positive gradient for negative input values.
2. Mathematical Representation:
For a given input x:
f(x) = x, if x > 0
f(x) = alpha * x, if x <= 0 (where alpha is a small positive constant, often around 0.01).
3. Advantages:
Addressing "Dying ReLU": Leaky ReLU mitigates the issue of "dying ReLU"
neurons that cease to update during training due to always outputting zero for
negative inputs.
Enhanced Information Flow: The non-zero gradient for negative inputs allows for
continuous learning, promoting better information flow.
4. Impact on Accuracy Improvement (From 95.48% to 96.08%):
Robust Learning: The introduction of Leaky ReLU resulted in a more robust learning
process, especially for neurons dealing with negative inputs.
Reduced Vanishing Gradient Problem: The small, non-zero gradient prevents neurons
from becoming inactive, addressing the vanishing gradient problem.
5. Potential Reasons for Increased Accuracy:
Improved Model Dynamics: Leaky ReLU promotes more dynamic learning dynamics,
helping the model adapt to complex patterns in the data.
Enhanced Generalization: The ability to capture nuanced information from negative inputs
might lead to improved generalization on the test set.
6. Considerations:
Alpha Value: The choice of the alpha parameter can impact the performance, and fine-
tuning it may yield further improvements.
Task-specific Impact: The effectiveness of Leaky ReLU can vary depending on the nature
of the task and dataset.

``

model.add(layers.Conv2D(32, (3, 3), activation='leaky_relu', input_shape=(150, 150,


3)))
model.add(layers.MaxPooling2D((2, 2)))

model.add(layers.Conv2D(64, (3, 3), activation='leaky_relu'))


model.add(layers.MaxPooling2D((2, 2)))

model.add(layers.Conv2D(128, (3, 3), activation='leaky_relu'))


model.add(layers.MaxPooling2D((2, 2)))

Here are a couple more activation functions that be used:

Sigmoid Activation:

model.add(layers.Conv2D(32, (3, 3), activation='sigmoid', input_shape=(150, 150,


3)))
model.add(layers.Dense(512, activation='sigmoid'))

Tanh Activation:

model.add(layers.Conv2D(32, (3, 3), activation='tanh', input_shape=(150, 150, 3)))


model.add(layers.Dense(512, activation='tanh'))

Softmax Activation (for multi-class classification tasks):

model.add(layers.Dense(num_classes, activation='softmax'))

Exponential Linear Unit (ELU):

model.add(layers.Conv2D(32, (3, 3), activation='elu', input_shape=(150, 150, 3)))


model.add(layers.Dense(512, activation='elu'))

Rectified Linear Unit (ReLU) with Parametric Rectification (PReLU):

# ReLU
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3)))

# PReLU
model.add(layers.Dense(512, activation='prelu'))

Swish Activation:

model.add(layers.Conv2D(32, (3, 3), activation='swish', input_shape=(150, 150, 3)))


model.add(layers.Dense(512, activation='swish'))

Modification #5 Weight Decay:

Implement weight decay as a regularization technique during optimization.


Introduce a small weight decay term in the optimizer (e.g., Adam with weight decay) to penalize
large weights and prevent overfitting.

Weight decay, also known as L2 regularization, is a regularization technique used in machine learning
and deep learning to prevent overfitting. It involves adding a penalty term to the loss function based
on the magnitude of the weights in the model.
In mathematical terms, the weight decay term is added to the standard loss function during training. For a
given model with weights wi and a loss function L, the regularized loss function L(reg) with weight decay
can be expressed as:

Here:

λ is the weight decay parameter, controlling the strength of the regularization.


∑w(i)^2 represents the sum of the squared magnitudes of all weights in the model.

The regularization term penalizes large weights in the model. As a result, during training, the optimization
process aims to find a set of weights that minimizes both the original loss and the regularization term. This
encourages the model to generalize well to new, unseen data by preventing it from becoming too
specialized to the training data.

In the context of neural networks, weight decay helps to prevent overfitting by discouraging the model
from assigning too much importance to any particular input feature. It promotes a more balanced and
generalized representation of the underlying patterns in the data.

In the provided code, tf.keras.regularizers.l2(weight_decay) is used to incorporate weight


decay into the dense layers of the model.

Actual Outcome ( when used weight decay = 1e-4 )

The implementation of weight decay with a value of 1e-4 resulted in a decrease in accuracy. Weight decay
is a regularization technique designed to prevent overfitting by penalizing large weights. However, the
choice of the weight decay parameter (λ) is crucial, as too much regularization can lead to underfitting.

In this case, the introduced regularization term might have been too strong, causing the model to
prioritize minimizing the weight magnitudes over fitting the training data. It's essential to strike a balance
between preventing overfitting and allowing the model to capture the underlying patterns in the data.
Outcome ( when weight decay = 1e-2 )

The results suggest that weight decay with a coefficient of 1e-2 led to a decrease in accuracy, indicating
that it might not be an effective regularization technique for this particular model.
Weight decay is designed to penalize large weights during optimization to prevent overfitting, but its
impact can vary based on the model architecture, dataset, and other factors. In this case, it appears that
the regularization introduced by weight decay at this magnitude might be too strong, hindering the
model's ability to learn from the training data effectively.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy