Eng21cs0302 - Sgan
Eng21cs0302 - Sgan
Name-Prajwal MJ
Class-7D
USN-ENG21CS0302
1
INTRODUCTION
This report outlines the step-by-step development of a Convolutional Neural
Network (CNN) to solve the MNIST handwritten digit classification problem.
The MNIST dataset, widely used in computer vision, includes 60,000 training
images and 10,000 test images of single-digit numbers (0–9). Each digit is
represented as a grayscale image of 28x28 pixels. The goal of this project is to
classify these images accurately by building a CNN model from scratch,
evaluating it thoroughly, optimizing its performance, and deploying it for
prediction.
OBJECTIVES
1. Build and train a CNN model to classify handwritten digits in the MNIST
dataset.
2. Establish a reliable model evaluation framework using techniques like k-fold
cross-validation.
3. Optimize model architecture and parameters to improve performance.
4. Finalize and save the mode for real-world application and prediction tasks.
STEPS
1. Setting Up the Development Environment
The tutorial uses Python 3 with Keras running on top of TensorFlow, a
popular framework for deep learning.
Anaconda is recommended for environment setup, simplifying the
installation and management of required packages.
The development environment requires the following libraries:
TensorFlow/Keras: To build and train the CNN.
NumPy: For numerical operations and data manipulation.
Matplotlib: For visualizing data and model learning curves.
2
2. Understanding and Preprocessing the MNIST Dataset
The MNIST dataset contains grayscale images of digits (0-9), each in a
28x28 pixel format.
Images are preprocessed as follows:
Reshape the images to include a single color channel, making them
compatible with CNN input requirements (28x28x1).
Normalize pixel values from their original range of [0, 255] to [0, 1].
This helps stabilize the learning process and accelerates convergence.
3
opt = SGD(learning_rate=0.01, momentum=0.9)
model.compile(optimizer=opt, loss='categorical_crossentropy',
metrics=['accuracy'])
return model
4
Batch Normalization:
Batch normalization is applied to stabilize learning and improve
performance. It is added after each convolutional and dense layer to
normalize the output distribution, accelerating convergence.
Increasing Model Depth:
The depth of the CNN is increased by adding additional convolutional
and pooling layers, following a structure similar to VGG-like
architectures.
Each convolutional layer block is followed by a pooling layer, with more
filters (e.g., 64 filters) added in deeper layers to capture more complex
patterns.
Hyperparameter Tuning:
Tuning the learning rate and exploring different optimizer configurations
help optimize model performance.
Adjusting batch size and experimenting with deeper network structures
can also further enhance accuracy.
6. Finalizing and Saving the Model
After refining the model, the final architecture is trained on the entire
training dataset.
The model is saved in an H5 file format, enabling easy loading and
deployment in future applications.
Evaluation on Test Set:
The final model is evaluated on the reserved test dataset to obtain a
realistic estimate of its performance on unseen data.
Practical Application:
Saving the model allows it to be reused for predictions in a real-world
setting, making it possible to classify new handwritten digit images.
# fit model
model.fit(trainX, trainY, epochs=10, batch_size=32, verbose=0)
# save model
model.save('final_model.h5')
5
The document includes a demonstration of how to use the saved model to
classify a new image.
Steps for prediction:
Load and preprocess the imageConvert it to grayscale, resize it to 28x28
pixels, and normalize the pixel values.
Predict using the model: The image is passed through the model, which
outputs a probability distribution across the 10 digit classes.
Interpret the result: The class with the highest probability is selected as
the predicted digit.
CONCLUSION
6
The tutorial provides a thorough approach to building, evaluating, and
deploying a CNN for handwritten digit classification. By following these steps,
developers gain a robust understanding of CNNs and practical techniques for
optimizing performance. The methodology demonstrated here can be easily
adapted to other datasets and classification problems, making it a valuable
learning resource for machine learning practitioners.