Handwritten Digit Recognizer
Handwritten Digit Recognizer
Project report submitted in partial fulfilment of the requirement for the degree of Bachelor of
Technology
in
By
of
To
This is to certify that the above statement made by the candidate is true to the best of my
knowledge.
i
ACKNOWLEDGEMENT
Firstly, I express my heartiest thanks and gratefulness to Almighty God for his divine blessing
that makes it possible to complete the project work successfully.
I am really grateful and wish my profound indebtedness to Supervisor Dr. Himanshu Jindal,
Assistant Professor(SG), Department of CSE Jaypee University of Information Technology,
Wakhnaghat. Deep Knowledge & keen interest of my supervisor in the field of “Machine
learning and Artificial Intelligence” to carry out this project. His endless patience, scholarly
guidance, continual encouragement, constant and energetic supervision, constructive criticism,
valuable advice, reading many inferior drafts, and correcting them at all stages have made it
possible to complete this project.
I would like to express my heartiest gratitude to Dr. Himanshu Jindal, Department of CSE,
for his kind help to finish my project.
I would also generously welcome each one of those individuals who have helped me
straightforwardly or in a roundabout way in making this project a win. In this unique situation,
I might want to thank the various staff individuals, both educating and non-instructing, which
have developed their convenient help and facilitated my undertaking.
Finally, I must acknowledge with due respect the constant support and patience of my parents.
ii
TABLE OF CONTENTS
1. Certificate i
2. Acknowledgement ii
3. List of Abbreviations iv
4. List of Figures v
5. Abstract vi
6. Chapter-1 Introduction 1
6.1. Introduction
6.3. Objectives
6.4. Methodology
iii
LIST OF ABBREVIATIONS
1. AI -Artificial Intelligence
2. CSV - Comma separated Values
3. HWR - Hand witing recognition
4. ML - Machine Learning
5. RBF -Radial Basis Function
6. SVM - Support Vector Machine
7. SVC - Support Vector Classifier
iv
LIST OF FIGURES
v
ABSTRACT
The acknowledgment of Handwritten characters and digits has consistently been a truly
challenging errand on account of the numerous varieties of transcribed characters with
various composing styles.
Handwriting digit recognition systems are designed to transform handwritten digits into
machine-readable representations. Handwritten Numeral Recognition plays an important
role in postal automation services mainly in countries like India wherein more than one
languages and scripts .The major objectives of this work is to create effective and reliable
methods for accurately detecting numerals in order to make banking procedures more
convenient and accurate.
This sort of clever framework is applied in different fields: really look at handling, handling
of structures, programmed handling of manually written responses to an assessment, and so
on. This last application is the subject of this work.
We use various machine learning algorithms to get the best accuracy for our result,we have
use three type of algorithms to get best accuracy for our data set these are
SUPPORT VECTOR MACHINE,NEURAL NETWORK,AND CONVOLUTIONAL
NEURAL NETWORK we got the best result in CONVOLUTIONAL NEURAL
NETWORK which is 98% accuracy.
vi
CHAPTER 1
INTRODUCTION
1.1 Introduction
Handwriting recognition (HWR) is the methodology in which machine can succelly
read the handwritten digits and character and can further interpreted as text or
Number and convert it into digitised form.Optical Character Recognition (OCR)
technology is used to convert images containing written text to machine-readable text
data. The digital document becomes a modifiable file.
Text summarization refers to the technique of shortening lengthy text data. The aim is
to create a consistent and fluent summary having only the important points sketched in
the document. The summary gives insights into the whole document, leaving behind
the insignificant and inessential pieces of text. It enhances readability, reduces time
and eases the user. Automatic summarization tools are greatly needed to absorb
relevant information faster and in an efficient manner.
1.3 Objective
The main Objective of this project is to Successfully identify the handwritten digits
from 0-9 and in order to achieve this we will be using Support vector Machine to
implement a model .The SVM model will be able to identify the handwritten digits
on the basis of pixel values as feature.So we can call this as a 10-class classification
problem.
1
1.4 Methodology
For this problem, we are generating our very own dataset,The dataset consist of 600
Images of handwritten digits from 0-9. We are drawing the digit on paint and then
capturing the image of the handwritten digit using the pyscreenshot package and then
storing them in a folder each named from 0-9.
Each image that we stored in our dataset is 28x 28 pixels so in total there are 784 pixels
in the image. So these 784 pixels are in such a way that the area where the digit is not
drawn has a value of 0 and the area of the image in which the digit is drawn has a value
other than 0. To generate the dataset we will be assigning 1 to the drawn region and 0 to
the empty region where no digit is drawn.
In this project, I am trying to experiment with various hyperparameters in SVMs. With
a sub-sample of 10-20% of the training data.
Fig.1
2
1.4.1 Collecting Dataset :
We have to collect images of digits (from 0 to 9). To collect images, a pyscreenshot package
can be used. Through which we are Drawing image of digits and then capturing it and then
storing it.
Fig.2
3
1.4.2 Generating Dataset:
We have to generate our dataset using images that we have collected . To generate a dataset,
what we have to do is, we have to assign 1 to the drawn region and 0 to the background.
That means, in our dataset, we will be having only two values i.e., 0 and 1. Typically, 0
represents black and 255 represents white.We are assigning 0 to pixel value from 0 to 100
and 1 to pixel value from 100 to 255. Now our pixel value is not from 0 to 255, it is only 0
and 1. In this way, we are generating a dataset (csv file).
Final step is to open the dataset, shuffle it i.e., change the position of each row of data and
display it.
After this step, our model gets learned. Now, in the testing part, we only give a pixel value
(bunch of 0 and 1) to our model, our model has to predict that digit. We count how much our
model returns a true answer. This way, we calculate accuracy.
4
CHAPTER -2
LITERATURE SURVEY
The first paper we referred to is Handwritten Digit Recognizer Using Machine learning.
fig.3
The main goal of this study is to show and illustrate the work that has been done on
hand-written digit repute.Characters and numbers written by hand One of the most difficult
and fascinating aspects of sample recognition and image processing is recognition
.Hand-written digit popularity is an extremely difficult task. The numbers in this popularity
venture aren't properly written or scripted since they vary in shape and length; as a result,
feature extraction and segmentation of hand-written numerical script is time-consuming.or
the purpose of segmentation within the suggested artworks, vertical and horizontal
projections are employed. SVM is used for reputation and classification.
5
The next paper we referred is Off-line Handwritten Character recognition System Using SVM
fig.4
A completely off-line handwritten character popularity machine has been developed using
the suggested Support Vector Machine.Studies were conducted out using a standard
database obtained from CEDAR, as well as four unique function extraction algorithms to
generate the final characteristic vector.In order to get great viable category accuracy,
classifier selection is critical.The results of the experiments show that SVM outperforms
other techniques proposed in the literature in terms of overall performance.
6
CHAPTER -3
SYSTEM DEVELOPMENT
fig.5
7
ALGORITHM:
1. Distinguish the right hyperplane which isolates the two classes better.
2. Estimate the distance as an edge by finding the maximum extreme distance between the
nearest information point and the hand hyperplane. Similarly, seek for a hyperplane on
both sides with the most extreme edge. The hyperplane with a higher edge is more
strong, even though the low edge has been changed for misclassification.
3. SVM selects the classifier for the enhanced edge with precision.
4. SVM is a powerful classifier with a feature that allows it to ignore anomalies and seek
for the hyperplane with the largest edge.
Fig.6
8
MATHEMATICAL
The linear algebra that can be used to generate hyper - plane learning in linear SVM is
known as Kernel.To overcome these challenges, SVM contains many of the most
frequent kernels. SVM features a kernel technique that allows it to achieve higher
accuracy.
The radial basis function (RBF) kernel is the most preferred.The Linear kernel is the
simplest of them all. The Polynomial kernel and the Sigmoid kernel are kernels for
normalising data issues.
● Kernel :F(x) = B (0) + sum (ai * (X,Xi))
9
Steps Of Implementation : -
1. First we will Import the necessary Python libraries and then we will load the image
dataset of our handwritten digit
2. Next step is Preprocessing and splitting of data into train & test dataset
3. Build the Support vector machine model
4. Perform the training of model
5. Evaluating the efficiency of model using performance metrics
6. Digit prediction with the help of GUI.
10
TECHNIQUES:
3.1 SVM
The Handwritten digit Recognizer and Summarizer have been programmed using Python
language on Jupyter Notebook.
Fig 7
11
2.Train the model using a training set of images
12
3.2 NEURAL NETWORK
Neural organisations can adjust to evolving input; so the organisation produces the most ideal
outcome without expecting to upgrade the result measures. The idea of neural organisations,
which has its foundations in man-made brainpower, is quickly acquiring fame in the
advancement of exchanging frameworks.
Manually written digit acknowledgment utilising MNIST dataset is a significant venture made
with the assistance of Neural Network. It essentially distinguishes the filtered pictures of
manually written digits.
We have made this a stride further where our manually written digit acknowledgment
framework identifies filtered pictures of transcribed digits as well as permits composing digits
on the screen with the assistance of a coordinated GUI for acknowledgment.
Fig 8
13
ALGORITHM:
1. First we will Import the necessary Python libraries and then we will load the image
dataset of our handwritten digit
2. Next step is Preprocessing and splitting of data into train & test dataset
3. Build the Support vector machine model
4. Perform the training of model
5. Evaluating the efficiency of model using performance metrics
14
Steps Of Implementation : -
The Handwritten digit Recognizer and Summarizer have been programmed using Python
language on Jupyter Notebook.
15
2.Train the model using a training set of images
16
Validation and testing of the model trained using a training set.
3. Obtain digits dataset of images of handwritten digits.
4.Apply contour analysis technique so as to segment the Digit images into individual images.
5.Once digits have been segregated, they are passed to the model for digit recognition.
6.And then for any input digit drawn through paint we are predicting the result through an
interface.
17
3.3 CONVOLUTIONAL NEURAL NETWORK:
A Convolutional Neural Network or CNN is a Deep Learning Algorithm which is extremely
successful in taking care of picture characterization assignments. Catching the Temporal and
Spatial conditions in a picture with the assistance of channels or kernels is capable.
18
ALGORITHM:
1. First we will Import the necessary Python libraries and then we will load the image
dataset of our handwritten digit
2. Next step is Preprocessing and splitting of data into train & test dataset
3. Build the Support vector machine model
4. Perform the training of model
5. Evaluating the efficiency of model using performance metrics
19
Steps Of Implementation : -
1. Obtain the dataset of images of handwritten characters. My data set contains digits from
0-9.
20
2.Model building
21
3.Next step is Preprocessing and splitting of data into train & test dataset
22
4.Plotting the results in histogram.
23
CHAPTER -4
PERFORMANCE ANALYSIS
4.1 SVM:
Fig 10
24
The Accuracy Of our Model is 83.33%
Fig 11
25
4.2 NEURAL NETWORK:
The Accuracy Of our Model is 97.35%
26
4.3 CONVOLUTIONAL NEURAL NETWORK:
The Accuracy Of our Model is 98.68%
27
The following are the snippets through which we can analyse our result ,through the digit we
have provided in input.
Input digit is 9 and the predicted result is 9.
Fig 12
Fig 13
28
The input digit is 5 and the predicted result is 5.
Fig 14
Fig 15
29
CHAPTER -5
CONCLUSION
30
REFERENCES
1. https://www.researchgate.net/publication/221710787_Handwritten_digit_Recognition_
using_Support_Vector_Machine
2. Support vector machines in handwritten digits classification
a. Publisher: IEEE U. Markowska-Kaczmar; P. Kubacki
3. SVM based off-line handwritten digit recognition
4. Publisher: IEEE Gauri Katiyar; Shabana Mehfuz
5. https://ieeexplore.ieee.org/document/8237400 Gu, J., Wang, G., Cai, J.,&Chen, T.
(2017). An empirical study of language cnn for image captioning. In Proceedings of
the IEEE International Conference on Computer Vision (pp. 1222-1231).
6. https://techvidvan.com/tutorials/handwritten-digit-recognition-with-python-cnn
31
APPENDIX
pgno
1. Linear kernel 9
2. RBF kernel 9
3. Sigmoid Kernel 9
4. Kernel 9
5. Svm classifying two classes 8
32