0% found this document useful (0 votes)
18 views26 pages

Combined Paper

Uploaded by

Rigzen Bodh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views26 pages

Combined Paper

Uploaded by

Rigzen Bodh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 26

SUMMARY HANDWRITTEN MARATHI TEXT RECOGNITION

1.INTRO:
Among several Devanagari scripts available in India, Marathi compositions
are also rich source of the information. Comparatively large work has been
done on OCR like English, Arabic, Sanskrit, and Kannada and so on
however; very little work was accounted on recognition of Marathi contents.
Over the recent years, the handwritten Devanagari text recognition has attracted
numerous researchers to do the investigation. Different procedures were proposed
by the various researchers to perform the recognition. Deep neural systems are
picking up the popularity in the field of Computer Vision (CV) and Machine
Learning (ML). Although, Recognition of handwritten Devanagari characters
is a difficult errand, however, Deep learning can be adequately utilized as an
answer for such issues.
In this paper, they proposed a CNN based OCR framework which precisely
perceived manually written Marathi words and delivered great quality
printed Marathi text. Because of the limited availability of the Marathi
training dataset, they arranged their own training dataset. The dataset was
prepared with assistance of individuals belonging to age group of 8 to 45 years.
The dataset had 9,360 words. The training accuracy of CNN model is 94.76 %.
(104 words with 90 images each).

2.CHALLENGES
1.Letters that appear to be similar.
Recognizing individual Marathi characters is an even troublesome task in light of
the fact that some characters in Devanagari lipi are fundamentally the same as one
another like "ma" and "bha", or "va" and "ba" or "sa" and "ra" etc.
In case of handwritten Marathi text document the writing styles may vary person
to person. The variation
of individual’s writing style is the key challenge in developing handwritten
OCR system.

3.DEEP LEARNING AND CNN

Deep learning is a part of the wide field of Machine Learning which is based on
artificial neural networks to perform tasks from their experience over a period of
time. This technique is mostly useful for recognizing the images and shapes. A
Convolutional Neural Network (ConvNet or CNN) is a popular algorithm of
deep learning that is especially used in image classification. They learn from
pictures(dataset) for classification purposes, this in future can even eradicate
the need of manual classification. Like other neural systems, a CNN is made out
of an input layer, convolutional layers ,pooling layer, softmax layer and fully
connected layers. CNNs have been successful in image classification and have led
to identification of faces, objects and traffic signs and have literally given vision
to robots and are also used in self driving cars.

Use of CNN[1]

There are four fundamental operations in CNN


1. Convolution
2. Non Linearity (ReLU)
3. Pooling or Sub Sampling
4. Classification (Fully Connected Layer)

1.Convolution
The process convolution is derived from the “Convolution Operator”.
Convolution is used to extract features from the image. Convolution is the
main feature that makes CNN different from ANN , because it maintains the
spatial relation between its pixels , whereas in ANN the image is converted into a
1D matrix and then classified. So CNN is more successful in the field of image
classification than ANN.

Working Of Convolution
Convolution is done by sliding the filter or kernel over the image matrix as
shown in the above figure , multiplying the pixels of image with the
corresponding pixel on the filter and adding them all to the feature map or
convolved image. Before training a CNN we need to provide number of
filters , size of filters , number of layers and types of layers. At each layer
each image is convolved using a number of filters, producing feature
maps.The more the number of filters more will be the feature maps and we’ll
get more features at each step, but that doesn’t means we always have to put
more features to improve classification accuracy, in fact this may lead to
overfitting and slowing the training process . Through the process of back
propagation, these filters or trainable parameters here, learns to extract
appropriate features which are needed for classification. For a multichannel
image, the number of channels in input image must be the same with the number
of channels in filter, but here we are having grayscale images (where each pixel is
from 0 to 255 where 0 is black and 255 is complete white), and these images are
having only a single channel .There are many aspects involved in convolution
such as :-
1)Depth
The number of filters define the depth at a convolutional layer. A single image
convolved with multiple filters produce multiple feature maps(volume).
2)Stride
The length or unit through which a filter moves rightward or downward is called
the stride. This determines the size of resultant feature maps.
3)Padding
Padding is a very important step during convolution, we need to pad the images
so as to control the size of resultant feature map, If we use the “padding=same”
feature in our convolutional layer so as to avoid the size reduction of our
convolved image , usually 0 is added as the padded value. If the CNN developer
is not careful with padding, the size of the feature map will reduce drastically
leading to loss of input information for the further layers of CNN (layers at right).
2.ReLU(Rectified Linear Unit)
ReLU is the activation function after the convolution step and is done to
introduce non linearity in the rectified image.

Graph of ReLU[1]
Since the process of convolution is a linear element wise matrix multiplication
and addition and since most of the real world data is non linear, therefore we use
ReLU activation function.

Visualization Of ReLU

3. Pooling

Pooling (also called subsampling) is used for reduction in dimensionality of


the convolved image but it still preserves the most important feature from
the feature maps. Pooling can be of different types for example Max,
Average, Sum etc.

In max-pooling , we define a neighborhood and in that window we take the


maximum of the pixels to put it in our pooled image. We could have taken sum
or even average of those pixels (Sum, Average Pooling respectively)
Historically , max pooling has done better than other types of poolings.

The Following figure illustrates pooling with a 2X2 filter(stride=2)


Working Of Max Pooling

We slide our 2 x 2 window by 2 cells (also called ‘stride’) and take the maximum
value in each region. this reduces the dimensionality of our feature map.

When pooling operation is applied it does not reduce the volume for example if
there are 4 feature maps after convolution + ReLU operations then we will have 4
feature maps(pooled) after pooling also.

Figure below compares the result of max and sum pooling


Visualization Of Max Pooling and Sum Pooling

The following are some other functions of pooling=>

Pooling reduces the dimensionality of images and saves us with the computation,
thus controlling overfitting.

Pooling makes the CNN unresponsive to small changes, distortions in the input
image (a small distortion cannot change the max value or average value to a large
extent, thus easing the process of classification.)

4.Fully Connected Layer

The Fully Connected Layer is nothing but a Multi-Layer Perceptron. The


term “Fully Connected” implies that every neuron in the previous layer is
connected to every neuron on the next layer. With the last layer having the
softmax activation function and the other 2 layers having the ReLU
activation function(specific to this paper). Usually in a CNN we have the last
layer(output layer) having the softmax activation function as it presents the output
in a form of probability distribution.

The output which we have got after convolution and pooling operations represents
the features of the input image. The Fully Connected Layer(Output Layer)
classifies this image on the basis of these features or through the combination of
these features after learning from the training dataset.
Demonstration of a fully connected layer

The role of bias


Overall Working And Architecture

If CNN is presented in relation with ANN , the only difference is CNN has
convolution and pooling for the purpose of feature extraction and these
features act as an input for the fully connected layers which classifies the
image in its proper category.

Figure 15: Training the CNN

The training process of a CNN goes like this:

Firstly, the trainable parameters such as filters and other weights are
initialized randomly. The Neural Network then inputs an image for training,
and then the network goes through the forward propagation. That is
Convolution and Pooling with forward propagation in the Fully Connected
layers and produces a probability distribution at the output.

Since the weights and other trainable parameters are randomly assigned (the
network is untrained) so the probabilities are random. The total error is
calculated at the output layer.

Total Error = ∑ ½ (target probability – output probability) ²

Now we use Backpropagation and then calculate the gradients of the error
with respect to all weights in the network and use gradient descent to update
all filter values / weights and parameter values to minimize the output error.
The weights are updated according to their share in the error. Now if this
image is again inputted the results will be closer to the expected results, this
shows that our neural network is learning as the error has been reduced. Now, all
images are inputted one by one and finally the neural network gets trained
(this means that weights and other trainable parameters have been updated to
classify a new image correctly).

If the training set is large enough, the network will (hopefully) generalize
well to new images and classify them into correct categories.

4.METHODOLOGY
A CNN comprises numerous convolution layers followed by fully connected
layers that interface each neuron in one layer with every neuron in another layer.
There are various choices that a CNN developer needs to choose before
convolutional learning. The choices are: Number of the convolution layers,
Number of filters and size of filters, Number of pooling layers with step size,
Number of masked (hidden) neurons in dense layers, and enhancement
algorithm to be utilized, etc.
A. Data Acquisition
The mobile phone document scanner was utilized to scan the handwritten
Marathi text documents. To obtain the picture of sensible quality , high
resolution cameras were used. They used 16 MP mobile Camera for scanning
the document images. The acquired pictures were put away in one folder for
performing pre-processing operations.
B. Preprocessing
Generally document image pre-handling steps are used for improving the
appearance of secured images for extracting highlighting features. Mobile
camera or Optical scanner may incorporate clatters and clamours while
capturing the document images, like unwanted shadows, additional dark
spots, dispersed lines, variations at the edges etc. Hence, before starting the
actual word recognition process, the immaculate image must be obtained from the
scanned image. Generally, image binarization is used to diminish a maximum
pixel data in grayscale image to a little amount. In image binarization stage,
each pixel in an image is supplanted with a dark pixel if the intensity of
image pixel I(i,j) is less than certain value T (pixel = 0; if I(i,j)< T) or a white
pixel if the image pixel intensity is more T (pixel = 1; if I(i,j) > T).
C. Skew Correction
D. Feature Extraction

C1->BN->RE->P1->C2->P2->4C->DROP->FC(104)
Ambadas Shinde and Yogesh Dandawate made a 20 layer CNN model for this
purpose. Their first layer which is input layer takes in 36X96 grayscale images.
The first convolution layer has 96 filters with size 3 x 3. After, the convolution
layer they performed batch normalization and ReLU layer activation. Then they
have used a pooling layer for lessening the convolved image components. The
next layer which follows is the convolution layer having 128 filters of size
3x3.They did not fully connect the previous layer(P1) with this layer,so as to trim
the trainable parameters. Then they have used 4 convolution layers and finally
the dropout layer with the probability of 0.5 before the fully connected layer.
Finally the output layer they have is having 104 output classes.

5.EXPERIMENTATION AND RESULTS


A)DATASET
Due to unavailability of standard Marathi words dataset, they prepared their
own training dataset. The sample training words are collected from the
several persons belonging to 8 to 45 years age groups including primary and
secondary school students as well as college students. As the writing may
change depending on the mood of the writer and quality of writing
material/instruments, some sample training words were collected in the morning,
few were collected in the afternoon and maximum were collected in the evening.
For experimentation purpose, they used 104 words while preparing the
samples. 104 words, with 90 samples of each are considered here. Each word
was resized into 36 x 96 pixels to maintain the uniformity. Out of 9,360
samples images, 80 % data was used for training the CNN model and 20 %
was used for validation.
A number of experimentation was done by them such as changing the
number filters and size of filters in convolution layer, activation
functions,initial learning rate, number of epochs, etc. The best validation
accuracy obtained was 94.76 % with learning rate of 0.1.
B) TESTING
The scanned copy of handwritten Marathi text document image having 13
lines and 55 words (67 including punctuation marks) was used for testing.
The document is pre-processed to improve its quality by removing the
shadows and black spots or clamours introduced at the time of scanning. The
pre-processed document was utilized for line segmentation.
The blank row information was used to segment the text lines. The scanned
image then segmented to separate the textlines. The segmented textlines are
stored in “Lines” folder. After that, when the algorithm finds the blank
column the words are segmented and stored in “words” folder. The
dimension of segmented words were made uniform i.e. 36 x 96.
These segmented words were allowed to go through the designed CNN
model. The extracted features from CNN model were analysed to make
decision about recognition and output is produced in text file. Out of 55
words, 54 words were correctly recognised and reproduced.The handwritten
to printed Marathi text conversion accuracy was 98.19 %(acc to paper).
Handwritten Text Recognition using Deep Learning with Tensor Flow

I. INTRODUCTION
Character recognition is one of the emerging fields within the computer vision. Hand
transcriptions can be easily identified by humans. Different languages have different
patterns to spot. Humans can identify the text accurately. Hand transcriptions cannot be
identified by machines. It's difficult to spot the text by the system. Handwriting
recognition is the ability of a machine to receive and interpret the handwritten input from
an external source like image and convert it into digital text.
Character recognition involves several steps like acquisition, feature extraction,
classification, and recognition. During this approach, the system is trained to seek out the
similarities, and also the differences among various handwritten samples.

II. PROPOSED WORK


A. Handwritten text recognition:
Handwritten Text Recognition (HTR) systems consist of handwritten text in the form of
scanned images. We are going to build a Neural Network (NN) which is trained on word-
images from the IAM dataset. For the implementation of HTR, the minimum requirement
is TF.

Fig.: Image of word taken from IAM Dataset

B. Model Overview:
We use a NN for our task. It consists of a convolutional neural network (CNN) layers,
recurrent neural network (RNN) layers, and a final Connectionist Temporal Classification
(CTC) layer.

Fig. 2: Overview of HTR

In this project, we've taken 5 CNN (feature extraction) and a pair of RNN layers and a CTC
layer (calculate the loss).
C. Operations:
CNN: The input image is given to the CNN layers. These layers are trained to take out
relevant features from the image. Each layer consists of three operations. First, the
convolution operation, 5×5 filter is used in the first two layers and 3×3 filter used in the
last three layers to the input. Then, the non-linear RELU function is applied. At last, a
pooling layer summarizes image regions and outputs a downsized(smaller) version of the
input.
RNN: The feature sequence consists of 256 features per time-step, the RNN propagates
relevant information through this sequence. The favored Long Short-Term Memory
(LSTM) implementation of RNNs is employed because it is in a position to propagate
information through longer distances and provides more robust training-characteristics
than vanilla RNN. The RNN output sequence is mapped to a matrix of 32×80.
CTC: while training the NN, the CTC is given the RNN output matrix and also the ground
truth text and it computes the loss value. While inferring, the CTC is just given the matrix
and it decodes it into the ultimate text.

Data:
Input: It is a gray-value image of size 128×32. Usually, the pictures from the dataset don't
have exactly this size, therefore we resize it (without distortion) until it either contains a
width of 128 or a height of 32. Then, we place the image in a (white) target image of size
128×32.
Finally, we will normalize the gray-values of the image so that it could simplify the task for
the NN.

Fig. 4: Left: a picture from the dataset with an arbitrary size. it's scaled to suit the target image of size
128×32.|
The empty a part of the target image is crammed with white color.

CNN output: Figure. 5 displays the output of the CNN layers which may be a sequence of
length 32. Each layer entry contains 256 features. All these features are further process
has been done by the RNN layers
RNN output: The given Figure shows a visualization of the RNN output matrix for a
picture containing the text “little”. The matrix shown within the top-most graph consists
of the scores for the characters included in the Connectionist Temporal Classification
blank label as its last entry.
Only the last character “e” isn't aligned. But this can be OK because the CTC operation is
segmentation-free and doesn't care about absolute positions. From the bottom-most
graph showing the scores for the characters “l”, “i”, “t”, “e” and also the CTC blank label,
the text can easily be decoded: we just take the foremost probable character from each
time-step, this forms the so-called best path, then we throw away repeated characters and
at last all blanks: “l---ii--t-t--l-...-e” → “l---i--t-t--l-...-e” → “little”

Fig. 6: Top: output matrix of the RNN layers. Middle: input image. Bottom:
Probabilities for the characters “l”, “i”, “t”, “e” and therefore the CTC blank label.

D. Implementation using TF:


We only have a look at Model.py, because the other source files are concerned with basic
file IO (DataLoader.py) and image processing (SamplePreprocessor.py).

CNN: For each CNN layer, create a kernel of size k×k to be utilized in the convolution
operation. Then, RELU operation again to the pooling layer with results of the
convolution.

RNN: Create and stack the two RNN layers Then, create a bidirectional RNN from it, such
the input sequence is traversed from front to back and therefore the other way round. As
a result, we get two output sequences forward and backward
Finally, it's mapped to the output sequence (or matrix) which is fed into the CTC layer.

CTC: For loss calculation, we feed both the bottom truth text and therefore the matrix to
the operation.

The length of the input sequences must be given. We now have all the input files to make
the loss operation and therefore the decoding operation.

Training:
The mean of the loss values of the batch elements is employed to coach the NN

E. Improving the model:


In case you want to feed complete text-lines as shown in given Figure instead of word-
images, you have to increase the input size of the NN.
Fig. A complete text-line can be fed into the NN if its input
size is increased (image is taken from IAM).

If you want to improve the recognition accuracy, you can follow one of these hints:
 Remove cursive writing style in the input images

 Increase input size (if an input of NN is large, complete text-lines can be used)

F. Spell checker

The python package pyspellchecker provides us this feature to search out the words that will are
misspelled and also suggest the possible corrections.

III. THE ARCHITECTURE OF THE PROPOSED NETWORK

The proposed network has different layers which are as shown in the following figure:

Fig. : Architecture of proposed network

A. CNN layers

Fig. : CNN layer


CNN is meant to imitate human visual processing, and it's highly optimized structures to process
2D images. Further, it can effectively learn the extraction and abstraction of 2D features. In detail,
the max-pooling layer of CNN is extremely effective in absorbing shape variations.

Most significantly, CNN is trainable and can produce highly optimized weights and good
generalization performance.

B. RNN layer

RNN have a “memory” which remembers all information about what has been calculated. It uses
the same parameters for each input as it performs the same task on all the inputs or hidden layers
to produce the output. This reduces the complexity of parameters, unlike other neural networks.

 RNN provides the same weights and biases to all the layers, thus reducing the
complexity and memorizes each previous output by giving each output as input to
the next hidden layer.
 Hence these three layers can be joined together such that the weights and bias of
all the hidden layers are the same, into a single recurrent layer.
 The formula for calculating current state:

 The formula for applying Activation function(tanh):

 Formula for calculating output:

C. CTC layer

CTC is a loss function that is employed to coach neural networks. There's no must align
data because we assign a probability for any label. CTC is alignment free. It works on the
summing over probability of all possible alignments between the input and also the label.
Blank token:

CTC removes any repeating characters that it finds. However some words have letters that
are repeated more than once like the letter l in hello and they may inevitably end up
getting removed. There is a way around that called the Blank token. It doesn’t mean
anything and simply gets removed before the final word output is produced.
1. CTC network assigns the character with the simplest probability of every input
sequence.
2. Repeats not having a blank token in between get merged.
3. Lastly, the blank token gets removed.

The CTC network can then give the probability label to the input. By summing all the
probabilities of the characters of each time step. The CTC algorithm is alignment-free — it
doesn’t require alignment between the input and thus the output. However, to induce the
probability of output given an input, CTC works by summing over the probability of all
possible alignments between the two.

D. Summary of Dataset

The IAM Handwriting Database contains forms of handwritten English text which can be used to
train and test handwritten text recognizers and to perform writer identification and verification
experiments.

Characteristics of IAM Dataset

 657 writers contributed samples of their writings.

 1532 pages of scanned text.

 5685 isolated and labeled sentences.

 13353 isolated and labeled text lines.

 115320 isolated and labeled words.

IV. RESULTS

In this project we have given image as an input then it predicts the output by loading the model
which is already previously created and saved.

The above image is the input given to the neural network to predict the solution.
This is image which shows the output to the above input image.

V. CONCLUSION AND FUTURE SCOPE


In this project classification of characters takes place. The project is achieved through the
conventional neural network.
This algorithm will provide both the efficiency and effective result for the recognition. The
project gives best accuracy for the text which has less noise. The accuracy completely
depending on the dataset if we increase the data, we can get more accuracy. If we try to
avoid cursive writing then also its best results.

Future Work:

In future we are planning to extend this study to a large variety of datasets. The future is
completely based on technology. No one will use paper and pen for writing. In that
scenario they will write on touch pads so the inbuilt software can automatically detect
text which they write and convert into digital text so that searching and understanding
becomes simplified.
Deep, Big, Simple Neural Nets for Handwritten Digit Recognition

Introduction: Automatic handwriting recognition is of academic and commercial interest.


Post offices use them to sort letters and banks to read personal checks. The dataset used
in this research paper is MNIST dataset.
Before year 2000, artificial neural networks called multilayer perceptrons (MLPs) were
among the first classifiers tested on MNIST. An MLP with a single hidden layer of 800
units achieved 0.70% error tested on MNIST dataset. Convolutional neural networks
(CNNs) achieved a record-breaking 0.40% error rate using novel elastic training image
deformations. These experiments and research required deep MLPs to be pre-trained in
an unsupervised fashion to use them and get desired results with some error. These
complexifications are done because we can’t simply train really big plain MLPs on MNIST.
One reason is that at first glance, deep MLPs do not seem to work better than shallow
networks Training them is hard, as back propagated gradients quickly vanish
exponentially in the number of layers, just as the first recurrent neural networks. Training
large MLPs using Backpropagation method is also very difficult as it takes a very large
amount of time (even weeks or months) on standard serial computers. Parallelization of
MLPs can help solve this problem but only with the help of GPUs.
Details about Dataset: MNIST consists of two data sets: one for training (60,000 images)
and one for testing (10,000 images). More training instances were also created by
deforming images.

Architecture: We train five MLPs with two to nine hidden layers and varying numbers of
hidden units. ]. Each neuron’s activation function is a scaled hyperbolic tangent: y(a) =
Atanh Ba, where A= 1.7159 and B = 0.6666.
GPU Implementations: -
1. Deformations: Using GPU instead of CPU, generating the elastic displacement field
takes only 3 seconds. Deforming the whole training set is more than 10 times
faster, taking 9 instead of the original 93 seconds.
2. Training Algorithm: We closely follow the standard BP algorithm except that BP of
deltas and weight updates are disentangled and performed sequentially. The
training algorithm involves 2 algorithms: -
a. Forward Propagation
b. Backward Propagation
c. Weight Updating: The associated algorithm starts by reading the
appropriate delta and precomputes all repetitive expressions.
Results: -
1. The GPU accelerates the deformation routine by a factor of 10 (only elastic
deformations are GPU optimized); the forward propagation (FP) and BP routines
are speed up by a factor of 40.
2. Most remarkable, the best network has an error rate of only 0.35% (35 out of
10,000 digits). This is significantly better than the best previously published
results—0.39% by Ranzato et al. (2006) and 0.40% by Simard et al. (2003), both
obtained by more complex methods.
3. The best test error of this MLP is even lower (0.32%) and may be viewed as the
maximum capacity of the network.

Conclusions: -
1. This ongoing hardware progress may be more important than advances in
algorithms and software (although the future will belong to methods combining
the best of both worlds).
2. Current graphics cards (GPUs) are already more than 40 times faster than standard
microprocessors when it comes to training big and deep neural networks by the
ancient algorithm, online backpropagation (weight update rate up to 5 × 109/s and
more than 1015 per trained network).
Recognition of Handwritten Characters using Deep Convolutional
Neural Network

I. INTRODUCTION
The main difficult part of Indian handwritten recognition is overlapping between
characters. These overlapping shaped characters are difficult to recognize and may lead to
low recognition rate. These factors also increase the complexity of handwritten character
recognition. This paper proposes a new approach to identify handwritten characters for
Telugu language using Deep Learning (DL). The proposed work can be enhance the
recognition rate of individual characters.
The objective of this work is to effectively extract the topo-logical features for
handwritten numerals recognition using convolutional Neural Network (CNN) with deep
network. The rest of the paper is organized as follows. Section II describes the present
work and its methodology. The experimental setup is discussed in III with discussions. In
Section IV concludes with remarks of proposed work in the end.

II. PROPOSED FRAMEWORK TO RECOGNIZEHANDWRITTEN NUMERALS FOR


TELUGU
In this section, we discuss the proposed method to recognize handwritten numerals. The
framework involves several stages to accurately identify digits.
o Step i collected Telugu Handwritten Samples from 280 individuals which are in
different styles and shapes.
o Step ii Collected samples are pre-processed to scale the images by 32x32 in
grayscale format.
o Step iii The dataset is divided into two groups as train and test phase. In training
phase, we have considered 200 individual samples i.e. 2000 numerals and for
testing phase 80 individual samples i.e. 800 numerals.
o Step iv We implemented a Convolutional Neural Network (CNN) model in Python.
The training samples are input to CNN and that adjusted the weights to accurately
recognize digits.
Our work utilized the LeNet-5 architecture to recognize the handwritten digits. It has
several stages viz. convolution, sub-sampling and fully connected layer. In this
architecture, the image is scaled to 32x32. The scaled image forwarded to convolution
layer, where there are 6 kernel functions are applied with a size of 5x5. These kernels can
detect various edges from image.
The next layer, is a sub-sampling that reduce the dimensional of the image with 2x2 map
pooling and a stride of 2 with of every kernel function.
The proposed method CNN model is implemented in Python that can accurately recognize
the digits. This procedure re-quires both training and testing patterns involves
transformed into gray scale and image re-size an normalized each image. Then the
training function will call to build a CNN model and supplies the test samples to measure
the performance of recognition rate. In CNN, the loss function and optimization of weights
are calculated at every epoch
Telugu language is one of the popular in south India. The typical handwritten samples are
shown in Fig. 3. These samples are pre-processed and segmented into gray scale images.

Fig. 3. Diversity in handwritten numerals of Telugu language

The dataset collection process carefully designed and main-tained diversity of across the
gender. It has collected man-ually from 280 individuals with various handwritten styles.
The collected samples are digitized and segmented to create numeral database. While
segmentation the image are scaled to 32x32 with out any loss of information. With the
diversity of writing style, handwritten recognition allows us to aimed to process and
extract different features that can distinguish between numerals.

III. EXPERIMENTAL RESULTS AND DISCUSSIONS


In this section, we describe the experimental setup to examine the performance of
proposed method in recognition of handwritten numerals of Telugu and also measures
the clas-sification rate of each numeral with CNN based classifiers.
The network has convolutional operation and sampling operations are involved. Given
any 32x32 image will be reduced to 84 features that can be distinguish each digit. The
deep network extracts the features based on the weights of each kernel derived from
within a portion of image.
The proposed method then evaluated on test dataset to measure the overall accuracy and
individual digit recognition rate. We ran different iterations by varying the neural
network parameters that can observed in lot of variation in accuracy. There are total of
2000 samples are used to build a model and 800 samples are used to recognize the new
handwritten samples.

The overall accuracy is captured of different epochs is shown in Fig. 4. At the 50 th epoch,
we measured heights accuracy of 94% is captured. In our experiments, we also captured
the each digit recognition rate that gives the insight of how each digits can be accurately
identified. Most of the digits are recognized more the 95% except digit 4, digit 7 and digit
9 are very low in accuracy due to ambiguity in patterns. These digits are closely resemble
each other that can successfully recognize individual patterns. The highest recognition
rate is captured digit is ―3‖ with 98%. The detailed accuracy of each digit is shown in
Fig. 5.

Fig. 4. Recognition rate

Fig. 5. Recognition rate of individual numeral

IV. CONCLUSION
An efficient handwritten character recognition approach is proposed in this work. The
proposed work employed a deep network model to recognize the handwritten numerals.
The experimental results shown that convolutional neural networks performance better
in recognition for Telugu language. We have captured the low recognition rate of three
digits viz. 3,7 and 9 can be improved by introducing the new kernel methods. The possible
reason may be a overlapping patterns among them.

***********

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy