Handwritten Bengali Alphabets, Compound Characters and Numerals Recognition Using CNN-based Approach
Handwritten Bengali Alphabets, Compound Characters and Numerals Recognition Using CNN-based Approach
net/publication/372337900
CITATIONS READS
0 73
3 authors:
Ebrahim Hossen
Pabna University of Science and Technology
2 PUBLICATIONS 0 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Md Asraful on 13 July 2023.
Research Article
Received: 13th May 2023; Accepted: 19th June 2023; Published: 1st July 2023
Abstract: Accurately classifying user-independent handwritten Bengali characters and numerals presents a
formidable challenge in their recognition. This task becomes more complicated due to the inclusion of numerous
complex-shaped compound characters and the fact that different authors employ diverse writing styles. Researchers
have recently conducted significant researches using individual approaches to recognize handwritten Bangla digits,
alphabets, and slightly compound characters. To address this, we propose a straightforward and lightweight
convolutional neural network (CNN) framework to accurately categorize handwritten Bangla simple characters,
compound characters, and numerals. The suggested approach exhibits outperformance in terms of performance
when compared too many previously developed procedures, with faster execution times and requiring fewer
epochs. Furthermore, this model applies to more than three datasets. Our proposed CNN-based model has achieved
impressive validation accuracies on three datasets. Specifically, for the BanglaLekha isolated dataset, which
includes 84-character classes, the validation accuracy was 92.48%. On the Ekush dataset, which includes 60-character
classes, the model achieved a validation accuracy of 97.24%, while on the customized dataset, which includes 50-
character classes, the validation accuracy was 97.03%. Our model has demonstrated high accuracy and outperformed
several prominent existing frameworks.
Keywords: Bangla Handwritten Recognition; Convolutional Neural Network; Deep Learning; Image Classification;
Pattern Recognition
1. Introduction
Bengali is an official language in Bangladesh and in the Republic of India, many Indians with
approximately 300 million individuals worldwide communicate in Bengali as their primary language. The
identification of handwritten Bangla characters and numerals has gained tremendous popularity as a
research field in recent years, primarily due to its numerous practical applications [1]. The vast application
areas of Bengali characters and numeral identification encompass a wide range of domains, including but
not limited to bank cheques, postal addresses, optical image recognition, product expiration dates, post
office automation, number plate reading, image-to-speech conversion with text recognition, Bengali identity
card authentication, and real-time vehicle tracking [2]. Additionally, Bengali character and numeral
identification also find their use in document analysis and recognition, particularly in digitizing
handwritten documents such as historical records, manuscripts, and old texts in the Bengali language. In
the healthcare industry, people also use it for handwritten text transcription, signature verification and
authentication, and processing of handwritten medical records, prescriptions, and patient forms.
Furthermore, with the increasing trend of e-commerce and online transactions, Bengali characters and
numeral identification can help verify customer information, addresses, and payment details. We can use
Md Asraful, Md. Anwar Hossain and Ebrahim Hossen, “Handwritten Bengali Alphabets, Compound Characters and Numerals
Recognition Using a CNN-based Approach”, Annals of Emerging Technologies in Computing (AETiC), Print ISSN: 2516-0281, Online ISSN:
2516-029X, pp. 60-77, Vol. 7, No. 3, 1 st July 2023, Published by International Association for Educators and Researchers (IAER), DOI:
10.33166/AETiC.2023.03.003, Available: http://aetic.theiaer.org/archive/v7/v7n3/p3.html.
AETiC 2023, Vol. 7, No. 3 61
it to recognize handwritten answers and exam sheets for assessment and grading purposes. The
entertainment industry also benefits from this technology by enabling the creation of Bengali language
content, such as comics and novels, in digital form. It also has applications in linguistics research for
analysing handwritten samples for research purposes. Moreover, we can achieve personalization by
recognizing an individual handwriting style for personalized communication and authentication. Lastly,
integrating Bengali character and numeral identification for text recognition and image processing features
in mobile and web applications is becoming increasingly popular. Identifying Bengali characters and
numerals is a critical concern that requires considerable attention to progress other active applications. The
size and shape of different individual handwriting vary, making the task of character recognition
challenging due to the numerous variations in each character’s writing style [3]. The challenges of
identifying Bengali characters far outweigh those of their English counterparts due to the inclusion of
conjunct consonants in the Bengali alphabet, which are more complex to classify because they consist of two
distinctive Bengali characters. The identification of such letters by a learning algorithm can prove
problematic, as they may be identified as either one or the other, leading to inaccuracies.
Furthermore, as many Bengali characters are equivalent to one another, differentiating between them
can be daunting. For instance, the initial "অ" and the second "আ" vowels in a single vertical line separate
the Bengali alphabet, which presents a significant obstacle in character identification [4]. Although we have
progressed, accurately recognizing complexly formed compound handwritten characters remains a
significant challenge. The deep learning methodology of CNN has made remarkable progress in
distinguishing Bangla handwritten characters by training on vast amounts of raw data to identify
discriminative features [5]. To overcome the challenges in recognizing Bengali characters and numerals our
study addresses the challenge of accurately identifying handwritten Bengali characters and numerals using
a CNN-based model. We compared our proposed model with existing models commonly used in the
literature, including CNN, MQDF, and ResNet-18, and found that our model outperforms these previous
models with an accuracy rate of 92.48%, 97.24%, and 97.03% for respective datasets. Our proposed model
also showed faster execution times and required fewer epochs than previous models. The main contribution
of our research is introducing a comparatively simple and lightweight CNN-based model composed of 12
sequential layers, including five convolution layers, three pooling layers, and four dense layers. Despite its
uncomplicated structure, our model accurately identified handwritten Bengali characters and numerals
across 84, 60, and 50 classes. This approach provides a valuable alternative to previous, more complex
models and offers improved performance for this task.
2. Related Works
In this section, we describe previously completed tasks related to Bangla handwritten recognition
research area. Prior research in Bangla handwritten character classification has primarily focused on
recognizing handwritten Bangla simple characters and numerals, with fewer studies focusing on
identifying handwritten compound characters in Bengali. Despite that, the utmost of the exploration has
been done on this task independently.
Rahman et al. [3] suggested that a CNN model can recognize only simple Bangla characters (50 classes)
to achieve a testing accuracy of 85.36%. This model performs better with higher iterations closer to 300.
Their model consists set of convolutional and other layers may achieve comparatively lower accuracy. A
significant limitation of the model is its lower testing accuracy (85.36%) compared to its training accuracy
(93.93%). So, the model overfitting to the training data. Our suggested method outperformed a well-known
existing method in recognizing a greater number of classes (84 and 60) with fewer iterations (50 epochs).
Our method achieved a training accuracy of 97% and a testing accuracy of 96%. The achieved accuracy is
less distinguishable accuracy compared to the training accuracy.
Pal et al. [6] used the Modified Quadratic Discriminant Function (MQDF) to recognize Bangla
compound characters and achieved 85.90 percent accuracy. MQDF is a cutting-edge classifier for
handwriting recognition. Despite fitting the training data well, MQDF generalization performance is poor,
as shown by a significant gap between training and test accuracy. We propose a CNN-based model to
address the limitations of the suggested model and better performance with higher accuracy and
www.aetic.theiaer.org
AETiC 2023, Vol. 7, No. 3 62
robustness. Our proposed model also performed combined characters such as simple and compound
character sets else numerals.
Purkaystha et al. [7] designed a deep convolutional neural network (DCNN) approach for recognizing
Bengali characters. The model achieves an accuracy of 91.23% for alphabet recognition (50-character
categories) and 89.93% for recognition of almost all Bengali characters (80-character categories). They used
a comparatively complex approach with more layers in their method. In addition, we proposed a
comparatively simple and lightweight CNN approach that achieves accuracy rates of 97.03% for alphabets
(50-character classes) and achieves accuracy rates of 92.48% for almost all Bengali characters (84 - character
classes). Our method surpasses the previously suggested approach.
Khandokar et al. [5] developed a CNN model that trains on 1000 Bengali handwritten characters and
tests on 200 handwritten characters. Its achieved accuracy rate is 92.91 percent. Working with a less amount
of data results in a limited variety of characters. The model overfits when increasing the number of
iterations. So, it is a significant limitation for any good deep learning model. However, our proposed model
overcomes all the previously mentioned limitations, performs well on a comparatively large dataset, and
achieves good accuracy.
Hossain et al. [8] proposed a model that performs on 60 classes of Bangla handwritten characters. It
achieves an accuracy of 93.2 percent. This level of accuracy is proportionately good from other existing
previously mentioned models. Since it accomplishes numerals and Bangla simple handwritten characters
recognition, this model’s execution time is a prominent point for model fit. However, the model keeps
overfit and underfit. That is a significant limitation. However, this model is insufficient for better
performance. Our deep learning model gets the better result than the previous model's mentioned
limitations and achieves an accuracy of 97.24%. Our model training and validation accuracy learning graph
indicate that this model is the best fit for the deep learning approach compared to the previously mentioned
model.
Alif et al. [9] suggested a modified ResNet-18 architecture for recognizing isolated handwritten Bangla
characters belonging to 84 different classes. The model achieves an accuracy of 95.99 percent. ResNet-18 is
a pre-trained deep learning model that uses 72 layers and 18 deep layers, designed for a higher proportion
of feature extraction layers to operate efficiently. However, integrating multiple deep layers into a network
can lead to deterioration of output quality, and the model complexity is high with a large number of
weights. Our proposed approach overcomes these shortcomings and achieves higher accuracy than ResNet-
18 for this research domain.
Rabby et al. [10] investigated a lightweight CNN prototype for identifying handwritten Bengali
numerals. The prototype surpasses all previously used methods with a higher accuracy (99.74%) and a faster
execution time achieved in fewer epochs. The existing deep learning methods have shown impressive
results in recognizing Bengali handwritten numeral character sets with 10 classes. However, our utilized
technique surpasses these methods by accurately recognizing almost all character sets separately for 84
classes, 60 classes, and 50 classes.
Reddy and Raju [11] suggested model works on numerals getting accuracy gradually 99.74%, 97.07%,
and 99.9%. It is a massive performance for numerals recognition through all provided models that perform
only 10 classes. In this issue, our model performs more than 10 classes. Chowdhury et al. [12] proposed that
CNN-based method accomplishes 50 classes character set and achieves an accuracy of 95.25% executed for
70 epochs. In this scenario, our suggested model produced a faster execution time with fewer epochs and
achieved 97.03% accuracy. Our method recognizes areas for the significant achievement of the deep learning
approach. Saha et al. [13] stated that a deep learning approach for recognizing Bangla handwritten simple
characters of 50 classes achieved an accuracy of 96.40%. The recommended approach consists of six layers
of convolution, six layers of pooling, and two dense layers. This approach has a total of 14 layers. However,
our proposed model has only 12 layers, yet it achieves higher accuracy than the model mentioned above.
Roy [4] constructed a deep learning model for works on 84 classes of Bangla handwritten characters,
numerals, and compound characters, with an accuracy of 96.40%. It is an outstanding performance of works
on isolated character sets. It is the best out of 84 classes of isolated characters on work. Nevertheless, the
current implementation of the model entails twelve convolutional, four pooling, and five fully connected
layers. This framework is comparatively more complex and uses more hyperparameters, thus requiring
www.aetic.theiaer.org
AETiC 2023, Vol. 7, No. 3 63
high-capacity devices and longer processing time. We have designed our proposed CNN model intending
to achieve a faster execution by reducing the number of layers to 12, making it less complex and more
superficial. We expect that our model will attain greater accuracy with more epochs. Consequently, our
model is superior to the existing deep cognition models and significantly contributes to this research area.
Table 1 summarizes the research gap and critical findings for each study analyzed in this paper. The
table includes the name of the study, its focus, approach, number of classes, achieved accuracy, and the
research gaps addressed. The research gap column indicates the limitations or gaps addressed by each
study. The accuracy column reports the level of accuracy achieved by each proposed model for recognizing
handwritten Bengali numeral character sets. The approach column describes the proposed deep learning
approach, while the number of classes column indicates the number of distinct character sets recognized by
the model. This table offers an overview of the contributions of each study and the gap in the literature they
addressed.
Table 1. Summary of existing Handwritten Bengali Character Recognition models and research gap
Study Focus Approach Classes Accuracy Research Gap
Rahman et al. [3] Simple characters CNN 50 85.36% Recognizes only simple characters
Pal et al. [6] Compound MQDF 20,543 85.90% Limited generalization performance
characters characters
Purkaystha et al. Bengali character Deep CNN 80 89.93% Comparatively complex approach
[7] sets
Khandokar et al. Bengali CNN 1000 characters 92.91% Overfitting when the model iteration
[5] handwritten (training) and increases
characters 200 characters
(testing)
Hossain et al. [8] Simple and CNN 60 93.2% Overfitting and underfitting
numeral characters
Al Rabbani Alif Isolated ResNet-18 84 95.99% Complexity and limited output
et al. [9] handwritten quality
characters
Rabby et al. [10] Handwritten Lightweight 84, 60, 50, 10 99.74% Outperforms existing methods with
Bengali numeral CNN more classes
recognition prototype
Reddy and Raju Handwritten LR, SVM, 10, 50, 60, 84 97.07- Performs well in more than 10
[11] numeral KNN, CNN 99.9% classes
recognition
Chowdhury et al. Handwritten CNN-based 50 97.03% Faster execution time and higher
[12] character method accuracy than the existing method
recognition
Saha et al. [13] Handwritten Deep learning 50 96.40% 96.40% Proposed model achieves
Bangla simple approach with higher accuracy with fewer layers
character 14 layers
recognition
Roy [4] Handwritten Deep learning 84 96.40% The proposed model is less complex
Bangla character, model with 12 with faster execution time and can
numeral, and convolutional,4 achieve greater accuracy with more
compound pooling, and 5 epochs
recognition fully connected
layers
3. Background Study
3.1. Convolutional Neural Network
Convolutional Neural Networks are extensively used and it’s dominated machine learning, especially
in classifying images and patterns. These networks process images as objects in three-dimensional space,
incorporating the notion of three-dimensional objects. The influence of CNN on computer vision has
undergone a revolutionary transformation [14]. Handwriting recognition is essential in machine learning
and has many applications. Various techniques exist to transform written text on physical documents into
a format for machines. Character recognition technologies can support the move toward a paperless world
by facilitating the digitalization and processing of paper-based documents. CNNs are noteworthy as they
www.aetic.theiaer.org
AETiC 2023, Vol. 7, No. 3 64
mimic neural networks in the human brain and can detect and map spatial relationships in an image to
recognize and classify its content [5]. CNN uses a powerful fitting mechanism to identify unique features
in an image. During training, the network adjusts parameters, costs, and biases to convert input images to
feature vectors, leading to a deeper understanding of image characteristics. García-Ordás et al. [15]
suggested that the CNN process, shown in Figure 1, includes components such as convolution, pooling,
fully connected neural networks, and layers to tackle overfitting/underfitting.
www.aetic.theiaer.org
AETiC 2023, Vol. 7, No. 3 65
reshaping of the features using this technique leads to a decrease in dimensionality compared to the original
input.
3.5. Dropout
Connecting all the features to the FC layer often leads to overfitting in the training dataset. Overfitting
transpires when a model acts exceptionally better on the training data, yet when applied to new and
previously unexplored data; it adversely influences the model’s performance. During the training process,
the dropout layer disables a small number of neurons to tackle this problem. It’s leading to a smaller and
more efficient model. Upon reaching a dropout value of 0.25, 25% of the nodes in the neural network were
disabled at random. Dropout is an effective technique for enhancing the activity of a deep learning model
by mitigating overfitting through the simplification of the network. During training, the neural network
removes neurons using a dropout layer [18].
www.aetic.theiaer.org
AETiC 2023, Vol. 7, No. 3 66
known for their performance on ImageNet have found similar results: batch normalization is necessary for
training intense networks [19]. We have two alternatives for normalizing our data. The first method is
Normpoint−Mean
Normpoint (normalized) = Normpoint −Normpoint (5)
max min
Equation 5 for Normpoint is the normalized data point, Mean is the data set’s mean, Normpoint max
𝑎𝑥is the maximum value, and Normpointmin is the minimum value. Data inputs often employ this strategy.
For large-scale data, use standard deviation. The second method is
Normpoint−Mean
Normpoint (normalized) = std
(6)
Equation 6 for Normpoint is the normalized data point, Mean is the data set’s mean, and std is the set
standard deviation. Each data point now resembles a typical normal distribution. None of the features on
this scale will be biased, so our models will learn better.
www.aetic.theiaer.org
AETiC 2023, Vol. 7, No. 3 67
▪ Featurewise_center=False: Using feature-wise normalization this parameter sets the input mean to
zero over the entire dataset.
▪ Samplewise_center=False: The sample-wise centre parameter sets each sample’s mean to false per
sample.
▪ Featurewise_std_normalization=False: Transform the input by dividing each feature by the
dataset’s standard deviation in a feature-wise manner.
▪ Samplewise_std_normalization=False: For each sample, divide the inputs by the standard
deviation of the dataset using sample-wise.
▪ Zca_whitening=False: Apply ZCA whitening using zca_whitening.
▪ Rotation_range: We will rotate our training images by 15 degrees.
▪ Zoom_range: Randomly apply a 1% zoom to the training image.
▪ Height_shift_range and width_shift_range: We introduced variation in the dataset by randomly
shifting the images in both height and width by 10%.
▪ Horizontal_flip and vertical_flip: We have set the random flipping of images for both vertical and
horizontal orientations to false.
www.aetic.theiaer.org
AETiC 2023, Vol. 7, No. 3 68
layer. Additionally, we implemented a dropout layer in the model. Subsequently, the final dense layer
functions as the output for the model, providing the results of the classification process. The unit size of this
layer is the total number of classes to be classified, and softmax defines its activation function. In our
research, the suggested deep learning framework comprises a total of 9121300 parameters, with 9120404 of
them being trainable. The remaining 896 parameters are non-trainable. Table 3 enumerates all the
parameters required to establish our proposed model for classifying Handwritten Bengali alphabets,
numerals, and compound characters. The total parameters of the model are 9121300, with 9120404 being
trainable and 896 being non-trainable.
Table 3. Proposed CNN model for internal parameter
Type of layer Output Shape Parameters
Convolution 2D layer (None, height=64, width=64, filter size=64) 1792
Convolution 2D layer (None, height=64, width=64, filter size =64) 36928
MaxPooling 2D layer (None, height=32, width=32, filter size =64) 0
Batch normalization layer (None, height=32, width=32, filter size =64) 256
Dropout layer (None, height=32, width=32, filter size =64) 0
Convolution 2D layer (None, height=32, width=32, filter size =128) 73856
Convolution 2D layer (None, height=32, width=32, filter size =128) 147584
MaxPooling 2D layer (None, height=16, width=16, filter size =128) 0
Batch normalization (None, height=16, width=16, filter size =128) 512
Dropout layer (None, height=16, width=16, filter size =128) 0
Convolution 2D layer (None, height=16, width=16, filter size =256) 295168
MaxPooling layer (None, height=8, width=8, filter size =256) 0
Batch normalization (None, height=8, width=8, filter size =256) 1024
Dropout layer (None, height=8, width=8, filter size =256) 0
Flatten layer (None, 16384) 0
Fully connected layer (None, filter size =512) 8389120
Fully connected layer (None, filter size =256) 131328
Fully connected layer (None, filter size =128) 32896
Dropout layer (None, filter size =128) 0
Fully connected layer (None, filter size =84) 10836
Figure 2. CNN architecture for Bangla handwritten character recognition. Here conv stands for Convolutional layer
Figure 2 depicts the architecture of our stated CNN model. It takes in an RGB image with dimensions
of 64x64x3 as input. The initial two layers employ 64 filters of a 3x3 kernel size with the same padding,
resulting in a volume of 64x64x64. Then we employed a pooling layer with a max-pooling technique using
a pool size of 2x2. This operation resulted in a reduction of the height and width of the volume from
64x64x64 to 32x32x64. Subsequently, two additional convolutional layers with 128 filters each were applied,
resulting in a new volume dimension of 32x32x128. Following the utilization of the pooling layer, the
volume underwent a reduction, resulting in a new dimension of 16x16x128. An additional convolutional
www.aetic.theiaer.org
AETiC 2023, Vol. 7, No. 3 69
layer was incorporated, consisting of 256 filters, resulting in a new dimension of 16x16x256. After this, a
final pooling layer was applied, resulting in a new feature map dimension of 8x8x256. We flattened the
output, resulting in a feature vector with dimensions of (1x1x16384). The model utilized four fully connected
layers. The first layer received input from the final feature vector and produced an output vector with
dimensions of (1x1x512). The second and third layers generated output vectors with dimensions of
(1x1x256) and (1x1x128), respectively. Finally, the fourth layer produced an output of 84 channels to
correspond to the 84 classes. The model utilized a fourth fully connected layer, which implemented the
softmax function to classify the 84 classes. The activation function used for all the hidden layers was relu.
This architecture is commonly used for image classification tasks and has achieved high accuracy on various
datasets.
4.3.2. Proposed Model Algorithm
Algorithm 1. Bangla handwritten character recognition for CNN algorithm
Initialize model Model = Sequential ()
1 Convolution2D → Conv2D (filter, kernel dimensions, activation, padding, input shape)
2 Convolution2D → Conv2D (filter, kernel dimension, activation, padding, input shape)
3 Max pooling → MaxPooling2D (pool dimensions)
4 Batch Normalization ()
5 Dropout (rate)
6 Convolution2D → Conv2D (filter, kernel dimensions, activation, padding)
7 Convolution2D → Conv2D (filter, kernel dimensions, activation, padding)
8 Max pooling → MaxPooling2D (pool dimensions)
9 Batch Normalization ()
10 Dropout (rate)
11 Convolution2D → Conv2D (filter, kernel dimensions, activation, padding)
12 Max pooling → MaxPooling2D (pool dimensions)
13 Batch Normalization ()
14 Dropout (rate)
15 Flatten ()
16 Dense → Dense (Units, activation, regularizer= regularizers. l2(learning rate))
17 Dense → Dense (Units, activation, regularizer= regularizers. l2(learning rate))
18 Dense → Dense (Units, activation, regularizer= regularizers. l2(learning rate))
19 Dropout (rate)
20 Dense → Dense (total classification classes, activation="softmax"))
End
www.aetic.theiaer.org
AETiC 2023, Vol. 7, No. 3 70
the loss function. In training deep CNN models, the optimizer plays a vital role in performing iterative
updates on the parameters of all layers required to optimize the neural network’s performance. This
research’s crucial emphasis is evaluating the impact of Adam, RMSProp, and stochastic gradient descent
optimizers on our dataset [24].
4.4.3. Learning Rate
The learning rate is a significant hyper-parameter in a neural network that controls weight adjustments
based on loss gradient and determines how often the network updates learned perceptions. Selecting an
appropriate value is difficult, as too small a value can led to prolonged training, while too large a value can
cause unstable training or rapid learning of low weights. It is an essential hyperparameter to consider
during configuration, and tools like Adam and RMSprop can manually select the learning rate schedule in
each learning session [21, 25]. We have established the learning rate at 0.00004 and set the kernel regularization rate
to 0.001 in this model.
www.aetic.theiaer.org
AETiC 2023, Vol. 7, No. 3 71
6.1. Precision
Precision is the ratio of true positives (TP) to the total number of predicted positives (TP + false
positives (FP)). In other words, precision is the proportion of true positive predictions out of all positive
predictions made by the model. A high precision indicates that the model is good at identifying positive
cases and has a low false positive rate. The following formulas are to determine precision.
∑m
k=1|Gk ∪Ek |
Precision = ∑m
(7)
k=1|Ei |
6.2. Recall
Recall, also known as sensitivity or true positive rate, is the ratio of true positives (TP) to the total
number of actual positives (TP + false negatives (FN)). In other words, recall is the proportion of positive
cases the model correctly identified as positive. A high recall indicates that the model is good at identifying
all positive cases, including those that are hard to detect. We use the following mathematical expressions to
compute recall for a classification model.
∑m
k=1|Gk ∪Ek |
Recall = ∑m
(8)
k=1|Gk |
Where 𝐺𝑘 and 𝐸𝑘 denote gold-standard targets and model’s predicted levels for 𝑘 𝑡ℎ word such that 𝑊𝑘
∈ W. The intersection of gold-standard targets and the model’s predicted levels for a given word 𝑀𝑘 ∈ M is
considered as,
Gk ∪ Ek ≕ {E ∊ Ek ∣ ∃G ∈ Gk , Match(G, E) } (9)
6.3. F∝ score
The F∝ score is a weighted harmonic mean of precision and recall, where beta is a positive constant
that controls the trade-off between precision and recall. When ∝ set to 1, the F∝ score becomes the harmonic
mean of precision and recall, commonly known as the F1 score. The following formula using calculate F∝
score:
(1+∝2 ) ⨯ P⨯ R
F∝ = (10)
(∝2 ⨯P)+R
Where P denotes the Precision and R denotes the Recall. The F∝ score is a useful metric for evaluating
a classification model’s overall performance, as it considers both precision and recall and allows for a
customizable balance between the two.
www.aetic.theiaer.org
AETiC 2023, Vol. 7, No. 3 72
effectiveness over a corpus. We consider a prediction positive if top-L predictions match any outcome in
the desired corpus. To determine whether a prediction is positive, use the following formula.
1, if holds Est (y) = p(x) = L ∊ W
G(x) = { (13)
0, Otherwise
Similar to the Match Value metric, the Update Accuracy measures the ratio of total positive predictions
to the instances in the corpus. A higher Update Accuracy score indicates better model performance. To
compute the Update Accuracy score, use the following formula.
∑𝑃𝑃𝑁
1 𝐺(𝑥)
𝑈𝑝𝑑𝑎𝑡𝑒 𝑆𝑐𝑜𝑟𝑒 = 𝑃𝑃𝑁
(14)
The sum of G(x) from 1 to PPN represents the count of accurate positive predictions, where PPN is the
number of instances in the corpus.
Where 𝑃𝑗 (𝑖)are usually the activations of a fully connected layer, so 𝑃𝑗 (𝑖) . That expressed as
𝑃𝑗 (𝑖) = 𝑊𝑒𝑖𝑔ℎ𝑡𝑗𝑁 𝐶 (𝑖) + 𝑄𝑗 (16)
The softmax function turns raw prediction scores into a probability distribution over classes by making
them non-negative and normalizing them. We then use this distribution to compute the softmax loss, which
measures the difference between predicted and actual label distributions. The goal is to minimize this
difference and improve the classifier's accuracy:
1 𝑝𝑝𝑛
𝐿𝑂𝑆𝑆 = − 𝑃𝑃𝑁 [∑𝑘=1 ∑𝑀 𝑙=1{𝑦
(𝑘)
= 𝑗} log 𝑀𝑙 (𝑘) ] (17)
www.aetic.theiaer.org
AETiC 2023, Vol. 7, No. 3 73
state-of-the-art models and examining any potential limitations or challenges encountered during the
model creation and training process.
(a) (b)
Figure 3. (a) Training and Validation accuracy for BanglaLekha dataset (b) Training and Validation loss for
BanglaLekha dataset
(a) (b)
Figure 4. (a)Training and Validation accuracy for Ekush dataset (b)Training and Validation loss for Ekush dataset
(a) (b)
Figure 5. (a) Training and Validation accuracy for custom dataset (b) Training and Validation loss for custom
dataset
www.aetic.theiaer.org
AETiC 2023, Vol. 7, No. 3 74
In the following comparison, Table 6 presents an evaluation of various deep learning models utilized
for Bangla handwritten character recognition. The study assesses models including CNN, MQDF, and
ResNet-18 that other researchers have previously proposed. In addition, the proposed model is also
evaluated and compared to these models in terms of accuracy and limitations.
Our proposed model outperforms previous models with accuracies of 92.48%, 97.24%, and 97.03%,
respectively, and shows faster execution times and fewer required epochs. Limitations of previous models
include low accuracy, restrictions to certain types of characters, overfitting, underfitting, high complexity,
and large numbers of weights. The comparison table comprehensively evaluates various deep-learning
models for Bangla handwritten character recognition.
Table 6. Comparison of the proposed model with other existing different deep learning models
Research paper Proposed model Classes Validation Necessary comments
Accuracy
Rahman et al. [3] CNN 50 85.36% Low accuracy, requires high iterations
Pal et al. [6] MQDF 20,543 characters 85.90% Limited to compound characters, generalization
performance not encouraging
Khandokar et al. [5] CNN 1200 characters 92.91% Overfitting when iterations increase
Hossain et al. [8] CNN 60 93.2% Overfitting and underfitting
Al Rabbani Alif et al. ResNet-18 84 95.99% High complexity, a large number of weights, not
[9] suitable for this specific research domain
Saha et al. [13] Deep learning 50 96.40% Six layers of convolution, six layers of pooling,
and two dense layers; more experimental results
needed
Roy [4] Deep learning 84 96.40% Twelve convolutional layers, four pooling
layers, and five fully connected layers; require
high-capacity devices and longer processing
time
Our Proposed Model CNN 84, 60, 50 92.48%, A lightweight model with 12 layers; outperforms
97.24%, previous approaches in terms of performance
97.03% with faster execution times and fewer epochs
required
(a) (b)
Figure 6. (a) Graphical user interface built in python (b) Model read from interface image
www.aetic.theiaer.org
AETiC 2023, Vol. 7, No. 3 75
Acknowledgment
We thank the Department of Information and Communication Engineering, Pabna University of
Science and Technology for assisting with this research.
References
[1] Nishatul Majid and Elisa H. Barney Smith, “Introducing the Boise State Bangla Handwriting Dataset and an
Efficient Offline Recognizer of Isolated Bangla Characters”, in Proceedings of the 16th IEEE International Conference
on Frontiers in Handwriting Recognition 2018 (ICFHR '18), 05-08 August 2018, Niagara Falls, NY, USA, E-ISBN:978-
1-5386-5875-8, Print on Demand(PoD) ISBN: 978-1-5386-5876-5, DOI: 10.1109/ICFHR-2018.2018.00073, pp. 380-
385, Published by IEEE, Available: https://ieeexplore.ieee.org/abstract/document/8583791.
[2] Asfi Fardous and Shyla Afroge, “Handwritten Isolated Bangla Compound Character Recognition”, in Proceedings
of the IEEE International Conference on Electrical, Computer and Communication Engineering 2019 (ECCE '19), 07-09
February 2019, Cox's Bazar, Bangladesh, E-ISBN:978-1-5386-9111-3, Print on Demand(PoD) ISBN: 978-1-5386-
9112-0, DOI: 10.1109/ECACE.2019.8679258, pp. 01-05, Published by IEEE, Available:
https://ieeexplore.ieee.org/abstract/document/8679258.
[3] Md. Mahbubar Rahman, M. A. H. Akhand, Shahidul Islam, Pintu Chandra Shill and M. M. Hafizur Rahman,
“Bangla Handwritten Character Recognition using Convolutional Neural Network”, International Journal of Image,
Graphics and Signal Processing, Print ISSN: 2074-9074, Online ISSN: 2074-9082, pp. 42–49, Vol. 7, No. 8, 8th July 2015,
Published by MECS Press, DOI: 10.5815/ijigsp.2015.08.05, Available: https://www.scinapse.io/papers/755956977.
[4] Akash Roy, “AKHCRNet: Bengali handwritten character recognition using deep learning”, Computing Research
Repository, Online ISSN: 2331-8422, Vol. 2008.12995, 23rd January 2021, DOI: 10.48550/arXiv.2008.12995, Available:
https://dblp.org/rec/journals/corr/abs-2008-12995.
[5] I Khandokar, Md M Hasan, F Ernawan, Md S Islam and M N Kabir, “Handwritten character recognition using
convolutional neural network”, in Proceedings of the 7th International Conference on Mathematics, Science, and
Education 2020 (ICMSE '20), 06 October 2020, Semarang, Indonesia, vol. 1918, no. 4, p. 042152, DOI: 10.1088/1742-
6596/1918/4/042152, Published by IOP, Available: https://iopscience.iop.org/article/10.1088/1742-
6596/1918/4/042152/meta.
[6] U. Pal, T. Wakabayashi and F. Kimura, “Handwritten Bangla Compound Character Recognition Using Gradient
Feature”, in Proceedings of the 10th IEEE International Conference on Information Technology 2007 (ICIT '07), 17-20
December 2007, Rourkela, India, Print ISBN: 0-7695-3068-0, DOI: 10.1109/ICIT.2007.62, pp. 208-213, Published by
IEEE, Available: https://ieeexplore.ieee.org/abstract/document/4418297.
[7] Bishwajit Purkaystha, Tapos Datta and Md Saiful Islam, “Bengali handwritten character recognition using deep
convolutional neural network”, in Proceedings of the 20th IEEE International Conference of Computer and Information
Technology 2017 (ICCIT '17), 22-24 December 2017, Dhaka, Bangladesh, E-ISBN:978-1-5386-1150-0, Print on
Demand(PoD) ISBN: 978-1-5386-1151-7, USB ISBN: 978-1-5386-1149-4, DOI:10.1109/ICCITECHN.2017.8281853,
pp. 01-05, Published by IEEE, Available: https://ieeexplore.ieee.org/abstract/document/8281853.
[8] Md. Anwar Hossain, Mirza A. F. M. Rashidul Hasan, A. F. M. Zainul Abadin and Nafiul Fatta, “Bangla
Handwritten Characters Recognition Using Convolutional Neural Network”, Australian Journal of Engineering and
www.aetic.theiaer.org
AETiC 2023, Vol. 7, No. 3 76
Innovative Technology, Print ISSN: 2663-7790, Online ISSN: 2663-7804, pp. 27–31, Vol. 4, No. 2, 31th March 2022,
Published by UniversePG, DOI:10.34104/ajeit.022.027031, Available: https://universepg.com/journal-details/317.
[9] Mujadded Al Rabbani Alif, Sabbir Ahmed and Muhammad Abul Hasan, “Isolated Bangla handwritten character
recognition with convolutional neural network”, in Proceedings of the 20th IEEE International Conference of Computer
and Information Technology 2017 (ICCIT '17), 22-24 December 2017, Dhaka, Bangladesh, E-ISBN:978-1-5386-1150-0,
Print on Demand(PoD) ISBN: 978-1-5386-1151-7, DOI:10.1109/ICCITECHN.2017.8281823, pp. 01-06, Published by
IEEE, Available: https://ieeexplore.ieee.org/abstract/document/8281823.
[10] AKM Shahariar Azad Rabby, Sheikh Abujar, Sadeka Haque and Syed Akhter Hossain, "Bangla Handwritten Digit
Recognition Using Convolutional Neural Network", In Advances in Intelligent Systems and Computing: Emerging
Technologies in Data Mining and Information Security, Singapore: Springer Nature, 2018, Vol. 755, pp 111–122, Print
ISBN: 978-981-13-1950-1, Online ISBN: 978-981-13-1951-8, DOI: 10.1007/978-981-13-1951-8_11, Available:
https://link.springer.com/chapter/10.1007/978-981-13-1951-8_11.
[11] R. Pradeep Kumar Reddy and C .Naga Raju, “Comparative Analysis of Handwritten Digit Recognition Using
Logistic Regression, SVM, KNN and CNN Algorithms”, Journal of Science and Technology, ISSN: 2456-5660, pp. 94-
102, Vol. 6, No. 6, November-December 2021, Published by Longman Publishers, DOI:
doi.org/10.46243/jst.2021.v6.i06.pp94-102, Available: https://jst.org.in/previous-issue.php?id=40.
[12] Rumman Rashid Chowdhury, Mohammad Shahadat Hossain, Raihan ul Islam, Karl Andersson and Sazzad
Hossain, “Bangla Handwritten Character Recognition using Convolutional Neural Network with Data
Augmentation”, in Proceedings of the 8th IEEE International Conference on Informatics, Electronics & Vision 2019 (ICIEV
'19) and 3rd International Conference on Imaging, Vision & Pattern Recognition 2019 (icIVPR '19), 30 May 2019 - 02
June 2019, Spokane, WA, USA, E-ISBN:978-1-7281-0788-2, Print on Demand(PoD) ISBN: 978-1-7281-0789-9, Print
ISBN: 978-1-7281-0786-8, DOI: 10.1109/ICIEV.2019.8858545, pp. 318-323, Published by IEEE, Available:
https://ieeexplore.ieee.org/abstract/document/8858545.
[13] Chandrika Saha, Rahat Hossain Faisal and Md. Mostafijur Rahman, “Bangla Handwritten Basic Character
Recognition Using Deep Convolutional Neural Network”, in Proceedings of the 8th IEEE International Conference on
Informatics, Electronics & Vision 2019 (ICIEV '19) and 3rd International Conference on Imaging, Vision & Pattern
Recognition 2019 (icIVPR '19), 30 May 2019 - 02 June 2019, Spokane, WA, USA, E-ISBN:978-1-7281-0788-2, Print on
Demand(PoD) ISBN: 978-1-7281-0789-9, Print ISBN: 978-1-7281-0786-8, DOI: 10.1109/ICIEV.2019.8858575, pp. 190-
195, Published by IEEE, Available: https://ieeexplore.ieee.org/abstract/document/8858575.
[14] Tanuja Kumari, Yatharth Vardan, Prashant Giridhar Shambharkar and Yash Gandhi, “Comparative Study on
Handwritten Digit Recognition Classifier Using CNN and Machine Learning Algorithms”, in Proceedings of the
6th IEEE International Conference on Computing Methodologies and Communication 2022 (ICCMC '22), 29-31 March
2022, Erode, India, E-ISBN:978-1-6654-1028-1, Print on Demand(PoD) ISBN: 978-1-6654-1029-8, DVD ISBN: 978-1-
6654-1027-4, DOI: 10.1109/ICCMC53470.2022.9753756, pp. 882-888, Published by IEEE, Available:
https://ieeexplore.ieee.org/abstract/document/9753756.
[15] María Teresa García-Ordás, José Alberto Benítez-Andrades, Isaías García-Rodríguez, Carmen Benavides and
Héctor Alaiz-Moretón, “Detecting Respiratory Pathologies Using Convolutional Neural Networks and
Variational Autoencoders for Unbalancing Data”, Sensors, ISSN: 1424-8220, p. 1214, Vol. 20, No. 4, 22th February
2020, Published by MDPI, DOI: 10.3390/s20041214, Available: https://www.mdpi.com/1424-8220/20/4/1214.
[16] Md Zahangir Alom, Paheding Sidike, Tarek M. Taha and Vijayan K. Asari, “Handwritten Bangla Character
Recognition Using the State-of-the-Art Deep Convolutional Neural Networks”, Computational Intelligence and
Neuroscience, Print ISSN: 1687-5265, Online ISSN: 1687-5273, pp. 1-13, Vol. 2018, 27th August 2018, Published by
Hindawi, DOI: 10.1155/2018/6747098, Available: https://doi.org/10.1155/2018/6747098.
[17] Dominik Scherer, Andreas Müller and Sven Behnke, “Evaluation of Pooling Operations in Convolutional
Architectures for Object Recognition”, in Lecture Notes in Computer Science (LNTCS), vol. 6354, Online ISBN: 978-
3-642-15825-4, Print ISBN: 978-3-642-15824-7, Series Print ISSN: 0302-9743, Series Online ISSN: 1611-3349, DOI:
10.1007/978-3-642-15825-4_10, pp. 92–101, 2010, Published by Springer-Verlag, Available:
https://link.springer.com/chapter/10.1007/978-3-642-15825-4_10.
[18] Jiuxiang Gu, Zhenhua Wang, Jason Kuen, Lianyang Ma, Amir Shahroudy et al., “Recent advances in convolutional
neural networks”, Pattern Recognition, pp. 354–377, Vol. 77, 1st May 2018, DOI: 10.1016/j.patcog.2017.10.013,
Available: https://www.sciencedirect.com/science/article/abs/pii/S0031320317304120.
[19] Vignesh Thakkar, Suman Tewary and Chandan Chakraborty, “Batch Normalization in Convolutional Neural
Networks — A comparative study with CIFAR-10 data”, in Proceedings of the 5th IEEE International Conference on
Emerging Applications of Information Technology 2018 (EAIT '18), 12-13 January 2018, Kolkata, India, E-ISBN:978-1-
5386-3719-7, Print on Demand(PoD) ISBN: 978-1-5386-3720-3, DOI: 10.1109/EAIT.2018.8470438, pp. 01-05,
Published by IEEE, Available: https://ieeexplore.ieee.org/abstract/document/8470438.
[20] AKM Shahariar Azad Rabby, Sadeka Haque, Md. Sanzidul Islam, Sheikh Abujar and Syed Akhter Hossain,
“Ekush: A Multipurpose and Multitype Comprehensive Database for Online Off-Line Bangla Handwritten
www.aetic.theiaer.org
AETiC 2023, Vol. 7, No. 3 77
Characters”, In Communications in Computer and Information Science: Recent Trends in Image Processing and Pattern
Recognition, Singapore: Springer Nature, 17th July 2019, Vol. 1037, ch. 14, pp 149–158, Print ISBN: 978-981-13-9186-
6, Online ISBN: 978-981-13-9187-3, DOI: 10.1007/978-981-13-9187-3_14, Available:
https://link.springer.com/chapter/10.1007/978-981-13-9187-3_14.
[21] Mithun Biswas, Rafiqul Islam, Gautam Kumar Shom, Md Shopon, Nabeel Mohammed et al., “Banglalekha-
isolated: A Multi-purpose comprehensive dataset of Handwritten Bangla Isolated Characters”, Data in Brief, ISSN:
2352-3409, pp. 103-107, Vol. 12, 29th March 2017, Published by Elsevier, DOI: 10.1016/j.dib.2017.03.035, Available:
https://doi.org/10.1016/j.dib.2017.03.035.
[22] Luis Perez and Jason Wang, “The effectiveness of data augmentation in image classification using deep learning”,
Computing Research Repository, Online ISSN: 2331-8422, Vol. 1712.04621, 13th August 2018, DOI:
10.48550/arXiv.1712.04621, Available: https://dblp.org/rec/journals/corr/abs-1712-04621.
[23] Savita Ahlawat, Amit Choudhary, Anand Nayyar, Saurabh Singh and Byungun Yoon, “Improved Handwritten
Digit Recognition Using Convolutional Neural Networks (CNN)”, Sensors, ISSN: 1424-8220, p. 3344, Vol. 20, No.
12, 12th June 2020, Published by MDPI, DOI: 10.3390/s20123344, Available: https://www.mdpi.com/1424-
8220/20/12/3344.
[24] Md. Rajibul Islam, Md. Asif Mahmod Tusher Siddique, Md Amiruzzaman, M. AbdullahAl-Wadud, Shah Murtaza
Rashid Al Masud et al., “An Efficient Technique for Recognizing Tomato Leaf Disease Based on the Most Effective
Deep CNN Hyperparameters”, Annals of Emerging Technologies in Computing (AETiC), Print ISSN: 2516-0281,
Online ISSN: 2516-029X, pp. 1–14, Vol. 7, No. 1, 1st January 2023, Published by International Association for
Educators and Researchers (IAER), DOI: 10.33166/aetic.2023.01.001, Available:
http://aetic.theiaer.org/archive/v7/v7n1/p1.html.
[25] S M Azizul Hakim and Asaduzzaman, “Handwritten Bangla Numeral and Basic Character Recognition Using
Deep Convolutional Neural Network”, in Proceedings of the IEEE International Conference on Electrical, Computer
and Communication Engineering 2019 (ECCE '19), 07-09 February 2019, Cox's Bazar, Bangladesh, E-ISBN: 978-1-
5386-9111-3, Print on Demand(PoD) ISBN: 978-1-5386-9112-0, DOI: 10.1109/ECACE.2019.8679243, pp. 01-06,
Published by IEEE, Available: https://ieeexplore.ieee.org/abstract/document/8679243.
www.aetic.theiaer.org