0% found this document useful (0 votes)
23 views7 pages

A New Benchmark On American Sign Language Recognition Using Convolutional Neural Network

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views7 pages

A New Benchmark On American Sign Language Recognition Using Convolutional Neural Network

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/340686365

A New Benchmark on American Sign Language Recognition using Convolutional


Neural Network

Conference Paper · April 2020


DOI: 10.1109/STI47673.2019.9067974

CITATIONS READS
25 6,194

6 authors, including:

Md Moklesur Rahman Md Shafiqul Islam


University of Milan The People's University of Bangladesh
10 PUBLICATIONS 127 CITATIONS 7 PUBLICATIONS 124 CITATIONS

SEE PROFILE SEE PROFILE

Md. Hafizur Rahman Roberto Sassi


University of Maine University of Milan
10 PUBLICATIONS 106 CITATIONS 175 PUBLICATIONS 3,534 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Md Moklesur Rahman on 17 April 2020.

The user has requested enhancement of the downloaded file.


2019 International Conference on Sustainable Technologies for
Industry 4.0 (STI), 24-25 December, Dhaka

A New Benchmark on American Sign Language


Recognition using Convolutional Neural Network
Md. Moklesur Rahman∗ , Md. Shafiqul Islam† , Md. Hafizur Rahman‡ , Roberto Sassi§ , Massimo W. Rivolta¶ and
Md Aktaruzzamank
∗ , † Dept. of Computer Science and Eng., The People’s University of Bangladesh, Dhaka, Bangladesh.
‡ Dept. of Electrical and Electronic Eng., Islamic University, Kushtia, Bangladesh.
§ , ¶ Dipartimento di Informatica, Università degli Studi di Milano, Via Celoria 18, 20133, Milano, Italy.
k Dept. of Computer Science and Eng., Islamic University, Kushtia, Bangladesh.

{∗ moklesur.ai, † msislam.iu, ‡ hafizur.iueee}@gmail.com, {§ roberto.sassi, ¶ massimo.rivolta}@unimi.it and


k aktaruzzaman@iu.ac.bd

Abstract—The listening or hearing impaired (deaf/dumb) peo- mutually intelligible and universal. For example, the ASL and
ple use a set of signs, called sign language instead of speech for BSL are different, even though both of them use the same
communication among them. However, it is very challenging for verbal language. The normal hearing and listening people
non-sign language speakers to communicate with this community
using signs. It is very necessary to develop an application to find it extremely difficult to understand the sign language
recognize gestures or actions of sign languages to make easy even of the nation itself. Hence trained SL interpreters are
communication between the normal and the deaf community. needed during medical and legal appointments, educational
The American Sign Language (ASL) is one of the mostly used and training sessions, etc. The automatic recognition of an
sign languages in the World, and considering its importance, SL and its translation into a natural language can establish
there are already existing methods for recognition of ASL with
limited accuracy. The objective of this study is to propose a a proper communication interface between the hearing or
novel model to enhance the accuracy of the existing methods for listening impaired and normal people.
ASL recognition. The study has been performed on the alphabet ASL also predominants as a second language to the deaf
and numerals of four publicly available ASL datasets. After communities in the United States and Canada [2]. According
preprocessing, the images of the alphabet and numerals were fed to the National Association of Deaf (NAD) [3] in the United
to a newly proposed convolutional neural network (CNN) model,
and the performance of this model was evaluated to recognize States of America, ASL is accepted by many high schools,
the numerals and alphabet of these datasets. The proposed CNN colleges, and universities in the fulfillment of modern and for-
model significantly (9%) improves the recognition accuracy of eign language academic degree requirements. Besides North
ASL reported by some existing prominent methods. America, ASL is also used in many countries across the World,
Index Terms—Hand gesture, American Sign Language, Con- including parts of Southeast Asia and much of West Africa.
volution neural network, Recognition, ASL.
There are some works [4]–[6] already reported in the litera-
ture for automatic recognition of ASL. Some of these methods
I. I NTRODUCTION
have been studied on a sample dataset of few samples, and
According to the World Health Organization (WHO) [1], some using the traditional shallow neural network approach
the number of people having hearing or listening disability for classification. Shallow neural networks require manual
increased from 278 million in 2005 to 466 million in early identification of features and relevant features selection. The
2018. It is assumed that this number will be increased to use of deep learning (DL) techniques for machine learn-
400 million by 2050 [1]. This deaf community uses a set of ing problems have significantly improved the performance
signs to express their language (called sign language), which is of traditional shallow neural networks, especially for image
different for different nations. In other words, a sign language recognition and computer vision problems. DL is a subfield
(SL) is a nonverbal communication language, which utilizes of machine learning in artificial intelligence (AI). It is a set
visual sign patterns made with the hands or any parts of the of algorithms, models with high-level abstractions through
body, used primarily by the people who have the disability architectures which composed of multiple nonlinear transfor-
of hearing and/or listening. Sign languages (SLs) are full- mations. DL algorithms utilize a huge amount of data to extract
fledged natural languages with their own lexicon and grammar. features automatically, aim in emulating the human brain’s
Different SLs such as American Sign Language (ASL), Aus- ability to learn, analyze, observe, and make an inference,
tralian Sign Language, British Sign Language (BSL), Danish especially for extremely difficult problems. DL architectures
Sign Language, French Sign Language, and many others create relationships beyond immediate neighbors in the data
have been developed for deaf communities. Although there and generate learning patterns, extract representations directly
are some striking similarities among the SLs, they are not from data without human intervention. There are different deep

978-1-7281-6099-3/19/$31.00 © 2019 IEEE


learning architectures such as deep belief networks [7], stacked Many researchers [4], [5], [11]–[13] have proposed tech-
auto encoder [8], convolutional neural networks [9], and so on. niques to recognize sign languages since then. Every research
Among them, the CNN have utilized multi-layered artificial has its own limitations and is still unable to be used commer-
neural network (ANN) to provide state-of-the-art accuracy in cially. The brief description of some prominent works on ASL
the field of computer vision, medical image analysis, speech recognition is given here:
recognition, bioinformatics and so on. Vivek Bheda et al. [4] presented a method for the classifica-
A convolutional neural network (CNN), one of the most tion of alphanumeric characters of the ASL. Two datasets (one
popular deep learning algorithms, comprising of convolutional self-generated and MU HandImages ASL [6] were used in
layers and then following by one or more fully connected their study. They reported an average recognition rate of 67%
layers, was proposed by Vallian et al. [10]. From a computer and 82.5%, respectively for the alphabet and the digits of ASL.
science perspective, a CNN is a set of digital filters whose W. Huang et al. [11] propose 3D Hopfield neural network
weights are estimated during the learning phase. Naturally, for hand tracking, feature extraction, and gesture recognition.
there are more complex processes occurring in human brain. Their model was tested on a set of 15 different hand gestures
Following this analogy, each convolutional layer extracts fea- and reported an average recognition rate of 91%. T. Starner et
tures from training data. A CNN convolves learned features al. [14] presented real-time systems for recognizing sentence-
with input data, utilizes convolutional layers, and turning this level continuous ASL. They described two extensible systems
architecture into well suited form to process data. CNNs learn for recognition of words in which the first system achieve 92%
to detect different features of data using multiple hidden layers. accuracy when images are taken from a desk mounted camera
Every hidden layer increases the difficulty of the learned data and the second system obtains 98% recognition rate from cap
features. The problems with the existing methods for ASL mounted camera by the user. In both tests, they have used a
recognition are that they have reported their study on a specific 40-word lexicon.
dataset (most of the cases), rare comparison of the methods Suk et al. [12] propose a dynamic Bayesian network (DBN)
on a common dataset. So, we cannot compare in which degree model for classifying hand sign from video stream. Skin
a method is better/worse than another or which dataset does extraction, modeling, and motion tracking are observe by the
possess proper variation in samples for sufficient training of a DBN model. The model is evaluated on the recognition of ASL
classification model. by 10 isolated gestures and performed over 99% recognition
To address these problems, we have considered four publicly rate. In this work, all gestures are noticeably different from
available ASL datasets on which a number of good works each other. However, the motion tracking features are related
have been reported. We have studied the performance of the to classifying the dynamic letters of ASL (J and Z).
proposed model on each dataset, when trained using the train-
ing set and tested using the test set (if separate train and test
sets are available, or using the 10 fold cross-validation). The
performance of the proposed method has also been justified
using cross dataset i.e., trained using one dataset and tested
on a different dataset. In addition, the performance has been
compared with previous methods on the same dataset.
The rest of this paper has been organized as follows: A
brief description of related works reported in the literature
has been given in section II. A brief description of the dataset
considered in this study has been provided in section III. The
proposed method has been described in section IV. The results
of this study have been presented in section VI, and finally
section VII summarizes the study and main findings.

II. R ELATED W ORKS


The researchers are paying more and more attention to
recognition of sign language due to its numerous potential
applications in many areas such as deaf people communication
systems, human-machine interaction, machine control, etc.
Research on SL recognition can be divided into two broad
categories on the basis of the type of signs: i. static signs based
recognition and ii. dynamic signs based recognition. Majority
of the studies till conducted are recognition of static signs. Fig. 1: Some samples of ASL Alphabet and Sign Language
The research on SL recognition has been started by the end Digits dataset.
of 1990.
III. DATASET A. Data Pre-processing

In this paper, four separate datasets of ASL have been In this study, the raw images are transformed into grayscale
utilized to analyze the performance of the proposed method. images. The grays levels of input images are normalized by
The Massey University Gesture dataset [6] contains standard the maximum value of the gray level range. The use of low-
ASL hand gestures which consist of 2425 images in (PNG) resolution images provides faster training without too much
format from 5 individuals and the dataset is called MU impact on the recognition rate. The images are resized to
HandImages ASL. The sign language digit dataset [15] was 64×64 pixels.
collected from 218 Turkey Ankara Ayranci Anadolu high
B. CNN Model Description
school students. There are 10 samples of each digit collected
from each subject. The third dataset that we considered in The architectural design of a CNN contributes to the optimal
this study is the ASL finger spelling dataset [5] collected by performance by a proper selection of convolution layers and
the Center for Vision, Speech and Signal Processing group the number of neurons. There are no universally accepted
at the University of Surrey, UK. The samples of this dataset standard guidelines to select the number of neurons and
was divided into color images and depth images. In our work, convolution layers. Here, we have proposed an architecture
we consider only color images, consist of 24 static signs of a CNN (that we called SLRNet-8) which maximizes the
(excluding the letters J and Z) of the ASL alphabet, which recognition accuracy. Our proposed SLRNet-8 consists of six
were acquired from 5 individuals in different sessions with convolution layers, three pooling layers and a fully connected
similar lighting and backgrounds. The dataset contains over layer besides the input-output layers. The major steps of the
65,000 images of the ASL alphabet. The fourth and the last proposed ASL recognition method have been described in the
dataset that we considered is the ASL Alphabet dataset [16] Fig. 3.
consists of 87,000 samples of 29 classes (26 for the letters
A-Z and three special characters: delete, nothing, space). A
few samples from this dataset are shown in Fig. 1.

IV. P ROPOSED M ETHOD

The methodological steps of the ASL recognition system


have been described in Fig. 2.

Fig. 3: Architecture of the proposed SLRNet-8 CNN. In this


diagram, Conv and BN refer for convolutional layer and batch
normalization, respectively.

Input Layer
The pre-processed images are directly applied to the net-
work through its input layer. There are 4096 nodes in the input
layer, each node corresponds to every pixel of the image at
resolution 64×64.
Convolution Layer
Convolution is the first layer to extract features from an
input data and serves as a basic building block of the CNN.
In the convolution layer, the kernels extract the salient features
from the input data through forward and backward propaga-
Fig. 2: Methodological steps of the proposed ASL recognition tion. In our study, this operation is performed by shifting
system the filters of dimension 3×3 and 5×5 over the input data
matrix. At every shifting, it executes element-wise matrix
multiplications, and then aggregates the results into a feature Dropout is a regularization technique that sets input ele-
map. ments to zero with a given probability in a random manner.
The number of kernels used in the convolutional layers The over-fitting problem occurs when a model’s training
may affect the performance of a CNN model. There are no accuracy is too high in contrast to testing accuracy. In CNN
standard guidelines for opting the number of kernels in a models, a dropout layer followed by the FC layer allows to
convolution layer. In this work, we have performed experiment prevent the over-fitting problem and enhances the performance
using different number of kernels from 32 to 512 at different [20], [21] by setting activation to zero in a random manner
step size, and finally, the combination which maximize the during the training process. The probability of dropout used
accuracy was selected. A batch normalization (BN) [17] layer in this study was 0.5.
which is responsible for accelerating the training process and Output Layer
reducing the internal covariate shift is proceeded by some The output of the classification model i.e., the prediction
convolution layer. of a class with a certain probability is obtained at this layer.
Activation Function Here it is also mentioned that the target class should have
In a CNN architecture, activation functions decide which the highest probability. We set out the number of neurons
node should be fired at a time. We have applied a ReLU [18] in the output layer as there are categories. In the case of a
activation function which substitutes all negative values to 0 multiclass classification problem, the Softmax function returns
and remains identical with the positive values. The selection the probabilities of each class, where the target class will have
of ReLU was inspired by the learning time of the model. In the highest probability. The mathematical expression for the
training, ReLUs are tended to be several times faster [19] than Softmax function is given by:
their equivalents (softplus, tanh, sigmoid), and it can diminish exj
the problem of gradient vanishing. The ReLU function is σ(X)i = PK ; for i = 1, 2, 3, ...K. (3)
k=1 exk
expressed by:
ReLU(y) = max(0, y), (1) where, xi are the inputs from the previous FC layer used
to each Softmax layer node and K is the number of classes.
where y refers to the input to a neuron.
Pooling Layer V. T RAINING D ETAILS
Pooling is a significant concept used in deep learning A. Data Augmentation
process. It makes the training of a CNN faster and reduces Data augmentation means increasing the number of samples
the memory size of the network by reducing the linkages as well as adding the variations in the samples for better
between the convolutional layers. Here, we have used max- training. The traditional data augmentation techniques [22],
pooling operation for this purposes. Max-pooling is the usage [23] include rotation, scaling, shifting and flipping. To keep
of a sliding window across an input space, where the largest smaller the computational burden, here, data augmentation are
value within that window is the output. We have opted 2×2 performed by randomly changing the angles between −10° to
window size for the max-pooling operation. To remove the 10°, zooming by 10%, and shifting by 10% on height and
overlapping problem, the size of stride has been settled at 2. width. These parameters were chosen by trial and error basis
The resultant dimension of the max-pooling operation can be which provided optimum accuracy.
calculated by the following equation:
  B. CNN Training
Nin − F
Nout = f loor + 1, (2) There was no distinct test or train sets for any of the
S
datasets considered in this work. In this study, every dataset
where Nin , F and S refer to the size of the input image, was randomly partitioned into K=10 folds, and K − 1 of them
kernel, and stride, respectively. were applied for training the model, and the remaining one
Global Average Pooling Layer was applied for testing the performance of it. This process of
The global average pooling (GAP) layer is very similar to training was repeated 10 times, and finally, the average of all
the max-pooling layer. The only difference is that the entire accuracies was reported as the accuracy of the model. Here,
area is replaced by the average value instead of maximum we choose cross-entropy cost function [24], and a gradient
value. The GAP extremely reduces dimension, where a ten- descent-based Adam optimizer [25] with a learning rate 0.001
sor of size height×width×depth is drastically decreased to was selected. Our SLRNet-8 model was trained for up to 200
1×1×depth. epochs with 64 steps per epoch, and a batch size of 128. If
Fully Connected Layer the validation accuracy did not improve in six consecutive
The features map generated by the GAP layer is fed into the epochs, the learning rate of the model was updated to 75%
fully connected layer (FC) layer. In a FC layer, the neurons of its previous value. We allowed early stopping, and training
in one layer is connected to the neurons in another layer. The was halted if the validation loss did not improve for thirty
FC layer also behaves like a convolution layer with filter of consecutive epochs. The very small real numbers that come
size 1×1. from normal distribution were initially assigned to the network
Dropout Layer weights with a weight decay rate of 1 × 10−6 . The model
TABLE I: PERFORMANCE OF THE PROPOSED
MODEL
Dataset Category Accuracy(%)
Digit 100
MU HandImages ASL
Alphabet 99.95
Sign Language Digits Digit 99.90
ASL Alphabet Alphabet 100
Finger Spelling Alphabet 99.99

was trained on a desktop computer under 64 bit Windows 10


environment with NVIDIA Titan Xp PRO 12 bit GPU, 3.98
CPU, 8 GB RAM, and 1 TB HDD. The training of the model
was completed within 150 epochs.
The performance of the proposed model was evaluated for
recognition of digits and alphabet of each dataset separately Fig. 4: Performance curve of the proposed model for recogni-
using 10 fold cross-validation. Besides this, its performance tion of alphabet of the MU HandImages ASL dataset. From the
was evaluated also for the case of mixing digits and alphabet figure, it is seen that both training and validation performance
of each dataset. curve converges to 100% at epoch 50. Hence, the model
converges very rapidly.
VI. R ESULTS
The training and validation accuracy of the model on MU TABLE II: PERFORMANCE OF THE PROPOSED
HandImages ASL digit dataset has been depicted in Fig. 4. MODEL WHEN DIGIT AND ALPHABET DATASETS
The average accuracy of the proposed model for recognition ARE COMBINED
of ASL of every dataset has been presented in Table I. It is Dataset Category Accuracy(%)
observed that the model recognized both digit and alphabet MU HandImages ASL Alhanumeric 99.92
signs of every dataset with about 100% accuracy. The lowest Sign Language Digits
Alphanumeric 99.90
and ASL Alphabet
accuracy (99.90%) has been reported for the digits of sign Sign Language Digits
language digits dataset. On the other hand, the digits of MU Alphanumeric 99.90
and Finger Spelling
HandImages ASL dataset has been recognized with 100%
accuracy. This very good results may be due to the application
of sufficient data augmentation that adds more variation in of test samples used to evaluate the model’s performance. To
the training samples to make the model capturing all possible compare the performance of the proposed model with some
changes. existing methods, we have chosen MU HandImages ASL and
The performance of the proposed model has been evaluated Finger Spelling datsets on which previous works have been
also in the case when the digits and alphabet have been mixed done by [4] and [5] using CNN. The comparison between the
together, and its performance has been represented in Table proposed SLRNet-8 model and the models proposed by Bheda
II. Since all datasets except MU HandImages ASL contain et al. [4] and Brandon et al. [5] has been mentioned in Table
either alphabet or numeral signs of ASL and sign language III. It is observed that out model has significantly (≥ 9%)
digits dataset contain only the digit signs of the ASL, we have improved the recognition accuracy reported by the previous
combined sign language digits with the ASL digits and finger models [4], [5] of CNN on the same dataset.
spelling datasets separately to evaluate the performance on
alphanumeric datasets. It is observed from Table I and Table VII. C ONCLUSION
II that the recognition accuracy of the model is slightly reduced American Sign Language is one of the most popular Sign
for alphanumeric recognition than the individual recognition Languages in the World. In this study, we proposed a con-
of digits or alphabet. The recognition rate on MU HandImages volutional neural network model (SLRNet-8) for automatic
ASL is reduced from 100% (for digits) to 99.92%. However, recognition of ASL, and its performance has been evaluated on
it is still very high, approximately 100% accuracy. Similarly,
the recognition rate of ASL (alphabet) of the finger spelling
dataset reduced from 99.99% to the average recognition rate of TABLE III: COMPARISON WITH PREVIOUS RE-
99.90%. Thus, we can conclude that the model performance is SEARCH
not affected significantly by mixing the digits with the alphabet Model Dataset Accuracy(%)
datasets. CNN [4] MU HandImages ASL 91.70
It is very difficult and not rational to strictly compare some Self-generated 89.75
MU HandImages ASL and
methods when they are not evaluated on the same dataset, CNN [5] 91.63
Finger Spelling
becuase the performance of a recognition method may vary CNN (Proposed)
MU HandImages ASL 99.92
due to the dataset used for training and also on the quality Finger Spelling 99.99
four ASL datasets of digits and alphabet. The performance of [20] S. Park and N. Kwak, “Analysis on the dropout effect in convolutional
the proposed model for ASL has been compared with some neural networks,” in Asian Conference on Computer Vision. Springer,
2016, pp. 189–204.
prominent works already reported on the same dataset. The [21] B. Ko, H.-G. Kim, K.-J. Oh, and H.-J. Choi, “Controlled dropout: A
proposed model has significantly improved the recognition different approach to using dropout on deep neural network,” in Big Data
accuracy of the ASL of some prominent works already exists and Smart Computing (BigComp), 2017 IEEE International Conference
on. IEEE, 2017, pp. 358–362.
in the literature. The model predicts every sign with 100% [22] X. Cui, V. Goel, and B. Kingsbury, “Data augmentation for deep
accuracy. In this study, we have considered only isolated digit convolutional neural network acoustic modeling,” in Acoustics, Speech
or letters of ASL from static images. In the future, the model and Signal Processing (ICASSP), 2015 IEEE International Conference
on. IEEE, 2015, pp. 4545–4549.
can be applied for the recognition of sentence-level continuous [23] M. M. Rahman, M. S. Islam, R. Sassi, and M. Aktaruzzaman,
words of ASL or for recognition of ASL from the video. “Convolutional neural networks performance comparison for
handwritten bengali numerals recognition,” SN Applied Sciences,
ACKNOWLEDGMENT vol. 1, no. 12, p. 1660, Nov 2019. [Online]. Available:
https://doi.org/10.1007/s42452-019-1682-y
We are thankful to the NVIDIA corporation for supporting [24] S. Mannor, D. Peleg, and R. Rubinstein, “The cross entropy method for
classification,” in Proceedings of the 22nd international conference on
the study by providing a GPU card. Machine learning. ACM, 2005, pp. 561–568.
[25] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
R EFERENCES arXiv:1412.6980, 2014.

[1] World Health Organization, WHO. [Online]. Avail-


able: https://www.who.int/news-room/fact-sheets/detail/deafness-and-
hearing-loss
[2] J. Cummins, “Bilingualism and second language learning,” Annual
Review of Applied Linguistics, vol. 13, pp. 50–70, 1992.
[3] National Association of Deaf. [Online]. Available: https://www.nad.org/
[4] V. Bheda and D. Radpour, “Using deep convolutional networks for
gesture recognition in american sign language,” arXiv:1710.06836, 2017.
[5] B. Garcia and S. A. Viesca, “Real-time american sign language recog-
nition with convolutional neural networks,” Convolutional Neural Net-
works for Visual Recognition, vol. 2, 2016.
[6] A. Barczak, N. Reyes, M. Abastillas, A. Piccio, and T. Susnjak, “A new
2d static hand gesture colour image dataset for asl gestures,” 2011.
[7] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for
deep belief nets,” Neural computation, vol. 18, no. 7, pp. 1527–1554,
2006.
[8] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extract-
ing and composing robust features with denoising autoencoders,” in
Proceedings of the 25th international conference on Machine learning.
ACM, 2008, pp. 1096–1103.
[9] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning
applied to document recognition,” Proceedings of the IEEE, 1998.
[10] R. Vaillant, C. Monrocq, and Y. Le Cun, “Original approach for the
localisation of objects in images,” IEE Proceedings-Vision, Image and
Signal Processing, vol. 141, no. 4, pp. 245–250, 1994.
[11] C.-L. Huang and W.-Y. Huang, “Sign language recognition using model-
based tracking and a 3d hopfield neural network,” Machine vision and
applications, vol. 10, no. 5-6, pp. 292–307, 1998.
[12] H.-I. Suk, B.-K. Sin, and S.-W. Lee, “Hand gesture recognition based
on dynamic bayesian network framework,” Pattern recognition, vol. 43,
no. 9, pp. 3059–3072, 2010.
[13] M. S. Islalm, M. M. Rahman, M. H. Rahman, M. Arifuzzaman,
R. Sassi, and M. Aktaruzzaman, “Recognition bangla sign language
using convolutional neural network,” in 2019 International Conference
on Innovation and Intelligence for Informatics, Computing, and Tech-
nologies (3ICT), Sep. 2019, pp. 1–6.
[14] T. Starner, J. Weaver, and A. Pentland, “Real-time american sign
language recognition using desk and wearable computer based video,”
IEEE Transactions on pattern analysis and machine intelligence, vol. 20,
no. 12, pp. 1371–1375, 1998.
[15] F. Beşer, M. A. Kizrak, B. Bolat, and T. Yildirim, “Recognition of sign
language using capsule networks,” in 2018 26th Signal Processing and
Communications Applications Conference (SIU). IEEE, 2018, pp. 1–4.
[16] A. Deza and D. Hasan, “Mie324 final report: Sign language recognition.”
[17] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep
network training by reducing internal covariate shift,” arXiv:1502.03167,
2015.
[18] A. F. Agarap, “Deep learning using rectified linear units (relu),”
arXiv:1803.08375, 2018.
[19] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with deep convolutional neural networks,” in Advances in neural infor-
mation processing systems, 2012, pp. 1097–1105.

View publication stats

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy