0% found this document useful (0 votes)

104 views12 pages

Lip Reading Using Deep Learning in Turkish Language

Computer vision is one of the most important areas of artificial intelligence and lip reading is one of the most important areas of computer vision. Lip-reading, which is more important in noisy environments or where there is no sound flow, is one of the working areas that can help the hearing-impaired people. There is no dataset in Turkish for lip reading, which there are different datasets at alphabet, word, and sentence level in different languages. The dataset of this study was created by the author and video data were collected from 72 people for 71 words. Audio streams were removed from the collected videos and a dataset was created using only images. Due to the small size of the dataset, the data was replicated with the Camtasia application. After the model of the research was designed and trained, the model was tested on adjectives, nouns, and verbs dataset and success rates of 71.8%, 71.88%, and 79.69% were obtained, respectively.

Uploaded by

IAES IJAI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

104 views12 pages

Lip Reading Using Deep Learning in Turkish Language

Uploaded by

IAES IJAI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

IAES International Journal of Artificial Intelligence (IJ-AI)

Vol. 13, No. 3, September 2024, pp. 3250~3261

ISSN: 2252-8938, DOI: 10.11591/ijai.v13.i3.pp3250-3261  3250

Lip reading using deep learning in Turkish language

Hadi Pourmousa1,2, Üstün Özen1

1
Department of Management Information Systems, Faculty of Economics and Administrative Science, Atatürk University,
Erzurum, Türkiye
2
Department of Computer Engineering, Faculty of Engineering, Atatürk University, Erzurum, Türkiye

Article Info ABSTRACT

Article history: Computer vision is one of the most important areas of artificial intelligence
and lip reading is one of the most important areas of computer vision.
Received Oct 21, 2023 Lip-reading, which is more important in noisy environments or where there is
Revised Jan 27, 2024 no sound flow, is one of the working areas that can help the hearing-impaired
Accepted Feb 10, 2024 people. There is no dataset in Turkish for lip reading, which there are different
datasets at alphabet, word, and sentence level in different languages. The
dataset of this study was created by the author and video data were collected
Keywords: from 72 people for 71 words. Audio streams were removed from the collected
videos and a dataset was created using only images. Due to the small size of
Convolutional neural networks the dataset, the data was replicated with the Camtasia application. After the
Dataset model of the research was designed and trained, the model was tested on
Deep learning adjectives, nouns, and verbs dataset and success rates of 71.8%, 71.88%, and
Lip-reading 79.69% were obtained, respectively.
Turkish language
This is an open access article under the CC BY-SA license.

Corresponding Author:
Hadi Pourmousa
Department of Management Information Systems, Faculty of Economics and Administrative Science
Atatürk University
Erzurum, Türkiye
Email: hadi.pourmousa14@ogr.atauni.edu.tr

1. INTRODUCTION
The lip-reading process, which tries to understand what the speaker is saying from lip movements [1],
is one of the most important field of human action recognition [2]. Lip reading which is performed using only
visual information, is an impressive skill in noisy environments or when there is no audio streaming [3]-[5].
Because, the main purpose in lip reading studies is to detect and understand expressions that are spoken only
with images without audio flow [6]-[8].
In recent years, lip-reading studies have started to increase with the widespread of deep learning [6].
The lip-reading process is applied at the letter, number, syllable, word, and sentence level [6], [9]. The lip
reading is process is applied at the level of letter, number, syllable, word, and sentence [6], but in most studies
it has been applied at the level of alphabet, word, and sentence [9].
The most important step in lip-reading studies is to acquire the mouth images of the speaker with
some predefined point coordinates. Then, the movements of these points are analyzed by some classification
methods, such as k-nearest neighbor classifier, hidden Markov models, and artificial neural networks. to
understand what the speaker is saying [10]. Automatic lip reading is used in different fields. However, there
are many challenges to be overcome in this area in Turkish language.
Guttural sounds (K, G, Ğ). Having multiple dialects (kardeş (brother) is pronounced as gardaş).
Making some words and verbs meaningless by shortening (gidiyorum (I am going) verb changes to gidiyom).
Absence of lip movement in some words (iyi (good), 40, and 2). Different datasets in different languages were
created for the lip-reading area. Some of the important datasets created are presented in Table 1.

Journal homepage: http://ijai.iaescore.com

Int J Artif Intell ISSN: 2252-8938  3251

Table 1. Some of the important lip-reading datasets

Dataset Language Year Number of dataset Type Number of class
TULIPS1 [11] English 1995 12 Number 4
M2VTS [12] English 1997 37 Number 10
AVLetters [13] English 1998 10 Alphabet 26
WAPUSK20 [14] English 1999 20 Sentence 52
IBMViaVoice [15] English 2000 290 Sentence 10500
AV@CAR [16] 208 Alphabet 26
English 2004 20 Number 10
20 Sentence 250
CUAVE [17] English 2004 36 Number 10
UWB-05-HSAVC [18] Czech 2005 100 Sentence 200
AVLetters2 [19] English 2008 5 Alphabet 29
CENSREC-1-AV [20] Japanese 2010 42 Number 10
NDUTAVSC [21] Number 10
German 2010 66 Word 6907
Sentence 6907
OULUVS2 [22] Number 10
English 2010 53 Expression 10
Sentence 540
LTS5 [23] French 2012 20 Number 10
MIRACL-VC [24] Word 10
English 2012 15
Expression 10
HAVRUS [25] Russian 2016 20 Sentence 1530
AV Digits [26] 53 Number 10
English 2018
39 Expression 10
LRW-1000 [27] English 2018 2000’den fazla Word 1000
AVSD [28] Arabic 2019 22 Expression 10
NSTDB [4] Chinese 2020 349 Word

2. RELATED WORK
In recent years, lip-reading studies have been carried out using deep learning techniques and datasets
in different languages have been created and studies are increasing. Most of the studies have been carried out
in English and different datasets have been created in English, unlike other languages, until today. However,
today, datasets for other languages have begun to be created, even if they are small. In Turkish, a dataset
consisting of 70 people was created for 20 numbers by Pourmousa and Özen [29], and it was trained and tested
with convolutional neural network. In the study, 56.25% successes were achieved and it was reported that the
absence of lip movements in some Turkish numbers and the similarity of lip movements in some numbers
significantly affect the success.
Atila and Sabaz [6] created two new Turkish datasets, one with 111 words and the other with 113
sentences. In this study, pre-trained models and the bidirectional long short-term memory (Bi-LSTM) method
were used to perform the classification. GoogleNet, ResNet-101, ResNet-50, ResNet-18, Nasnet-Large,
Xception, DarkNet53, DarkNet19, AlexNet, Squeezenet, and DenseNet201 were used as a pre-trained model
and success of models were investigated. As a result of the study, when Resnet-18 and Bi-LSTM were used
together, the highest success was achieved with 84.5% at word level and 88.55% at sentence level.
Nambeesan et al. [30] developed a lip-reading system on the MIRACL-VC1 dataset by using the long
short-term memory method. In this study, lip regions were extracted from all frames of a word video and
recorded sequentially in a single photograph. As a result of the study, 85.5% accuracy was obtained.
Sarhan et al. [31] developed a hybrid lip-reading model which is based deep convolutional neural network for
lip reading from videos. The proposed model consists of preprocessing, encoder, and decoder stages to perform
lip-reading. As a result of the study, 92% success was achieved in the unseen speaker and 99% in the overlapped
speakers.
Ma et al. [32] developed a lip-reading system using convolutional neural network on lip reading in
the wild (LRW) and LRW-1000 dataset. In this study, 88.5% accuracy was obtained on the LRW dataset and
46.6% on the LRW-1000 dataset. Elrefaei et al. [28] proposed a lip-reading method for Arabic language. In
this study, 22 people were participated and recorded video by their smartphones for 10 words. As a result of
the study, 79% accuracy was obtained. Noda et al. [5] used convolutional neural network method as feature
extraction mechanism for visual speech recognition in their study titled lip reading using convolutional neural
network. In this study, the convolutional neural network was trained using images of the speaker's mouth
region. The proposed system was evaluated on an audio-visual speech dataset containing 300 Japanese words
with six different speakers, and 58% success was achieved as a result of the study.

Lip reading using deep learning in Turkish language (Hadi Pourmousa)

3252  ISSN: 2252-8938

3. DATASET
In this section, the details of the dataset used are explained. First, the characteristics of the words used
in the data set and how they were collected are explained, then how the dataset was created and the
pre-processing done on the data set are explained. Finally, an example of the dataset that should be given as
input to the system is presented.

3.1. Dataset properties

The dataset was created by the author. First of all, the most used adjectives, nouns and verbs in Turkish
were found and some of them were chosen randomly [33], [34]. In the selection of these words, attention was
paid to the selection of words with lip movements, and words with no lip movements such as "iyi" (good) were
not chosen. In this framework, words are presented in Table 2. In this context, 71 words, including 19
adjectives, 33 nouns and 19 verbs, were selected to create the dataset.

Table 2. Turkish words dataset

Dataset Word
Adjectives büyük (big), küçük (small), başka (another), önemli (important), doğru (correct), güzel (beautiful), yüksek (high), kolay
(19 Words) (easy), mümkün (possible), sürekli (continually), yavaş (slow), ekonomik (economic), beyaz (white), siyah (black),
sosyal (social), uluslararası (international), tamam (ok), temiz (clean), lazım (required)
Nouns (33 parça (piece), hizmet (service), bilgisayar (computer), televizyon (television), masa (table), merkez (centre), ortam
Words) (environment), araba (car), sistem (system), alışveriş (shopping), abla (elder sister), fırsat (opportunity), piyasa
(market), teknoloji (technology), fiyat (price), demokrasi (democracy), yumurta (egg), insan (human), zaman (time),
dünya (world), hayvan (animal), baş (head), ülke (country), para (money), baba (father), devlet (government), bölüm
(section), banka (bank), toplum (society), program (program), çalışma (work), cumhuriyet (republic), sigorta
(insurance)
Verbs (19 bakmak (look), yapmak (do), beklemek (wait), bilinmek (be known), başlatmak (start), belirlemek (determine),
Words) bozmak (disrupt), kapatmak (close), kaybetmek (lose), vurmak (hit), toplamak (collect), fark etmek (notice),
paylaşmak (share), satın almak (buy), düşünmek (think), uğraşmak (strive), evlenmek (marry), öldürmek (kill),
geliştirmek (develop)

3.2. Dataset pre-processing

First, each word was extracted with the Camtasia application, and then new videos were created by
turning them at different angles due to the small dataset. As seen in the Figure 1, five videos were created from
each video by rotating it at three different angles, and thus, the dataset was reproduced. The video with the
most frames was detected and the frame count (48) was recorded for further processing. The lip area was cut
with the mediapipe library in each frame of the videos. There are 468 points in the face region in the Mediapipe
library as shown in the Figure 2.
To create the final entry in the data set, the frames of the videos were first calculated and the highest
frame was obtained. The maximum number of frames was obtained as 48 and the closest Square number was
49, so the designed photograph consisted of 7 rows and 7 columns. Then the dimensions of the lip areas were
changed to 64×64 and then saved in a file as in Figure 3. Frames obtained from each video and recorded in a
file were arranged sequentially on a photo, and a single photo was obtained for each video, as seen in
Figure 4. Since each frame consists of 64×64 dimensions, the width and length of the single photograph
designed consists of 7×64 length. Thus, the size of the photo was obtained as 512×512 and the frames were
arranged in this photo.

Figure 1. Five videos of word with rotation at different angles

Int J Artif Intell, Vol. 13, No. 3, September 2024: 3250-3261

Int J Artif Intell ISSN: 2252-8938  3253

Figure 2. 468 points used for face in mediapipe library

Figure 3. Recording of lip movement as frames Figure 4. Sequential recording of lip frames in one image

The dataset prepared as mentioned above is divided into training and validation datasets. The test
dataset is completely different and consists of data not found in the training and validation dataset. The number
of data in the each class was not equal due to the fact that some words were said incorrectly or there was an
error in the video while saying it. Therefore, the training dataset contains at least 295 data for each class, 25
for validation and 20 for testing.

4. CONVOLUTIONAL NEURAL NETWORKS

Convolutional neural network is a type of feed forward artificial neural network proposed by
LeCun et al. [35] to extract features from data that has a grid pattern such as images and videos with
convolutional structures [36], [37]. Convolutional neural network is a mathematical structure that consists of
three types of layers: convolution, pooling, and fully connected layers. While the convolution and pooling
layers are used for feature extraction from the input images, the fully connected layers use the extracted features
for classification, which are the result stage [37], [38]. Multiple convolution layers and pooling are used to
extract high-level features, and the output of each layer depends on the next layer [39]. Convolutional neural
networks can use for any form of data as input, such as images [40], video [41], natural language [42], audio
[43], and speech [44]. The visual geometry group (VGG) model, which is an important example of
convolutional neural network architecture and proposed by Simonyan and Zisserman [45], is shown in the
Figure 5.
As Figure 5, there are different layers in the network, and each layer can have several convolutional
layers. A pooling layer can be applied after one or more convolution layers. Finally, there are one or more fully
connected layers. The purpose of the convolution layer is to extract and learn the sensitive and high-level
features of the input data (image, video or audio) using the local neurons in the previous layer [46], [47]. The
convolution layer creates feature maps by applying kernels of different sizes and weights to the input queue. A
completed convolution layer consists of multiple feature maps with different weight vectors so that multiple
Lip reading using deep learning in Turkish language (Hadi Pourmousa)
3254  ISSN: 2252-8938

features can be obtained from each location [35]. The pooling layer, which is usually added after a convolution
layer, is an important step in convolution-based systems that reduce the dimensionality of feature maps [48].
The two main purposes of this layer are, firstly, to reduce the number of parameters or weights, thereby
reducing the computational cost, and secondly, to control overfitting by reduce the spatial size of the data [49],
[50]. Generally, two types of pooling methods are used, maximum pooling and average pooling (Figure 6).

Figure 5. An important example of convolutional neural network architecture [45]

Figure 6. Pooling methods [51]

5. RESEARCH METHOD
In this study, convolutional neural networks with high success in image processing and classification
were used. Because all images are divided into frames and recorded on a single image. Since there are not
many features in the images, the images are given to the system in grayscale to increase the speed of the study.
As seen in the model presented in the Figure 7, firstly the dataset was changed to 224×224×1 and sizes of all
the images were reduced and changed to grayscale images to increase the speed of the system.

Figure 7. Proposed convolutional neural network model

Int J Artif Intell, Vol. 13, No. 3, September 2024: 3250-3261

Int J Artif Intell ISSN: 2252-8938  3255

6. RESULTS AND DISCUSSION

The model was run 3 times as there were different results. The model presented in Figure 7 are trained
on the adjectives, nouns, and verbs dataset. For training, the model was run 50 epoch and the adam function
was used as the optimizer. The training and validation accuracy and the training and validation loss for the
adjectives dataset are shown in Figure 8. Training accuracy was 82.01% and validation accuracy was 81.26%.
The training and validation datasets are likely to be similar, but the test dataset consists of completely
different data and is given to the system only while it is being tested. The success rate of the adjectives dataset
was 75% on the test set (Figure 9). Elrefaei et al. [28] achieved 70.09% success on word dataset in Arabic.
Garg et al. [7] achieved 56% success at word level on MIRACL-VC1 dataset in English. Chen et al. [4]
achieved 73.19% success in Chinese with LipNet.

Figure 8. Training and validation accuracy and training and validation loss for adjectives dataset

Figure 9. Test accuracy for adjectives dataset

The large dataset in deep learning plays an important role in making the system more successful. The
dataset of this study is very small and the data were mostly obtained by replacing other data. The confusion
matrix of the adjectives dataset is presented in Figure 10. 100% success was achieved in most of the adjectives
such as “Other”, “White”, “Correct”, “Economic”, “Easy”, “Small”, “Possible”, “Important”, “Black”, and
“International” and the system predicted completely correctly. The system made an incorrect guess with 0% in
"Slow" adjective, 20% in "continually" adjective, and 25% in "OK" adjective. In addition, in adjectives such
as “Great”, “Beautiful”, “Necessary”, “Clean”, and “High”, the system can be said to have made almost correct
predictions by showing 50% or more success.
The system showed more errors in words with similar lip movements (as seen in Figure 11). The lip
movements of the adjectives "Beautiful" (in Figure 11(a)) and "Easy" (in Figure 11(b)) are similar. Therefore,
the system showed the adjective "Easy" as a result in 33% of the " Beautiful " adjectives. Also, the lip
movement of the "continually" is similar to the "Beautiful" and "High", and thus the system nearly failed with
only 20% correct guesses. In order to eliminate these errors and to enable the system to learn better, the dataset
and the number of epochs must be increased.
The model was run again with 50 epochs and adam optimization for the nouns dataset. Training and
validation accuracy and training and validation loss for the nouns dataset are shown in Figure 12. Training
accuracy was 78.67% and validation accuracy was 81.21%. The system showed lower performance than the
adjective dataset in training and validation accuracy.

Lip reading using deep learning in Turkish language (Hadi Pourmousa)

3256  ISSN: 2252-8938

Figure 10. Confusion matrix for adjectives dataset

(a) (b)

Figure 11. Lip movement of (a) “beautiful” and (b) “easy”

Figure 12. Training and validation accuracy and training and validation loss for nouns dataset

Int J Artif Intell, Vol. 13, No. 3, September 2024: 3250-3261

Int J Artif Intell ISSN: 2252-8938  3257

The success rate of the nouns dataset was 71.88% on the test set (Figure 13). As mentioned in the
adjective dataset, the training and validation datasets are likely to be similar here, but the test dataset consists
of completely different data and is given to the system only when testing. Faisal and Manzoor [52] achieved
62% success on word dataset in Urdu language.

Figure 13. Test accuracy for nouns dataset

The confusion matrix of the nouns dataset is presented in Figure 14. The system showed 100% success
in 17 nouns. However, it failed completely with 0% on some words such as “Car”, “section”, “Technology”,
and “Country” and partially failed with the nouns “Money” and “Part” under 50% guessing. In other words,
50% or more success was achieved.

Figure 14. Confusion matrix for nouns dataset

The model was run again with 50 epochs and adam optimization for the verbs dataset. Training and
validation accuracy and training and validation loss for the verbs dataset are shown in Figure 15. Training
accuracy was 81.16% and validation accuracy was 78.74%. The system showed better performance than the
adjective and nouns dataset in training and validation accuracy.

Lip reading using deep learning in Turkish language (Hadi Pourmousa)

3258  ISSN: 2252-8938

Figure 15. Training and validation accuracy and training and validation loss for verbs dataset

The success rate of the nouns dataset was 79.69% on the test set (Figure 16). As mentioned in the
adjective dataset, the training and validation datasets are likely to be similar here, but the test dataset consists
of completely different data and is given to the system only when testing. Ma et al. [32] achieved 88.5% success
at word level with LRW-1000 dataset and Noda et al. [5] achieved 58% success in the word dataset in Japanese.
The confusion matrix of the verbs dataset is presented in Figure 17. As seen in the complexity matrix, the
model showed 100% success in 9 verbs and 50% or more success in all other verbs. The verbs dataset
performed better than other datasets In the testing phase.

Figure 16. Test accuracy for verbs dataset

Figure 17. Confusion matrix for verbs dataset

Int J Artif Intell, Vol. 13, No. 3, September 2024: 3250-3261

Int J Artif Intell ISSN: 2252-8938  3259

There is lip movement in all verbs and lip movement of verbs are not similar to each other, and also
the number of frames is high. So, the system learned better than other datasets. In addition to the developed
convolutional neural network model, pre-trained models were also trained and tested on the datasets. In this
study, pre-trained VGG and Inception-V3 models were trained on datasets and their success rates were
examined. However, the VGG model showed only 10% success on the adjectives dataset and Incevption-V3
model showed 64.06% success on the verbs dataset. Table 3 summarizes the results, the number of repetitions
and success rates.

Table 3. Summary of the results

Model Dataset Epoch Class Optimizer Success rate %
Proposed model Adjectives 50 19 Adam 75
Proposed model Nouns 50 33 Adam 71.88
Proposed model Verbs 50 19 Adam 79.69
VGG Adjectives 50 19 Adam 10
Inception-v3 Verbs 50 19 Adam 64.06

7. CONCLUSION
In this study, a visual lip-reading system was proposed for Turkish language. So, a convolutional neural
network model was proposed and trained. Also, pre-trained models were used to increase success. The proposed
model was trained and tested on the dataset of adjectives, nouns, and verbs and achieved 75%, 71.88%, and
79.69% success, respectively In the adjectives dataset, the lip movements of adjectives such as “beautiful”, “easy”,
and “continually” are similar to each other, and in the nouns dataset, the lip movements of words such as “money”
and “piece” are similar to each other, and the lip movements of words such as “animal” and “car” are similar to
each other. Therefore, they affected the success rate. In addition to these, one of the biggest problems of the study
was that the data set was small and was not recorded in the required environment, and that most of the people did
not look at the camera correctly. Most of the studies conducted at the word level in this area have achieved a
success rate of less than 75%. Therefore, it can be said that the result is at a good level when the success achieved
at the word level is compared with other studies. Most people did not send videos because they did not trust and
thus the very small dataset was one of the major limitations of the study. In addition, the videos were not recorded
in the required environment or the persons were not looking towards the camera. therefore, lip movements were
not understandable by the system. For this reason, in future studies, it can be examined how much the success rate
will be affected by keeping the camera fixed in one place and saying words by looking at the camera from different
angles. In addition, Turkish sentences dataset can be collected and its success rate can be compared with numbers
words. Also, other pre-trained models can be run on these datasets.

REFERENCES
[1] Y. Li, Y. Takashima, T. Takiguchi, and Y. Ariki, “Lip reading using a dynamic feature of lip images and convolutional neural
networks,” in 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), Jun. 2016, pp. 1–6,
doi: 10.1109/ICIS.2016.7550888.
[2] S. Agrawal, V. R. Omprakash, and Ranvijay, “Lip reading techniques: A survey,” in 2016 2nd International Conference on Applied
and Theoretical Computing and Communication Technology (iCATccT), 2016, pp. 753–757, doi:
10.1109/ICATCCT.2016.7912100.
[3] J. S. Chung, A. Senior, O. Vinyals, and A. Zisserman, “Lip reading sentences in the wild,” in 2017 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), Jul. 2017, pp. 3444–3453, doi: 10.1109/CVPR.2017.367.
[4] X. Chen, J. Du, and H. Zhang, “Lipreading with DenseNet and resBi-LSTM,” Signal, Image and Video Processing, vol. 14, no. 5,
pp. 981–989, Jul. 2020, doi: 10.1007/s11760-019-01630-1.
[5] K. Noda, Y. Yamaguchi, K. Nakadai, H. G. Okuno, and T. Ogata, “Lipreading using convolutional neural network,” in Interspeech
2014, Sep. 2014, pp. 1149–1153, doi: 10.21437/Interspeech.2014-293.
[6] Ü. Atila and F. Sabaz, “Turkish lip-reading using Bi-LSTM and deep learning models,” Engineering Science and Technology, an
International Journal, vol. 35, p. 101206, Nov. 2022, doi: 10.1016/j.jestch.2022.101206.
[7] A. Garg, J. Noyola, and S. Bagadia, “Lip reading using CNN and LSTM,” Technical Report, Stanford University, 2016. [Online].
Available: https://cs231n.stanford.edu/reports/2016/pdfs/217_Report.pdf.
[8] B. Martinez, P. Ma, S. Petridis, and M. Pantic, “Lipreading using temporal convolutional networks,” in 2020 IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 6319–6323, doi:
10.1109/ICASSP40776.2020.9053841.
[9] T. Ozcan and A. Basturk, “Lip reading using convolutional neural networks with and without pre-trained models,” Balkan Journal
of Electrical and Computer Engineering, vol. 7, no. 2, pp. 195–201, Apr. 2019, doi: 10.17694/bajece.479891.
[10] A. Yargic and M. Dogan, “A lip reading application on MS Kinect camera,” in 2013 IEEE INISTA, Jun. 2013, pp. 1–5, doi:
10.1109/INISTA.2013.6577656.
[11] J. R. Movellan, “Visual speech recognition with stochastic networks,” in Advances in Neural Information Processing Systems, 1994,
pp. 851–858.
[12] O. Vanegas, K. Tokuda, and T. Kitamura, “Location normalization of HMM-based lip-reading: experiments for the M2VTS
database,” in Proceedings 1999 International Conference on Image Processing, 1999, pp. 343–347 vol.2, doi:

Lip reading using deep learning in Turkish language (Hadi Pourmousa)

3260  ISSN: 2252-8938

10.1109/ICIP.1999.822914.
[13] I. Matthews, T. F. Cootes, J. A. Bangham, S. Cox, and R. Harvey, “Extraction of visual features for lipreading,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. 24, no. 2, pp. 198–213, 2002, doi: 10.1109/34.982900.
[14] A. Vorwerk, X. Wang, D. Kolossa, S. Zeiler, and R. Orglmeister, “WAPUSK20 - a database for robust audiovisual speech
recognition,” in Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), 2010,
pp. 3016–3019.
[15] C. Neti et al., “Audio visual speech recognition,” Worksop Final Report, pp. 1-84, 2000.
[16] A. Ortega et al., “AV@CAR: a Spanish multichannel multimodal corpus for in-vehicle automatic audio-visual speech recognition,”
in Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04), 2004.
[17] E. K. Patterson, S. Gurbuz, Z. Tufekci, and J. N. Gowdy, “CUAVE: A new audio-visual database for multimodal human-computer
interface research,” in IEEE International Conference on Acoustics Speech and Signal Processing, 2002, pp. 2017-2020, doi:
10.1109/ICASSP.2002.5745028.
[18] C. Petr, Ž. Miloš, K. Zdeněk, K. Jakub, Z. Jan, and M. Luděk, “Design and recording of Czech speech corpus for audio-visual
continuous speech recognition,” in Auditory-Visual Speech Processing Workshop 2005, 2005, pp. 1–4.
[19] S. Cox, R. Harvey, Y. Lan, J. Newman, and B.-J. Theobald, “The challenge of multispeaker lip-reading,” in Audio Visual Speech
Processing AVSP, Brisbane, 2008, 2008, pp. 179–184.
[20] S. Tamura et al., “CENSREC-1-AV: an audio-visual corpus for noisy bimodal speech recognition,” International Conference on
Audio-Visual Speech Processing, pp. 1-4, 2010.
[21] A. G. Chitu, K. Driel, and L. J. M. Rothkrantz, “Automatic lip reading in the Dutch language using active appearance models o n
high speed recordings,” in Text, Speech and Dialogue, Springer Berlin Heidelberg, 2010, pp. 259–266.
[22] I. Anina, Ziheng Zhou, Guoying Zhao, and M. Pietikainen, “OuluVS2: A multi-view audiovisual database for non-rigid mouth
motion analysis,” in 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG),
May 2015, pp. 1–5, doi: 10.1109/FG.2015.7163155.
[23] V. Estellers and J.-P. Thiran, “Multi-pose lipreading and audio-visual speech recognition,” EURASIP Journal on Advances in Signal
Processing, vol. 2012, no. 1, p. 51, Dec. 2012, doi: 10.1186/1687-6180-2012-51.
[24] A. Rekik, A. B. -Hamadou, and W. Mahdi, “A new visual speech recognition approach for RGB-D cameras,” in Lecture Notes in
Computer Science, Springer International Publishing, 2014, pp. 21–28.
[25] V. Verkhodanova, A. Ronzhin, I. Kipyatkova, D. Ivanko, A. Karpov, and M. Železný, “HAVRUS corpus: high-speed recordings
of audio-visual Russian speech,” in Speech and Computer, Springer International Publishing, 2016, pp. 338–345.
[26] S. Petridis, J. Shen, D. Cetin, and M. Pantic, “Visual-only recognition of normal, whispered and silent speech,” in 2018 IEEE
International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr. 2018, pp. 6219–6223, doi:
10.1109/ICASSP.2018.8461596.
[27] S. Yang et al., “LRW-1000: a naturally-distributed large-scale benchmark for lip reading in the wild,” in 2019 14th IEEE
International Conference on Automatic Face & Gesture Recognition (FG 2019), 2019, pp. 1–8, doi: 10.1109/FG.2019.8756582.
[28] L. A. Elrefaei, T. Q. Alhassan, and S. S. Omar, “An Arabic visual dataset for visual speech recognition,” Procedia Computer
Science, vol. 163, pp. 400–409, 2019, doi: 10.1016/j.procs.2019.12.122.
[29] H. Pourmousa and Ü. Özen, “Lip reading using CNN for Turkish numbers,” Journal of Business in The Digital Age, vol. 5, no. 2,
pp. 155-160, Sep. 2022, doi: 10.46238/jobda.1100903.
[30] A. S. Nambeesan, C. Payyappilly, E. J.C, J. J. P, and M. S. Alex, “Lip reading using facial feature extraction and deep learning,”
International Journal of Innovative Science and Research Technology, vol. 6, no. 7, pp. 92–96, 2021.
[31] A. M. Sarhan, N. M. Elshennawy, and D. M. Ibrahim, “HLR-Net: a hybrid lip-reading model based on deep convolutional neural
networks,” Computers, Materials & Continua, vol. 68, no. 2, pp. 1531–1549, 2021, doi: 10.32604/cmc.2021.016509.
[32] P. Ma, B. Martinez, S. Petridis, and M. Pantic, “Towards practical lipreading with distilled and efficient models,” in ICASSP 2021
- 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun. 2021, pp. 7608–7612, doi:
10.1109/ICASSP39728.2021.9415063.
[33] K. Sarıgül, “En çok kullanılan 1000 türkçe kelime,” Türkçe Ogretimi. Accessed: Oct. 21, 2021. [Online]. Available:
https://www.turkceogretimi.com/tavsiyeler/en-cok-kullanilan-1000-turkce-kelime
[34] K. Sarıgül, “Türkçede en çok kullanılan 200 fiil,” Türkçe Ogretimi. Accessed: Oct. 21, 2021. [Online]. Available:
https://www.turkceogretimi.com/tavsiyeler/turkcede-en-cok-kullanilan-200-fiil
[35] Y. LeCun et al., “Backpropagation applied to handwritten zip code recognition,” Neural Computation, vol. 1, no. 4, pp. 541–551,
Dec. 1989, doi: 10.1162/neco.1989.1.4.541.
[36] Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, “A survey of convolutional neural networks: analysis, applications, and prospects,”
IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 12, pp. 6999–7019, Dec. 2022, doi:
10.1109/TNNLS.2021.3084827.
[37] R. Yamashita, M. Nishio, R. K. G. Do, and K. Togashi, “Convolutional neural networks: an overview and application in radiology,”
Insights into Imaging, vol. 9, no. 4, pp. 611–629, Aug. 2018, doi: 10.1007/s13244-018-0639-9.
[38] A. Kamilaris and F. X. P. -Boldú, “A review of the use of convolutional neural networks in agriculture,” Journal of Agricultural
Science, vol. 156, no. 3, pp. 312–322, 2018, doi: 10.1017/S0021859618000436.
[39] K. Ryczko, K. Mills, I. Luchak, C. Homenick, and I. Tamblyn, “Convolutional neural networks for atomistic systems,”
Computational Materials Science, vol. 149, pp. 134–142, Jun. 2018, doi: 10.1016/j.commatsci.2018.03.005.
[40] T. Guo, J. Dong, H. Li, and Y. Gao, “Simple convolutional neural network on image classification,” in 2017 IEEE 2nd International
Conference on Big Data Analysis (ICBDA), Mar. 2017, pp. 721–724, doi: 10.1109/ICBDA.2017.8078730.
[41] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. F.-Fei, “Large-scale video classification with convolutional
neural networks,” in 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1725–1732, doi:
10.1109/CVPR.2014.223.
[42] Y. Kim, “Convolutional neural networks for sentence classification,” in Proceedings of the 2014 Conference on Empirical Methods
in Natural Language Processing (EMNLP), 2014, pp. 1746–1751, doi: 10.3115/v1/D14-1181.
[43] S. Hershey et al., “CNN architectures for large-scale audio classification,” in 2017 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP), Mar. 2017, pp. 131–135, doi: 10.1109/ICASSP.2017.7952132.
[44] O. Abdel-Hamid, A. Mohamed, H. Jiang, L. Deng, G. Penn, and D. Yu, “Convolutional neural networks for speech recognition,”
IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 10, pp. 1533–1545, Oct. 2014, doi:
10.1109/TASLP.2014.2339736.
[45] K. Simonyan and A. Zisserman, “Very deep convolutional networks for Large-Scale image recognition,” Computer Vision and
Pattern Recognition, Sep. 2014.

Int J Artif Intell, Vol. 13, No. 3, September 2024: 3250-3261

Int J Artif Intell ISSN: 2252-8938  3261

[46] C. Affonso, A. L. D. Rossi, F. H. A. Vieira, and A. C. P. de L. F. de Carvalho, “Deep learning for biological image classification,”
Expert Systems with Applications, vol. 85, pp. 114–122, Nov. 2017, doi: 10.1016/j.eswa.2017.05.039.
[47] F. Güven, “Using text representation and deep learning methods for Turkish text classification,” Master Thesis, Çukurova
University, Adana, Turkey, 2019.
[48] H. Gholamalinezhad and H. Khosravi, “Pooling methods in deep neural networks, a review,” Computer Vision and Pattern
Recognition, Sep. 2020.
[49] D. T. Tran, A. Iosifidis, and M. Gabbouj, “Improving efficiency in convolutional neural networks with multilinear filters,” Neural
Networks, vol. 105, pp. 328–339, Sep. 2018, doi: 10.1016/j.neunet.2018.05.017.
[50] H. Wu and J. Zhao, “Deep convolutional neural network model based chemical process fault diagnosis,” Computers & Chemical
Engineering, vol. 115, pp. 185–197, Jul. 2018, doi: 10.1016/j.compchemeng.2018.04.009.
[51] A. M. Karim, “A new framework by using deep learning techniques for data processing,” Ph.D. Thesis, Department of Computer
Engineering, Ankara Yıldırım Beyazıt University, Ankara, Turkey, 2018.
[52] M. Faisal and S. Manzoor, “Deep learning for lip reading using audio-visual information for Urdu language,” Computer Vision and
Pattern Recognition, Feb. 2018.

BIOGRAPHIES OF AUTHORS

Hadi Pourmousa holds a Doctor of Management Information Systems degree

from Atatürk University, Türkiye in 2022. He also received his B.Sc. (Information
Technology Engineering) from Tabriz University, Iran in 2011 and M.Sc. (MIS) from
Atatürk University, Türkiye in 2017 and respectively. He is currently a PhD student in
Computer Engineering. His research includes image processing, computer vision, machine
learning, and deep learning. He can be contacted at email: pourmousahadi@gmail.com or
hadi.pourmousa14@ogr.atauni.edu.tr.

Dr. Üstün Özen is full Professor and Senior Lecturer in Management Information
Systems Department at Atatürk University, Erzurum, Türkiye. He also serves as Department
chair. His research interests focus on social media analysis, data analytics and health
informatics. He has several journal and conference papers, and books. He can be contacted at
email: uozen@atauni.edu.tr.

Lip reading using deep learning in Turkish language (Hadi Pourmousa)

Final Project Report
No ratings yet
Final Project Report
21 pages
NMITCON 2024 Submitted 23 24 B27 Lip Reading Using Deep Learning Project Paper
No ratings yet
NMITCON 2024 Submitted 23 24 B27 Lip Reading Using Deep Learning Project Paper
8 pages
Applsci 15 00563
No ratings yet
Applsci 15 00563
22 pages
A Multimodal German Dataset For Automatic Lip Reading Systems and Transfer Learning
No ratings yet
A Multimodal German Dataset For Automatic Lip Reading Systems and Transfer Learning
8 pages
Lip Reading Sentences Using Deep Learning With Only Visual Cues
No ratings yet
Lip Reading Sentences Using Deep Learning With Only Visual Cues
15 pages
Manual Danfoss Erc211 PDF
No ratings yet
Manual Danfoss Erc211 PDF
152 pages
Lip-Reading With Densely Connected Temporal Convolutional Networks
No ratings yet
Lip-Reading With Densely Connected Temporal Convolutional Networks
10 pages
Christoph Bensch Master Thesis
No ratings yet
Christoph Bensch Master Thesis
67 pages
Hybrid Attention Mechanisms in 3D CNN For Noise-Resilient Lip Reading in Complex Environments
No ratings yet
Hybrid Attention Mechanisms in 3D CNN For Noise-Resilient Lip Reading in Complex Environments
11 pages
Learning Spatio-Temporal Features With Two-Stream Deep 3D Cnns For Lipreading
No ratings yet
Learning Spatio-Temporal Features With Two-Stream Deep 3D Cnns For Lipreading
13 pages
Comparing The Fine-Tuning and Performance of Whisper Pre-Trained Models For Turkish Speech Recognition Task
No ratings yet
Comparing The Fine-Tuning and Performance of Whisper Pre-Trained Models For Turkish Speech Recognition Task
4 pages
A Survey On Deep Learning Based Lip-Reading Techniques
No ratings yet
A Survey On Deep Learning Based Lip-Reading Techniques
8 pages
Toward Language-Independent Lip Reading A Transfer Learning Approach
No ratings yet
Toward Language-Independent Lip Reading A Transfer Learning Approach
4 pages
Developing Phoneme Based Lip Reading Sentences System For Silent Speech Recognition
No ratings yet
Developing Phoneme Based Lip Reading Sentences System For Silent Speech Recognition
10 pages
LipReadNet A Deep Learning Approach To Lip Reading
No ratings yet
LipReadNet A Deep Learning Approach To Lip Reading
6 pages
Learning Individual Speaking Styles For Accurate L
No ratings yet
Learning Individual Speaking Styles For Accurate L
11 pages
17 - A Deep Learning Analysis On Question Classification Task Using Word2vec Representations
No ratings yet
17 - A Deep Learning Analysis On Question Classification Task Using Word2vec Representations
20 pages
Cascade
No ratings yet
Cascade
6 pages
Analysis of Lip-Reading Using Deep Learning Techniques A Review
No ratings yet
Analysis of Lip-Reading Using Deep Learning Techniques A Review
6 pages
Zhu 2020 J. Phys. Conf. Ser. 1651 012076
No ratings yet
Zhu 2020 J. Phys. Conf. Ser. 1651 012076
8 pages
Engineering Science and Technology, An International Journal
No ratings yet
Engineering Science and Technology, An International Journal
10 pages
Analyzing Lower Half Facial Gestures For Lip Reading Applications Survey On Vision Techniques
No ratings yet
Analyzing Lower Half Facial Gestures For Lip Reading Applications Survey On Vision Techniques
45 pages
Deep Learning Model For Lip Reading To ImproveAccessibility
No ratings yet
Deep Learning Model For Lip Reading To ImproveAccessibility
6 pages
Icassp19 Zhoupan
No ratings yet
Icassp19 Zhoupan
5 pages
Lip Reading With Hahn Convolutional Neural Networks
No ratings yet
Lip Reading With Hahn Convolutional Neural Networks
28 pages
Deep Learning-Based Automated Lip-Reading A Survey
No ratings yet
Deep Learning-Based Automated Lip-Reading A Survey
22 pages
Chung 18
No ratings yet
Chung 18
28 pages
LIP Reading Using Facial Feature Extraction and Deep Learning
No ratings yet
LIP Reading Using Facial Feature Extraction and Deep Learning
5 pages
Lipsound2: Self-Supervised Pre-Training For Lip-To-Speech Reconstruction and Lip Reading
No ratings yet
Lipsound2: Self-Supervised Pre-Training For Lip-To-Speech Reconstruction and Lip Reading
11 pages
Wavllm: Towards Robust and Adaptive Speech Large Language Model
No ratings yet
Wavllm: Towards Robust and Adaptive Speech Large Language Model
21 pages
Lipreading With 3D-2D-Cnn BLSTM-HMM and Word-Ctc Models
No ratings yet
Lipreading With 3D-2D-Cnn BLSTM-HMM and Word-Ctc Models
5 pages
LRW-1000: A Naturally-Distributed Large-Scale Benchmark For Lip Reading in The Wild
No ratings yet
LRW-1000: A Naturally-Distributed Large-Scale Benchmark For Lip Reading in The Wild
8 pages
Vision Based Lip Reading System Using Deep Learning: July 2022
No ratings yet
Vision Based Lip Reading System Using Deep Learning: July 2022
7 pages
Developing Phoneme-Based Lip-Reading Sentences System For Silent Speech Recognition
No ratings yet
Developing Phoneme-Based Lip-Reading Sentences System For Silent Speech Recognition
10 pages
1 s2.0 S1877050922001843 Main
No ratings yet
1 s2.0 S1877050922001843 Main
6 pages
Lip Decoder
No ratings yet
Lip Decoder
11 pages
A Novel Machine Lip Reading Model A Novel Machine Lip Reading Model
No ratings yet
A Novel Machine Lip Reading Model A Novel Machine Lip Reading Model
6 pages
Cep Report
No ratings yet
Cep Report
21 pages
Ashrith Miniproject 2
No ratings yet
Ashrith Miniproject 2
11 pages
Mutual Information Maximization For Effective Lip Reading: Xing Zhao, Shuang Yang, Shiguang Shan, Xilin Chen
No ratings yet
Mutual Information Maximization For Effective Lip Reading: Xing Zhao, Shuang Yang, Shiguang Shan, Xilin Chen
8 pages
1 s2.0 S2666764923000450 Main
No ratings yet
1 s2.0 S2666764923000450 Main
10 pages
Batch A3
No ratings yet
Batch A3
7 pages
DL Review
No ratings yet
DL Review
4 pages
Lipreading Using A Comparative Machine Learning Approach
No ratings yet
Lipreading Using A Comparative Machine Learning Approach
7 pages
Prajwal Sub-Word Level Lip Reading With Visual Attention CVPR 2022 Paper
No ratings yet
Prajwal Sub-Word Level Lip Reading With Visual Attention CVPR 2022 Paper
11 pages
Lip Reading Word Classification: Abiel Gutierrez Stanford University Zoe-Alanah Robert Stanford University
No ratings yet
Lip Reading Word Classification: Abiel Gutierrez Stanford University Zoe-Alanah Robert Stanford University
9 pages
Deformation Flow Based Two-Stream Network For Lip Reading
No ratings yet
Deformation Flow Based Two-Stream Network For Lip Reading
7 pages
History of Indian Railways
100% (1)
History of Indian Railways
35 pages
Lip Reading Using External Viseme Decoding: 1 Javad Peymanfard 2 Mohammad Reza Mohammadi 3 Hossein Zeinali
No ratings yet
Lip Reading Using External Viseme Decoding: 1 Javad Peymanfard 2 Mohammad Reza Mohammadi 3 Hossein Zeinali
5 pages
A Lip Reading Method Based On 3D Convolutional Vision Transformer
No ratings yet
A Lip Reading Method Based On 3D Convolutional Vision Transformer
8 pages
Deep Learning For Lip Reading Using Audio-Visual Information For Urdu Language
No ratings yet
Deep Learning For Lip Reading Using Audio-Visual Information For Urdu Language
5 pages
584 Camera Ready
No ratings yet
584 Camera Ready
6 pages
2001 08702v1
No ratings yet
2001 08702v1
6 pages
Multi-Grained Spatio-Temporal Modeling For Lip-Reading: Wangchenhao17@mails - Ucas.ac - CN
No ratings yet
Multi-Grained Spatio-Temporal Modeling For Lip-Reading: Wangchenhao17@mails - Ucas.ac - CN
11 pages
Video Forgery: An Extensive Analysis of Inter-And Intra-Frame Manipulation Alongside State-Of-The-Art Comparisons
No ratings yet
Video Forgery: An Extensive Analysis of Inter-And Intra-Frame Manipulation Alongside State-Of-The-Art Comparisons
13 pages
ANN Paper
No ratings yet
ANN Paper
7 pages
Gangs and Society
No ratings yet
Gangs and Society
921 pages
Automatic Lip Reading Classification of
No ratings yet
Automatic Lip Reading Classification of
5 pages
ANN Paper
No ratings yet
ANN Paper
6 pages
Deep Audio-Visual Speech Recognition
No ratings yet
Deep Audio-Visual Speech Recognition
13 pages
Speech Emotion Recognition With Deep Learning
No ratings yet
Speech Emotion Recognition With Deep Learning
5 pages
Lip Reading Using CNN and LTSM
No ratings yet
Lip Reading Using CNN and LTSM
9 pages
STATE OF PREFABRICATION IN CANADA - April 2022
No ratings yet
STATE OF PREFABRICATION IN CANADA - April 2022
99 pages
Two-Dimensional Klein-Gordon and Sine-Gordon Numerical Solutions Based On Deep Neural Network
No ratings yet
Two-Dimensional Klein-Gordon and Sine-Gordon Numerical Solutions Based On Deep Neural Network
13 pages
How To Make Your Bed
No ratings yet
How To Make Your Bed
3 pages
Rod Handler Introduction and General Information
No ratings yet
Rod Handler Introduction and General Information
50 pages
Adaptive Kernel Integration in Visual Geometry Group 16 For Enhanced Classification of Diabetic Retinopathy Stages in Retinal Images
No ratings yet
Adaptive Kernel Integration in Visual Geometry Group 16 For Enhanced Classification of Diabetic Retinopathy Stages in Retinal Images
12 pages
Detecting Road Damage Utilizing Retinanet and Mobilenet Models On Edge Devices
No ratings yet
Detecting Road Damage Utilizing Retinanet and Mobilenet Models On Edge Devices
11 pages
A Novel Scalable Deep Ensemble Learning Framework For Big Data Classification Via MapReduce Integration
No ratings yet
A Novel Scalable Deep Ensemble Learning Framework For Big Data Classification Via MapReduce Integration
15 pages
U-Net For Wheel Rim Contour Detection in Robotic Deburring
No ratings yet
U-Net For Wheel Rim Contour Detection in Robotic Deburring
14 pages
Deep Learning-Based Techniques For Video Enhancement, Compression and Restoration
No ratings yet
Deep Learning-Based Techniques For Video Enhancement, Compression and Restoration
13 pages
Hybrid Object Detection and Distance Measurement For Precision Agriculture: Integrating YOLOv8 With Rice Field Sidewalk Detection Algorithm
No ratings yet
Hybrid Object Detection and Distance Measurement For Precision Agriculture: Integrating YOLOv8 With Rice Field Sidewalk Detection Algorithm
11 pages
Hybrid Model Detection and Classification of Lung Cancer
No ratings yet
Hybrid Model Detection and Classification of Lung Cancer
11 pages
Deep Ensemble Learning With Uncertainty Aware Prediction Ranking For Cervical Cancer Detection Using Pap Smear Images
No ratings yet
Deep Ensemble Learning With Uncertainty Aware Prediction Ranking For Cervical Cancer Detection Using Pap Smear Images
11 pages
Multi-Task Deep Learning For Vietnamese Capitalization and Punctuation Recognition
No ratings yet
Multi-Task Deep Learning For Vietnamese Capitalization and Punctuation Recognition
11 pages
Optimizing Deep Learning Models From Multi-Objective Perspective Via Bayesian Optimization
No ratings yet
Optimizing Deep Learning Models From Multi-Objective Perspective Via Bayesian Optimization
10 pages
Enhancing Fall Detection and Classification Using Jarratt Butterfly Optimization Algorithm With Deep Learning
No ratings yet
Enhancing Fall Detection and Classification Using Jarratt Butterfly Optimization Algorithm With Deep Learning
10 pages
Squeeze-Excitation Half U-Net and Synthetic Minority Oversampling Technique Oversampling For Papilledema Image Classification
No ratings yet
Squeeze-Excitation Half U-Net and Synthetic Minority Oversampling Technique Oversampling For Papilledema Image Classification
10 pages
Event Detection in Soccer Matches Through Audio Classification Using Transfer Learning
No ratings yet
Event Detection in Soccer Matches Through Audio Classification Using Transfer Learning
9 pages
Primary Phase Alzheimer's Disease Detection Using Ensemble Learning Model
No ratings yet
Primary Phase Alzheimer's Disease Detection Using Ensemble Learning Model
9 pages
Exploring DenseNet Architectures With Particle Swarm Optimization: Efficient Tomato Leaf Disease Detection
No ratings yet
Exploring DenseNet Architectures With Particle Swarm Optimization: Efficient Tomato Leaf Disease Detection
9 pages
Automatic Detection of Dress-Code Surveillance in A University Using YOLO Algorithm
No ratings yet
Automatic Detection of Dress-Code Surveillance in A University Using YOLO Algorithm
8 pages
Hindi Spoken Digit Analysis For Native and Non-Native Speakers
No ratings yet
Hindi Spoken Digit Analysis For Native and Non-Native Speakers
7 pages
Asce LRFD & Asd Load Combinations
No ratings yet
Asce LRFD & Asd Load Combinations
4 pages
Developing A Website For English-Speaking Practice To English As A Foreign Language Learners at The University Level
No ratings yet
Developing A Website For English-Speaking Practice To English As A Foreign Language Learners at The University Level
12 pages
Enhancing Emotion Recognition Model For A Student Engagement Use Case Through Transfer Learning
No ratings yet
Enhancing Emotion Recognition Model For A Student Engagement Use Case Through Transfer Learning
11 pages
Leadership and Organisational Dynamics MBL921M Group Assignment 1 - ZIM0112A
No ratings yet
Leadership and Organisational Dynamics MBL921M Group Assignment 1 - ZIM0112A
25 pages
Abstractive Summarization Using Multilingual Text-To-Text Transfer Transformer For The Turkish Text
No ratings yet
Abstractive Summarization Using Multilingual Text-To-Text Transfer Transformer For The Turkish Text
10 pages
Graph-Based Methods For Transaction Databases: A Comparative Study
No ratings yet
Graph-Based Methods For Transaction Databases: A Comparative Study
10 pages
Hybrid Horned Lizard Optimization Algorithm-Aquila Optimizer For DC Motor
No ratings yet
Hybrid Horned Lizard Optimization Algorithm-Aquila Optimizer For DC Motor
10 pages
Artificial Intelligence Algorithms To Predict Customer Satisfaction: A Comparative Study
No ratings yet
Artificial Intelligence Algorithms To Predict Customer Satisfaction: A Comparative Study
9 pages
A Proposed Approach For Plagiarism Detection in Myanmar Unicode Text
No ratings yet
A Proposed Approach For Plagiarism Detection in Myanmar Unicode Text
9 pages
A Contest of Sentiment Analysis: K-Nearest Neighbor Versus Neural Network
No ratings yet
A Contest of Sentiment Analysis: K-Nearest Neighbor Versus Neural Network
9 pages
Evaluating ChatGPT's Mandarin "Yue" Pronunciation System in Language Learning
No ratings yet
Evaluating ChatGPT's Mandarin "Yue" Pronunciation System in Language Learning
8 pages
A Comparative Study of Natural Language Inference in Swahili Using Monolingual and Multilingual Models
No ratings yet
A Comparative Study of Natural Language Inference in Swahili Using Monolingual and Multilingual Models
8 pages
Improved Convolutional Neural Networks For Aircraft Type Classification in Remote Sensing Images
No ratings yet
Improved Convolutional Neural Networks For Aircraft Type Classification in Remote Sensing Images
8 pages
Market Research - Wed-E - 1
No ratings yet
Market Research - Wed-E - 1
43 pages
BELOVED
No ratings yet
BELOVED
9 pages
Dissertation Arwa Al-Moghrabi 120038
No ratings yet
Dissertation Arwa Al-Moghrabi 120038
79 pages
Yes
No ratings yet
Yes
80 pages
Cutting Parameters
No ratings yet
Cutting Parameters
5 pages
Pendulate User Guide
No ratings yet
Pendulate User Guide
26 pages
Racecar Engineering 2006 05 PDF
No ratings yet
Racecar Engineering 2006 05 PDF
100 pages
Sas S2
No ratings yet
Sas S2
6 pages
The Jesus Seminar, Its Methodology
No ratings yet
The Jesus Seminar, Its Methodology
24 pages
6.1.3 Determination of Legal Requirements and Other Requirements
No ratings yet
6.1.3 Determination of Legal Requirements and Other Requirements
1 page
Persia and India
No ratings yet
Persia and India
20 pages
1 s2.0 S0378775320310120 Main
No ratings yet
1 s2.0 S0378775320310120 Main
26 pages
An Experimental Study On The Performance of Corrugated Cardboard As A Sustainable Sound-Absorbing and Insulating Material
No ratings yet
An Experimental Study On The Performance of Corrugated Cardboard As A Sustainable Sound-Absorbing and Insulating Material
12 pages
Roselyn Deborah Resume
No ratings yet
Roselyn Deborah Resume
3 pages
Successful Practices of Environmental Management Systems in Small and Medium-Size Enterprises
No ratings yet
Successful Practices of Environmental Management Systems in Small and Medium-Size Enterprises
36 pages
Itmc Presentation
No ratings yet
Itmc Presentation
18 pages
Dobutamine
No ratings yet
Dobutamine
21 pages
Design Guidelines For Partially Composite Beams
No ratings yet
Design Guidelines For Partially Composite Beams
5 pages
Lesson Plans Solerebels 44
No ratings yet
Lesson Plans Solerebels 44
2 pages
Prepaid Brokerage Plan - ICICIDirect
No ratings yet
Prepaid Brokerage Plan - ICICIDirect
1 page
Mark Thornton - What Keeps Us Safe
No ratings yet
Mark Thornton - What Keeps Us Safe
4 pages
Creating A Pygame Window
No ratings yet
Creating A Pygame Window
3 pages
PLC Controls with Structured Text (ST): IEC 61131-3 and best practice ST programming
From Everand
PLC Controls with Structured Text (ST): IEC 61131-3 and best practice ST programming
Tom Mejer Antonsen
4/5 (12)
LabVIEW – More LCOD
From Everand
LabVIEW – More LCOD
Rob Maskell
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lip Reading Using Deep Learning in Turkish Language

Uploaded by

Lip Reading Using Deep Learning in Turkish Language

Uploaded by

IAES International Journal of Artificial Intelligence (IJ-AI)

Vol. 13, No. 3, September 2024, pp. 3250~3261

Lip reading using deep learning in Turkish language

Hadi Pourmousa1,2, Üstün Özen1

Article Info ABSTRACT

Journal homepage: http://ijai.iaescore.com

Table 1. Some of the important lip-reading datasets

Lip reading using deep learning in Turkish language (Hadi Pourmousa)

3.1. Dataset properties

Table 2. Turkish words dataset

3.2. Dataset pre-processing

Figure 1. Five videos of word with rotation at different angles

Int J Artif Intell, Vol. 13, No. 3, September 2024: 3250-3261

Figure 2. 468 points used for face in mediapipe library

4. CONVOLUTIONAL NEURAL NETWORKS

Figure 5. An important example of convolutional neural network architecture [45]

Figure 6. Pooling methods [51]

Figure 7. Proposed convolutional neural network model

Int J Artif Intell, Vol. 13, No. 3, September 2024: 3250-3261

6. RESULTS AND DISCUSSION

Figure 9. Test accuracy for adjectives dataset

Lip reading using deep learning in Turkish language (Hadi Pourmousa)

Figure 10. Confusion matrix for adjectives dataset

Figure 11. Lip movement of (a) “beautiful” and (b) “easy”

Int J Artif Intell, Vol. 13, No. 3, September 2024: 3250-3261

Figure 13. Test accuracy for nouns dataset

Figure 14. Confusion matrix for nouns dataset

Lip reading using deep learning in Turkish language (Hadi Pourmousa)

Figure 16. Test accuracy for verbs dataset

Figure 17. Confusion matrix for verbs dataset

Int J Artif Intell, Vol. 13, No. 3, September 2024: 3250-3261

Table 3. Summary of the results

Lip reading using deep learning in Turkish language (Hadi Pourmousa)

Int J Artif Intell, Vol. 13, No. 3, September 2024: 3250-3261

Hadi Pourmousa holds a Doctor of Management Information Systems degree

Lip reading using deep learning in Turkish language (Hadi Pourmousa)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.