C It Iations
C It Iations
Automotive applications use these systems for hands-free control of navigation, media,
and climate settings, enhancing driver safety. Furthermore, assistive technologies for
people with disabilities rely on gesture and voice control to provide accessible interaction
with computers, smartphones, and other devices, promoting independence and
inclusion[5], [6].Real-time gesture and voice command control systems offer several
advantages over traditional input methods. One key benefit is the ability to interact with
devices without physical contact, which enhances user convenience and hygiene. These
systems are particularly beneficial in situations where hands-free interaction is necessary,
such as in healthcare, driving, or while handling other tasks. Gesture recognition, which
allows for natural human movements, offers an intuitive and immersive experience, while
voice control provides ease of use, especially for users with mobility limitations.
Additionally, integrating both modalities into a single system enhances flexibility, as
users can switch between gestures and voice commands depending on the situation. This
multi-modal approach leads to a more inclusive and accessible interface for a broader
range of users [7], [8]. Real-time processing demands high computational power, and
latency issues may occur, which can cause delays in command execution. Furthermore,
current systems may not seamlessly integrate both gesture and voice inputs, limiting the
overall user experience. Existing technologies often require extensive training data, and
there can be limitations in the robustness of models when faced with diverse user
behaviors and environmental factors [9], [10].To overcome these challenges,
advancements in machine learning, deep learning, and sensor-based technologies are
critical. Implementing more robust algorithms for gesture recognition, such as
convolutional neural networks (CNNs) or recurrent neural networks (RNNs), can improve
accuracy and reduce errors caused by environmental variations.
For voice control, employing noise cancellation techniques, adaptive speech recognition
models, and real-time speech-to-text algorithms can help mitigate issues with background
noise and varying speech patterns. Data augmentation and multi-modal fusion, where
both gesture and voice inputs are processed simultaneously, can enhance system
robustness. Additionally, optimizing hardware and processing algorithms to minimize
latency and improve response times will be key to delivering a seamless user
experience[11], [12].Miao et al., explored gesture recognition as an innovative HCI
technique with applications in smart driving, smart homes, and drone control. Their study
highlighted its advantages over touch and voice interactions, emphasizing natural,
intuitive control. By leveraging advancements in computer vision and machine learning,
gesture recognition enhances safety, flexibility, and accessibility in HCI. Practical
applications in smart home automation and drone control showcase its potential to shape
future HCI trends, fostering a more natural and seamless human-computer relationship
[13].Sakshi and Sukhwinder designed a CNN-based deep learning model for gesture-
based sign language recognition, focusing on Indian Sign Language (ISL) and American
Sign Language (ASL). Their model, trained on a custom ISL dataset (2150 images) and a
public ASL dataset, outperformed VGG-11, VGG-16, and state-of-the-art methods. The
compact CNN, enhanced with data augmentation, achieved 99.96% accuracy for ISL and
100% for ASL, demonstrating robustness, efficiency, and minimal errors with fewer
parameters [14].Zahra et al. developed a camera-based interactive wall display system
using bare-hand gestures for human-computer interaction, enabling cursor control and
document operations without external hardware. Their approach involved preprocessing
video frames, detecting hand regions, and applying algorithms like Otsu thresholding and
genetic algorithms for gesture validation. The system achieved 93.35% recognition
accuracy, allowing for reliable gesture-based cursor control and document interactions.
2. Methodology
4. Conclusions
References
[1] S. N. D. Chua, K. Y. R. Chin, S. F. Lim, and P. Jain, “Hand Gesture Control for Human–
Computer Interaction with Deep Learning,” J. Electr. Eng. Technol., vol. 17, no. 3, pp.
1961–1970, May 2022, doi: 10.1007/s42835-021-00972-6.
[2] H.-A. Rusan and B. Mocanu, “Human-Computer Interaction Through Voice Commands
Recognition,” in 2022 International Symposium on Electronics and Telecommunications
(ISETC), Timisoara, Romania: IEEE, Nov. 2022, pp. 1–4. doi:
10.1109/ISETC56213.2022.10010253.
[3] D. A. Oladele, E. D. Markus, and A. M. Abu-Mahfouz, “Adaptability of Assistive
Mobility Devices and the Role of the Internet of Medical Things: Comprehensive
Review,” JMIR Rehabil. Assist. Technol., vol. 8, no. 4, p. e29610, Nov. 2021, doi:
10.2196/29610.
[4] K. Małecki, A. Nowosielski, and M. Kowalicki, “Gesture-Based User Interface for
Vehicle On-Board System: A Questionnaire and Research Approach,” Appl. Sci., vol.
10, no. 18, p. 6620, Sep. 2020, doi: 10.3390/app10186620.
[5] A. Brunete, E. Gambao, M. Hernando, and R. Cedazo, “Smart Assistive Architecture for
the Integration of IoT Devices, Robotic Systems, and Multimodal Interfaces in
Healthcare Environments,” Sensors, vol. 21, no. 6, p. 2212, Mar. 2021, doi:
10.3390/s21062212.
[6] P. M., R. K., and V. K., “VOICE AND GESTURE BASED HOME AUTOMATION
SYSTEM,” Int. J. Electron. Eng. Appl., vol. IX, no. I, p. 08, Apr. 2021, doi:
10.30696/IJEEA.IX.I.2021.08-18.
[7] B. Latha, S. Sowndarya, S. K, A. Raghuwanshi, R. Feruza, and G. S. Kumar, “Hand
Gesture and Voice Assistants,” E3S Web Conf., vol. 399, p. 04050, 2023, doi:
10.1051/e3sconf/202339904050.
[8] H. Zhou, D. Wang, Y. Yu, and Z. Zhang, “Research Progress of Human–Computer
Interaction Technology Based on Gesture Recognition,” Electronics, vol. 12, no. 13, p.
2805, Jun. 2023, doi: 10.3390/electronics12132805.
[9] A. Tellaeche Iglesias, I. Fidalgo Astorquia, J. I. Vázquez Gómez, and S. Saikia,
“Gesture-Based Human Machine Interaction Using RCNNs in Limited Computation
Power Devices,” Sensors, vol. 21, no. 24, p. 8202, Dec. 2021, doi: 10.3390/s21248202.
[10] A. S. Mohamed, N. F. Hassan, and A. S. Jamil, “Real-Time Hand Gesture Recognition:
A Comprehensive Review of Techniques, Applications, and Challenges,” Cybern. Inf.
Technol., vol. 24, no. 3, pp. 163–181, Sep. 2024, doi: 10.2478/cait-2024-0031.
[11] K. Bousbai and M. Merah, “A Comparative Study of Hand Gestures Recognition Based
on MobileNetV2 and ConvNet Models,” in 2019 6th International Conference on
Image and Signal Processing and their Applications (ISPA), Mostaganem, Algeria:
IEEE, Nov. 2019, pp. 1–6. doi: 10.1109/ISPA48434.2019.8966918.
[12] Department of Computer Science and Engineering, Rajalakshmi Engineering College,
Chennai, India., R. J, Dr. P. Kumar, and Professor, Department of Computer Science and
Engineering, Rajalakshmi Engineering College, Chennai, India., “Gesture Recognition
using CNN and RNN,” Int. J. Recent Technol. Eng. IJRTE, vol. 9, no. 2, pp. 230–233,
Jul. 2020, doi: 10.35940/ijrte.B3417.079220.
[13] Q. Miao, Y. Li, X. Liu, and R. Liu, “Gesture recognition-based human–computer
interaction cases,” in Gesture Recognition, Elsevier, 2024, pp. 173–194. doi:
10.1016/B978-0-443-28959-0.00006-6.
[14] S. Sharma and S. Singh, “Vision-based hand gesture recognition using deep learning for
the interpretation of sign language,” Expert Syst. Appl., vol. 182, p. 115657, Nov. 2021,
doi: 10.1016/j.eswa.2021.115657.
[15] R. Zahra et al., “Camera-based interactive wall display using hand gesture recognition,”
Intell. Syst. Appl., vol. 19, p. 200262, Sep. 2023, doi: 10.1016/j.iswa.2023.200262.
[16] A. B., K. Krishi, M. M., M. Daaniyaal, and A. H. S., “Hand gesture recognition using
machine learning algorithms,” Comput. Sci. Inf. Technol., vol. 1, no. 3, pp. 116–120,
Nov. 2020, doi: 10.11591/csit.v1i3.p116-120.
[17] T. L. Dang, S. D. Tran, T. H. Nguyen, S. Kim, and N. Monet, “An improved hand
gesture recognition system using keypoints and hand bounding boxes,” Array, vol. 16,
p. 100251, Dec. 2022, doi: 10.1016/j.array.2022.100251.