Final Report
Final Report
(4NM18IS005) (4NM18IS061)
the Degree of
from
June 2022
i
ISO 9001:2015 Certified Accredited with ‘A’ Grade by NAAC
CERTIFICATE
Certified that the project work entitled
1. __________________________ __________________________
2. __________________________ __________________________
ii
ACKNOWLEDGEMENT
It is with great satisfaction and delight that we are submitting the Project Report on
“Hand Gesture Controlled Video Application”. We have completed it as a part of
the curriculum of Visvesvaraya Technological University, Belagavi for the award of
Bachelor of Engineering in Information Science and Engineering.
We sincerely thank Dr. Niranjan N Chiplunkar, Principal, NMAM Institute of
Technology, Nitte and Dr. I Ramesh Mithanthaya, Vice Principal & Dean
(Academics), NMAM Institute of Technology, Nitte, who have always been a great
source of inspiration.
We are profoundly indebted to our guides, Ms. Nikitha Saurabh, (Assistant
Professor Gd II) and Ms. Pratheeksha Hegde N, (Assistant Professor Grade I),
Department of Information Science and Engineering for innumerable acts of timely
advice, encouragement and we sincerely express our gratitude.
We also thank Mr. Vasudeva Pai, Project Coordinator & Assistant Professor Gd II,
Mr. Devidas, Project Coordinator & Assistant Professor Gd III, Department of
Information Science & Engineering for their constant encouragement and support
extended throughout.
We express our sincere gratitude to Dr. Karthik Pai B.H, Head and Associate
Professor, Department of Information Science and Engineering for his invaluable
support and guidance.
Finally, yet importantly, we express our heartfelt thanks to our family and friends for
their wishes and encouragement throughout the work.
iii
ABSTRACT
Gestures through hands and fingers, popularly called Hand gestures, are a
common way for human-robot interaction. Hand motions are a type of nonverbal
correspondence that can be utilized in a few fields, for example, correspondence
between hard of hearing quiet individuals, robot controlled, Human and Computer
communication, house mechanization and clinical areas. Research articles in view
of hand and finger motions have embraced various strategies, including those in
light of instrumented sensor innovation and PC vision. Hand motion
acknowledgment gives a canny, normal and helpful method of human-PC
communication (HCI). Hand motion acknowledgment has numerous applications in
medical, engineering and even military research areas. As the reliance of our
general public on innovation develops step by step, the utilization of gadgets like
cell phones, PCs are additionally expanding. In our regular routine we utilize
various methods of correspondence which incorporate talking, composing and
furthermore with some type of body development however while if there should
arise an occurrence of machines we are as yet stayed with composing or talking
thus, we really want some headway so we can speak with machines in some body
development as well. This method of correspondence wherein any kind of body
development is involved is called Gestures. As such, signals are non-vocal
methods of correspondence which use hand movements, various stances of the
body, and looks. Thus, to make machines brilliant we empower our machines to
take order by perceiving various hand motions. Hand signals are utilized as a
contribution to our framework. Hand motion acknowledgment based man-machine
point of interaction is being grown overwhelmingly lately. Because of the effect of
lighting and complex foundation, many hand gesture recognition frameworks work
well. A versatile skin variety model in light of face recognition is used to identify skin
variety areas like hands. To order the unique hand signals, we fostered a
straightforward and quick movement history picture based strategy. Four
gatherings of haar-like directional examples were prepared for the up, down, left,
and right hand signals classifiers.
iv
LIST OF FIGURES
Sl no. Figure no. Description Page no.
1 Fig. 1.1.1 Instrumental gloves gestures 5
2 Fig. 1.2.1 Computer vision gestures 6
2 Fig. 5.1 System design flowchart 13
3 Fig. 7.1 Flowchart of the system 16
4 Fig. 7.2 Hand landmarks using mediapipe 17
5 Fig. 7.3 Finger values of an array 18
6 Fig. 7.4 Different hand gestures 19
7 Fig. 7.5 Code snippet for hand detection 19
8 Fig. 7.6 Code snippet for volume Control 20
9 Fig. 7.7 Code snippet to capture screen 21
10 Fig. 7.8 Code snippet for play/pause the media 22
11 Fig. 9.1 Camera window pop up 26
12 Fig. 9.2 Hand detection with landmarks 26
13 Fig. 9.3 Fingers count 27
14 Fig. 9.4 Detecting multiple hands 28
15 Fig. 9.5(a) Gesture to increase the system volume 28
16 Fig. 9.5(b) Gesture to decrease the system volume 29
17 Fig. 9.6(a) Gestures to move video forward 29
18 Fig. 9.6(b) Gestures to move video backward 30
19 Fig. 9.7 Controlling the mouse with gesture 30
20 Fig. 9.8 Screenshot through gesture 31
v
LIST OF TABLES
vi
TABLE OF CONTENTS
CONTENTS PAGE NO.
Title Page i
Certificate ii
Acknowledgements iii
Abstract iv
List of Figures v
List of Tables vi
REFERENCES
Hand Gesture Controlled Video Application 2021-2022
1. INTRODUCTION
Focal points on the palm and finger position are the hand motions which are a part
of non verbal communication. Static and dynamic are the two types of hand gestures
[3]. As the name infers, the steady state of the hand implies static motion, while the
powerful signal includes the progression of hand moments like waving. There are an
assortment of hand developments inside a motion. For instance, shaking of hand
differs starting with one individual then onto the next and changes as indicated by
general setting. The primary distinction among stance and signal is that stance zeros
in more on the state of the hand though motion centers around the hand
development.
.
Already, hand motion acknowledgment was accomplished with sensored gloves that
can be worn. These sensored gloves distinguished the reaction as per hand
developments or movement of the finger. The collected information was later
handled by utilizing a PC associated with wired gloves.
Although strategies referenced above have given great results, they have different
restrictions that make them unacceptable for the older, who might encounter
uneasiness and disarray because of wire association issues. Hand motions offer a
rousing field of examination since they can work with correspondence and give a
characteristic method for association that can be utilized across an assortment of
utilizations. Beforehand, the motion of the hand can be acknowledged by the hand
with censored gloves. These sensored gloves distinguished the reaction as per hand
developments or movement of the finger. The collected information was later
handled by utilizing a PC associated with wired gloves.
Algorithms based on computer vision techniques have been developed which helps
to identify the hands with various cameras. This algorithm helps us to recognize as
well as segment the features of the hand as in color of the skin, the way it appears,
movement, skeleton, how deep it is, 3D models, and detection of deep-learning.
With the tremendous development of computing technology in this global world, the
current user interaction with pointing and positioning devices such as mice,
keyboards and pens is not very sufficient due to ubiquitous computing techniques.
Since these devices are only limited, so is the instruction set. B. Hands are a better
option, such as using body parts for interaction. The hand can be used as an input
device to provide natural interactions. There are basically two ways to recognize
hand gesture recognition. One is a tool and the other is visual-based cognition.
Instrument gloves include gloves in which some sensors are fixed to the glove, and
to get input from this glove, one wears it and depending on the movement of the
hand or finger, the hand gesture is generated. Vision Based receives input from
webcams, but this method is more complicated than the Globe method because
everything is important depending on the number and location of the cameras.
Visibility is an important factor in this approach. It captures an image from the stream
and recognizes the skin to distinguish it from the background. All of this creates this
visual-based complexity, as the background can also be the same color as the skin.
However, it is still very popular due to its low probability of error and high efficiency.
Gesture detection system has been considered for various research uses from
detection of facial gestures to body movements. Many other applications have
evolved and hence created a need for such a recognition method. Stable action
recognition is a pattern identification problem; as such, an inevitable part of the
pattern observation preprocessing stage, that is extracting features, should be
.
conducted prior to any other standard techniques that recognize patterns. Human
and robot interaction is another application of our project where the underlying
motivation for such kinds of systems is that the communication resembles natural
human communication as much as possible. There are two main approaches to
hand gestures that can be categorized as the hardware approach using wearable
glove-based sensors and the software approach using camera view-based
approaches. These approaches are commonly used to detect gestures in HCI
systems. One approach which is based on wired gloves (wearable or direct contact).
The next approach is using the computer vision method where the sensored-gloves
do not play any role.
Hand motions are the features of the language of the body that can be expressed by
the center of the palm, position of the human fingers, and the shape of the formed
hand. There are Static hand gestures and Dynamic hand gestures. As the name
implies, static gestures refer to the hand being in a stable state without any
movements, and dynamic gestures are the following sequence of hand movements
as if in a video.
There exist various hand movements within the gesture. For example, handshakes
differ from person to person and from place to time. The main variation between
posture and gesture is that posture focuses more on the shape of the hand whereas
gesture focuses on the movements of the hand.
In the Fig. 1.1.1 gloves with sensors that can be worn and be utilized to detect the
hand movement and position of the fingers. Also, with the help of gloves with
sensors they can give the directions specified by the fingers and palm area.
Nonetheless, the simplicity of cooperation among client and PC will impede when
the client associates with the PC. Moreover the gadgets become unaffordable. In
any case, the cutting edge gloves based approach utilizes the innovation of the
touch-based device, which is an encouraging innovation and thus viewed as industry
grade haptic innovation. These gloves with sensors that can be worn and be utilized
to detect the hand movement and position of the fingers. Also, with the help of
gloves with sensors they can give the directions specified by the fingers and palm
area. Be that as it may, this approach requires the client to be associated with the
PC truly, which impedes the simplicity of communication among client and PC. In
addition, the cost of some of the gadgets is not affordable.
Nonetheless, the cutting edge glove-based approach utilizes the innovation of touch,
which is an encouraging innovation and it can be seen as Industrial-grade haptic
innovation.
Hand motions are a part of non-verbal communication which are passed on by the
finger position, focal point of the palm and the shape that is built by hand. Hand
signals can be differentiated as dynamic and static. As the name infers, the static
signal suggests the steady state of the hand, while more powerful motion contains a
progression of hand developments like waving. There is an assortment of hand
developments inside the signal like a handshake shifts starting with one individual
then onto the next and changes as per overall setting.
PC vision method for motion detection. When a client plays a particular signal with
one or both hands in front of the camera, a frame that contains various possible
strategies for separating highlights and grouping hand gestures so that you can
control possible applications. Connect to the work. Use the PCVision method to
detect the signal. Fig. 1.2.1 shows that when a client performs a particular
movement in front of a camera with one or both hands, it separates the elements to
control possible applications and connects to a framing system that includes various
potential ways to group hand signals. Will be done.
Control of home gadgets and machines for individuals with actual impediment and
additionally old clients with hindered versatility.
Exploration of the big data and manipulation of the high-definition images through
instinctive actions can make use of 3D interactive methods, rather than constrained
traditional 2D techniques.
There are various use cases of our application which are mentioned below as
follows :
Using gestures as a way to control user functions, the wired devices such as the
keyboard and mouse could be isolated and hence the complexity of the system can
be reduced and this could also reduce the overall system expenses.
Hand gesture controlled video application makes it easy for the user to operate the
media by just using the simple gestures without the need of the hardware wired
devices (keyboard or mouse).
Since users could control the media just by gesture, people who are specially abled
might find this application very much useful.
The touchless systems became very useful during the pandemic that could help
prevent transmission of virus from person to person.
2. LITERATURE SURVEY
The system presented by Haitham Badi, et al. [2] developed a vision-based hand
gesture recognition method where after an image is captured, the image is scaled to
required level, then adjusted to required lighting conditions. A deep learning neural
network model, Artificial Neural Network (ANN) is used and comparison is made
between hand contour-based feature extraction and complex moments-based
Artificial Neural Network . The system proposed by Muhammad Inayat Ullah Khan,
et al. [3] considers the idea of skin color. It took into account various details like
variations in image plane and pose, skin color and other structure components like
presence of more features like hairs on the human hand further makes a difference
and adds variability . Background and lighting conditions were also considered here.
The system presented by Zhi-hua Chen, Jung-Tae Kim, Jianning Liang, Jing
Zhang,and Yu-Bo Yuan, et al. [8] worked on an idea of finger segmentation, After
the hand is detected, fingers and palm are segmented. Then fingers are recognized
and thus hand gestures are recognized. The point on the palm is characterized by
the middle place of the palm. It is found by the technique for gap change. Distance
change, likewise distance map is a portrayal of a picture. Somewhere out there
every pixel records its distance and the closest limit pixel. Whenever the point on the
.
palm is detected, it draws a circle with the radius measured from palm point to the
middle point inside the palm. The round shape is known as the internal circle since it
is incorporated on the inside of the palm. The range of the circle steadily spreads
until it arrives at the end of the palm. That is the range of the circle stops to
increment when the dark pixels are remembered for the circle. With the assistance
of the palm cover, human fingers, palm can be divided without any problem. The part
of the hand that is covered by the palm veil is the palm, while different parts of the
hand are fingers. Next, get palm points and wrist points. You can add an arrow
pointing from the point on the palm to the midpoint of the wrist line at the bottom of
the wrist. Then the arrows are aligned north.
3. PROBLEM DEFINITION
Controlling the system functionalities using just gestures without actually touching
and the wired controllers is very useful.
It also becomes very difficult to use wired controllers during the pandemic and hence
using gestures as a way to control user functions, the wired devices such as the
keyboard, mouse, touchpad etc could be isolated and thereby the complexity of the
system can be reduced and this could also reduce the overall system expenses.
3.1 Objectives:
A collection of all requirements that are to be put up on the design and validating
the product is called a requirement specification. And the specification also
contains other relevant information necessary for the designing, validating, and
maintenance of the product. There are two types of requirement specification. They
are hardware and software requirements and the most frequent set of
requirements defined by any operating system or software application is the
physical computer resources. Hardware compatibility is normally accompanied by
the hardware requirements list. The product prerequisites are depictions of
highlights and functionalities of the objective framework. Prerequisites likewise
convey the assumption for clients from the product item.
Device Description
5. SYSTEM DESIGN
Structure design is the strategy engaged with portraying the parts, modules, places
of cooperation, and data for a system to satisfy decided necessities. Structure
improvement is the most well-known approach to making or changing systems,
close by the cycles, practices, models, and techniques used to encourage them.
First, we play a video by selecting it from the VLC player either by selecting
manually or by using our fingers as virtual mouse. After that we open the OpenCV
frame from which we can sense the gestures continuously so that it performs various
functionalities. Then based on the activities we have decided for the gestures, that
particular activity takes place. Fig. 5.1 shows the basic view of the system that we
are going to implement.
6. METHODOLOGY
Hand Gesture Controlled Video Application project utilizes several machine learning
techniques commonly used in computer vision. We have made use of several
python libraries such as mediapipe, pycaw, auto pygui along with the OpenCV
package. Based on the literature survey, these are the tasks to be performed:
7. IMPLEMENTATION
In implementation of live video streaming, the streaming application will pop up and
with the help of a web camera raw image will be taken as input. This is done with the
help of mediapipe along with the OpenCV package. The obtained image is
pre-processed with the help of twenty one landmarks and each landmark position is
denoted by a circle using CV2.Circle function to get the specific points in the palm. In
Fig. 7.1 the flowchart of the course of action is shown.
We can use the double click option as a gesture and open the application by moving
the cursor using our fingers. Then we can select the video by opening the browser
option to select the files. Then after that we play the video and openCV instance will
keep waiting for input from the user and perform action for those gestures.
We use a python library called mediapipe for hand detection and then to calculate
the frame rate, we calculate the current time and subtract it with previous time.
Mediapipe provides ML solutions like Iris Detection, Face Mesh Detection, Face
Detection, Pose Detection, Hands Detection, Holistic Detection, Hair Segmentation,
Object Detection, Instant Motion Tracking, Box Tracking etc.
The Fig. 7.2 displays the various landmarks in the hand that can be detected and we
calculate coordinate values for each landmark. We locate the coordinate value of the
index finger tip and the tip of the thumb finger. We calculate the location of those
coordinates and find the distance between them. Along with this distance and the
use of the pycaw library we can increase or decrease the value of the volume.
As seen in the sample output Fig. 7.2, the landmark positions of the hand are
detected first, then we can draw a circle using the cv2.circle function on those
landmarks.
The hand consists of the landmarks as shown in Fig. 7.3 of which the values of each
finger of each hand is set, values numbered from 0 to 4 respectively. Left and Right
hand are set as hand1[] and hand2[] with the finger count. For example, the index
finger of the right hand can be denoted as hand1[1] and that of the left hand as
hand2[1]. And if it's straight then it can be denoted as hand1[1]==1.
The Fig. 7.4 shows various Hand Gestures defined in our application.
Fig. 7.5 shows the code snippet for hand detection for increasing and decreasing
volume, we find the landmarks of index tip and thumb tip and then find distance
between them. And map it into volume level using the pycaw package.
Fig. 7.6 shows the code snippet for volume control. To control the volume of the
system we first identify the landmark value at the tip of the index finger and the tip
of the thumb finger. Now we find the distance between these coordinates. The
distance between them is mapped into the volume scale. Then as we increase and
decrease the distance the volume value also changes. To move the video forwards
or backwards, play and pause the video and take screenshots what we do is first
find all the coordinates. Then put the coordinate values of the tip of each finger into
an array called fingers. Now if the value of any element in the array is one then we
can say the finger is standing. Based on this concept, we can give various gestures
for various functions.
In the Fig. 7.7 We use additional packages like pynput, pyautogui, pycaw to take
values from the keyboard, take screenshots and to control the volume
functionalities. To control the mouse functionalities, we make a rectangle on the
output frame of the OpenCV. We limit the moment of the fingers to this scale else
this will leave scaling problems. Using pynput, we control the mouse functionality
like moving the mouse cursor and single clicking. pause the video and take
screenshots. What we do is first find all the coordinates. Then put the coordinate
values of the tip of each finger into an array called fingers. Now if the value of any
element in the array is one then we can say the fingNow we find the distance
between these coordinates. The distance between them is mapped into the volume
scale. Then as we increase and decrease the distance the volume value also
changes. To move the video forwards or backwards, the player is standing. Based
on this concept, we can give various gestures for various functions.
Fig. 7.8 Code snippet to seek the media forward and backward
Fig. 7.8 shows code snippets for seeking the media forward and backward and the
media is seeked forward by showing the gesture having the finger1 and finger2
values of a specific hand to one. And the media is seeked backward by doing the
same on the other hand.
Fig. 7.9 indicates code snippets to play and pause the media by projecting the
gesture of finger4 value to 1 of either hand. Cv2.waitkey makes the program wait
for a few seconds for a smooth transition.
8. TESTING
Programming testing is the demonstration of looking at the antiques and the way of
behaving of the product under test by approval and confirmation. Programming
testing can likewise give a goal, autonomous perspective on the product to permit
the business to appreciate and figure out the dangers of programming execution.
There are various type of testing and some of these are mentioned below:
It centers around the smallest unit of programming plan. In this, we test a singular
unit or gathering of interrelated units. Much of the time is done by the developer by
utilizing test input and noticing its related yields.
Module testing is a process where you need to test each unit of these modules to
ensure they adhered to the best coding standards. Unless a module passes the
testing phase, it cannot go for the application testing process. Module testing, aka
component testing, helps to early detection of errors in application testing.
The goal is to take unit tried parts and construct a program structure that has been
directed by plan. Reconciliation testing will be tried in which a gathering of parts is
consolidated to deliver yield.
.
Black Box testing is a method of testing without having any knowledge of the
application's internal workings. The tester has no understanding of the system
architecture and no access to the source code. A tester will often engage with the
system's user interface by providing inputs and examining results without
understanding how and where the inputs are processed when performing a black
box test. Advanced testing that focuses on the behavior of software is known as
black box testing. It entails testing from the outside or from the point of view of the
end user. Almost any level of software testing can benefit from black box testing.
White box testing entails a thorough examination of the code's fundamental logic
and structure. Glass testing or open box testing are other names for white box
testing. In order to perform white box testing on an application, the tester must be
familiar with the code's internal workings. The tester must examine the source code
to determine which unit/chunk of code is acting abnormally. Testing is based on the
coverage of code statements, branches, paths, or conditions in this approach.
Low-level testing is referred to as white-box testing. Glass box, transparent box,
clear box, and code base testing are all terms used to describe this type of testing.
rey Box Testing Grey box testing is a technique for testing an application with only a
limited understanding of its internal workings. When it comes to software testing, the
phrase "the more you know, the better" has a lot of weightage. A tester with
extensive subject expertise always has an advantage over someone with little
domain knowledge.
In the beginning, the hand is shown to the system camera and the expected
output should be that the captured image is processed and landmarks are
added using libraries such as mediapipe and openCV. And the output
obtained is working in accordance with the expected output.
When this particular module is executed the camera window pops and the
gesture that is defined to this is shown and it is expected that when the video
is played parallely to this, the media play/pause along with seeking forward
and backward is controlled by using the gesture. And the output obtained is
working in accordance with the expected output.
9. RESULTS
The Fig. 9.1 depicts the live streaming window application. This can be done using
cvShowImage. We can display the image captured by the camera through an
application window using OpenCV.
The Fig. 9.2 Depicts the capturing of images by hand with computer vision and
camera using OpenCV library. OpenCV (Open Source Computer Vision Library) is
an open source PC vision and AI programming library. OpenCV was designed to
give a typical framework to PC vision applications and to speed up the utilization of
machine discernment in the business items.
The Fig. 9.3 Demonstrates the processing of images from the video input and
tracing the landmarks using the Mediapipe library. MediaPipe is a Framework for
building AI pipelines for handling time-series information like video, sound, and so
on. This cross-stage Framework works in Desktop/Server, Android, iOS, and
implanted gadgets like Raspberry Pi and Jetson Nano. Here FPS stands for frames
per second.
The Fig. 9.4 Shows the detection of multiple hands in a single screen. Though the
gesture can be defined to both hands, only a single hand gesture will be executed.
Fig. 9.5(a) demonstrates the gesture to increase the system volume Fig. 9.5(b)
demonstrates the gesture to decrease the system volume using the Pycaw library.
Wherein we can control the system volume accordingly.
The Fig. 9.6(a) The gesture that is used to seek the media forward by 5 seconds.
The Fig. 9.6(b) The gesture that is used to seek the media backward by 5 seconds.
The Fig. 9.7 Depicts about controlling the mouse and performing double click action
using pyautogui.
The Fig. 9.8 shows the gesture for capturing the screenshot.
We have added Gestures to detect the number of fingers, Move the mouse cursor,
Single Click functionality, taking screenshots, Increase and decrease the volume,
Move the video forward and backwards by n seconds based on the requirement,
pause and play the video in any media player. The results also showed that the
gesture controlled application was quite robust for web captured images. The
application was very sensitive to the live noise on the video stream. Slight
movements in the palm could affect recognition of the gesture. However, when the
hand is kept steady enough for a long time, the program runs efficiently.
Based on our results, computer vision applications can recognize basic hand
gestures in robot navigation using heuristic rules. The primary goal of this project
was to simplify the way of using the existing system through gestures rather than
touch based devices.
Additionally the future work includes that we embed our system to work on various
media applications control media functionalities through gesture. In real time with the
help of representing numbers can be done with the help of commands. Enhancing
the recognition capability for various lightning conditions, which is encountered as a
challenge in this project can be worked upon in future.
REFERENCES
[1] AJ. Yashas and G. Shivakumar, "Hand Gesture Recognition: A Survey," 2019
International Conference on Applied Machine Learning (ICAML), 2019, pp.
3-8,
[4] Oudah, M.; Al-Naji, A.; Chahl, J. Hand Gesture Recognition Based on
Computer Vision: A Review of Techniques. J. Imaging 2020, 6, 73.
https://doi.org/10.3390/jimaging6080073
[5] Norah, Meshari; Alnaim. Hand Gesture Recognition using Deep Learning
Neural Networks, 2020.
[8] Xi-hua Chen, Jung-Tae Kim, Jianning Liang, Jing Zhang, Yu-Bo Yuan,
"Real-Time Hand Gesture Recognition Using Finger Segmentation", The
Scientific World Journal, vol. 2014, Article ID 267872, 9 pages, 2014
Hand Gesture Controlled Video Application 2021-2022
[9] Y Haria A., Subramanian A., Asokkumar N., Poddar S., Nayak J.S, Hand
Gesture Recognition for Human Computer Interaction Procedia Computer
Science, 2017.