Human Hand Gestures Capturing and Recognition Via Camera
Human Hand Gestures Capturing and Recognition Via Camera
CAMERA
A PROJECT REPORT
Submitted by
ARAVINTHAN.V(620816104009)
JEEVANANDHAN.S(620816104037)
PALPANDI.K(620816104068)
POOVARASAN.P.M(620816104072)
BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE AND ENGINEERING
APRIL 2020
HUMAN HAND GESTURES
CAPTURING AND RECOGNITION VIA
CAMERA
A PROJECT REPORT
Submitted by
ARAVINTHAN.V(620816104009)
JEENANANDHAN.S(620816104037)
PALPANDI.K(620816104068)
POOVARASAN.P.M(620816104072)
BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE AND ENGINEERING
BONAFIDE CERTIFICATE
SIGNATURE SIGNATURE
Dr.R.UMAMAHESWARI, Ph.D., Ms.J.JAYARANJANI,
We are grateful to the Almighty for the grace and sustained blessings
throughout the project and have given immense strength in executing the work
successfully. We would like to express our deep sense of heartiest thanks to our
beloved Chairman Dr.T.ARANGANNAL and Chairperson
Mrs.P.MALALEENAGnanamani Educational Institutions, Namakkal, for
giving an opportunity to do and complete this project.
Generally deaf-dumb people use sign language for communication, but they find
difficulty in communicating with others who don’t understand sign language. Sign
language plays a major role for dump people to communicate with normal people.
It is very difficult for mute people to convey their message to normal people. Since
normal people are not trained on hand sign language. In emergency time conveying
their message is very difficult. Due to which communications between deaf-mute
and a normal person have always been a challenging task. We propose to develop a
device which can convert the hand gestures of a deaf-mute person into speech. So,
the solution for this problem is to convert the sign language into human hearing
voice.We propose a multimodal deep learning architecture for sign language
recognition which effectively combines RGB-D input and two-stream
spatiotemporal networks. Depth videos, as an effective compensation of RGB
input, can supply additional distance information about the signer's hands. A novel
sampling method called ARSS (Aligned Random Sampling in Segments) is put
forward to select and align optimal RGB-D video frames, which improves the
capacity utilization of multimodal data and reduces the redundancy. We get the
hand ROI by joints information of RGB data for local focus in spatial stream. D-
shift Net is proposed as depth motion feature extraction in temporal stream, which
fully utilizes three-dimensional motion information of the sign language.Finally
recognized output is converted into text and speech. This system eliminates
communication barrier between hearing impaired-mute and normal people.
TABLE OF CONTENTS
CHAPTER TITLE PAGE
NO NO
ABSTRACT
LIST OF FIGURES
LIST OF ABBREVIATIONS
1 INTRODUCTION 1
1.1HUMAN COMPUTER INTERACTION 1
1.2 GESTURE RECOGNITION
1.3 GESTURE RECOGNITION ALGORITHMS
1.3.1 3D model based algorithms
1.3.2 Skeletal-based algorithms
1.3.3 Appearance-based models
1.4 APPLICATION BASED ON THE HANDS-FREE
INTERFACE
1.4.1 Interactive exposition
1.4.2 Non-verbal communication
1.5 COLOR MODELS
1.6 HAND MODELING FOR GESTURE RECOGNITION
LIST OF ABBREVIATIONS
ABBREVIATIONS EXPANSIONS
RGB Red Green Blue
HSV Hue Saturation Value
CIE International Commission
On Lllumination
CHAPTER 1
INTRODUCTION
1.1 Human Computer Interaction
Human–computer interaction (HCI) involves the study, planning, design and uses of the
interaction between people (users) and computers. It is often regarded as the intersection
of computer science, behavioral sciences, design, media studies, and several other fields of
study. Human–computer interaction (HCI) involves the study, planning, design and uses of the
interaction between people (users) and computers. It is often regarded as the intersection
of computer science, behavioral sciences, design, media studies, and several other fields of
study. Humans interact with computers in many ways, and the interface between humans and the
computers they use is crucial to facilitating this interaction. Desktop applications, internet
browsers, handheld computers, and computer kiosks make use of the prevalent graphical user
interfaces (GUI) of today. Voice user interfaces (VUI) are used for speech recognition and
synthesizing systems, and the emerging multi-modal and gestalt User Interfaces (GUI) allow
humans to engage with embodied character agents in a way that cannot be achieved with other
interface paradigms.
HCI (Human Computer Interaction) aims to improve the interactions between users and
computers by making computers more usable and receptive to users' needs. Specifically, HCI has
interests in: methodologies and processes for designing interfaces (i.e., given a task and a class of
users, design the best possible interface within given constraints, optimizing for a desired
property such as learn ability or efficiency of use) methods for implementing interfaces (e.g.
software toolkits and libraries) techniques for evaluating and comparing interfaces developing
new interfaces and interaction techniques developing descriptive and predictive models and
theories of interaction A long term goal of HCI is to design systems that minimize the barrier
between the human's mental model of what they want to accomplish and the computer's support
of the user's task. Professional practitioners in HCI are usually designers concerned with the
practical application of design methodologies to problems in the world. Their work often
revolves around designing graphical user interfaces and web interfaces. Researchers in HCI are
interested in developing new design methodologies, experimenting with new devices,
prototyping new software systems, exploring new interaction paradigms, and developing models
and theories of interaction.
Gesture recognition is a topic in computer science and language technology with the goal of
interpreting human gestures via mathematical algorithms. Gestures can originate from any bodily
motion or state but commonly originate from the face or hand. Current focuses in the field
include emotion recognition from the face and hand gesture recognition. Many approaches have
been made using cameras and computer vision algorithms to interpret sign language. However,
the identification and recognition of posture, gait, proxemics, and human behaviors is also the
subject of gesture recognition techniques. Gesture recognition can be seen as a way for
computers to begin to understand human body language, thus building a richer bridge between
machines and humans than primitive text user interfaces or even GUIs (graphical user
interfaces), which still limit the majority of input to keyboard and mouse.
Gesture recognition enables humans to communicate with the machine (HMI) and interact
naturally without any mechanical devices. Using the concept of gesture recognition, it is possible
to point a finger at the computer screen so that the cursor will move accordingly. This could
potentially make conventional input devices such as mouse, keyboards and even touch-screens
redundant. Gesture recognition can be conducted with techniques from computer vision and
image processing. The literature includes ongoing work in the computer vision field on capturing
gestures or more general human pose and movements by cameras connected to a computer.
Gesture recognition and pen computing: This computing not only going to reduce the
hardware impact of the system but also it increases the range of usage of physical world object
instead of digital object like keyboards, mouses. Using this we can implement and can create a
new thesis of creating of new hardware no requirement of monitors too. This idea may lead us to
the creation of holographic display. The term gesture recognition has been used to refer more
narrowly to non-text-input handwriting symbols, such as inking on a graphics tablet, multi-touch
gestures, and mouse gesture recognition. This is computer interaction through the drawing of
symbols with a pointing device cursor.
Offline gestures: Those gestures that are processed after the user interaction with the object.
An example is the gesture to activate a menu.
Online gestures: Direct manipulation gestures. They are used to scale or rotate a tangible
object.
The ability to track a person's movements and determine what gestures they may be
performing can be achieved through various tools. Although there is a large amount of research
done in image/video based gesture recognition, there is some variation within the tools and
environments used between implementations.
Wired gloves: These can provide input to the computer about the position and rotation of the
hands using magnetic or inertial tracking devices. Furthermore, some gloves can detect finger
bending with a high degree of accuracy (5-10 degrees), or even provide haptic feedback to the
user, which is a simulation of the sense of touch. The first commercially available hand-tracking
glove-type device was the Data Glove, a glove-type device which could detect hand position,
movement and finger bending. This uses fiber optic cables running down the back of the hand.
Light pulses are created and when the fingers are bent, light leaks through small cracks and the
loss is registered, giving an approximation of the hand pose.
Stereo cameras: Using two cameras whose relations to one another are known, a 3d
representation can be approximated by the output of the cameras. To get the cameras' relations,
one can use a positioning reference such as a lexian-stripe or infrared emitters. In combination
with direct motion measurement (6D-Vision) gestures can directly be detected.
Controller-based gestures: These controllers act as an extension of the body so that when
gestures are performed, some of their motion can be conveniently captured by software. Mouse
gestures are one such example, where the motion of the mouse is correlated to a symbol being
drawn by a person's hand, as is the Wii Remote or the Myo, which can study changes in
acceleration over time to represent gestures. Devices such as the LG Electronics Magic Wand,
the Loop and the Scoop use Hillcrest Labs' Free space technology, which uses MEMS
accelerometers, gyroscopes and other sensors to translate gestures into cursor movement. The
software also compensates for human tremor and inadvertent movement. Audio Cubes are
another example. The sensors of these smart light emitting cubes can be used to sense hands and
fingers as well as other objects nearby, and can be used to process data. Most applications are in
music and sound synthesis, but can be applied to other fields.
Single camera: A standard 2D camera can be used for gesture recognition where the
resources/environment would not be convenient for other forms of image-based recognition.
Earlier it was thought that single camera may not be as effective as stereo or depth aware
cameras, but some companies are challenging this theory. Software-based gesture recognition
technology using a standard 2D camera that can detect robust hand gestures, hand signs, as well
as track hands or fingertip at high accuracy has already been embedded in Lenovo’s Yoga
ultrabooks, Pantech’s Vega LTE smartphones, Hisense’s Smart TV models, among other
devices.
Depending on the type of the input data, the approach for interpreting a gesture could be done in
different ways. However, most of the techniques rely on key pointers represented in a 3D
coordinate system. Based on the relative motion of these, the gesture can be detected with a high
accuracy, depending of the quality of the input and the algorithm’s approach.
In order to interpret movements of the body, one has to classify them according to common
properties and the message the movements may express. For example, in sign language each
gesture represents a word or phrase. The taxonomy that seems very appropriate for Human-
Computer Interaction has been proposed by Quek in "Toward a Vision-Based Hand Gesture
Interface" He presents several interactive gesture systems in order to capture the whole space of
the gestures: 1. Manipulative; 2. Semaphoric; 3. Conversational.
A real hand (left) is interpreted as a collection of vertices and lines in the 3D mesh version
(right), and the software uses their relative position and interaction in order to infer the gesture.
The 3D model approach can use volumetric or skeletal models, or even a combination of the two.
Volumetric approaches have been heavily used in computer animation industry and for computer
vision purposes. The models are generally created of complicated 3D surfaces, like NURBS or
polygon meshes.
The drawback of this method is that is very computational intensive and systems for live analysis
is still to be developed. For the moment, a more interesting approach would be to map simple
primitive objects to the person’s most important body parts (for example cylinders for the arms
and neck, sphere for the head) and analyze the way these interact with each other. Furthermore,
some abstract structures like super-quadrics and generalized cylinders may be even more suitable
for approximating the body parts. The exciting thing about this approach is that the parameters
for these objects are quite simple. In order to better model the relation between these, we make
use of constraints and hierarchies between our objects.
The skeletal version (right) is effectively modelling the hand (left). This has fewer
parameters than the volumetric version and it's easier to compute, making it suitable for real-time
gesture analysis systems.
Instead of using intensive processing of the 3D models and dealing with a lot of
parameters, one can just use a simplified version of joint angle parameters along with segment
lengths. This is known as a skeletal representation of the body, where a virtual skeleton of the
person is computed and parts of the body are mapped to certain segments. The analysis here is
done using the position and orientation of these segments and the relation between each one of
them (for example the angle between the joints and the relative position or orientation)
These binary silhouette (left) or contour (right) images represent typical input for appearance-
based algorithms. They are compared with different hand templates and if they match, the
correspondent gesture is inferred.
1.3.3 Appearance-based models
These models don’t use a spatial representation of the body anymore, because they derive
the parameters directly from the images or videos using a template database. Some are based on
the deformable 2D templates of the human parts of the body, particularly hands. Deformable
templates are sets of points on the outline of an object, used as interpolation nodes for the
object’s outline approximation. One of the simplest interpolation functions is linear, which
performs an average shape from point sets, point variability parameters and external deformators.
These template-based models are mostly used for hand-tracking, but could also be of use for
simple gesture classification.
Nowadays, expositions based on new ways of interaction need contact with the visitors
that play an important role in the exhibition contents. Museums and expositions are open to all
kind of visitors, therefore, these ‘‘sensing expositions’’ look forward to reaching the maximum
number of people. This is the case of ‘‘Galicia dixital’’, an exposition. Visitors go through all the
phases of the exposition sensing, touching and receiving multimodal feedback such as audio,
video, haptics, interactive images or virtual reality. In one phase there is a slider-puzzle with
images of Galicia to be solved. There are four computers connected enabling four users to
compete to complete the six puzzles included in the application.
Visitors use a touch-screen to interact with the slider-puzzle, but the characteristics of this
application make it possible to interact by means of the hands-free interface in a very easy
manner. Consequently, the application has been adapted to it and therefore, disabled people can
also play this game and participate in a more active way in the exposition.
1. Cognitive abilities;
Some speech therapists use it in their sessions to help themselves with children with speech
disorders and to help in the prevention of linguistic and cognitive delays in crucial stages of a
child’s life. The Blissymbolics language is currently composed of over 2000 graphic symbols
that can be combined and re-combined to create new symbols. The number of symbols is
adaptable to the capabilities and necessities of the user, for example, BlissSpeaker has 92
symbols that correspond to the first set of Bliss symbols for preschool children. BlissSpeaker is
an application that verbally reproduces statements built using Bliss symbols, which allows a
more ‘‘natural’’ communication between a child using Bliss and a person that does not
understand or use these symbols, for example, the children’s relatives. The application can work
with any language, as long as there is an available compatible SAPI (Speech Application
Programming Interface). The system’s process is shown. The potential users of BlissSpeaker are
children with speech disorders; therefore, its operation is to be very simple and intuitive.
Moreover, audio, vision and traditional graphical user interfaces combined together configure a
very appealing multimodal interface that can help attract and involve the user in its use.
Furthermore, the use of the hands-free interface with BlissSpeaker will help to fulfil the third
requirement of Bliss user, which is the possibility of indicating the desired symbol. It will offer
children with upper-body physical disabilities and speech difficulties a way to communicate
themselves through an easy interface and their teachers or relatives will understand them better
due to the symbols’ vocal reproduction. Furthermore, the use of the new interface can make
learning of Bliss language more enjoyable and entertaining, and it also promotes the children’s
coordination, because the interface works with head motion. This system was evaluated in a
children’s scientific fair. The system was tested by more than 60 disabled and non-disabled
children from 6 to 14 years of age. A short explanation on how it works was given. They
operated the application with surprising ease and even if they had never seen Bliss symbols
before, they created statements that made sense and reproduced them for their class mates.
Children enjoyed interacting with the computer through the functionalities that the face-based
interface offered. Moreover, upper-body physical disabled children are grateful for the
opportunity of accessing a computer.
The aim of the proposed project is to overcome the challenge of skin color detection for
natural interface between user and machine. So to detect the skin color under dynamic
background the study of various color models was done for pixel based skin detection. Three
color spaces has been chosen which are commonly used in computer vision applications.
RGB: Three primary colors red(R), green(G), and blue(B) are used. The main advantage
of this color space is simplicity. However, it is not perceptually uniform. It does not separate
luminance and chrominance, and the R, G, and B components are highly correlated.
HSV (Hue, Saturation, Value): It express Hue with dominant color (such as red, green,
purple and yellow)of an area. Saturation measures the colorfulness of an area in proportion to its
brightness. The “intensity”, “lightness”, or “Values” is related to the color luminance. This
model discriminates luminance from chrominance. This is a more intuitive method for describing
colors, and because the intensity is independent of the color information this is very useful model
for computer vision. This model gives poor result where the brightness is very low. Other similar
color spaces are HSI and HSL (HLS).
Human hand is an articulated object with 27 bones and 5 fingers. Each of these fingers
consists of three joints. The four fingers (little, ring, middle and index) are aligned together and
connected to the wrist bones in one tie and at a distance there is the thumb. Thumb always stands
on the other side of the four fingers for any operation, like capturing, grasping, holding etc.
Human hand joints can be classified as flexion, twist, directive or spherical depending up on the
type of movement or possible rotation axes. In total human hand has approximately 27 degrees
of freedom. As a result, a large number of gestures can be generated. Therefore, for proper
recognition of the hand, it should be modeled in a manner understandable as an interface in
Human Computer Interaction (HCI). There are two types of gestures, Temporal (dynamic) and
Spatial (shape). Temporal models use Hidden Markov Model (HMM), KalmanFilter , Finite
State Machines, Neural Network (NN). Hand modeling in spatial domain can be further divided
into two categories, 2D (appearance based or view based) model and 3D based model. 2D hand
modeling can be represented by deformable templates, shape representation features, motion and
coloured markers. Shape representation feature is classified as geometric features (i.e. live
feature) and non-geometric feature. Geometric feature deals with location and position of
fingertips, location of palm and it can be processed separately. The non – geometric feature
includes colour, silhouette and textures, contour, edges, image moments and Eigen vectors. Non-
geometric features cannot be seen (blind features) individually and collective processing is
required. The deformable templates are flexible in nature and allow changes in shape of the
object up to certain limit for little variation in the hand shape. Image motion based model can be
obtained with respect to colour cues to track the hand. Coloured markers are also used for
tracking the hand and detecting the fingers/ fingertips to model the hand shape. Hand shape can
also be represented using 3D modeling. The hand shape in 3D can be volumetric, skeletal and
geometric models. Volumetric models are complex in nature and difficult for computation in
real-time applications. It uses a lot of parameters to represent the hand shape. Instead other
geometric models, such as cylinders, ellipsoids and spheres are considered as alternative for such
model for hand shape approximation. Skeletal model represents the hand structure with 3D
structure with reduced set of parameters. Geometric models are used for hand animation and
real-time applications. Polygon meshes and cardboard models are examples of geometric
models.
CHAPTER 2
Disadvantages
There are many challenges associated with the accuracy and usefulness of gesture
recognition software. For image-based gesture recognition there are limitations on the equipment
used and image noise. Images or video may not be under consistent lighting, or in the same
location. Items in the background or distinct features of the users may make recognition more
difficult.
The variety of implementations for image-based gesture recognition may also cause issue
for viability of the technology to general usage. For example, an algorithm calibrated for one
camera may not work for a different camera. The amount of background noise also causes
tracking and recognition difficulties, especially when occlusions (partial and full) occur.
Furthermore, the distance from the camera, and the camera's resolution and quality, also cause
variations in recognition accuracy.
In order to capture human gestures by visual sensors, robust computer vision methods are
also required, for example for hand tracking and hand posture recognition or for capturing
movements of the head, facial expressions or gaze direction.
Gesture was the first mode of communication for the primitive cave men. Later on human
civilization has developed the verbal communication very well. But still non-verbal
communication has not lost its weight age. Such non – verbal communication are being used not
only for the physically challenged people, but also for different applications in diversified areas,
such as aviation, surveying, music direction etc. It is the best method to interact with the
computer without using other peripheral devices, such as keyboard, mouse. Researchers around
the world are actively engaged in development of robust and efficient gesture recognition
system, more specially, hand gesture recognition system for various applications. The major
steps associated with the hand gesture recognition system are; data acquisition, gesture modeling,
feature extraction and hand gesture recognition. The importance of gesture recognition lies in
building efficient human–machine interaction. Its applications range from sign language
recognition through medical rehabilitation to virtual reality. Given the amount of literature on the
problem of gesture recognition and the promising recognition rates reported, one would be led to
believe that the problem is nearly solved. Sadly this is not so. A main problem hampering most
approaches is that they rely on several underlying assumptions that may be suitable in a
controlled lab setting but do not generalize to arbitrary settings. Several common assumptions
include: assuming high contrast stationary backgrounds and ambient lighting conditions.
A feasibility study is carried out to select the best system that meets performance requirements.
The main aim of the feasibility study activity is to determine whether it would be financially and
technically feasible to develop the product. The feasibility study activity involves the analysis of
the problem and collection of all relevant information relating to the product such as the different
data items which would be input to the system, the processing required to be carried out on these
data, the output data required to be produced by the system as well as various constraints on the
behavior of the system.
Technical Feasibility
This is concerned with specifying equipment and software that will successfully satisfy the user
requirement. The technical needs of the system may vary considerably, but might include:
In examining technical feasibility, configuration of the system is given more importance than the
actual make of hardware. The configuration should give the complete picture about the system’s
requirements: How many workstations are required, how these units are interconnected so that
they could operate and communicate smoothly? And what speeds of input and output should be
achieved at particular quality of printing.
Economic Feasibility
Economic analysis is the most frequently used technique for evaluating the effectiveness of a
proposed system. More commonly known as Cost / Benefit analysis, the procedure is to
determine the benefits and savings that are expected from a proposed system and compare them
with costs. If benefits outweigh costs, a decision is taken to design and implement the system.
Otherwise, further justification or alternative in the proposed system will have to be made if it is
to have a chance of being approved. This is an outgoing effort that improves in accuracy at each
phase of the system life cycle.
Operational Feasibility
This is mainly related to human organizational and political aspects. The points to be considered
are:
• What new skills will be required? Do the existing staff members have these skills? If not, can
they be trained in due course of time?
This feasibility study is carried out by a small group of people who are familiar with information
system technique and are skilled in system analysis and design process. Proposed projects are
beneficial only if they can be turned into information system that will meet the operating
requirements of the organization. This test of feasibility asks if the system will work when it is
developed and installed.
CHAPTER 3
Development Environment
3.3 Java
This chapter is about the software language and the tools used in the development of the project.
The platform used here is JAVA. The Primary languages are JAVA, J2EE and J2ME. In this
project J2EE is chosen for implementation. Java is a programming language originally
developed by James Gosling at Microsystems and released in 1995 as a core component of Sun
Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a
simpler object model and fewer low-level facilities. Java applications are typically compiled to
byte code that can run on any Java Virtual Machine (JVM) regardless of computer architecture.
Java is general-purpose, concurrent, class-based, and object-oriented, and is specifically designed
to have as few implementation dependencies as possible. It is intended to let application
developers "write once, run anywhere".
Java is considered by many as one of the most influential programming languages of the 20th
century, and is widely used from application software to web applications the java framework is
a new platform independent that simplifies application development internet. Java technology's
versatility, efficiency, platform portability, and security make it the ideal technology for network
computing. From laptops to datacenters, game consoles to scientific supercomputers, cell phones
to the Internet, Java is everywhere! Java is a small, simple, safe, object oriented, interpreted or
dynamically optimized, byte coded, architectural, garbage collected, multithreaded programming
language with a strongly typed exception-handling for writing distributed and dynamically
extensible programs.
Java is an object oriented programming language. Java is a high-level, third generation language
like C, FORTRAN, Small talk, Pearl and many others. You can use java to write computer
applications that crunch numbers, process words, play games, store data or do any of the
thousands of other things computer software can do.
Special programs called applets that can be downloaded from the internet and played safely
within a web browser. Java a supports this application and the follow features make it one of the
best programming languages.
The original and reference implementation Java compilers, virtual machines, and class libraries
were developed by Sun from 1995. As of May 2007, in compliance with the specifications of the
Java Community Process, Sun made available most of their Java technologies as free software
under the GNU General Public License. Others have also developed alternative implementations
of these Sun technologies, such as the GNU Compiler for Java and GNU Class path.
The Java language was created by James Gosling in June 1991 for use in a set top box
project. The language was initially called Oak, after an oak tree that stood outside Gosling's
office - and also went by the name Green - and ended up later being renamed to Java, from a list
of random words. Gosling's goals were to implement a virtual machine and a language that had a
familiar C/C++ style of notation.
OBJECTIVES OF JAVA
Java has been tested, refined, extended, and proven by a dedicated community. And
numbering more than 6.5 million developers, it's the largest and most active on the planet. With
its versatility, efficiency, and portability, Java has become invaluable to developers by enabling
them to:
Write software on one platform and run it on virtually any other platform
Create programs to run within a Web browser and Web services
Develop server-side applications for online forums, stores, polls, HTML forms
processing, and more
Combine applications or services using the Java language to create highly customized
applications or services
Write powerful and efficient applications for mobile phones, remote processors, low-cost
consumer products, and practically any other device with a digital heartbeat
Today, many colleges and universities offer courses in programming for the Java
platform. In addition, developers can also enhance their Java programming skills by reading
Sun's java.sun.com Web site, subscribing to Java technology-focused newsletters, using the Java
Tutorial and the New to Java Programming Center, and signing up for Web, virtual, or
instructor-led courses.
Java Server Pages - An Overview
Java Server Pages or JSP for short is Sun's solution for developing dynamic web sites.
JSP provide excellent server side scripting support for creating database driven web applications.
JSP enable the developers to directly insert java code into jsp file, this makes the development
process very simple and its maintenance also becomes very easy. JSP pages are efficient, it
loads into the web servers memory on receiving the request very first time and the subsequent
calls are served within a very short period of time.
In today's environment most web sites servers dynamic pages based on user request.
Database is very convenient way to store the data of users and other things. JDBC provide
excellent database connectivity in heterogeneous database environment. Using JSP and JDBC its
very easy to develop database driven web application. Java is known for its characteristic of
"write once, run anywhere." JSP pages are platfJavaServer Pages .
Java Server Pages (JSP) technology is the Java platform technology for delivering dynamic
content to web clients in a portable, secure and well-defined way. The JavaServer Pages
specification extends the Java Servlet API to provide web application developers with a robust
framework for creating dynamic web content on the server using HTML, and XML templates,
and Java code, which is secure, fast, and independent of server platforms.
JSP has been built on top of the Servlet API and utilizes Servlet semantics. JSP has become the
preferred request handler and response mechanism. Although JSP technology is going to be a
powerful successor to basic Servlets, they have an evolutionary relationship and can be used in a
cooperative and complementary manner.
Servlets are powerful and sometimes they are a bit cumbersome when it comes to generating
complex HTML. Most servlets contain a little code that handles application logic and a lot more
code that handles output formatting. This can make it difficult to separate and reuse portions of
the code when a different output format is needed. For these reasons, web application developers
turn towards JSP as their preferred servlet environment.
Evolution of Web Applications
Over the last few years, web server applications have evolved from static to dynamic
applications. This evolution became necessary due to some deficiencies in earlier web site
design. For example, to put more of business processes on the web, whether in business-to-
consumer (B2C) or business-to-business (B2B) markets, conventional web site design
technologies are not enough. The main issues, every developer faces when developing web
applications, are:
1. Scalability - a successful site will have more users and as the number of users is increasing
fastly, the web applications have to scale correspondingly.
2. Integration of data and business logic - the web is just another way to conduct business, and so
it should be able to use the same middle-tier and data-access code.
3. Manageability - web sites just keep getting bigger and we need some viable mechanism to
manage the ever-increasing content and its interaction with business systems.
4. Personalization - adding a personal touch to the web page becomes an essential factor to keep
our customer coming back again. Knowing their preferences, allowing them to configure the
information they view, remembering their past transactions or frequent search keywords are all
important in providing feedback and interaction from what is otherwise a fairly one-sided
conversation.
Apart from these general needs for a business-oriented web site, the necessity for new
technologies to create robust, dynamic and compact server-side web applications has been
realized. The main characteristics of today's dynamic web server applications are as follows:
1. Serve HTML and XML, and stream data to the web client
3. Interface to databases, other Java applications, CORBA, directory and mail services
4. Make use of application server middleware to provide transactional support.
Benefits of JSP
One of the main reasons why the Java Server Pages technology has evolved into what it is today
and it is still evolving is the overwhelming technical need to simplify application design by
separating dynamic content from static template display data. Another benefit of utilizing JSP is
that it allows to more cleanly separating the roles of web application/HTML designer from a
software developer. The JSP technology is blessed with a number of exciting benefits, which are
chronicled as follows:
1. The JSP technology is platform independent, in its dynamic web pages, its web servers, and its
underlying server components. That is, JSP pages perform perfectly without any hassle on any
platform, run on any web server, and web-enabled application server. The JSP pages can be
accessed from any web server.
2. The JSP technology emphasizes the use of reusable components. These components can be
combined or manipulated towards developing more purposeful components and page design.
This definitely reduces development time apart from the At development time, JSPs are very
different from Servlets, however, they are precompiled into Servlets at run time and executed by
a JSP engine which is installed on a Web-enabled application server such as BEA Web Logic
and IBM Web Sphere.
Servlets
Earlier in client- server computing, each application had its own client program and it worked as
a user interface and need to be installed on each user's personal computer. Most web applications
use HTML/XHTML that are mostly supported by all the browsers and web pages are displayed
to the client as static documents.
A web page can merely displays static content and it also lets the user navigate through the
content, but a web application provides a more interactive experience.
Any computer running Servlets or JSP needs to have a container. A container is nothing but a
piece of software responsible for loading, executing and unloading the Servlets and JSP. While
servlets can be used to extend the functionality of any Java- enabled server.
They are mostly used to extend web servers, and are efficient replacement for CGI scripts. CGI
was one of the earliest and most prominent server side dynamic content solutions, so before
going forward it is very important to know the difference between CGI and the Servlets.
Java Servlets
Java Servlet is a generic server extension that means a java class can be loaded dynamically to
expand the functionality of a server. Servlets are used with web servers and run inside a Java
Virtual Machine (JVM) on the server so these are safe and portable.
Unlike applets they do not require support for java in the web browser. Unlike CGI, servlets
don't use multiple processes to handle separate request. Servets can be handled by separate
threads within the same process. Servlets are also portable and platform independent.
A web server is the combination of computer and the program installed on it. Web server
interacts with the client through a web browser. It delivers the web pages to the client and to an
application by using the web browser and the HTTP protocols respectively.
The define the web server as the package of large number of programs installed on a computer
connected to Internet or intranet for downloading the requested files using File Transfer Protocol,
serving e-mail and building and publishing web pages. A web server works on a client server
model. JSP and Servlet are gaining rapid acceptance as means to provide dynamic content on the
Internet. With full access to the Java platform, running from the server in a secure manner, the
application possibilities are almost limitless. When JSPs are used with Enterprise JavaBeans
technology, e-commerce and database resources can be further enhanced to meet an enterprise's
needs for web applications providing secure transactions in an open platform. J2EE technology
as a whole makes it easy to develop, deploy and use web server applications instead of mingling
with other technologies such as CGI and ASP. There are many tools for facilitating quick web
software development and to easily convert existing server-side technologies to JSP and Servlets.
CHAPTER 4
Application access
CHAPTER 5
Modules Description
• Image Acquisition
• Foreground segmentation
• Evaluation criteria
For efficient hand gesture recognition, data acquisition should be as much perfect as
possible. Suitable input device should be selected for the data acquisition. There are a number of
input devices for data acquisition. Some of them are data gloves, marker, hand images (from
webcam/ stereo camera/ Kinect 3D sensor) and drawings. Data gloves are the devices for perfect
data input with high accuracy and high speed. It can provide accurate data of joint angle,
rotation, location etc. for application in different virtual reality environments. At present,
wireless data gloves are available commercially so as to remove the hindrance due to the cable.
Colored markers attached to the human skin are also used as input technique and hand
localization is done by the color localization. Input can also be fed to the system without any
external costly hardware, except a low-cost web camera. Bare hand (either single or double) is
used to generate the hand gesture and the camera captures the data easily and naturally (without
any contact). Sometimes drawing models are used to input commands to the system. The latest
addition to this list is Microsoft Kinect 3D depth sensor. Kinect is a 3D motion sensing input
device widely used for gaming. In this module, we can input image from web camera and also
capture hand and face images. And captured both depth and color image. In 3D computer
graphics a depth map is an image and image channel that contains information relating to the
distance of the surfaces of scene objects from a viewpoint. The term is related to and may be
analogous to depth buffer, Z-buffer, Z-buffering and Z-depth. The "Z" in these latter terms
relates to a convention that the central axis of view of a camera is in the direction of the camera's
Z axis, and not to the absolute Z axis of a scene. And use color map techniques to implement
a function that maps (transforms) the colors of one (source) image to the colors of another
(target) image. A color mapping may be referred to as the algorithm that results in the mapping
function or the algorithm that transforms the image colors.
Separating foreground objects from natural images and video plays an important role in
image and video editing tasks. Despite extensive study in the last two decades, this problem still
remains challenging. In particular, extracting a foreground object from the background in a static
image involves determining both full and partial pixel coverage, also known as extracting a
matte, which is a severely under-constrained problem. Segmenting spatio-temporal video objects
from a video sequence is even harder since extracted foregrounds on adjacent frames must be
both spatially and temporally coherent. Previous approaches for foreground extraction usually
require a large amount of user input and still suffer from inaccurate results and low
computational efficiency.
In foreground segmentation section, the background was ruled out from the captured
frames and the whole human body was kept as the foreground. In this module, we implement
thresholding approach. In computer vision, image segmentation is the process of partitioning a
digital image into multiple segments (sets of pixels, also known as super pixels). The goal of
segmentation is to simplify and/or change the representation of an image into something that is
more meaningful and easier to analyze. Image segmentation is typically used to locate objects
and boundaries (lines, curves, etc.) in images. More precisely, image segmentation is the process
of assigning a label to every pixel in an image such that pixels with the same label share certain
characteristics. Thresholding is the simplest segmentation method. The pixels are partitioned
depending on their intensity value.
Face and hand detection was used to initialize the position of the face and hands for the
tracking phase. After initialization, both face and hands were tracked through video sequences by
HMM method.
5.4 Hand trajectory classification
Hand tracking results were segmented as trajectories, compared with motion models, and
decoded as commands for robotic control.
Neural networks are composed of simple elements operating in parallel. These elements
are inspired by biological nervous systems. As in nature, the network function is determined
largely by the connections between elements. We can train a neural network to perform a
particular function by adjusting the values of the connections (weights) between elements.
Commonly neural networks are adjusted, or trained, so that a particular input leads to a specific
target output. There, the network is adjusted, based on a comparison of the output and the target,
until the network output matches the target. Typically many such input/target pairs are used, in
this supervised learning (training method studied in more detail on following chapter), to train a
network.
The proposed system was able to detect finger tips even when it was in front of palm, it
reconstruct the 3D image of hand that was visually comparable. This system claimed results 90-
95% accurate for open fingers that is quite acceptable while for closed finger it was 10-20% only
and closed or bended finger is coming in front of palm, so skin color detection would not make
any difference in palm or finger. According to him image quality andoperator was the main
reason for low detection and claims about 90% accuracy in the result, if the lighting conditions
are good. Then used six different parameters to control the performance of system, if he found
much noise there, he could control it using two parameters called as α and β respectively. Finally
claims about 90.45% accuracy, through hidden finger was not detected in his approach.
CHAPTER 6
System Testing
The procedure level testing is made first. By giving improper inputs, the errors occurred
are noted and eliminated .Then the web form level is made.
Each and every module is checked in this unit testing phase. Used controls are executed
the coding successfully without any execution error and run time error. Unit tester checks each
module output.
Integration Testing
Testing is done for each module. After testing all the modules, the modules are integrated
and testing of the final system is done with the test data, specially designed to show that the
system will operate successfully in all its aspects conditions. Thus the system testing is a
confirmation that all its correct and an opportunity to show the user that the system works.
In this testing, check flow of the each module. Image acquisition, preprocessing, text
detection and recognition are integrated into project. Function flow successfully executed from
first module to final module. Integrated testing provide proof concept to list out successfully
executed loop conditions.
Validation Testing
The final step involves validation testing which determines whether the software function
as the user expected. The end-user rather than the system developer conduct this test most
software developers as a process called “Alpha and Beta test” to uncover that only the end user
seems able to find. The compilation of the entire project is based on the full satisfaction of the
end users.
CHAPTER 7
Implementation Results
Implementation is the process that actually yields the lowest-level system elements in the
system hierarchy (system breakdown structure). System elements are made, bought, or reused.
Production involves the hardware fabrication processes of forming, removing, joining, and
finishing, the software realization processes of coding and testing, or the operational procedures
development processes for operators' roles. If implementation involves a production process, a
manufacturing system which uses the established technical and management processes may be
required. The purpose of the implementation process is to design and create (or fabricate) a
system element conforming to that element’s design properties and/or requirements. The element
is constructed employing appropriate technologies and industry practices. This process bridges
the system definition processes and the integration process.
implementation tools
implementation procedures
verification reports
Conclusion
The design of more natural and multimodal forms of interaction with computers or
systems is an aim to achieve. Vision-based interfaces can offer appealing solutions to introduce
non-intrusive systems with interaction by means of gestures. In order to build reliable and robust
perceptual user interfaces based on computer vision, certain practical constraints must be taken
in account: the application must be capable of working well in any environment and should make
use of low-cost devices. This work has proposed a new mixture of several computer vision
techniques for facial and hand features detection and tracking and face gesture recognition, some
of them have been improved and enhanced to reach more stability and robustness. A hands-free
interface able to replace the standard mouse motions and events has been developed using these
techniques. Hand gesture recognition is finding its application for non-verbal communication
between human and computer, general fit person and physically challenged people, 3D gaming,
virtual reality etc. With the increase in applications, the gesture recognition system demands lots
of research in different directions. Finally we implemented effective and robust algorithms to
solve false merge and false labeling problems of hand tracking through interaction and occlusion.
CHAPTER 9
Future enhancements
In future we present an idea of hand gesture recognition to improve the accuracy of the
system and also include eye blink detection for access the systems. In this future, a vision-based
system for detection ofvoluntary eye-blinks is presented, together with its implementationas a
Human–Computer Interface for people withdisabilities. The future algorithm allowsfor eye-blink
detection, estimation of the eye-blink durationand interpretation of a sequence of blinks in real
timeto control a non-intrusive human–computer interface. Thedetected eye-blinks are classified
as short blinks (shorterthan 200 ms) or long blink (longer than 200 ms). Separateshort eye-blinks
are assumed to be spontaneous and are notincluded in the designed eye-blink code.
CHAPTER 10
Output Screenshots
References
[1] M. R. Ahsan, “EMG signal classification for human computer interaction: A review,” Eur. J.
Sci. Res., vol. 33, no. 3, pp. 480–501, 2009.
[2] J. A. Jacko, “Human–computer interaction design and development approaches,” in Proc.
14th HCI Int. Conf., 2011, pp. 169–180.
[3] I. H. Moon, M. Lee, J. C. Ryu, and M. Mun, “Intelligent robotic wheelchair with EMG-,
gesture-, and voice-based interface,” Intell.Robots Syst., vol. 4, pp. 3453–3458, 2003.
[4] M. Walters, S. Marcos, D. S. Syrdal, and K. Dautenhahn, “An interactive game with a robot:
People’s perceptions of robot faces and a gesture based user interface,” in Proc. 6th Int. Conf.
Adv. Computer–HumanInteractions, 2013, pp. 123–128.
[5] O. Brdiczka, M. Langet, J. Maisonnasse, and J. L. Crowley, “Detection human behavior
models from multimodal observation in a smart home,” IEEE Trans. Autom. Sci. Eng., vol. 6,
no. 4, pp. 588–597, Oct. 2009.
[6] M. A. Cook and J. M. Polgar, Cook & Hussey’s Assistive Technologies: Principles and
Practice, 3rd ed. Maryland Heights, MO, USA: MosbyElsevier, 2008, pp. 3–33.
[7] G. R. S. Murthy, and R. S. Jadon, “A review of vision based hand gesture recognition,” Int. J.
Inform. Technol. Knowl. Manage., vol. 2, no. 2, pp. 405–410, 2009.
[8] D. Debuse, C. Gibb, and C. Chandler, “Effects of hippotherapy on people with cerebral palsy
from the users’ perspective: A qualitative study,” Physiotherapy Theory Practice, vol. 25, no. 3,
pp. 174–192, 2009.
[9] J. A. Sterba, B. T. Rogers, A. P. France, and D. A. Vokes, “Horseback riding in children with
cerebral palsy: Effect on gross motor function,” Develop. Med. Child Neurology, vol. 44, no. 5,
pp. 301–308, 2002.
[10] K. L. Kitto, “Development of a low-cost sip and puff mouse,” in Proc. 16th Annu Conf.
RESNA, 1993, pp. 452–454.