0% found this document useful (0 votes)
111 views7 pages

The PC Interfaced Voice Recognition System Is To Implement A Password For Authentication

Abstract,intoroduction,working of voice recognition system
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views7 pages

The PC Interfaced Voice Recognition System Is To Implement A Password For Authentication

Abstract,intoroduction,working of voice recognition system
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

ABSTRACT

The PC interfaced voice recognition system is to implement a password for authentication.


During training, a person need to spell out the password 10 to 15 minutes via mic. Audio signal
sampling modules read the audio signals from mic and sampled at rate of 16,000 per second.
Sampling frequency is defined as the number of samples taken per second from a continuous
audio signal to make a discrete signal. Amplitude of each sample will be calculated and stored
into MySQL database with respect to a user. This process will be repeated for all the samples
collected from a person. Once samples are collected, amplitude average of similar block in the
signal will be taken and stored. During real time authentication, when a person spell out his
password, audio signal will be sampled into blocks which is equal to the no of blocks in the
trained signal. Then the amplitude of the real time block will be compared with the amplitude of
the trained block data. We need to put a tolerance of 10 - 15 % on both sides , because there will
be a slight deviation in voice when a person spell out his password each time.
INTRODUCTION

Voice recognition is the inter-disciplinary sub-field of computational


linguistics that develops methodologies and technologies that enables the recognition
and translationof spoken language into text by computers. It is also known as "automatic speech
recognition" (ASR), "computer speech recognition", or just "speech to text" (STT). It
incorporates knowledge and research in the linguistics, computer science, and electrical
engineering fields.

Some SR systems use "training" (also called "enrollment") where an individual speaker reads
text or isolated vocabulary into the system. The system analyzes the person's specific voice and
uses it to fine-tune the recognition of that person's speech, resulting in increased accuracy.
Systems that do not use training are called "speaker independent" [1]systems. Systems that use
training are called "speaker dependent".

Voice recognition applications include voice user interfaces such as voice dialing (e.g. "Call
home"), call routing (e.g. "I would like to make a collect call"), domotic appliance control,
search (e.g. find a podcast where particular words were spoken), simple data entry (e.g., entering
a credit card number), preparation of structured documents (e.g. a radiology report), speech-to-
text processing (e.g., word processors or emails), and aircraft (usually termed Direct Voice
Input).

The term voice recognition or speaker identification refers to identifying the speaker, rather than
what they are saying. Recognizing the speaker can simplify the task of translating speech in
systems that have been trained on a specific person's voice or it can be used to authenticate or
verify the identity of a speaker as part of a security process.
From the technology perspective, speech recognition has a long history with several waves of
major innovations. Most recently, the field has benefited from advances in deep learning and big
data. The advances are evidenced not only by the surge of academic papers published in the
field, but more importantly by the worldwide industry adoption of a variety of deep learning
methods in designing and deploying speech recognition systems. These speech industry players
include Google, Microsoft, IBM, Baidu, Apple, Amazon,Nuance, SoundHound, IflyTek, CDAC
many of which have publicized the core technology in their speech recognition systems as being
based on deep learning.
WORKING OF VOICE RECOGNITION SYSTEM
When we speak, our voices generate little sound packets
called phones (which correspond to the sounds of letters or groups of letters in words); so
speaking the word cat produces phones that correspond to the sounds "c," "a," and "t." Although
you've probably never heard of these kinds of phones before, you might well be familiar with the
related concept of phonemes: simply speaking, phonemes are the basic LEGO blocks of sound
that all words are built from. Although the difference between phones and phonemes is complex
and can be very confusing, this is one "quick-and-dirty" way to remember it: phones
are actual bits of sound that we speak (real, concrete things), whereas phonemes are ideal bits of
sound we store (in some sense) in our minds (abstract, theoretical sound fragments that are never
actually spoken).

Computers and computer models can juggle around with phonemes, but the real bits of speech
they analyze always involves processing phones. When we listen to speech, our ears catch
phones flying through the air and our leaping brains flip them back into words, sentences,
thoughts, and ideasso quickly, that we often know what people are going to say before the
words have fully fled from their mouths. Instant, easy, and quite dazzling, our amazing brains
make this seem like a magic trick. And it's perhaps because listening seems so easy to us that we
think computers (in many ways even more amazing than brains) should be able to hear,
recognize, and decode spoken words as well.

Broadly speaking, there are four different approaches a computer can take if it wants to turn
spoken sounds into written words:

1. Simple pattern matching (where each spoken word is recognized in its entiretythe way
you instantly recognize a tree or a table without consciously analyzing what you're
looking at)

2. Pattern and feature analysis (where each word is broken into bits and recognized from
key features, such as the vowels it contains)

3. Language modeling and statistical analysis (in which a knowledge of grammar and the
probability of certain words or sounds following on from one another is used to speed up
recognition and improve accuracy)

4. Artificial neural networks (brain-like computer models that can reliably recognize
patterns, such as word sounds, after exhaustive training).
DATA FLOW DIAGRAM
ANALYSIS REPORT

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy