0% found this document useful (0 votes)
94 views74 pages

Ranjith S - Mini Project

The document is a mini project report on speech and text recognition. It discusses converting speech into text using mobile apps. The apps have features like text-to-speech for those who cannot speak, speech-to-text for deaf users, and translation between languages. The report outlines the hardware and software requirements needed and analyzes the feasibility and benefits of the proposed system compared to existing systems. It also describes the objectives, modules, and implementation of the project.

Uploaded by

monu mary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views74 pages

Ranjith S - Mini Project

The document is a mini project report on speech and text recognition. It discusses converting speech into text using mobile apps. The apps have features like text-to-speech for those who cannot speak, speech-to-text for deaf users, and translation between languages. The report outlines the hardware and software requirements needed and analyzes the feasibility and benefits of the proposed system compared to existing systems. It also describes the objectives, modules, and implementation of the project.

Uploaded by

monu mary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

MINI PROJECT REPORT

(PCA20P02L)

ON

SPEECH AND TEXT RECOGNITION

(CONVERSION OF SPEECH INTO TEXT)

By

RANJITH S

(RA2132241020088)

Submitted to the

DEPARTMENT OF COMPUTER SCIENCE AND APPLICATIONS (MCA)

Under the guidance of

Dr. S. UMA SHANKARI MCA, M.Phil., Ph.D., NET, SET

Assistant Professor, Department of Computer Applications

MASTER OF COMPUTER APPLICATIONS

SRM INSTITUTE OF SCIENCE AND TECHNOLOGY

Ramapuram, Chennai.
NOVEMBER 2022
COLLEGE OF SCIENCE & HUMANITIES

Ramapuram, Chennai.

Department of Computer Science and Applications (MCA)

BONAFIDE CERTIFICATE

Certified that this project report titled “SPEECH AND TEXT RECOGNITION”

is the bonafide work of RANJITH S (Reg No:RA2132241020088) who carried out


the Mini project work done under my supervision.

Signature of Internal Guide Signature of Head of the Department

Signature of External Examiner


ABSTRACT

In the developed world, smartphones have now overtaken the usage of earlier
mobile system, Mobile apps are playing an ever-increasing role in our day today
life. This apps serves you with three different features " Text to speech " This
feature helps people who are dumb to express the words through mobile phone
speakers. "Speech to text " This feature allows the deaf people to understand the
feelings and words of others through their smart phone screen. " Translator "
feature renders the message and feeling of a person from one language to another
this makes our life easier and more comfortable. Voice is the basic, common and
efficient form of communication method for people to interact with each other.
Today speech technologies are commonly available for a limited but interesting
range of task. These technologies enable machines to respond correctly and
reliably to human voices and provide useful and valuable services. As
communicating with computer is faster using voice rather than using keyboard,
so people will prefer such system. Communication among the human being is
dominated by spoken language, therefore it is natural for people to expect voice
interfaces with computer.
ACKNOWLEDGEMENT

I extend my sincere gratitude to the Chancellor Dr. T.R. PACHAMUTHU


and to Chairman Dr. R. SHIVAKUMAR of SRM Institute of Science and
Technology, Ramapuram and Trichy campuses for providing me the opportunity to
pursue the MCA degree at this University.

I express my sincere gratitude to Maj. Dr. M. VENKATARAMANAN


Dean(S&H), SRM IST, Ramapuram for his support and encouragement for the
successful completion of the project.

I record my sincere thanks to Dr. J. DHILIPAN M.Sc., MBA., M.Phil.,


Ph.D., Vice Principal-Academic(S&H) and Head of the Department of Computer
Applications, SRM IST, Ramapuram for his continuous support and keen interest to
make this project a successful one.

I find no word to express profound gratitude to my guide Dr. S. UMA


SHANKARI MCA, M.Phil., Ph.D., NET, SET, Department of Computer Science
and Applications (MCA), SRM IST Ramapuram.

I thank the almighty who has made this possible. Finally, I thank my beloved
family member and friend for their motivation, encouragement and cooperation in
all aspect which led me to the completion of this project.

RANJITH S
TABLE OF CONTENTS

S.NO TITLE PAGE


NO
ABSTRACT

ACKNOWLEDGEMENT

LIST OF TABLES

LIST OF FIGURES

CHAPTERS TITLE PAGE


NO.
INTRODUCTION

1 1.1PROJECT INTRODUCTION 2

WORKING ENVIRONMENT

2.1 HARDWARE REQUIREMENT 5

2.2 SOFTWARE REQUIREMENT 6


2
2.3 SYSTEM SOFTWARE 7

SYSTEM ANALYSIS

3.1 FEASIBILITY STUDY 9

10
3.2 EXISTING SYSTEM
11
3.3 DRAWBACKS OF EXISTING SYSTEM
11
3 3.4 PROPOSED SYSTEM
12
3.5 BENEFITS OF PROPOSED SYSTEM
13
3.6 SCOPE OF THE PROJECT
SYSTEM DESIGN

4.1 DATA FLOW DIAGRAM 16


4
4.2 USE CASE DIAGRAM 17

4.3 ARCHITECTURE DIAGRAM 19

PROJECT DESCRIPTION

5.1 OBJECTIVE 22
5 23
5.2 MODULE DESCRIPTION
24
5.3 IMPLEMENTATION

SYSTEM TESTING

6.1 TESTING DEFINITON 26

6 6.2 TESTING OBJECTIVE 27

28
6.3 TYPES OF TESTING

CONCLUSION

7.1 SUMMARY 31
7
7.2 FUTURE ENHANCEMENTS 32
APPENDIX

8.1 SCREENSHOTS 34
8
8.2 CODING 38

8.3 DATA DICTIONARY 63

9 BIBLIOGRAPHY AND REFERENCES 66


LIST OF FIGURES

FIG.NO TITLE PAGE.NO

1.1.1 SPEECH AND TEXT CONVERSION 2

1.1.2 TEXT TO SPEECH CONVERSION 3

3.4 PROPOSED SYSTEM 12

4.1.1 TEXT TO SPEECH DATAFLOW DIAGRAM 16

4.1.2 SPEECH TO TEXT DATAFLOW DIAGRAM 16

4.2.1 TEXT TO SPEECH USECASE DIAGRAM 17

4.2.2 SPEECH TO TEXT USECASE DIAGRAM 18

4.3.1 ARCHITECTURE DIAGRAM 1 19

4.3.2 ARCHITECTURE DIAGRAM 2 20

8.1.1 HOMEPAGE 34

8.1.2 TEXT TO SPEECH CONVERSION PAGE 35

8.1.3 SPEECH TO TEXT CONVERSION PAGE 36

8.1.4 TRANSLATOR PAGE 37


CHAPTER 1

1
INTRODUCTION

1.1 PROJECT INTRODUCTION:


Speech and text recognition has three features text to speech, speech to text and translator Which
helps for a deaf and dumb people to read or hear the content which is typed or which is spoken in
it " Text to speech " This feature helps people who are dumb to express the words through mobile
phone speakers. "Speech to text " This feature allows the deaf people to understand the feelings
and words of others through their smart phone screen. " Translator " feature renders the message
and feelings of a person from one language to another this makes our life easier and more
comfortable. Statistics show that there are currently over 325, 000 health-related mobile apps now
presented on app marketplaces. As per the statistics healthcare app developers, are keen on
developing projects like fitness app, calorie burning tracker, online pharmacy apps, online doctor
consulting apps are the areas shown interest in developing. There are only very few developers
who develop these health app for benefit of the person in need, so these apps are in huge demand
in the market. Speech recognition is an interdisciplinary subfield of computer science and
computational linguistics that develops methodologies and technologies that enable the
recognition and translation of spoken language into text by computers with the main benefit of
searchability. It is also known as automatic speech recognition (ASR), computer speech
recognition or speech to text (STT). It incorporates knowledge and research in the computer
science, linguistics and computer engineering fields. The reverse process is speech synthesis.

1.1.1 SPEECH AND TEXT CONVERSION

2
Some speech recognition systems require "training" (also called "enrollment") where an
individual speaker reads text or isolated vocabulary into the system. The system analyzes the
person's specific voice and uses it to fine-tune the recognition of that person's speech, resulting
in increased accuracy. Systems that do not use training are called "speaker-independent"[1]
systems. Systems that use training are called "speaker dependent”. Speech recognition
applications include voice user interfaces such as voice dialing (e.g. "call home"), call routing
(e.g. "I would like to make a collect call"), demotic appliance control, search key words (e.g. find
a podcast where particular words were spoken), simple data entry (e.g., entering a credit card
number), preparation of structured documents (e.g. a radiology report), determining speaker
characteristics, speech-to-text processing (e.g., word processors or emails), and aircraft (usually
termed direct voice input).

1.1.2 TEXT TO SPEECH CONVERSION

3
CHAPTER 2

4
WORKING ENVIRONMENT

2.1 HARDWARE REQUIREMENTS


• Operating system: windows 10

• Processor: Intel Core i5 (8th Gen) Processor

• Hard Drive: 1TB.

• Memory (RAM): 8 GB.

2.2 SOFTWARE REQUIREMENTS

• Java 19

• Frontend XML

• Android Studio

2.3 SYSTEM SOFTWARE

REQUIREMENT ANALYSIS

Requirements are a feature of a system or description of something that the system is capable of
doing in order to fulfil the system’s purpose. It provides the appropriate mechanism for
understanding what the customer wants, analyzing the needs assessing feasibility, negotiating a
reasonable solution, specifying the solution unambiguously, validating the specification and
managing the requirements as they are translated into an operational system.

5
JAVA

Java is a class-based, object-oriented programming language that is designed to have as few


implementation dependencies as possible. It is a general-purpose programming language intended
to let application developers write once, run anywhere (WORA), meaning that compiled Java code
can run on all platforms that support Java without the need for recompilation. Java applications are
typically compiled to bytecode that can run on any Java virtual machine (JVM) regardless of the
underlying computer architecture. The syntax of Java is similar to C and C++, but has fewer low-
level facilities than either of them. The Java runtime provides dynamic capabilities (such as
reflection and runtime code modification) that are typically not available in traditional compiled
languages. As of 2019, Java was one of the most popular programming languages in use according
to particularly for client-server web applications, with a reported 9 million developers. Java was
originally developed by James Gosling at Sun Microsystems (which has since been acquired by
Oracle) and released in 1995 as a core component of Sun Microsystems' Java platform. The original
and reference implementation Java compilers, virtual machines, and class libraries were originally
released by Sun under proprietary licenses. As of May 2007, in compliance with the specifications
of the Java Community Process, Sun had relicensed most of its Java technologies under the GNU
General Public License. Oracle offers its own Hotspot Java Virtual Machine; however, the official
reference implementation is the OpenJDK JVM which is free open-source software and used by
most developers and is the default JVM for almost all Linux distributions.
Features in Java

One of the biggest reasons why Java is so popular is the platform independence. Programs can run
on several different types of computers; as long as the computer has a Java Runtime Environment
(JRE) installed, a Java program can run on it. Most types of computers will be compatible with a
JRE including PCs running on Windows, Macintosh computers, Unix or Linux computers, and large
mainframe computers, as well as mobile phones. Since it has been around for so long, some of the
biggest organizations in the world are built using the language. Many banks, retailers, insurance
companies, utilities, and manufacturers all use Java.

6
ANDROID STUDIO

Android Studio is the official integrated development environment (IDE) for Google's Android
operating system, built on JetBrains' IntelliJ IDEA software and designed specifically for
Android development. It is available for download on Windows, macOS and Linux based
operating systems or as a subscription-based service in 2020. It is a replacement for the Eclipse
Android Development Tools (E-ADT) as the primary IDE for native Android application
development. Android Studio was announced on May 16, 2013 at the Google I/O conference. It
was in early access preview stage starting from version 0.1 in May 2013, then entered beta stage
starting from version 0.8 which was released in June 2014. The first stable build was released in
December 2014, starting from version 1.0.
Features in Android Studio
Android Studio supports all the same programming languages of IntelliJ (and CLIN)
e.g., Java, C++, and more with extensions, such as Go and Android Studio 3.0 or later supports
Kotlin and "all Java 7 language features and a subset of Java 8 language features that vary by
platform version." External projects backport some Java 9 features. While IntelliJ states that
Android Studio supports all released Java versions, and Java 12, it's not clear to what level
Android Studio supports Java versions up to Java 12 (the documentation mentions partial Java 8
support). At least some new language features up to Java 12 are usable in Android. Once an app
has been compiled with Android Studio, it can be published on the Google Play Store. The
application has to be in line with the Google Play Store developer content policy

7
CHAPTER 3

8
SYSTEM ANALYSIS
3.1 FEASIBILITY STUDY

The feasibility of the project is analyzed in this phase and business proposal is put forth N with a
very general plan for the project and some cost estimates. During system analysis the feasibility
study of the proposed system is to be carried out. This is to ensure that the proposed system is not a
burden to the person. For feasibility analysis, some understanding of the major requirements for
the system is essential. Three key considerations involved in the feasibility analysis are

 Economic Feasibility
 Technical Feasibility
 Social Feasibility

3.1.1. Economic Feasibility


This study is carried out to check the economic impact that the system will have on the
organization. The amount of fund that the company can pour into the research and development of
the system is limited. The expenditures must be justified. Thus, the developed system as well
within the budget and this was achieved because most of the technologies used are freely available.
Only the customized products had to be purchased.

3.1.2. Technical Feasibility


This study is carried out to check the technical feasibility, that is, the technical requirements of the
system. Any system developed must not have a high demand on the available technical resources.
This will lead to high demands on the available technical resources. This will lead to high demands
being placed on the client. The developed system must have a modest requirement, as only
minimal or null changes are required for implementing this system.

9
3.1.3. Social Feasibility
The aspect of study is to check the level of acceptance of the system by the user. This includes the
process of training the user to use the system efficiently. The user must not feel threatened by the
system, instead must accept it as a necessity. The level of acceptance by the users solely depends on
the methods that are employed to educate the user about the system and to make him familiar with
it. His level of confidence must be raised so that he is also able to make some constructive criticism,
which is welcomed, as he is the final user of the system.

3.2 EXISTING SYSTEM

10
3.3 DRAWBACKS OF EXISTING SYSTEM

“Write by voice, Speech to text, tell me, Speechify”

These are some of the good rated apps currently available in the app market.

• These applications actually don’t help this community in any way, these apps are just
named in the prefix of deaf and dumb but these things actually just increase the volume,
this is not going to help them in anyway.

• Most of the existing system have the title as deaf and dumb assistance but most of the
existing systems do not serve the relevant services that has to be served in this title, but our
system provides you the perfect services which has to be served with this Title.

3.4 PROPOSED SYSTEM

All the existing systems till now supports only a single assistance, whereas our system serves you
both the assistance Within a single application. Most of the existing system have the title as deaf
and dumb assistance but most of the existing systems do not serve the relevant services that has to
be served in this title, but our system provides you the perfect services which has to be served with
this Title. As our System provides you with the translator which will be more helpful with this
application which enhances your experience with this application

Phase-1: Train the Convolutional Neural Network (CNN) with the train dataset.

Phase-2: Store the obtained weights and parameters in file system.

Phase-3: Pre-process the input image.

Phase-4: Loading stored weights in to the CNN.

Phase-5: Classification of the image.

11
3.4 PROPOSED SYSTEM

3.5 BENEFITS OF PROPOSED SYSTEM

We achieved final accuracy of 95.0% on our data set. We have improved our prediction after
implementing two layers of algorithms wherein we have verified and predicted symbols which
are more similar to each other.

This gives us the ability to detect almost all the symbols provided that they are shown properly,
there is no noise in the background and lighting is adequate.

12
3.6 SCOPE OF THE PROJECT:
We are planning to achieve higher accuracy even in case of complex backgrounds by trying out
various background subtraction algorithms. This system provides three features text to speech,
speech to text and translator This system helps deaf people to read the message what they want to
hear will be displayed through mobile phone screen. This system helps dumb people to speak the
message what they want to speak through mobile phone speakers. This system helps both deaf and
dumb people to translate the message to other languages There are many more works that can be
carried out as an extension of this project. This system predicts the need of the mute person but
future systems may be developed that could communicate to the mute person’s mobile device,
allowing the system to learn the needs of the user, thereby provisioning the development of
recommendatory systems as they have the relevant data related to the mute person that can easily
be learned thought he neural network model.

13
CHAPTER 4

14
SYSTEM DESIGN

● Systems design is the process of defining elements of a system like modules, architecture,
components and their interfaces and data for a system based on the specified requirements.
It is the process of defining, developing and designing systems which satisfies the specific
needs and requirements of a business or organization.
● A systemic approach is required for a coherent and well-running system. Bottom-Up or
Top-Down approach is required to take into account all related variables of the system. A
designer uses the modelling languages to express the information and knowledge in a
structure of system that is defined by a consistent set of rules and definitions. The designs
can be defined in graphical or textual modelling languages.

DESIGN METHODS:

● Architectural design: To describes the views, models, behavior, and structure of the system.
● Logical design: To represent the data flow, inputs and outputs of the system. Example: ER
Diagrams (Entity Relationship Diagrams).
● Physical design: Defined as a) How users add information to the system and how the
system represents information back to the user. b) How the data is modelled and stored
within the system. c) How data moves through the system, how data is validated, secured
and/or transformed as it flows through and out of the system.

15
4.1 DATA FLOW DIAGRAM

DFDs make it easy to depict the business requirements of applications by representing the
sequence of process steps and flow of information using a graphical representation or visual
representation rather than a textual description. When used through an entire development
process, they first document the results of business analysis. Then, they refine the
representation to show how information moves through, and is changed by, application flows.
Both automated and manual processes are represented.

DATAFLOW DIAGRAM

4.1.1 TEXT TO SPEECH

4.1.2 SPEECH TO TEXT

16
4.2 USE CASE DIAGRAM

Use-case diagrams describe the high-level functions and scope of a system. These diagrams also
identify the interactions between the system and its actors. The use cases and actors in use-case
diagrams describe what the system does and how the actors use it, but not how the system operates
internally. Use-case diagrams illustrate and define the context and requirements of either an entire
system or the important parts of the system. You can model a complex system with a single use-
case diagram, or create many use-case diagrams to model the components of the system. You
would typically develop use-case diagrams in the early phases of a project and refer to them
throughout the development process.

USECASE DIAGRAM

4.2.1 TEXT TO SPEECH

17
4.2.2 SPEECH TO TEXT

18
4.3 ARCHITECTURE DIAGRAM

An architecture diagram is a visual representation of all the elements that make up part, or all, of
a system. Above all, it helps the engineers, designers, stakeholders and anyone else involved in the
project understand a system or app’s layout. This diagram gives a top-level view of a software’s
structure. To elaborate, it generally includes various components that interact with each other and
how the software interacts with external databases and servers. It’s useful for explaining software
to clients and stakeholders; and assessing the impact of adding new features or upgrading,
replacing, or merging existing applications.

4.3.1 ARCHITECTURE DIAGRAM 1

19
4.3.2 ARCHITECTURE DIAGRAM 2

20
CHAPTER 5

21
CHAPTER 5

PROJECT DESCRIPTION:

5.1 OBJECTIVE:

● To Statistics show that there are currently over 325, 000 health-related mobile apps now
presented on app marketplaces.
● As per the statistics healthcare app developers, are keen on developing projects like fitness
app, calorie burning tracker, online pharmacy apps, online doctor consulting apps are the
areas shown interest in developing.
● There are only very few developers who develop these health app for benefit of the person
in need so these apps are in huge demand in the market.

5.2 MODULE DESCRIPTION:

This application has three modules through which all the operations takes place.

● Module 1 – Text-to-Speech

● Module 2 – Speech-to-Text

● Module 3 – Translator

22
MODULE 1 – TEXT-TO SPEECH:
 The Text-To-Speech is the first module, this module will direct you to the next page where a
text-box will be given. The text that has to be pronounced as output will be entered here.
 The entered text or the text from the attached file, will be read by the text recognizer and the
words from the text will be matched and stored.
 The speech synthesizer will detect the words from the text recognizer, the data collected will
be manipulated and arranged according to the grammatical terms.
 The synthesizer will transform the collected and arranged data into waveform, Now the wave
form will be given as output through the phone speakers.

MODULE 2 – SPEECH-TO-TEXT:
 Speech-To-Text is the second module, this module will direct you to the next page where a
listen button will be given. You can hold the button and speak the data that has to be displayed
as output.
 The audio recorded will be analyzed, the recorded audio will be braked down into lines. The
start and end of the audio will be analyzed and found.
 From the recorded audio, the noise will be removed and matched with correct corresponding
words.
 This converted waveform will be converted as text and displayed through our mobile screen.

MODULE 3 – TRANSLATOR:

• Translator is the third module; this module will help us to convert the output form the other
modules.

• This module uses the google translate API, through which the output from the other modules
can be translated from one language to other.

23
5.3 IMPLEMENTATION

Project implementation is the process of putting a project plan into action to produce the
deliverables, otherwise known as the products or services, for clients or stakeholders. It takes place
after the planning phase, during which a team determines the key objectives for the project, as well
as the timeline and budget. Implementation involves coordinating resources and measuring
performance to ensure the project remains within its expected scope and budget. It also involves
handling any unforeseen issues in a way that keeps a project running smoothly.
To implement a project effectively, project managers must consistently communicate with a team
to set and adjust priorities as needed while maintaining transparency about the project's status with
the clients or any key stakeholders. Implementation is the stage in the project where the theoretical
design is turned into a working system and is giving confidence on the new system for the users
that it will work efficiently and effectively. It involves careful planning, investigation of the
current system and its constraints on implementation, design of methods to achieve the
changeover, an evaluation of change over methods. Apart from planning major task of preparing
the implementation are education and training of users. The implementation process begins with
preparing a plan for the implementation of the system. According to this plan, the activities are to
be carried out, discussions made regarding the equipment and resources and the additional
equipment has to be acquired to implement the new system. In network backup system no,
additional resources are needed. Implementation is the final and the most important phase. The
most critical stage in achieving a successful new system is giving the users confidence that the
new system will work and be effective. The system can be implemented only after thorough testing
is done and if it is found to be working according to the specification.

24
CHAPTER 6

25
SYSTEM TESTING:

6.1 TESTING DEFINITON:

● Testing is a process of executing a program with the intent of finding an error. A good test
case is one that has a high probability of finding an as-yet –undiscovered error. A successful
test is one that uncovers an as-yet- undiscovered error. System testing is the stage of
implementation, which is aimed at ensuring that the system works accurately and
efficiently as expected before live operation commences. It verifies that the whole set of
programs hang together. System testing requires a test consists of several key activities and
steps for run program, string, system and is important in adopting a successful new system.
This is the last chance to detect and correct errors before the system is installed for user
acceptance testing.
● The software testing process commences once the program is created and the
documentation and related data structures are designed. Software testing is essential for
correcting errors. Otherwise, the program or the project is not said to be complete. Software
testing is the critical element of software quality assurance and represents the ultimate the
review of specification design and coding. Testing is the process of executing the program
with the intent of finding the error. A good test case design is one that as a probability of
finding a yet undiscovered error. A successful test is one that uncovers a yet undiscovered
error.
● The purpose of testing is to discover errors. Testing is the process of trying to discover
every conceivable fault or weakness in a work product. It provides a way to check the
functionality of components, sub – assemblies or finished product it is the process of
exercising software with the intent of ensuring that the software and does not fail in an
unacceptable manner. System testing is the stage of implementation, which is aimed at
ensuring that the system works accurately and efficiently before live operation commences.
Testing is the process of executing the program with the intent of finding errors and missing
operations and also a complete verification to determine whether the objectives are met
and the user requirements are satisfied. The ultimate aim is Quality assurance.

26
6.2 TESTING OBJECTIVE:

● We To find errors in the developed software. To check the working of the function is
according to the specification. Their behavior and performance required are fulfilled. To
check the reliability and quality of the software.
● We feed the input images after pre-processing to our model for training and testing after
applying all the operations mentioned above.
● The prediction layer estimates how likely the image will fall under one of the classes. So,
the output is normalized between 0 and 1 and such that the sum of each value in each class
sums to 1. We have achieved this using SoftMax function.
● At first the output of the prediction layer will be somewhat far from the actual value. To
make it better we have trained the networks using labelled data. The cross-entropy is a
performance measurement used in the classification. It is a continuous function which is
positive at values which is not same as labelled value and is zero exactly when it is equal
to the labelled value. Therefore, we optimized the cross-entropy by minimizing it as close
to zero. To do this in our network layer we adjust the weights of our neural networks.
TensorFlow has an inbuilt function to calculate the cross entropy.
● As we have found out the cross-entropy function, we have optimized it using Gradient
Descent in fact with the best gradient descent optimizer is called Adam Optimizer

6.3 TYPES OF TESTING:

● Unit testing
● Integration testing
● Functional testing
● System testing
● White box testing
● Black box testing

27
UNIT TESTING:
● Unit testing is conducted to verify the functional performance of each modular component
of the software. Unit testing focuses on the smallest unit of the software design (i.e.), the
module. The white-box testing techniques were heavily employed for unit testing.

● Unit tests perform basic tests at component level and test a specific business process,
application, and/or system configuration. Unit tests ensure that each unique path of a
business process performs accurately to the documented specifications and contains clearly
defined inputs and expected results.

INTEGRATION TESTING:
● Integration testing is a systematic technique for construction the program structure while
at the same time conducting tests to uncover errors associated with interfacing. i.e.,
integration testing is the complete testing of the set of modules which makes up the product.
The objective is to take untested modules and build a program structure tester should
identify critical modules. Critical modules should be tested as early as possible. One
approach is to wait until all the units have passed testing, and then combine them and then
tested. This approach is evolved from unstructured testing of small programs. Another
strategy is to construct the product in increments of tested units. A small set of modules are
integrated together and tested, to which another module is added and tested incombination.
And so on. The advantages of this approach are that, interface dispenses can be easily found
and corrected.

FUNCTIONAL TESTS:
● Functional test cases involved exercising the code with nominal input values for which the
expected results are known, as well as boundary values and special values, such as logically
related inputs, files of identical elements, and empty files.
● Three types of tests in Functional test:
i. Performance Test
ii. Stress Test
iii. Structure Test
28
SYSTEM TEST:

System testing ensures that the entire integrated software system meets requirements. It tests a
configuration to ensure known and predictable results. An example of system test is the
configuration-oriented system integration test. System testing is based on process descriptions and
flows, emphasizing pre-driven process links and integration points.

WHITE BOX TESTING:


This testing is also called as Glass box testing. In this testing, by knowing the specific functions
that a product has been design to perform test can be conducted that demonstrate each function is
fully operational at the same time searching for errors in each function. It is a test case design
method that uses the control structure of the procedural design to derive test cases. Basis path
testing is a white box testing. Basis path testing:
● Flow graph notation
● Cyclometric complexity
● Deriving test cases
● Graph matrices Control

BLACK BOX TESTING:


In this testing by knowing the internal operation of a product, test can be conducted to ensure that
“all gears mesh”, that is the internal operation performs according to specification and all internal
components have been adequately exercised. It fundamentally focuses on the functional
requirements of the software. The steps involved in black box test case design are:
● Graph based testing methods
● Equivalence partitioning
● Boundary value analysis
● Comparison testing

29
CHAPTER 7

30
7.1 SUMMARY:

● To conclude the description about the project. The project developed using java and
android studio is based on the requirement specification of the user and the analysis of
the existing system, with flexibility for future enhancement.

● We achieved final accuracy of 95.0% on our data set. We have improved our prediction
after implementing two layers of algorithms wherein we have verified and predicted
symbols which are more similar to each other.

● This gives us the ability to detect almost all the symbols provided that they are shown
properly, there is no noise in the background and lighting is adequate.

● We have achieved an accuracy of 95.8% in our model using only layer 1 of our algorithm,
and using the combination of layer 1 and layer 2 we achieve an accuracy of 98.0%,
which is a better accuracy then most of the current research papers on speech recognition.
● They also used CNN for their recognition system. One thing should be noted that our
model doesn’t uses any background subtraction algorithm whiles some of the models
present above do that.

● So, once we try to implement background subtraction in our project the accuracies may
vary. On the other hand, most of the above projects use Kinect devices but our main aim
was to create a project which can be used with readily available resource

31
7.2 FUTURE ENHANCEMENTS:

● We As per the statistics users are only interested in applications that have good and creative
graphical user interface so we have plans of enhancing good graphical interface.
● As our application have translator feature, we wanted to include a greater number of
languages so large group of people who know different languages can use it.
● In future quick suggestion feature can be added to this application.
● We are also thinking of improving the Pre-Processing to predict voice in noisy conditions
with a higher accuracy.
● This project can be enhanced by being built as a web/mobile application for the users to
conveniently access the project. Also, the existing project only works for it can be extended
to work for other native speech recognitions with the right amount of data set and training.
This project implements a finger spelling translator; however, speech recognitions arealso
spoken in a contextual basis where each gesture could represent an object, or verb.
● Speech recognition System has been developed from classifying only static signs and
alphabets, to the system that can successfully recognize dynamic movements that comes in
continuous sequences of images. Researcher nowadays are paying more attention to make
a large vocabulary for speech recognition systems.

32
CHAPTER 8

33
8.1 SCREENSHOTS:

HOMEPAGE:

8.1.1 HOMEPAGE

This is the first page of our application.

● The three modules of our application are displayed here.


● When clicking on each module transfers to particular module.

34
MODULE 1-TEXT TO SPEECH:

8.1.2 TEXT TO SPEECH CONVERSION PAGE

 On clicking text to speech module, it navigates to next activity.


 Enter the text you want to speak on the text box and click the speak button
 On clicking clear button, it clears the text entered in text box The output will be displayed
through mobile speakers

35
MODULE 2-SPEECH TO TEXT:

8.1.3 SPEECH TO TEXT CONVERSION PAGE

On clicking speech to text, it navigates to next activity.


● If the microphone is not enabled it request permission to enable the microphone
for first time.
● On clicking the starter button, it allows you to record the audio.
● On clicking the stoper button, it stops and display the recorded audio as text.

36
MODULE 3-TRANSLATOR:

8.1.4 TRANSLATOR PAGE

 On clicking TRANSLATOR, it navigates to next activity.


 In the given text box enter the text that you want to translate or paste the text that you
want to translate
 Click the translate button the data given in text box will be displayed below.

37
8.2 CODING:

XML CODING FOR HOMEPAGE:

# Importing the Libraries Required

import <?xml version="1.0" encoding="utf-8"?>

<androidx.constraintlayout.widget.ConstraintLayout

xmlns:android="http://schemas.android.com/apk/res/android"

xmlns:app="http://schemas.android.com/apk/res-auto"

xmlns:tools="http://schemas.android.com/tools"

android:layout_width="match_parent"

android:layout_height="match_parent"

tools:context=".SpeechToTextActivity">

<TextView

android:id="@+id/textview"

android:layout_width="253dp"

android:layout_height="60dp"

android:layout_marginTop="84dp"

android:gravity="center"

android:text="@string/app_name"

android:textSize="24sp"

38
android:textColor="@color/black"

app:layout_constraintEnd_toEndOf="parent"

app:layout_constraintStart_toStartOf="parent"

app:layout_constraintTop_toTopOf="parent" />

<Button

android:id="@+id/button"

android:layout_width="317dp"

android:layout_height="74dp"

android:layout_marginTop="96dp"

android:gravity="center"

android:text="TEXT-TO-SPEECH"

android:textSize="18sp"

android:onClick="textToSpeechOnclick"

app:layout_constraintEnd_toEndOf="parent"

app:layout_constraintHorizontal_bias="0.553"

app:layout_constraintStart_toStartOf="parent"

app:layout_constraintTop_toBottomOf="@+id/textview" />

<Button

android:id="@+id/button2"

android:layout_width="317dp"

39
android:layout_height="74dp"

android:layout_marginTop="52dp"

android:gravity="center"

android:text="SPEECH-TO-TEXT"

android:textSize="18sp"

android:onClick="speechToTextOnclick"

app:layout_constraintEnd_toEndOf="parent"

app:layout_constraintHorizontal_bias="0.553"

app:layout_constraintStart_toStartOf="parent"

app:layout_constraintTop_toBottomOf="@+id/button" />

<Button

android:id="@+id/button3"

android:layout_width="317dp"

android:layout_height="74dp"

android:layout_marginTop="60dp"

android:gravity="center"

android:text="TRANSLATOR"

android:textSize="18sp"

app:layout_constraintEnd_toEndOf="parent"

app:layout_constraintStart_toStartOf="parent"

app:layout_constraintTop_toBottomOf="@+id/button2" />

</androidx.constraintlayout.widget.ConstraintLayout>

40
CONNECTIVITY CODE FROM ONE MODULE TO ANOTHER:

# Importing the Libraries Required

import androidx.appcompat.app.AppCompatActivity;

import android.content.Intent;

import android.os.Bundle;

import android.view.View;

public class FirstPageActivity extends AppCompatActivity

@Override

protected void onCreate(Bundle savedInstanceState)

super.onCreate(savedInstanceState);

setContentView(R.layout.first_page);

public void speechToTextOnclick(View view)

Intent i = new Intent(getApplicationContext(), SpeechToTextActivity.class);

startActivity(i);

public void textToSpeechOnclick(View view)

Intent i = new Intent(getApplicationContext(), TextToSpeechActivity.class);

41
startActivity(i);

public void translatorOnclick(View view)

Intent i = new Intent(getApplicationContext(), Translator.class);

startActivity(i);

42
XML CODE FOR ANDROID MANIFEST PERMISSION:

<?xmlversion="1.0"encoding="utf-8"?>

<manifestxmlns:android="http://schemas.android.com/apk/res/android"

package="com.example.speechtotext">

<uses-permission

android:name="android.permission.RECORD_AUDIO"/>

<uses-permission

android:name="android.permission.INTERNET"/>

<application

android:allowBackup="true"

android:icon="@mipmap/ic_launcher"

android:label="@string/app_name"

android:roundIcon="@mipmap/ic_launcher_round"

android:supportsRtl="true"

android:theme="@style/Theme.Speechtotext">

<activity android:name=".SpeechToTextActivity">

<intent-filter>

<category

android:name="android.intent.

category.LAUNCHER" />

</intent-filter>
43
</activity>

<activity android:name=".TextToSpeechActivity">

<intent-filter>

<category

android:name="android.intent.

category.LAUNCHER" />

</intent-filter>

</activity>

<activity android:name=".FirstPageActivity">

<intent-filter>

<action

android:name="android.intent.

action.MAIN" />

<category

android:name="android.intent.

category.LAUNCHER" />

</intent-filter>

</activity>

</application>

</manifest>

44
TEXT-TO-SPEECH XML CODING:

<?xml version="1.0" encoding="utf-8"?>

<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"

xmlns:app="http://schemas.android.com/apk/res-auto"

xmlns:tools="http://schemas.android.com/tools"

android:layout_width="match_parent"

android:layout_height="match_parent"

android:orientation="vertical" a

ndroid:padding="20dp"

android:gravity="center"

tools:context=".TextToSpeechActivity">

<EditText

android:layout_width="match_parent"

android:layout_height="wrap_content"

android:id="@+id/et_input"

android:hint="Enter The Text"

android:textAlignment="center"

android:gravity="center_horizontal"

android:lines="5"

android:background="@drawable/bg_round"/>

45
<LinearLayout

android:layout_width="match_parent"

android:layout_height="wrap_content"

android:layout_marginTop="10dp">

<Button

android:layout_width="wrap_content"

android:layout_height="wrap_content"

android:layout_weight="1"

android:id="@+id/bt_convert"

android:text="speak"

/>

<androidx.appcompat.widget.AppCompatSpinner

android:layout_width="10dp"

android:layout_height="wrap_content"/>

<Button

android:layout_width="wrap_content"

android:layout_height="wrap_content"

android:layout_weight="1"

android:id="@+id/bt_clear"

android:text="clear"

/>

</LinearLayout>

46
TEXT-TO-SPEECH JAVA CODING:

import android.os.Bundle;

import android.speech.tts.TextToSpeech;

import android.view.View;

import android.widget.Button; import android.widget.EditText;

import androidx.appcompat.app.AppCompatActivity;

import java.util.Locale;

public class TextToSpeechActivity extends AppCompatActivity {

EditText edtext;

Button btconvert,btclear;

android.speech.tts.TextToSpeech textToSpeech;

@Override

protected void onCreate(Bundle savedInstanceState)

super.onCreate(savedInstanceState);

setContentView(R.layout.text_to_speech);

edtext = findViewById(R.id.et_input);

btconvert = findViewById(R.id.bt_convert);

btclear = findViewById(R.id.bt_clear);

textToSpeech = new android.speech.tts.TextToSpeech(getApplicationContext()

, new android.speech.tts.TextToSpeech.OnInitListener() {

@Override

public void onInit(int status)

47
{

if (status == android.speech.tts.TextToSpeech.SUCCESS)

int lang = textToSpeech.setLanguage(Locale.ENGLISH);}

});

btconvert.setOnClickListener(new View.OnClickListener()

@Override

public void onClick(View v)

String s = edtext.getText().toString();

int speech = textToSpeech.speak(s,

android.speech.tts.TextToSpeech.QUEUE_FLUSH, null);}

});

btclear.setOnClickListener(new View.OnClickListener() {

@Override

public void onClick(View v)

edtext.setText("");

});

}}

48
SPEECH-TO-TEXT XML CODING:

<?xml version="1.0" encoding="utf-8"?>

<androidx.constraintlayout.widget.ConstraintLayout

xmlns:android="http://schemas.android.com/apk/res/android"

xmlns:app="http://schemas.android.com/apk/res-auto"

xmlns:tools="http://schemas.android.com/tools"

android:layout_width="match_parent"

android:layout_height="match_parent"

tools:context=".SpeechToTextActivity">

<TextView

android:id="@+id/output"

android:layout_width="300dp"

android:layout_height="80dp"

android:layout_marginTop="144dp"

android:gravity="center" android:textColor="#0C0C0C"

android:textSize="22sp"

app:layout_constraintEnd_toEndOf="parent"

app:layout_constraintHorizontal_bias="0.495"

app:layout_constraintStart_toStartOf="parent"

app:layout_constraintTop_toTopOf="parent" />

49
<Button

android:id="@+id/rec"

android:layout_width="wrap_content"

android:layout_height="wrap_content"

android:layout_marginTop="60dp"

android:onClick="startRec" android:text="startRec"

app:layout_constraintEnd_toEndOf="parent"

app:layout_constraintStart_toStartOf="parent"

app:layout_constraintTop_toBottomOf="@+id/output" tools:ignore="OnClick"

/>

<Button

android:id="@+id/stop" android:layout_width="108dp"

android:layout_height="48dp"

android:layout_marginTop="44dp"

android:onClick="stopRec" android:text="stopRec"

app:layout_constraintEnd_toEndOf="parent"

app:layout_constraintHorizontal_bias="0.512"

app:layout_constraintStart_toStartOf="parent"

app:layout_constraintTop_toBottomOf="@+id/rec"

tools:ignore="OnClick" />

</androidx.constraintlayout.widget.ConstraintLayout>

50
SPEECH-TO-TEXT JAVA CODING:

import androidx.appcompat.app.AppCompatActivity;

import androidx.core.app.ActivityCompat;

import androidx.core.content.ContextCompat;

import android.Manifest;

import android.content.Intent;

import android.content.pm.PackageManager;

import android.os.Bundle;

import android.speech.RecognitionListener;

import android.speech.RecognizerIntent;

import android.speech.SpeechRecognizer;

import android.view.View;

import android.widget.TextView;

import java.util.ArrayList;

public class SpeechToTextActivity extends AppCompatActivity

TextView txt;

SpeechRecognizer recognizer; Intent intent;

@Override

protected void onCreate(Bundle savedInstanceState)

super.onCreate(savedInstanceState);

setContentView(R.layout.speech_to_text);
51
checkpermission();

convert();

txt = findViewById(R.id.output);

System.out.println("Inside on create");

public void checkpermission()

if(!(ContextCompat.checkSelfPermission(SpeechToTextActivity.this,

Manifest.permission.RECORD_AUDIO)== PackageManager.PERMISSION_GRANTED))

ActivityCompat.requestPermissions(SpeechToTextActivity.this,new String[]

Manifest.permission.RECORD_AUDIO},1);

if(!(ContextCompat.checkSelfPermission(SpeechToTextActivity.this,

Manifest.permission.INTERNET)== PackageManager.PERMISSION_GRANTED))

ActivityCompat.requestPermissions(SpeechToTextActivity.this,new String[]

Manifest.permission.INTERNET},1);

public void convert()

{ 52
recognizer=SpeechRecognizer.createSpeechRecognizer(SpeechToTextActivity.this);

intent=new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);

recognizer.setRecognitionListener(new RecognitionListener()

@Override

public void onReadyForSpeech(Bundle params)

System.out.println("Inside on ready for speech");

@Override

public void onBeginningOfSpeech()

System.out.println("Inside on Beginning of speech");

@Override

public void onRmsChanged(float rmsdB)

System.out.println("Inside on rms changed");

@Override

public void onBufferReceived(byte[] buffer)

System.out.println("Inside on Buffer received");

} 53
@Override

public void onEndOfSpeech()

System.out.println("Inside on End of speech");

@Override

public void onError(int error)

System.out.println("Inside on Error"+ error);

@Override

public void onResults(Bundle results)

System.out.println("Inside on Results");

ArrayList<String>

words=results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION);

for(String word: words)

System.out.println(word);

if(words!=null)

txt.setText(words.get(0));

54
@Override

public void onPartialResults(Bundle partialResults)

System.out.println("Inside on Partial Results");

@Override

public void onEvent(int eventType, Bundle params)

System.out.println("Inside on event");

});

public void startRec(View view)

System.out.println("Inside start rec");

txt.setText("");

recognizer.startListening(intent);

public void stopRec(View view)

recognizer.stopListening();

55
TRANSLATOR XML CODING:

<?xml version="1.0" encoding="utf-8"?>

<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"

xmlns:tools="http://schemas.android.com/tools"

android:layout_width="match_parent"

android:layout_height="match_parent"

android:orientation="vertical"

tools:context=".MainActivity">

<EditText

android:id="@+id/inputToTranslate"

android:layout_width="wrap_content"

android:layout_height="wrap_content"

android:layout_gravity="center"

android:layout_marginTop="48dp"

android:layout_marginBottom="16dp"android:ems="10"

android:hint="Enter text"

android:inputType="text" />

56
<Button

android:id="@+id/translateButton"

android:layout_width="wrap_content"

android:layout_height="wrap_content"

android:layout_gravity="center"

android:layout_marginBottom="32dp"

android:text="Translate" />

<TextView

android:id="@+id/translatedTv"

android:layout_width="wrap_content"

android:layout_height="wrap_content"

android:layout_gravity="center" android:textSize="16sp"

/>

</LinearLayout>

57
MANIFEST PERMISSION:

<?xml version="1.0" encoding="utf-8"?>

<manifest xmlns:android="http://schemas.android.com/apk/res/android">

<uses-permission android:name="android.permission.INTERNET"/>

<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE"/>

<!--Activities below as usual-->

</manifest>

58
TRANSLATOR JAVA CODING:

public class MainActivity extends AppCompatActivity

private EditText inputToTranslate;

private TextView translatedTv;

private String originalText;

private String translatedText;

private boolean connected;

Translate translate;

@Override

protected void onCreate(Bundle savedInstanceState)

super.onCreate(savedInstanceState);

setContentView(R.layout.activity_main);

inputToTranslate = findViewById(R.id.inputToTranslate);

translatedTv = findViewById(R.id.translatedTv);

Button translateButton = findViewById(R.id.translateButton);

translateButton.setOnClickListener(new View.OnClickListener()

@Override

public void onClick(View v)

59
if (checkInternetConnection())

//If there is internet connection, get translate service and start translation:getTranslateService();

translate();

else

//If not, display "no connection" warning:

translatedTv.setText(getResources().getString(R.string.no_connection));

});

public void getTranslateService()

StrictMode.ThreadPolicy policy = new

StrictMode.ThreadPolicy.Builder().permitAll().build();

StrictMode.setThreadPolicy(policy);

try (InputStream is = getResources().openRawResource(R.raw.credentials))

//Get credentials:

final GoogleCredentials myCredentials = GoogleCredentials.fromStream(is);

60
//Set credentials and get translate service:TranslateOptions

translateOptions=

TranslateOptions.newBuilder().setCredentials(myCredentials).build();

translate = translateOptions.getService();

catch (IOException ioe)

ioe.printStackTrace();

public void translate()

//Get input text to be translated:

originalText = inputToTranslate.getText().toString();

Translation translation=

translate.translate(originalText,Translate.TranslateOption.targetLanguage("tr"),

Translate.TranslateOption.model("base")); translatedText = translation.getTranslatedText();

//Translated text and original text are set to TextViews:

translatedTv.setText(translatedText);

public boolean checkInternetConnection()

61
//Check internet connection:

ConnectivityManager connectivityManager = (ConnectivityManager)

getSystemService(Context.CONNECTIVITY_SERVICE);

//Means that we are connected to a network (mobile or wi-fi)connected =

connectivityManager.getNetworkInfo(ConnectivityManager.TYPE_MOBILE).getState() ==

NetworkInfo.State.CONNECTED ||

connectivityManager.getNetworkInfo(ConnectivityManager.TYPE_WIFI).getState() ==

NetworkInfo.State.CONNECTED;

return connected;

62
8.3 DATA DICTIONARY

Global vocabulary :
Support your global user base with Speech-to-Text’s extensive language support in over 125
languages and variants.
Streaming speech recognition:
Receive real-time speech recognition results as the API processes the audio input streamed from
your application’s microphone or sent from a prerecorded audio file (inline or through Cloud
Storage).
Speech adaptation:
Customize speech recognition to transcribe domain-specific terms and rare words by providing hints
and boost your transcription accuracy of specific words or phrases. Automatically convert spoken
numbers into addresses, years, currencies, and more using classes.
Speech-to-Text On-Prem :
Have full control over your infrastructure and protected speech data while leveraging Google’s
speech recognition technology on-premises, right in your own private data centers. Contact sales to
get started.
Multichannel recognition:
Speech-to-Text can recognize distinct channels in multichannel situations (e.g., video conference)
and annotate the transcripts to preserve the order.
Noise robustness:
Speech-to-Text can handle noisy audio from many environments without requiring additional noise
cancellation.
Domain-specific models:
Choose from a selection of trained models for voice control and phone call and video transcription
optimized for domain-specific quality requirements. For example, our enhanced phone call model
is tuned for audio originated from telephony, such as phone calls recorded at an 8khz sampling rate.

63
Content filtering:
Profanity filter helps you detect inappropriate or unprofessional content in your audio data and filter
out profane words in text results.
Transcription evaluation :
Upload your own voice data and have it transcribed with no code. Evaluate quality by iterating on
your configuration.
Automatic punctuation (beta):
Speech-to-Text accurately punctuates transcriptions (e.g., commas, question marks, and periods).
Speaker diarylation (beta):
Know who said what by receiving automatic predictions about which of the speakers in a
conversation spoke each utterance.

64
CHAPTER 9

65
BIBLIOGRAPHY AND REFERENCES:

1. Oviatt S. Predicting Spoken Disfluencies During Human-Computer Interaction. Computer Speech


and Language 9(1):19-35, January 1995.
2. http://www.igntu.ac.in/eContent/IGNTU-eContent-815947141046-MA-Linguistics-4-
HarjitSingh-ComputationalLinguistics-
3. Marsh E, Wauchope K, Gurney JO. Human-Machine Dialogue for Multi-Modal Decision Support
Systems. Technical Report AIC-94-032, NCARAI, US Naval Research Laboratory, Washington,
DC.
4. pdfhttps://www.researchgate.net/publication/304651244_VOICE_RECOGNITION_SYSTEM_S
PEECH-TO-TEXT
5. Cohen, PR. The Role of Natural Language in a Multimodal Interface. Proceedings of the ACM
Symposium on User Interface Software and Technology, Monterey California, ACM Press,
November 15-18, 1992.
6. https://redirect.cs.umbc.edu/~mgrass2/dissert/annbib.html#speech
7. https://en.wikipedia.org/wiki/Speech_recognition

8. https://github.com/topics/speech-to-text
9. http://www.ling.helsinki.fi/~gwilcock/Tartu-2003/L7-
Speech/JSAPI/Recognition.html#:~:text=A%20speech%20recognizer%20is%20a,of%20support
ing%20classes%20and%20interfaces.

66

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy