Ranjith S - Mini Project
Ranjith S - Mini Project
(PCA20P02L)
ON
By
RANJITH S
(RA2132241020088)
Submitted to the
Ramapuram, Chennai.
NOVEMBER 2022
COLLEGE OF SCIENCE & HUMANITIES
Ramapuram, Chennai.
BONAFIDE CERTIFICATE
Certified that this project report titled “SPEECH AND TEXT RECOGNITION”
In the developed world, smartphones have now overtaken the usage of earlier
mobile system, Mobile apps are playing an ever-increasing role in our day today
life. This apps serves you with three different features " Text to speech " This
feature helps people who are dumb to express the words through mobile phone
speakers. "Speech to text " This feature allows the deaf people to understand the
feelings and words of others through their smart phone screen. " Translator "
feature renders the message and feeling of a person from one language to another
this makes our life easier and more comfortable. Voice is the basic, common and
efficient form of communication method for people to interact with each other.
Today speech technologies are commonly available for a limited but interesting
range of task. These technologies enable machines to respond correctly and
reliably to human voices and provide useful and valuable services. As
communicating with computer is faster using voice rather than using keyboard,
so people will prefer such system. Communication among the human being is
dominated by spoken language, therefore it is natural for people to expect voice
interfaces with computer.
ACKNOWLEDGEMENT
I thank the almighty who has made this possible. Finally, I thank my beloved
family member and friend for their motivation, encouragement and cooperation in
all aspect which led me to the completion of this project.
RANJITH S
TABLE OF CONTENTS
ACKNOWLEDGEMENT
LIST OF TABLES
LIST OF FIGURES
1 1.1PROJECT INTRODUCTION 2
WORKING ENVIRONMENT
SYSTEM ANALYSIS
10
3.2 EXISTING SYSTEM
11
3.3 DRAWBACKS OF EXISTING SYSTEM
11
3 3.4 PROPOSED SYSTEM
12
3.5 BENEFITS OF PROPOSED SYSTEM
13
3.6 SCOPE OF THE PROJECT
SYSTEM DESIGN
PROJECT DESCRIPTION
5.1 OBJECTIVE 22
5 23
5.2 MODULE DESCRIPTION
24
5.3 IMPLEMENTATION
SYSTEM TESTING
28
6.3 TYPES OF TESTING
CONCLUSION
7.1 SUMMARY 31
7
7.2 FUTURE ENHANCEMENTS 32
APPENDIX
8.1 SCREENSHOTS 34
8
8.2 CODING 38
8.1.1 HOMEPAGE 34
1
INTRODUCTION
2
Some speech recognition systems require "training" (also called "enrollment") where an
individual speaker reads text or isolated vocabulary into the system. The system analyzes the
person's specific voice and uses it to fine-tune the recognition of that person's speech, resulting
in increased accuracy. Systems that do not use training are called "speaker-independent"[1]
systems. Systems that use training are called "speaker dependent”. Speech recognition
applications include voice user interfaces such as voice dialing (e.g. "call home"), call routing
(e.g. "I would like to make a collect call"), demotic appliance control, search key words (e.g. find
a podcast where particular words were spoken), simple data entry (e.g., entering a credit card
number), preparation of structured documents (e.g. a radiology report), determining speaker
characteristics, speech-to-text processing (e.g., word processors or emails), and aircraft (usually
termed direct voice input).
3
CHAPTER 2
4
WORKING ENVIRONMENT
• Java 19
• Frontend XML
• Android Studio
REQUIREMENT ANALYSIS
Requirements are a feature of a system or description of something that the system is capable of
doing in order to fulfil the system’s purpose. It provides the appropriate mechanism for
understanding what the customer wants, analyzing the needs assessing feasibility, negotiating a
reasonable solution, specifying the solution unambiguously, validating the specification and
managing the requirements as they are translated into an operational system.
5
JAVA
One of the biggest reasons why Java is so popular is the platform independence. Programs can run
on several different types of computers; as long as the computer has a Java Runtime Environment
(JRE) installed, a Java program can run on it. Most types of computers will be compatible with a
JRE including PCs running on Windows, Macintosh computers, Unix or Linux computers, and large
mainframe computers, as well as mobile phones. Since it has been around for so long, some of the
biggest organizations in the world are built using the language. Many banks, retailers, insurance
companies, utilities, and manufacturers all use Java.
6
ANDROID STUDIO
Android Studio is the official integrated development environment (IDE) for Google's Android
operating system, built on JetBrains' IntelliJ IDEA software and designed specifically for
Android development. It is available for download on Windows, macOS and Linux based
operating systems or as a subscription-based service in 2020. It is a replacement for the Eclipse
Android Development Tools (E-ADT) as the primary IDE for native Android application
development. Android Studio was announced on May 16, 2013 at the Google I/O conference. It
was in early access preview stage starting from version 0.1 in May 2013, then entered beta stage
starting from version 0.8 which was released in June 2014. The first stable build was released in
December 2014, starting from version 1.0.
Features in Android Studio
Android Studio supports all the same programming languages of IntelliJ (and CLIN)
e.g., Java, C++, and more with extensions, such as Go and Android Studio 3.0 or later supports
Kotlin and "all Java 7 language features and a subset of Java 8 language features that vary by
platform version." External projects backport some Java 9 features. While IntelliJ states that
Android Studio supports all released Java versions, and Java 12, it's not clear to what level
Android Studio supports Java versions up to Java 12 (the documentation mentions partial Java 8
support). At least some new language features up to Java 12 are usable in Android. Once an app
has been compiled with Android Studio, it can be published on the Google Play Store. The
application has to be in line with the Google Play Store developer content policy
7
CHAPTER 3
8
SYSTEM ANALYSIS
3.1 FEASIBILITY STUDY
The feasibility of the project is analyzed in this phase and business proposal is put forth N with a
very general plan for the project and some cost estimates. During system analysis the feasibility
study of the proposed system is to be carried out. This is to ensure that the proposed system is not a
burden to the person. For feasibility analysis, some understanding of the major requirements for
the system is essential. Three key considerations involved in the feasibility analysis are
Economic Feasibility
Technical Feasibility
Social Feasibility
9
3.1.3. Social Feasibility
The aspect of study is to check the level of acceptance of the system by the user. This includes the
process of training the user to use the system efficiently. The user must not feel threatened by the
system, instead must accept it as a necessity. The level of acceptance by the users solely depends on
the methods that are employed to educate the user about the system and to make him familiar with
it. His level of confidence must be raised so that he is also able to make some constructive criticism,
which is welcomed, as he is the final user of the system.
10
3.3 DRAWBACKS OF EXISTING SYSTEM
These are some of the good rated apps currently available in the app market.
• These applications actually don’t help this community in any way, these apps are just
named in the prefix of deaf and dumb but these things actually just increase the volume,
this is not going to help them in anyway.
• Most of the existing system have the title as deaf and dumb assistance but most of the
existing systems do not serve the relevant services that has to be served in this title, but our
system provides you the perfect services which has to be served with this Title.
All the existing systems till now supports only a single assistance, whereas our system serves you
both the assistance Within a single application. Most of the existing system have the title as deaf
and dumb assistance but most of the existing systems do not serve the relevant services that has to
be served in this title, but our system provides you the perfect services which has to be served with
this Title. As our System provides you with the translator which will be more helpful with this
application which enhances your experience with this application
Phase-1: Train the Convolutional Neural Network (CNN) with the train dataset.
11
3.4 PROPOSED SYSTEM
We achieved final accuracy of 95.0% on our data set. We have improved our prediction after
implementing two layers of algorithms wherein we have verified and predicted symbols which
are more similar to each other.
This gives us the ability to detect almost all the symbols provided that they are shown properly,
there is no noise in the background and lighting is adequate.
12
3.6 SCOPE OF THE PROJECT:
We are planning to achieve higher accuracy even in case of complex backgrounds by trying out
various background subtraction algorithms. This system provides three features text to speech,
speech to text and translator This system helps deaf people to read the message what they want to
hear will be displayed through mobile phone screen. This system helps dumb people to speak the
message what they want to speak through mobile phone speakers. This system helps both deaf and
dumb people to translate the message to other languages There are many more works that can be
carried out as an extension of this project. This system predicts the need of the mute person but
future systems may be developed that could communicate to the mute person’s mobile device,
allowing the system to learn the needs of the user, thereby provisioning the development of
recommendatory systems as they have the relevant data related to the mute person that can easily
be learned thought he neural network model.
13
CHAPTER 4
14
SYSTEM DESIGN
● Systems design is the process of defining elements of a system like modules, architecture,
components and their interfaces and data for a system based on the specified requirements.
It is the process of defining, developing and designing systems which satisfies the specific
needs and requirements of a business or organization.
● A systemic approach is required for a coherent and well-running system. Bottom-Up or
Top-Down approach is required to take into account all related variables of the system. A
designer uses the modelling languages to express the information and knowledge in a
structure of system that is defined by a consistent set of rules and definitions. The designs
can be defined in graphical or textual modelling languages.
DESIGN METHODS:
● Architectural design: To describes the views, models, behavior, and structure of the system.
● Logical design: To represent the data flow, inputs and outputs of the system. Example: ER
Diagrams (Entity Relationship Diagrams).
● Physical design: Defined as a) How users add information to the system and how the
system represents information back to the user. b) How the data is modelled and stored
within the system. c) How data moves through the system, how data is validated, secured
and/or transformed as it flows through and out of the system.
15
4.1 DATA FLOW DIAGRAM
DFDs make it easy to depict the business requirements of applications by representing the
sequence of process steps and flow of information using a graphical representation or visual
representation rather than a textual description. When used through an entire development
process, they first document the results of business analysis. Then, they refine the
representation to show how information moves through, and is changed by, application flows.
Both automated and manual processes are represented.
DATAFLOW DIAGRAM
16
4.2 USE CASE DIAGRAM
Use-case diagrams describe the high-level functions and scope of a system. These diagrams also
identify the interactions between the system and its actors. The use cases and actors in use-case
diagrams describe what the system does and how the actors use it, but not how the system operates
internally. Use-case diagrams illustrate and define the context and requirements of either an entire
system or the important parts of the system. You can model a complex system with a single use-
case diagram, or create many use-case diagrams to model the components of the system. You
would typically develop use-case diagrams in the early phases of a project and refer to them
throughout the development process.
USECASE DIAGRAM
17
4.2.2 SPEECH TO TEXT
18
4.3 ARCHITECTURE DIAGRAM
An architecture diagram is a visual representation of all the elements that make up part, or all, of
a system. Above all, it helps the engineers, designers, stakeholders and anyone else involved in the
project understand a system or app’s layout. This diagram gives a top-level view of a software’s
structure. To elaborate, it generally includes various components that interact with each other and
how the software interacts with external databases and servers. It’s useful for explaining software
to clients and stakeholders; and assessing the impact of adding new features or upgrading,
replacing, or merging existing applications.
19
4.3.2 ARCHITECTURE DIAGRAM 2
20
CHAPTER 5
21
CHAPTER 5
PROJECT DESCRIPTION:
5.1 OBJECTIVE:
● To Statistics show that there are currently over 325, 000 health-related mobile apps now
presented on app marketplaces.
● As per the statistics healthcare app developers, are keen on developing projects like fitness
app, calorie burning tracker, online pharmacy apps, online doctor consulting apps are the
areas shown interest in developing.
● There are only very few developers who develop these health app for benefit of the person
in need so these apps are in huge demand in the market.
This application has three modules through which all the operations takes place.
● Module 1 – Text-to-Speech
● Module 2 – Speech-to-Text
● Module 3 – Translator
22
MODULE 1 – TEXT-TO SPEECH:
The Text-To-Speech is the first module, this module will direct you to the next page where a
text-box will be given. The text that has to be pronounced as output will be entered here.
The entered text or the text from the attached file, will be read by the text recognizer and the
words from the text will be matched and stored.
The speech synthesizer will detect the words from the text recognizer, the data collected will
be manipulated and arranged according to the grammatical terms.
The synthesizer will transform the collected and arranged data into waveform, Now the wave
form will be given as output through the phone speakers.
MODULE 2 – SPEECH-TO-TEXT:
Speech-To-Text is the second module, this module will direct you to the next page where a
listen button will be given. You can hold the button and speak the data that has to be displayed
as output.
The audio recorded will be analyzed, the recorded audio will be braked down into lines. The
start and end of the audio will be analyzed and found.
From the recorded audio, the noise will be removed and matched with correct corresponding
words.
This converted waveform will be converted as text and displayed through our mobile screen.
MODULE 3 – TRANSLATOR:
• Translator is the third module; this module will help us to convert the output form the other
modules.
• This module uses the google translate API, through which the output from the other modules
can be translated from one language to other.
23
5.3 IMPLEMENTATION
Project implementation is the process of putting a project plan into action to produce the
deliverables, otherwise known as the products or services, for clients or stakeholders. It takes place
after the planning phase, during which a team determines the key objectives for the project, as well
as the timeline and budget. Implementation involves coordinating resources and measuring
performance to ensure the project remains within its expected scope and budget. It also involves
handling any unforeseen issues in a way that keeps a project running smoothly.
To implement a project effectively, project managers must consistently communicate with a team
to set and adjust priorities as needed while maintaining transparency about the project's status with
the clients or any key stakeholders. Implementation is the stage in the project where the theoretical
design is turned into a working system and is giving confidence on the new system for the users
that it will work efficiently and effectively. It involves careful planning, investigation of the
current system and its constraints on implementation, design of methods to achieve the
changeover, an evaluation of change over methods. Apart from planning major task of preparing
the implementation are education and training of users. The implementation process begins with
preparing a plan for the implementation of the system. According to this plan, the activities are to
be carried out, discussions made regarding the equipment and resources and the additional
equipment has to be acquired to implement the new system. In network backup system no,
additional resources are needed. Implementation is the final and the most important phase. The
most critical stage in achieving a successful new system is giving the users confidence that the
new system will work and be effective. The system can be implemented only after thorough testing
is done and if it is found to be working according to the specification.
24
CHAPTER 6
25
SYSTEM TESTING:
● Testing is a process of executing a program with the intent of finding an error. A good test
case is one that has a high probability of finding an as-yet –undiscovered error. A successful
test is one that uncovers an as-yet- undiscovered error. System testing is the stage of
implementation, which is aimed at ensuring that the system works accurately and
efficiently as expected before live operation commences. It verifies that the whole set of
programs hang together. System testing requires a test consists of several key activities and
steps for run program, string, system and is important in adopting a successful new system.
This is the last chance to detect and correct errors before the system is installed for user
acceptance testing.
● The software testing process commences once the program is created and the
documentation and related data structures are designed. Software testing is essential for
correcting errors. Otherwise, the program or the project is not said to be complete. Software
testing is the critical element of software quality assurance and represents the ultimate the
review of specification design and coding. Testing is the process of executing the program
with the intent of finding the error. A good test case design is one that as a probability of
finding a yet undiscovered error. A successful test is one that uncovers a yet undiscovered
error.
● The purpose of testing is to discover errors. Testing is the process of trying to discover
every conceivable fault or weakness in a work product. It provides a way to check the
functionality of components, sub – assemblies or finished product it is the process of
exercising software with the intent of ensuring that the software and does not fail in an
unacceptable manner. System testing is the stage of implementation, which is aimed at
ensuring that the system works accurately and efficiently before live operation commences.
Testing is the process of executing the program with the intent of finding errors and missing
operations and also a complete verification to determine whether the objectives are met
and the user requirements are satisfied. The ultimate aim is Quality assurance.
26
6.2 TESTING OBJECTIVE:
● We To find errors in the developed software. To check the working of the function is
according to the specification. Their behavior and performance required are fulfilled. To
check the reliability and quality of the software.
● We feed the input images after pre-processing to our model for training and testing after
applying all the operations mentioned above.
● The prediction layer estimates how likely the image will fall under one of the classes. So,
the output is normalized between 0 and 1 and such that the sum of each value in each class
sums to 1. We have achieved this using SoftMax function.
● At first the output of the prediction layer will be somewhat far from the actual value. To
make it better we have trained the networks using labelled data. The cross-entropy is a
performance measurement used in the classification. It is a continuous function which is
positive at values which is not same as labelled value and is zero exactly when it is equal
to the labelled value. Therefore, we optimized the cross-entropy by minimizing it as close
to zero. To do this in our network layer we adjust the weights of our neural networks.
TensorFlow has an inbuilt function to calculate the cross entropy.
● As we have found out the cross-entropy function, we have optimized it using Gradient
Descent in fact with the best gradient descent optimizer is called Adam Optimizer
● Unit testing
● Integration testing
● Functional testing
● System testing
● White box testing
● Black box testing
27
UNIT TESTING:
● Unit testing is conducted to verify the functional performance of each modular component
of the software. Unit testing focuses on the smallest unit of the software design (i.e.), the
module. The white-box testing techniques were heavily employed for unit testing.
● Unit tests perform basic tests at component level and test a specific business process,
application, and/or system configuration. Unit tests ensure that each unique path of a
business process performs accurately to the documented specifications and contains clearly
defined inputs and expected results.
INTEGRATION TESTING:
● Integration testing is a systematic technique for construction the program structure while
at the same time conducting tests to uncover errors associated with interfacing. i.e.,
integration testing is the complete testing of the set of modules which makes up the product.
The objective is to take untested modules and build a program structure tester should
identify critical modules. Critical modules should be tested as early as possible. One
approach is to wait until all the units have passed testing, and then combine them and then
tested. This approach is evolved from unstructured testing of small programs. Another
strategy is to construct the product in increments of tested units. A small set of modules are
integrated together and tested, to which another module is added and tested incombination.
And so on. The advantages of this approach are that, interface dispenses can be easily found
and corrected.
FUNCTIONAL TESTS:
● Functional test cases involved exercising the code with nominal input values for which the
expected results are known, as well as boundary values and special values, such as logically
related inputs, files of identical elements, and empty files.
● Three types of tests in Functional test:
i. Performance Test
ii. Stress Test
iii. Structure Test
28
SYSTEM TEST:
System testing ensures that the entire integrated software system meets requirements. It tests a
configuration to ensure known and predictable results. An example of system test is the
configuration-oriented system integration test. System testing is based on process descriptions and
flows, emphasizing pre-driven process links and integration points.
29
CHAPTER 7
30
7.1 SUMMARY:
● To conclude the description about the project. The project developed using java and
android studio is based on the requirement specification of the user and the analysis of
the existing system, with flexibility for future enhancement.
● We achieved final accuracy of 95.0% on our data set. We have improved our prediction
after implementing two layers of algorithms wherein we have verified and predicted
symbols which are more similar to each other.
● This gives us the ability to detect almost all the symbols provided that they are shown
properly, there is no noise in the background and lighting is adequate.
● We have achieved an accuracy of 95.8% in our model using only layer 1 of our algorithm,
and using the combination of layer 1 and layer 2 we achieve an accuracy of 98.0%,
which is a better accuracy then most of the current research papers on speech recognition.
● They also used CNN for their recognition system. One thing should be noted that our
model doesn’t uses any background subtraction algorithm whiles some of the models
present above do that.
● So, once we try to implement background subtraction in our project the accuracies may
vary. On the other hand, most of the above projects use Kinect devices but our main aim
was to create a project which can be used with readily available resource
31
7.2 FUTURE ENHANCEMENTS:
● We As per the statistics users are only interested in applications that have good and creative
graphical user interface so we have plans of enhancing good graphical interface.
● As our application have translator feature, we wanted to include a greater number of
languages so large group of people who know different languages can use it.
● In future quick suggestion feature can be added to this application.
● We are also thinking of improving the Pre-Processing to predict voice in noisy conditions
with a higher accuracy.
● This project can be enhanced by being built as a web/mobile application for the users to
conveniently access the project. Also, the existing project only works for it can be extended
to work for other native speech recognitions with the right amount of data set and training.
This project implements a finger spelling translator; however, speech recognitions arealso
spoken in a contextual basis where each gesture could represent an object, or verb.
● Speech recognition System has been developed from classifying only static signs and
alphabets, to the system that can successfully recognize dynamic movements that comes in
continuous sequences of images. Researcher nowadays are paying more attention to make
a large vocabulary for speech recognition systems.
32
CHAPTER 8
33
8.1 SCREENSHOTS:
HOMEPAGE:
8.1.1 HOMEPAGE
34
MODULE 1-TEXT TO SPEECH:
35
MODULE 2-SPEECH TO TEXT:
36
MODULE 3-TRANSLATOR:
37
8.2 CODING:
<androidx.constraintlayout.widget.ConstraintLayout
xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:app="http://schemas.android.com/apk/res-auto"
xmlns:tools="http://schemas.android.com/tools"
android:layout_width="match_parent"
android:layout_height="match_parent"
tools:context=".SpeechToTextActivity">
<TextView
android:id="@+id/textview"
android:layout_width="253dp"
android:layout_height="60dp"
android:layout_marginTop="84dp"
android:gravity="center"
android:text="@string/app_name"
android:textSize="24sp"
38
android:textColor="@color/black"
app:layout_constraintEnd_toEndOf="parent"
app:layout_constraintStart_toStartOf="parent"
app:layout_constraintTop_toTopOf="parent" />
<Button
android:id="@+id/button"
android:layout_width="317dp"
android:layout_height="74dp"
android:layout_marginTop="96dp"
android:gravity="center"
android:text="TEXT-TO-SPEECH"
android:textSize="18sp"
android:onClick="textToSpeechOnclick"
app:layout_constraintEnd_toEndOf="parent"
app:layout_constraintHorizontal_bias="0.553"
app:layout_constraintStart_toStartOf="parent"
app:layout_constraintTop_toBottomOf="@+id/textview" />
<Button
android:id="@+id/button2"
android:layout_width="317dp"
39
android:layout_height="74dp"
android:layout_marginTop="52dp"
android:gravity="center"
android:text="SPEECH-TO-TEXT"
android:textSize="18sp"
android:onClick="speechToTextOnclick"
app:layout_constraintEnd_toEndOf="parent"
app:layout_constraintHorizontal_bias="0.553"
app:layout_constraintStart_toStartOf="parent"
app:layout_constraintTop_toBottomOf="@+id/button" />
<Button
android:id="@+id/button3"
android:layout_width="317dp"
android:layout_height="74dp"
android:layout_marginTop="60dp"
android:gravity="center"
android:text="TRANSLATOR"
android:textSize="18sp"
app:layout_constraintEnd_toEndOf="parent"
app:layout_constraintStart_toStartOf="parent"
app:layout_constraintTop_toBottomOf="@+id/button2" />
</androidx.constraintlayout.widget.ConstraintLayout>
40
CONNECTIVITY CODE FROM ONE MODULE TO ANOTHER:
import androidx.appcompat.app.AppCompatActivity;
import android.content.Intent;
import android.os.Bundle;
import android.view.View;
@Override
super.onCreate(savedInstanceState);
setContentView(R.layout.first_page);
startActivity(i);
41
startActivity(i);
startActivity(i);
42
XML CODE FOR ANDROID MANIFEST PERMISSION:
<?xmlversion="1.0"encoding="utf-8"?>
<manifestxmlns:android="http://schemas.android.com/apk/res/android"
package="com.example.speechtotext">
<uses-permission
android:name="android.permission.RECORD_AUDIO"/>
<uses-permission
android:name="android.permission.INTERNET"/>
<application
android:allowBackup="true"
android:icon="@mipmap/ic_launcher"
android:label="@string/app_name"
android:roundIcon="@mipmap/ic_launcher_round"
android:supportsRtl="true"
android:theme="@style/Theme.Speechtotext">
<activity android:name=".SpeechToTextActivity">
<intent-filter>
<category
android:name="android.intent.
category.LAUNCHER" />
</intent-filter>
43
</activity>
<activity android:name=".TextToSpeechActivity">
<intent-filter>
<category
android:name="android.intent.
category.LAUNCHER" />
</intent-filter>
</activity>
<activity android:name=".FirstPageActivity">
<intent-filter>
<action
android:name="android.intent.
action.MAIN" />
<category
android:name="android.intent.
category.LAUNCHER" />
</intent-filter>
</activity>
</application>
</manifest>
44
TEXT-TO-SPEECH XML CODING:
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:app="http://schemas.android.com/apk/res-auto"
xmlns:tools="http://schemas.android.com/tools"
android:layout_width="match_parent"
android:layout_height="match_parent"
android:orientation="vertical" a
ndroid:padding="20dp"
android:gravity="center"
tools:context=".TextToSpeechActivity">
<EditText
android:layout_width="match_parent"
android:layout_height="wrap_content"
android:id="@+id/et_input"
android:textAlignment="center"
android:gravity="center_horizontal"
android:lines="5"
android:background="@drawable/bg_round"/>
45
<LinearLayout
android:layout_width="match_parent"
android:layout_height="wrap_content"
android:layout_marginTop="10dp">
<Button
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:layout_weight="1"
android:id="@+id/bt_convert"
android:text="speak"
/>
<androidx.appcompat.widget.AppCompatSpinner
android:layout_width="10dp"
android:layout_height="wrap_content"/>
<Button
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:layout_weight="1"
android:id="@+id/bt_clear"
android:text="clear"
/>
</LinearLayout>
46
TEXT-TO-SPEECH JAVA CODING:
import android.os.Bundle;
import android.speech.tts.TextToSpeech;
import android.view.View;
import androidx.appcompat.app.AppCompatActivity;
import java.util.Locale;
EditText edtext;
Button btconvert,btclear;
android.speech.tts.TextToSpeech textToSpeech;
@Override
super.onCreate(savedInstanceState);
setContentView(R.layout.text_to_speech);
edtext = findViewById(R.id.et_input);
btconvert = findViewById(R.id.bt_convert);
btclear = findViewById(R.id.bt_clear);
, new android.speech.tts.TextToSpeech.OnInitListener() {
@Override
47
{
if (status == android.speech.tts.TextToSpeech.SUCCESS)
});
btconvert.setOnClickListener(new View.OnClickListener()
@Override
String s = edtext.getText().toString();
android.speech.tts.TextToSpeech.QUEUE_FLUSH, null);}
});
btclear.setOnClickListener(new View.OnClickListener() {
@Override
edtext.setText("");
});
}}
48
SPEECH-TO-TEXT XML CODING:
<androidx.constraintlayout.widget.ConstraintLayout
xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:app="http://schemas.android.com/apk/res-auto"
xmlns:tools="http://schemas.android.com/tools"
android:layout_width="match_parent"
android:layout_height="match_parent"
tools:context=".SpeechToTextActivity">
<TextView
android:id="@+id/output"
android:layout_width="300dp"
android:layout_height="80dp"
android:layout_marginTop="144dp"
android:gravity="center" android:textColor="#0C0C0C"
android:textSize="22sp"
app:layout_constraintEnd_toEndOf="parent"
app:layout_constraintHorizontal_bias="0.495"
app:layout_constraintStart_toStartOf="parent"
app:layout_constraintTop_toTopOf="parent" />
49
<Button
android:id="@+id/rec"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:layout_marginTop="60dp"
android:onClick="startRec" android:text="startRec"
app:layout_constraintEnd_toEndOf="parent"
app:layout_constraintStart_toStartOf="parent"
app:layout_constraintTop_toBottomOf="@+id/output" tools:ignore="OnClick"
/>
<Button
android:id="@+id/stop" android:layout_width="108dp"
android:layout_height="48dp"
android:layout_marginTop="44dp"
android:onClick="stopRec" android:text="stopRec"
app:layout_constraintEnd_toEndOf="parent"
app:layout_constraintHorizontal_bias="0.512"
app:layout_constraintStart_toStartOf="parent"
app:layout_constraintTop_toBottomOf="@+id/rec"
tools:ignore="OnClick" />
</androidx.constraintlayout.widget.ConstraintLayout>
50
SPEECH-TO-TEXT JAVA CODING:
import androidx.appcompat.app.AppCompatActivity;
import androidx.core.app.ActivityCompat;
import androidx.core.content.ContextCompat;
import android.Manifest;
import android.content.Intent;
import android.content.pm.PackageManager;
import android.os.Bundle;
import android.speech.RecognitionListener;
import android.speech.RecognizerIntent;
import android.speech.SpeechRecognizer;
import android.view.View;
import android.widget.TextView;
import java.util.ArrayList;
TextView txt;
@Override
super.onCreate(savedInstanceState);
setContentView(R.layout.speech_to_text);
51
checkpermission();
convert();
txt = findViewById(R.id.output);
System.out.println("Inside on create");
if(!(ContextCompat.checkSelfPermission(SpeechToTextActivity.this,
Manifest.permission.RECORD_AUDIO)== PackageManager.PERMISSION_GRANTED))
ActivityCompat.requestPermissions(SpeechToTextActivity.this,new String[]
Manifest.permission.RECORD_AUDIO},1);
if(!(ContextCompat.checkSelfPermission(SpeechToTextActivity.this,
Manifest.permission.INTERNET)== PackageManager.PERMISSION_GRANTED))
ActivityCompat.requestPermissions(SpeechToTextActivity.this,new String[]
Manifest.permission.INTERNET},1);
{ 52
recognizer=SpeechRecognizer.createSpeechRecognizer(SpeechToTextActivity.this);
intent=new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
recognizer.setRecognitionListener(new RecognitionListener()
@Override
@Override
@Override
@Override
} 53
@Override
@Override
@Override
System.out.println("Inside on Results");
ArrayList<String>
words=results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION);
System.out.println(word);
if(words!=null)
txt.setText(words.get(0));
54
@Override
@Override
System.out.println("Inside on event");
});
txt.setText("");
recognizer.startListening(intent);
recognizer.stopListening();
55
TRANSLATOR XML CODING:
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:tools="http://schemas.android.com/tools"
android:layout_width="match_parent"
android:layout_height="match_parent"
android:orientation="vertical"
tools:context=".MainActivity">
<EditText
android:id="@+id/inputToTranslate"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:layout_gravity="center"
android:layout_marginTop="48dp"
android:layout_marginBottom="16dp"android:ems="10"
android:hint="Enter text"
android:inputType="text" />
56
<Button
android:id="@+id/translateButton"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:layout_gravity="center"
android:layout_marginBottom="32dp"
android:text="Translate" />
<TextView
android:id="@+id/translatedTv"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:layout_gravity="center" android:textSize="16sp"
/>
</LinearLayout>
57
MANIFEST PERMISSION:
<manifest xmlns:android="http://schemas.android.com/apk/res/android">
<uses-permission android:name="android.permission.INTERNET"/>
<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE"/>
</manifest>
58
TRANSLATOR JAVA CODING:
Translate translate;
@Override
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
inputToTranslate = findViewById(R.id.inputToTranslate);
translatedTv = findViewById(R.id.translatedTv);
translateButton.setOnClickListener(new View.OnClickListener()
@Override
59
if (checkInternetConnection())
//If there is internet connection, get translate service and start translation:getTranslateService();
translate();
else
translatedTv.setText(getResources().getString(R.string.no_connection));
});
StrictMode.ThreadPolicy.Builder().permitAll().build();
StrictMode.setThreadPolicy(policy);
//Get credentials:
60
//Set credentials and get translate service:TranslateOptions
translateOptions=
TranslateOptions.newBuilder().setCredentials(myCredentials).build();
translate = translateOptions.getService();
ioe.printStackTrace();
originalText = inputToTranslate.getText().toString();
Translation translation=
translate.translate(originalText,Translate.TranslateOption.targetLanguage("tr"),
translatedTv.setText(translatedText);
61
//Check internet connection:
getSystemService(Context.CONNECTIVITY_SERVICE);
connectivityManager.getNetworkInfo(ConnectivityManager.TYPE_MOBILE).getState() ==
NetworkInfo.State.CONNECTED ||
connectivityManager.getNetworkInfo(ConnectivityManager.TYPE_WIFI).getState() ==
NetworkInfo.State.CONNECTED;
return connected;
62
8.3 DATA DICTIONARY
Global vocabulary :
Support your global user base with Speech-to-Text’s extensive language support in over 125
languages and variants.
Streaming speech recognition:
Receive real-time speech recognition results as the API processes the audio input streamed from
your application’s microphone or sent from a prerecorded audio file (inline or through Cloud
Storage).
Speech adaptation:
Customize speech recognition to transcribe domain-specific terms and rare words by providing hints
and boost your transcription accuracy of specific words or phrases. Automatically convert spoken
numbers into addresses, years, currencies, and more using classes.
Speech-to-Text On-Prem :
Have full control over your infrastructure and protected speech data while leveraging Google’s
speech recognition technology on-premises, right in your own private data centers. Contact sales to
get started.
Multichannel recognition:
Speech-to-Text can recognize distinct channels in multichannel situations (e.g., video conference)
and annotate the transcripts to preserve the order.
Noise robustness:
Speech-to-Text can handle noisy audio from many environments without requiring additional noise
cancellation.
Domain-specific models:
Choose from a selection of trained models for voice control and phone call and video transcription
optimized for domain-specific quality requirements. For example, our enhanced phone call model
is tuned for audio originated from telephony, such as phone calls recorded at an 8khz sampling rate.
63
Content filtering:
Profanity filter helps you detect inappropriate or unprofessional content in your audio data and filter
out profane words in text results.
Transcription evaluation :
Upload your own voice data and have it transcribed with no code. Evaluate quality by iterating on
your configuration.
Automatic punctuation (beta):
Speech-to-Text accurately punctuates transcriptions (e.g., commas, question marks, and periods).
Speaker diarylation (beta):
Know who said what by receiving automatic predictions about which of the speakers in a
conversation spoke each utterance.
64
CHAPTER 9
65
BIBLIOGRAPHY AND REFERENCES:
8. https://github.com/topics/speech-to-text
9. http://www.ling.helsinki.fi/~gwilcock/Tartu-2003/L7-
Speech/JSAPI/Recognition.html#:~:text=A%20speech%20recognizer%20is%20a,of%20support
ing%20classes%20and%20interfaces.
66