0% found this document useful (0 votes)
41 views

Mini Project Report 3.00000000

Uploaded by

as4848284
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Mini Project Report 3.00000000

Uploaded by

as4848284
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Mini Project Report(KCS-554)

on
Convert text into Speech
Submitted in partial fulfillment for award of
BACHELOR OF TECHNOLOGY
Degree
In
COMPUTER SCIENCE & ENGINEERING

2023-24
Under the Guidance of: Submitted By:
Abhinav Dikshit
Mr. Pawan Pandey sir Aman Sharma
Assistant Professor Harshit Shukala

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


RAJ KUMAR GOEL INSTITUTE OF TECHNOLOGY
DELHI-MEERUT ROAD, GHAZIABAD

Affiliated to Dr. A.P.J. Abdul Kalam Technical University, Lucknow


CERTIFICATE
This is to certify that Project Report entitled “Convert Text
into Speech” which is submitted by ABHINAV DIKSHIT,
AMAN SHARMA and HARSHIT SHUKALA in partial
fulfillment of the requirement for the award of degree B.
Tech. in the Department of Computer Science &
Engineering of Dr. APJ Abdul Kalam Technical University,
Lucknow is a record of the candidates’ own work carried
out by them under my supervision. The matter embodied in
this report is original and has not been submitted for the
award of any other degree.

DATE:17/02/2024 MR.PAWAN PANDEY SIR


(Assistant professor)
DECLARATION

This is to certify the Synopsis Report Entitled “Convert text into


speech”. which is submitted in partial fulfillment of the
requirement for the award of degree B.Tech. in Computer Science
and Engineering to R.K.G.I.T, Ghaziabad, Dr. A.P.J. Abdul Kalam
Technical University, Lucknow. It comprises only original work
and studies carried out by the students themselves. The matter
embodied in this synopsis has not been submitted for the award
of any other degree.

Date: 17-Feb-24 Abhinav Dikshit

Aman Sharma

Harshit Shukala
ABSTRACT

A Text-to-speech synthesizer is an application that


converts text into spoken word, by analyzing and
processing the text using Natural Language Processing
(NLP) and then using Digital Signal Processing (DSP)
technology to convert this processed text into synthesized
speech representation of the text. Here, we developed a
useful text-to-speech synthesizer in the form of a simple
application that converts inputted text into synthesized
speech and reads out to the user which can then be saved
as an mp3 file. The development of a text to speech
synthesizer will be of great help to people with visual
impairment and make making through large volume of text
easier
TABLE OF CONTENTS

CHAPTER NO. TITLE PAGE NO.

COVER PAGE……………………………………….……..i

CERTIFICATE………………………………………..…….ii

DECLARATION ...................................................................iii

ABSTRACT............................................................................ iv

1. INTRODUCTION 1

1.1 INTRODUCTION…………………………………………………..1

1.2 ABOUT LANGUAGE……………………………………………...3

1.3 PURPOSE………………………………………………………….4

1.4 SCOPE……………………………………………………………..4

2. HARDWARE AND SOFTWARE REQUIREMENTS 5

2.1 HARDWARE REQUIREMENTS………………………………..5

2.2 SOFTWARE REQUIREMENTS………………………………….6

3. SOFTWARE REQUIREMENTS SPECIFICATION 7


3.1 SPECIFIC REQUIREMENT……………………………………….7
3.2 FUNCTIONAL REQUIREMENT…………………………………..7
3.3 NON-FUNCTIONAL REQUIREMENT…………………………….8
3.4 METHODOLOGY OF WORK……………………………………..9

4. PROJECT SNAPSHOT
5. LIMITATION
5.1 LIMITATION OF PROJECT

6. CONCLUSION AND FUTURE SCOPE


6.1 CONCLUSION
6.2 FUTURE SCOPE

7. REFERNCES
CHAPTER 1
INTRODUCTION

1.1 INTRODUCTION

Text-to-speech synthesis -TTS - is the automatic conversion of a text


into speech that resembles, as closely as possible, a native speaker
of the language reading that text. Text-to speech synthesizer (TTS) is
the technology which lets computer speak to you. The TTS system
gets the text as the input and then a computer algorithm which called
TTS engine analyses the text, pre-processes the text and synthesizes
the speech with some mathematical models. The TTS engine usually
generates sound data in an audio format as the output.

Text to speech converter is a recent software project that allows


even the visually challenged to read and understand various
documents. The blinds cannot read a document, so this software can
be an assistant to them who would read out those documents for
them. It can also be a great help for those who cannot speak. The
person can simply type what he/she wants to say and the software
would give a voice to them by speaking what they wanted to say. The
user just have to select the Interactive mode and then write what he
wants to say in the text area and then he can easily express what he
wanted to say by simply clicking the convert button. So, this software
is not just an advancement towards the future development but also
a boon for those who cannot speak and see.
This technology can also be utilized for various purposes, e.g. car
navigation, announcements in railway stations, response services in
telecommunications, and e-mail reading. Thus, if we think more
innovatively, we can easily get more applications out of it.
TTS works with nearly every personal digital, including computers,
smartphones and tablets. All kinds of text files can be read aloud,
including Word and Pages documents. Even online web pages can be
read aloud. The voice in TTS is computer-generated, and reading
speed can usually be sped up or slowed down. This software can has
a quality in which the voice quality varies, but some voices sound
human. This feature is specifically designed to give a real feel to the
voice. There are even computer-generated voices that sound like
children speaking. The software designed uses the computerized
female voice. Many TTS tools highlight words as they are read aloud.
This allows kids to see text and hear it at the same time. Some TTS
Tools also have a technology called optical character recognition
(OCR). OCR allows TTS tools to read text aloud from images.

A text-to-speech system (or "engine") is composed of two parts:


a front-end and a back-end. The front-end has two major tasks. First, it
converts raw text containing symbols like numbers and abbreviations into
the equivalent written-out words. This process is often called text
normalization, pre-processing, or tokenization. The front-end then assigns
phonetic transcriptions to each word, and divides and marks the text into
prosodic units, like phrases, clauses, and sentences. The process of
assigning phonetic transcriptions to words is called text-to-phoneme
or grapheme-to-phoneme conversion. Phonetic transcriptions and prosody
information together make up the symbolic linguistic representation that is
output by the front-end. The back-end often referred to as the synthesizer
then converts the symbolic linguistic representation into sound. In certain
systems, this part includes the computation of the target prosody
(pitch contour, phoneme durations) which is then imposed on the output
speech.
1.2 About Language

The language used for the project text to speech conversion is python.
Python is a high-level, interpreted, interactive and object-oriented scripting
language. Python is designed to be highly readable. It uses
English keywords frequently where as other languages use punctuation,
and it has fewer syntactical constructions than other languages.

 Python is Interpreted − Python is processed at runtime by the


interpreter. You do not need to compile your program before
executing it. This is similar to PERL and PHP.
 Python is Interactive − You can actually sit at a Python prompt and
interact with the interpreter directly to write your programs.
 Python is Object-Oriented − Python supports Object-Oriented style
or technique of programming that encapsulates code within objects.
 Python is a Beginner's Language − Python is a great language for
beginnerrlevel programmers and supports the development of a wid
e range of applications fromsimple text processing to WWW
browsers to games
1.3 PURPOSE

Our purpose is text to speech translation. The system is helpful for


persons having learning difficulties or visually challenged. It
Prevents eye from strain, and user can sit and listen comfortably,
help avoiding an external human translator. And also avoid sharing
of trade secrets with other translators. It will help in widening the
trade market and Travelling to foreign countries and speaking their
native language will be made easy mechanical attempts dating to the
eighteenth century.

1.4 SCOPE

 This application is designed to overcome the language barriers.


 The application is designed to overcome language difference.
 The application designed to help visually impaired people.
CHAPTER 2
HARDWARE AND SOFTWARE REQUIREMENTS

2.1 HARDWARE REQUIREMENTS

 Processor: A modern processor capable of handling


computational tasks efficiently is required. Most modern
processor, including those found in smartphone, tablets,
laptops, and desktop computers, are suitable for running
TTS software.

 Memory (RAM): Sufficient RAM is needed to store and


process text, linguistic data, and audio buffers efficiently.
The memory requirement may vary depending on the size
of the text being processed and the complexity of the TTS
algorithms.
 Audio Outputs Devices: A device capable of playing audio
is necessary for listening to the synthesized speech. This
can include built-in speakers, headphones, or external
speakers connected to the device.

2.2 SOFTWARE REQUIREMENTS

 Python latest version 3.12.2


 Visual studio: Visual Studio Code is a free source-code
developed by Microsoft for windows, Linux, and macOS.
It supports various programming languages.
 Pygame Libraries: Pygame is a set of Python modules
designed for writing video games.
 gTTS (Google Text-to-Speech): gTTS is a Python library
and CLI tool that allows you to interface with Google
Translate’s text-to-speech API.
 Text-to-speech Conversion: The convert text method
convert the text entered by the user into speech using
the gTTS library. It generates audio data and stores it in
memory as BytesIO object.
 Audio download: The download audio method saves the
generated audio as an MP3 file on the local file system.
CHAPTER 3

SOFTWARE REQUIREMENT SPECIFICATION

3.1 Specific Requirement

 User Interfaces: Describe the user interface elements, including input


methods, control, and feedback mechanisms.

 Hardware Interfaces: Specify any hardware interfaces required for


the application to function (e.g., audio output device).
 Software Interfaces: Identify any software interfaces or APIs the
application will interact with (e.g., text processing libraries).

3.2 Functional Requirement

 Input Processing: Describe how the application will process input text,
including text normalization, tokenization, and linguistic analysis.
 Speech Synthesis: Specify the algorithms and methods for converting
text into speech, including voice selection, speech rate, and
pronunciation.
 Output Rendering: Define how the synthesized speech will be
rendered and output to the user (e.g., audio playback).

3.3 Non-Functional Requirement

 Performance: Specify performance requirements such as response


time, throughput, and scalability.
 Reliability: Define reliability requirements such as availability, error
handling, and fault tolerance.
 Usability: Describe usability requirements such as accessibility, user
interaction, and ease of use.
 Security: Identify security requirements such as data privacy,
authentication, and authorization.
 Portability: Specify portability requirements for running the
application on different platforms and environments.
3.4 METHODOLOGY OF WORK

Different libraries used in the project are:

Tkinter - Tkinter is a python binding to the TK GUI toolkit. It is the


standard Python interface to the Tk GUI toolkit, and is Python's de facto
standard GUI. Tkinter is included with standard Linux, Microsoft Windows
and Mac OS X installs of Python. The name Tkinter comes from Tk
interface. Tkinter was written by Fredrik Lundh. Tkinter is free software
released under a python license. As with most other modern Tk bindings,
Tkinter is implemented as a Python wrapper around a complete Tcl
commands which are fed in the Python interpreter. Tkinter calls are
translated into Tcl commands which are fed to this embedded interpreter,
thus making it possible to mix Python and Tcl in a single application.
Python 2.7 and Python 3.1incorporate the "themed Tk" ("ttk") functionality
of Tk 8.5. This allows Tk widgets to be easily themed to look like the native
desktop environment in which the application is running, thereby
addressing a long-standing criticism of Tk (and hence of Tkinter).There are
several popular GUI library alternatives available, such as wxPython, PyQT
(Pyside), Pygame, Pyglet, and PyGTK.

Pyttsx3– Pyttsx is a good text to speech conversion library in python but it


was written only in python2 until now ! Even some fair amount of googling
didn’t help much to get tts library compatible with pyton3.
There is however, one library gTTS which works perfectly in python3 but it
needs internet connection to work since it relies on Google to get the
audio data. But Pyttsx is completely offline and works seemlesly and has
multiple tts-engine support. The codes in this repos are slightly modified
version of the pyttsx module of python 2.x and is a clone from
westonpace’s repo. The purpose of creating this repo is to help those who
want to have an offline tts lib for Python3 and don’t want to port it from
python2 to python3 themselves.
CHAPTER 4

PROJECT SNAPSHOT
Output :
CHAPTER 5

LIMITATIONS

5.1 limitation of project

 Naturalness: Making TTS sound natural is challenging.


Human speech is complex, with nuances in tone, pitch, and
rhythm.

 Emotion and Emphasis: These are key aspects in realistic


TTS systems. They help make the speech sound more
natural.

 Variability and Context: Variability means how TTS handles


different voices and tones. Context is understanding the
situation or text. Both help make TTS sound natural.

 Accents and Languages: Each language has its own sound.


Accents add variety. It’s hard for TTS to catch every accent.

 Computational Limitations: Realistic TTS systems have


improved a lot, but they still face some computational
limitations.

 Data Limitations: TTS systems learn from large datasets of


human speech.
CHAPTER 6

CONCLUSION AND FUTURE SCOPE

6.1 Conclusion

As per the goal of this project an attempt is made to show how the app
speaks out the any language text. Here the provision is provided to the user
to input the text and he can listen to his text. The “naturalness” of the
synthetic speech needs to be improved for implementing the expressions
of the human beings. By developing such systems, relationship between
human and computer becomes much closer. Thus it helps in overcoming
the problem of DIGITAL DIVIDE.

6.2 Future Scope

 Improvement of the smoothness of the sound.


 Inclusion of prosody and the naturalization of the voices like human
expressions.
 Reading of special cases like date and number.
 Inclusion of different kinds of voices and graphical faces.
 Import and Export of documents.
 Controlling the reading speed.
 The system can be further extended to include more languages.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy