0% found this document useful (0 votes)

24 views11 pages

Proposal PhamThaiNguyen 22560053

Uploaded by

22560023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views11 pages

Proposal PhamThaiNguyen 22560053

Uploaded by

22560023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

CMP6200/DIG6200

INDIVIDUAL UNDERGRADUATE PROJECT

2024–2025

A1: Proposal

REAL-TIME MULTILINGUAL SPEECH

TRANSLATION SYSTEM WITH
PERSONAL VOICE SYNTHESIS

Course: Smart Library Base: Ebook System with Enhanced AI Summarization

Student Name: Le Thanh Trung
Student Number: 22560023
Supervisor Name:Ph.D Nguyen Thanh Binh
Contents
1 Introduction........................................................................................1
1.1 Background and Rationale...........................................................1
1.2 Key Themes/Topics.......................................................................1
2 Aim and Objectives............................................................................2
2.1 Project Aim...................................................................................2
2.2 Project Objectives........................................................................2
3 Project Planning.................................................................................3
3.1 Initial Project Plan........................................................................3
3.2 Resources......................................................................................4
3.3 Risk Assessments..........................................................................5
4 Project Review and Methodology.....................................................6
4.1 Critique of Past Similar Projects...................................................6
4.2 Literature Search Methodology...................................................7
4.3 Initial Literature Search Results...................................................7
5 Bibliography.......................................................................................8
1 Introduction

1.1 Background and Rationale

Nowadays, people’s conversations revolve around how to protect the environment. There are a
thousand ways to preserve it, depending on various factors, including human factors (personality), education,
raising awareness of environment protection and more. With the advancement of information technology, one
solution to not only help save the environment but also reduce production costs can lead to savings of millions
of dollars for a country or a specific industry (document, book, textbook,etc..) , and potentially up to billions of
dollars.
Many of them (who try to digitize books in industry) have undergone revolutions. However, my study
not only focuses on digitization of books but also on the UI/UX, the user experiences aiming to make users feel
like they are reading a physical book. The most significant difference from the rest is AI summarization, text-to-
speech technology and user-driven platform
But why is AI summarization a step forward compared to the rest? People tend to enjoy something that
is fast and convenient. One demonstration of this is “Film reviewing”. Instead of spending hours reading, using
or experiencing something, now people just need a quick summary to get the essential points, allowing them to
make informed decisions without commitment.
That is why “Smart Library Base: Ebook System with Enhanced AI Summarization” was created.

1.2 Key Themes/Topics

 AI Summarization: using the AI to summarize all the information in my case is the content
of books.

 Text-to-speech (audio): This technology converts written content into spoken words.

 User-driven platform: allow users to comment, upload, modify their own book.

 Enhanced Reading Experience: the most different from the others is making the reading
experience like a physical books is page-flipping effect


2 Aim and Objectives

2.1 Project Aim

The aim of this project is to create a website that allow users to publishing their books(in pdf
file or posting online). Users can read books, comment on them, and rate the books they have
just read

2.2 Project Objectives

Guidance: Objectives set out how you are going to achieve your aim. Objectives should be
Specific, Measurable, Achievable, Resourced, and Time-limited (SMART). Objectives should be
presented using bullet points and numbering to enhance readability. Information on SMART
objectives is available on Moodle.

Tips:

 Use action-oriented language for your objectives to make it clear what steps will be taken.
Make sure that each objective aligns directly with your overarching project aim.
 Try to limit yourself to a manageable number of objectives to keep your project focused.
 Use bullet points and numbering to list your objectives clearly. 3|Page
 Technology Research: Complete a review of existing speech recognition (ASR),
neural machine translation (NMT), and voice synthesis
 Create and Develop a Multilingual ASR System: Build a multilingual speech
recognition (ASR) system that recognizes and transcribes real-time speech in multiple
languages.
 Intergrate Real-Time Translation: incorporate a neural machine translation (NMT)
system that can convert spoken language into target language even adjusting to structural
differences between languages.

4|Page
 Create Personalized Voice Synthesis: develop the a text-to-speech (TTS) system that
can generate speech in translated language while keeping the speaker’s original voice
characteristics, such as accent, tone, and emotion.
 Optimize for Real-Time: The system works seamlessly in real-time, providing
translation and voice output, allowing smooth communication without stops or delays.
 Test and gather feedback: test the system in real-time situations to evaluate its
performance, gather user feedback, and improve to ensure it meets accuracy and speed
requirements.

3 Project Planning

3.1 Initial Project Plan

Guidance: Identify the tasks and subtasks that are necessary to meet your project objectives.
Provide a brief description for each, outlining its role in achieving the goals you've set. Also, be aware
of the time each task and subtask will require.

Tips:

 While a Gantt chart is a useful tool, it's not mandatory. You may use any table or format
that effectively captures your planning. The key is to clearly display the timeline, tasks, and
their interdependencies.
 Your descriptions should be concise but informative, helping the reader understand the role
and importance of each task and subtask in the context of your project plan.

Task Description Subtask Time

Estimate
1.Technology Research ASR, NMT, and -Identify ASR models (Whisper ASR or 4 weeks
Research TTS models DeepSpeech)
-Evaluate NMT models (OpenNMT or Fairseq)
-Explore voice synthesis models (Tacotron 2)
2 Develop ASR Build and train ASR to - Choose ASR models and train on datasets like 4 weeks
System recognize real time speech LibriSpeech, Common Voice.
- Fine-tune the ASR model for real-
time performance across multiple
languages.
3. Integrate Real- Incorporate neural machine - Train or fine-tune NMT models (like 4 weeks
Time NMT translation (NMT) for real- Fairseq) on parallel corpora (OpenSubtitles
time translation. datasets).
- Implement real-time processing pipelines.
4. Create Develop the TTS system to - Select TTS models (Tacotron 2). 4 weeks
Personalized Voice generate speech in the - Train on voice datasets (VCTK) for
Synthesis translated language, personalized voice output.
keeping the speaker's
characteristics.
5. Optimize for Ensure smooth real-time - Integrate ASR, NMT, and TTS into 4 weeks
Real-Time operation, balancing a seamless pipeline.
Functionality performance and - Conduct real-time performance tests and
quality. optimize latency.

5|Page
6. Testing and Test the system in 3 weeks
- Conduct user testing with bilingual speakers.
Feedback Collection various real-time
- Simulate high-demand scenarios (customer
scenarios and collect
service, healthcare) to evaluate system
feedback for refinement.
robustness.
7. Iterative Make improvements - Optimize translation quality for 3 weeks
Improvements and based on feedback, structural differences between languages.
Refinement ensuring translation - Fine-tune the voice synthesis to reflect
accuracy and speaker characteristics more precisely.
voice quality.
8. Final Testing and Perform final tests and 2 weeks
Documentation document the system’s
capabilities, preparing the
project for submission.

o Model: Tacotron 2, FastSpeech 2 for voice generation.

o Datasets: VCTK, LibriTTS (for keeping speaker’s voice characteristics).

3.2 Resources
Guidance: Specify the resources required for the successful execution of your project. Resources
can include lab equipment, IT hardware and software, as well as research materials like databases or
library resources. For example, you might need high-computing servers for machine learning projects,
specialised editing software for digital media work, or network simulation tools for networking tasks.

Tips:

 Bear in mind that the university does not provide additional funding for student projects, so you
should account for any costs that are not covered by existing resources. It's advisable to consult
your supervisor regarding these costs, as they might be able to provide or recommend
equipment or software.
 Align your resource requirements closely with your project aim and objectives to ensure that
available resources are sufficient for achieving your goals.

Resource Reason
Microphone and speakers Testing and Simulating
Hardware
speech input/output
Speech Recognition ( Google Building a multilingual speech
Speech-to-Text API or recognition (ASR) system
Python SpeechRecognition )
Software Neural machine translation Integrating multilingual
tools translation models
Text-to-speech synthesis Developing personalized voice
system synthesis.

Python (TensorFlow, PyTorch) Building and training models.

VCS ( git, github) Managing project code

6|Page
IEEE Xplore, Google Accessing to online academic
Scholar, Birmingham City database
University Library
Research Resources
Datasets LibriSpeech (for ASR), Common
Voice, and VCTK (for voice
synthesis)
Data Storage for Training Cloud Storage Services (Google Storing and accessing datasets
Datasets Drive, OneDrive) remotely.
Display System Web-Based Application Developing a web application
( Reactjs and FastAPI)

3.3 Risk Assessments

Guidance: Contemplate the potential risks that could derail your project timeline. This could
encompass a variety of factors, from unavailability of specific resources like software or equipment to
logistical constraints such as limited access to specialists or test subjects.

Tips:

 Prioritise the identification of risks that directly impact your project aim and objectives. For more
substantial risks, consider implementing a contingency plan or alternative approaches that
could mitigate these challenges.
 Consult your supervisor for expertise and advice on managing identified risks effectively.

Each project has its own set of challenges, and noticing these risks beforehand really helps in
making a successful plan.
One of the major difficulties that we are going to face is real-time performance. Especially
when we have a large amount of data, the translation and synthesis system may not keep
pace, which leads us into annoying delays or lag, diminishing our good of smooth
communication. In this situation, we should make more attempts to optimize our algorithms
and conduct early testing, first of all, as to whether our projects meet performance targets.
We must examine lighter models and tap into cloud-based computing when we need that
extra power.
Another limitation arises in the form of sentence structures among different languages. As an
example, translations between languages that are very different from one another could result
in delays or errors. In such cases, we can adopt an advanced neural machine translation
technique which will be able to adapt to these differences in real time. The flexibility here
would be of major importance in maintaining the reliability of the system.
Keeping in mind a few of these risks and accordingly taking some appropriate effort to
mitigate them, we will be able to work much more effectively toward developing a
successful real-time multilingual speech translation system with personalize voice synthesis.

7|Page
4 Project Review and Methodology

4.1 Critique of Past Similar Projects

Guidance: Examine past similar projects or final year projects to enhance your understanding of how
to approach your own project. Firstly, discuss the strengths and weaknesses of these past projects.
Secondly, identify what aspects, such as background, methodologies, techniques, or technologies,
are particularly useful for your own project. Finally, explain how you plan to apply or adapt these
useful aspects in your project. The aim is to understand best practices and potential pitfalls,
regardless of whether the projects are directly related to your own subject matter.

Tips:

 If you can't find projects that directly align with your focus, select the closest available options.
Concentrate on what you can learn from them in terms of project planning, methodologies, or
specific techniques, and how you can apply this knowledge to enhance your own project's
robustness.
 "Critique" in this context means a detailed analysis and assessment of something, in this case,
past projects. Look beyond just listing good and bad points; instead, discuss the reasoning
behind these points and their implications for your own work.

Project 1: “LibriS2S: A German-English Speech-to-Speech Translation Corpus by Pedro

Jeuris and Jan Niehues”
The "LibriS2S" project will close the gap in speech-to-speech translation by developing the
first publicly available German-English speech-to-speech corpus. Other than text-based
translation datasets, it leverages independently created audio for both languages, thus allowing
unbiased pronunciation. This corpus, inspired by FastSpeech 2, directly translates spoken
language into another without going through any intermediate text. The model embeds
important linguistic features, such as pitch, energy, and transcripts of the source language to
generate good quality translations. While that may work out pretty okay, it lacks real-time
processing ability and flexibility for other languages. In my project, I'll adapt these
techniques for multiple languages and focus on real-time translation with the goal of
enriching personal voice synthesis. I will work to ensure that the system maintains the accent,
tone, and emotion of a speaker into various other languages.
https://paperswithcode.com/paper/libris2s-a-german-english-speech-to- speech
Project 2: “Direct speech-to-speech translation with a sequence-to-sequence model by
Ye Jia* , Ron J. Weiss* , Fadi Biadsy, Wolfgang Macherey, Melvin Johnson, Zhifeng
Chen, Yonghui Wu ”
This project present an attention-based sequence-to-sequence neural network which can
directly translate speech from one language into speech in another language, without relying
on an intermediate text representation. The network is trained end-to-end, learning to map
speech spectrograms into target spectrograms in another language, corresponding to the
translated content (in a different canonical voice). This further demonstrate the ability to
synthesize translated speech using the voice of the source speaker and conduct experiments
on two Spanish-to-English speech translation datasets, and find that the proposed model
slightly underperforms a baseline cascade of a direct speech-to-text translation model and a
text-to-
8|Page
speech synthesis model, demonstrating the feasibility of the approach on this very
challenging task. While the model slightly underperforms a traditional method using separate
speech-to- text and text-to-speech processes, it establishes a feasible framework for tackling
direct speech- to-speech translation. This approach can inform my project, particularly in
keeping speaker voice characteristics during translation and improving real-time
performance.
https://paperswithcode.com/paper/direct-speech-to-speech-translation-with-a

4.2 Literature Search Methodology

Guidance: In this section, articulate your approach for conducting a literature search relevant to your
project. Specify the search terms you intend to use and explain the rationale for choosing them.
Identify the databases you will utilise for your search, including IEEE Xplore, Elsevier, ACM Digital
Library, Web of Science, or Google Scholar. Discuss your strategy for assessing the relevance and
quality of the resources you find. Finally, describe your method for recording your findings, ensuring
they can be easily referenced later. It's advisable to consult with your supervisor at the outset to
ensure you're on the right track.

Tips:

 Be precise with your search terms, as this will significantly affect the quality of resources
you discover.
 Maintain consistency when grading the significance of each resource; you may consider
employing a scoring system or rubric.
 Keep a well-organised record of your findings using citation management software, such
as Mendeley, to facilitate the writing process.

Search Terms:
• Speech-to-Speech Translation
• Neural Machine Translation
• Text-to-Speech Synthesis
• Real-time Speech Recognition
• Voice Synthesis
Models Methods of
Research:
• IEEE Xplore
• Google Scholar
• BCU Online Library
• British Online Library
• Paper with Code

4.3 Initial Literature Search Results

Guidance: Present a few examples of key resources you have identified during your initial literature
search. This doesn't need to be exhaustive but should be sufficient to validate that your search
methodology is effective.

Tips:

 When reporting your initial findings, aim to be clear and concise.

9|Page
 Include the citation, a brief summary, and your own evaluation of each resource's relevance
or importance to your project.
 This serves as a preliminary validation of your literature search strategy and also prepares you
for a more in-depth literature review later in the project.

“Direct speech-to-speech translation with a sequence-to-sequence model by Ye Jia* , Ron

J. Weiss* , Fadi Biadsy, Wolfgang Macherey, Melvin Johnson, Zhifeng Chen, Yonghui
Wu ”
Summary: This project presents a neural network model that translates speech directly from
one language to another without using intermediate text. The model is trained end-to-end,
mapping speech spectrograms into target language spectrograms, maintaining the source
speaker's voice characteristics. The experiments indicate that while it slightly
underperforms compared to traditional cascade models, it lays the groundwork for future
developments in direct speech translation.
Resource relevance: This resource emphasizes direct speech translation and the keeping
of speaker voice, aligning with my goal of developing a multilingual speech translation
system.
“LibriS2S: A German-English Speech-to-Speech Translation Corpus by Pedro Jeuris and
Jan Niehues”
Summary: This project introduces the LibriS2S corpus, designed to facilitate research in
speech-to-speech translation between German and English. The corpus is unique in that it
uses independently recorded audio to ensure unbiased pronunciation, enabling the
development of models that can directly generate speech signals from source language
inputs.
Resource relevance: This resource will aid in understanding the significance of training
data in speech translation systems and how to curate high-quality datasets for improved
model performance.

5 Bibliography
Guidance: Compile a list of all the references you have used in your proposal, adhering to the
Harvard referencing style. Each citation should be complete, accurate, and in the prescribed format.

Tips:

 For more guidance, consult the BCU learning services.

 Using citation management software like Mendeley can also assist in organising and formatting
your references correctly.
Note: The Bibliography does not count towards the overall word count of the project proposal.

1. A. Lavie, A. Waibel, L. Levin, M. Finke, D. Gates, M. Gavalda, T. Zeppenfeld, and P.

Zhan, “JANUS-III: Speech-to-speech translation in multiple languages,” in Proc.
ICASSP, 1997.

2. W. Wahlster, Verbmobil: Foundations of speech-to-speech translation. Springer, 2000.

10 | P a g
e
3. S. Nakamura, K. Markov, H. Nakaiwa, G.-i. Kikui, H. Kawai, T. Jitsuhiro, J.-S. Zhang, H.
Yamamoto, E. Sumita, and S. Yamamoto, “The ATR multilingual speech-to-speech
translation system,” IEEE Transactions on Audio, Speech, and Language Processing, 200

11 | P a g
e

Empowerment Tech Microsoft Excel PPT 1
100% (1)
Empowerment Tech Microsoft Excel PPT 1
45 pages
Test General Product Support Assessment (MOSL1)
67% (3)
Test General Product Support Assessment (MOSL1)
21 pages
CARDIOVIT At-102 G2 - Espec Tecnicas
No ratings yet
CARDIOVIT At-102 G2 - Espec Tecnicas
2 pages
Internet and Network Security Final Exam
No ratings yet
Internet and Network Security Final Exam
9 pages
Proposal PhamThaiNguyen 22560053
No ratings yet
Proposal PhamThaiNguyen 22560053
11 pages
Proposal - Development of An AI-Enhanced Online Book Reading System With Automatic Summarization and Audio Conversion
No ratings yet
Proposal - Development of An AI-Enhanced Online Book Reading System With Automatic Summarization and Audio Conversion
8 pages
24 25 Final Report Stage I
No ratings yet
24 25 Final Report Stage I
50 pages
Project Title: Convert Text To Speech in Python: University Institute of Engineering, Chandigarh University
No ratings yet
Project Title: Convert Text To Speech in Python: University Institute of Engineering, Chandigarh University
3 pages
A12 Mini Project Documentation 1
No ratings yet
A12 Mini Project Documentation 1
56 pages
Phase-1 Report
No ratings yet
Phase-1 Report
29 pages
Real Time Voice Translator
No ratings yet
Real Time Voice Translator
28 pages
Synopsis Project Phase 1
No ratings yet
Synopsis Project Phase 1
5 pages
SYnopsis
No ratings yet
SYnopsis
5 pages
PD LAB Batch-16
No ratings yet
PD LAB Batch-16
17 pages
Proposal FYP
No ratings yet
Proposal FYP
14 pages
Visual Assist
No ratings yet
Visual Assist
53 pages
C4GT Community - Proposal - 1
No ratings yet
C4GT Community - Proposal - 1
7 pages
Advanced Image To Speech Conversion
No ratings yet
Advanced Image To Speech Conversion
46 pages
1 Aditi Project Draft 4
No ratings yet
1 Aditi Project Draft 4
15 pages
Main (pt2)
No ratings yet
Main (pt2)
13 pages
K-12 Internship - 2024 - EPSi - Sarwat
No ratings yet
K-12 Internship - 2024 - EPSi - Sarwat
12 pages
Draft
No ratings yet
Draft
80 pages
CPP Project Report
No ratings yet
CPP Project Report
15 pages
Minor Poject Report
No ratings yet
Minor Poject Report
38 pages
Format Edit
No ratings yet
Format Edit
10 pages
Myminiprgt
No ratings yet
Myminiprgt
70 pages
Rohan Internship
No ratings yet
Rohan Internship
11 pages
Sem5 Synopsis
No ratings yet
Sem5 Synopsis
23 pages
SDGDSGDSG
No ratings yet
SDGDSGDSG
31 pages
PD Batcho 16
No ratings yet
PD Batcho 16
4 pages
Jarvis Report Editing
No ratings yet
Jarvis Report Editing
66 pages
Kooky
No ratings yet
Kooky
26 pages
224s 22 Lec7
No ratings yet
224s 22 Lec7
50 pages
Expert System Voice Assistant
No ratings yet
Expert System Voice Assistant
52 pages
Technical Answers To Real Time Problems: Faculty: Prof. Sasikala R
No ratings yet
Technical Answers To Real Time Problems: Faculty: Prof. Sasikala R
19 pages
Project Report Rtu
No ratings yet
Project Report Rtu
17 pages
VIRTUAL ASSISTANT (Minor)
No ratings yet
VIRTUAL ASSISTANT (Minor)
8 pages
CPP Final Report
No ratings yet
CPP Final Report
16 pages
Automated Real-Time Language Translation Through Speech Recognition.
No ratings yet
Automated Real-Time Language Translation Through Speech Recognition.
27 pages
1.modern Text Tool
No ratings yet
1.modern Text Tool
8 pages
Bala Report
No ratings yet
Bala Report
60 pages
Ai Final Print
No ratings yet
Ai Final Print
23 pages
Speech Recognition
No ratings yet
Speech Recognition
66 pages
Dheeraj Resume
No ratings yet
Dheeraj Resume
3 pages
Language Translation
No ratings yet
Language Translation
4 pages
Chatbot 5 43
No ratings yet
Chatbot 5 43
39 pages
Final Report - 12
No ratings yet
Final Report - 12
60 pages
Ranjith S - Mini Project
No ratings yet
Ranjith S - Mini Project
74 pages
Research On Speech Recognition Technique While Building Speech Recognition Bot
No ratings yet
Research On Speech Recognition Technique While Building Speech Recognition Bot
13 pages
Swag
No ratings yet
Swag
33 pages
Alen
No ratings yet
Alen
20 pages
Virtual Assistant: Green University of Bangladesh
No ratings yet
Virtual Assistant: Green University of Bangladesh
14 pages
Himanshu Synopsis 2
No ratings yet
Himanshu Synopsis 2
10 pages
Text To Speech Convertion Report
No ratings yet
Text To Speech Convertion Report
26 pages
Project ADARSHA ROUTRAY Nice Project
No ratings yet
Project ADARSHA ROUTRAY Nice Project
64 pages
Test Projects
No ratings yet
Test Projects
3 pages
Final Report Shraddh
No ratings yet
Final Report Shraddh
16 pages
Gen Ai Lab - DS
No ratings yet
Gen Ai Lab - DS
26 pages
Project Report
No ratings yet
Project Report
58 pages
Kiki Sample Doc 2
No ratings yet
Kiki Sample Doc 2
36 pages
Project Title
No ratings yet
Project Title
3 pages
Language Translator
100% (1)
Language Translator
13 pages
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
From Everand
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
Timothy King
No ratings yet
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
From Everand
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
Robert Johnson
No ratings yet
GM0421 Innovation Management
No ratings yet
GM0421 Innovation Management
47 pages
Word 2007 Whats New
No ratings yet
Word 2007 Whats New
17 pages
122 Epirb
No ratings yet
122 Epirb
1 page
OSRAM PJU 2 in 1
No ratings yet
OSRAM PJU 2 in 1
1 page
Sub
No ratings yet
Sub
3 pages
Tractor Truck Checklist
No ratings yet
Tractor Truck Checklist
2 pages
RT A11-PR: 2 New Products From
No ratings yet
RT A11-PR: 2 New Products From
5 pages
Agni College of Technology
No ratings yet
Agni College of Technology
2 pages
Bharathiar University PHD Thesis Submission Status
100% (2)
Bharathiar University PHD Thesis Submission Status
5 pages
2 Vodia Escaneo
No ratings yet
2 Vodia Escaneo
3 pages
IDEAL 4815 Engl 2016
No ratings yet
IDEAL 4815 Engl 2016
2 pages
Product Obsolescence Bulletin: 27 September 2022 02
No ratings yet
Product Obsolescence Bulletin: 27 September 2022 02
2 pages
Skin Disease Detection Using Image Processing Technique
No ratings yet
Skin Disease Detection Using Image Processing Technique
4 pages
Junuthula Sai Mohan: Career Objective
No ratings yet
Junuthula Sai Mohan: Career Objective
2 pages
Vba Macro Introduction
No ratings yet
Vba Macro Introduction
5 pages
Sunmodule Solar Panel 240 Mono Black Ds
No ratings yet
Sunmodule Solar Panel 240 Mono Black Ds
2 pages
EE25 Design of Variable Speed Drive For Three Phase Induction Motor
No ratings yet
EE25 Design of Variable Speed Drive For Three Phase Induction Motor
4 pages
Ict 8 Test
No ratings yet
Ict 8 Test
3 pages
Budi Rahardjo's CV v.14
No ratings yet
Budi Rahardjo's CV v.14
20 pages
Hype Cycle For Monitoring and Observability, 2023
No ratings yet
Hype Cycle For Monitoring and Observability, 2023
72 pages
Biprime Uber Pitch Deck
No ratings yet
Biprime Uber Pitch Deck
29 pages
COMPUTER FUNDAMENTALS (BBA 1st Sem)
No ratings yet
COMPUTER FUNDAMENTALS (BBA 1st Sem)
41 pages
Computer Application Commerce
No ratings yet
Computer Application Commerce
268 pages
SR302 - SR306: Not Recommended For New Designs, Use Sb3X0 Series
No ratings yet
SR302 - SR306: Not Recommended For New Designs, Use Sb3X0 Series
2 pages
The Philippines 50 Years From Now in Terms of Political
No ratings yet
The Philippines 50 Years From Now in Terms of Political
2 pages
1 Chapter 1 - Intro To Computer - Programming
No ratings yet
1 Chapter 1 - Intro To Computer - Programming
42 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Proposal PhamThaiNguyen 22560053

Uploaded by

Proposal PhamThaiNguyen 22560053

Uploaded by

CMP6200/DIG6200

INDIVIDUAL UNDERGRADUATE PROJECT

REAL-TIME MULTILINGUAL SPEECH

Course: Smart Library Base: Ebook System with Enhanced AI Summarization

1.1 Background and Rationale

1.2 Key Themes/Topics

2.1 Project Aim

2.2 Project Objectives

3.1 Initial Project Plan

Task Description Subtask Time

o Model: Tacotron 2, FastSpeech 2 for voice generation.

o Datasets: VCTK, LibriTTS (for keeping speaker’s voice characteristics).

Python (TensorFlow, PyTorch) Building and training models.

3.3 Risk Assessments

4.1 Critique of Past Similar Projects

Project 1: “LibriS2S: A German-English Speech-to-Speech Translation Corpus by Pedro

4.2 Literature Search Methodology

4.3 Initial Literature Search Results

 When reporting your initial findings, aim to be clear and concise.

“Direct speech-to-speech translation with a sequence-to-sequence model by Ye Jia* , Ron

 For more guidance, consult the BCU learning services.

1. A. Lavie, A. Waibel, L. Levin, M. Finke, D. Gates, M. Gavalda, T. Zeppenfeld, and P.

2. W. Wahlster, Verbmobil: Foundations of speech-to-speech translation. Springer, 2000.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.