Proposal PhamThaiNguyen 22560053
Proposal PhamThaiNguyen 22560053
A1: Proposal
Text-to-speech (audio): This technology converts written content into spoken words.
User-driven platform: allow users to comment, upload, modify their own book.
Enhanced Reading Experience: the most different from the others is making the reading
experience like a physical books is page-flipping effect
2 Aim and Objectives
Tips:
Use action-oriented language for your objectives to make it clear what steps will be taken.
Make sure that each objective aligns directly with your overarching project aim.
Try to limit yourself to a manageable number of objectives to keep your project focused.
Use bullet points and numbering to list your objectives clearly. 3|Page
Technology Research: Complete a review of existing speech recognition (ASR),
neural machine translation (NMT), and voice synthesis
Create and Develop a Multilingual ASR System: Build a multilingual speech
recognition (ASR) system that recognizes and transcribes real-time speech in multiple
languages.
Intergrate Real-Time Translation: incorporate a neural machine translation (NMT)
system that can convert spoken language into target language even adjusting to structural
differences between languages.
4|Page
Create Personalized Voice Synthesis: develop the a text-to-speech (TTS) system that
can generate speech in translated language while keeping the speaker’s original voice
characteristics, such as accent, tone, and emotion.
Optimize for Real-Time: The system works seamlessly in real-time, providing
translation and voice output, allowing smooth communication without stops or delays.
Test and gather feedback: test the system in real-time situations to evaluate its
performance, gather user feedback, and improve to ensure it meets accuracy and speed
requirements.
3 Project Planning
Tips:
While a Gantt chart is a useful tool, it's not mandatory. You may use any table or format
that effectively captures your planning. The key is to clearly display the timeline, tasks, and
their interdependencies.
Your descriptions should be concise but informative, helping the reader understand the role
and importance of each task and subtask in the context of your project plan.
5|Page
6. Testing and Test the system in 3 weeks
- Conduct user testing with bilingual speakers.
Feedback Collection various real-time
- Simulate high-demand scenarios (customer
scenarios and collect
service, healthcare) to evaluate system
feedback for refinement.
robustness.
7. Iterative Make improvements - Optimize translation quality for 3 weeks
Improvements and based on feedback, structural differences between languages.
Refinement ensuring translation - Fine-tune the voice synthesis to reflect
accuracy and speaker characteristics more precisely.
voice quality.
8. Final Testing and Perform final tests and 2 weeks
Documentation document the system’s
capabilities, preparing the
project for submission.
3.2 Resources
Guidance: Specify the resources required for the successful execution of your project. Resources
can include lab equipment, IT hardware and software, as well as research materials like databases or
library resources. For example, you might need high-computing servers for machine learning projects,
specialised editing software for digital media work, or network simulation tools for networking tasks.
Tips:
Bear in mind that the university does not provide additional funding for student projects, so you
should account for any costs that are not covered by existing resources. It's advisable to consult
your supervisor regarding these costs, as they might be able to provide or recommend
equipment or software.
Align your resource requirements closely with your project aim and objectives to ensure that
available resources are sufficient for achieving your goals.
Resource Reason
Microphone and speakers Testing and Simulating
Hardware
speech input/output
Speech Recognition ( Google Building a multilingual speech
Speech-to-Text API or recognition (ASR) system
Python SpeechRecognition )
Software Neural machine translation Integrating multilingual
tools translation models
Text-to-speech synthesis Developing personalized voice
system synthesis.
6|Page
IEEE Xplore, Google Accessing to online academic
Scholar, Birmingham City database
University Library
Research Resources
Datasets LibriSpeech (for ASR), Common
Voice, and VCTK (for voice
synthesis)
Data Storage for Training Cloud Storage Services (Google Storing and accessing datasets
Datasets Drive, OneDrive) remotely.
Display System Web-Based Application Developing a web application
( Reactjs and FastAPI)
Tips:
Prioritise the identification of risks that directly impact your project aim and objectives. For more
substantial risks, consider implementing a contingency plan or alternative approaches that
could mitigate these challenges.
Consult your supervisor for expertise and advice on managing identified risks effectively.
Each project has its own set of challenges, and noticing these risks beforehand really helps in
making a successful plan.
One of the major difficulties that we are going to face is real-time performance. Especially
when we have a large amount of data, the translation and synthesis system may not keep
pace, which leads us into annoying delays or lag, diminishing our good of smooth
communication. In this situation, we should make more attempts to optimize our algorithms
and conduct early testing, first of all, as to whether our projects meet performance targets.
We must examine lighter models and tap into cloud-based computing when we need that
extra power.
Another limitation arises in the form of sentence structures among different languages. As an
example, translations between languages that are very different from one another could result
in delays or errors. In such cases, we can adopt an advanced neural machine translation
technique which will be able to adapt to these differences in real time. The flexibility here
would be of major importance in maintaining the reliability of the system.
Keeping in mind a few of these risks and accordingly taking some appropriate effort to
mitigate them, we will be able to work much more effectively toward developing a
successful real-time multilingual speech translation system with personalize voice synthesis.
7|Page
4 Project Review and Methodology
Tips:
If you can't find projects that directly align with your focus, select the closest available options.
Concentrate on what you can learn from them in terms of project planning, methodologies, or
specific techniques, and how you can apply this knowledge to enhance your own project's
robustness.
"Critique" in this context means a detailed analysis and assessment of something, in this case,
past projects. Look beyond just listing good and bad points; instead, discuss the reasoning
behind these points and their implications for your own work.
Tips:
Be precise with your search terms, as this will significantly affect the quality of resources
you discover.
Maintain consistency when grading the significance of each resource; you may consider
employing a scoring system or rubric.
Keep a well-organised record of your findings using citation management software, such
as Mendeley, to facilitate the writing process.
Search Terms:
• Speech-to-Speech Translation
• Neural Machine Translation
• Text-to-Speech Synthesis
• Real-time Speech Recognition
• Voice Synthesis
Models Methods of
Research:
• IEEE Xplore
• Google Scholar
• BCU Online Library
• British Online Library
• Paper with Code
Tips:
5 Bibliography
Guidance: Compile a list of all the references you have used in your proposal, adhering to the
Harvard referencing style. Each citation should be complete, accurate, and in the prescribed format.
Tips:
10 | P a g
e
3. S. Nakamura, K. Markov, H. Nakaiwa, G.-i. Kikui, H. Kawai, T. Jitsuhiro, J.-S. Zhang, H.
Yamamoto, E. Sumita, and S. Yamamoto, “The ATR multilingual speech-to-speech
translation system,” IEEE Transactions on Audio, Speech, and Language Processing, 200
11 | P a g
e