0% found this document useful (0 votes)
36 views8 pages

Slides On Speech To Text Model

The project aims to enhance automatic speech recognition (ASR) for the Pashto language by fine-tuning the Whisper model with domain-specific vocabulary to achieve a Word Error Rate (WER) below 10%. It involves collecting and annotating over 20 hours of speech data and is aligned with various Sustainable Development Goals. The methodology includes diverse data collection, model training, evaluation, and the potential for scalability to other regional languages.

Uploaded by

Afaq Ali Nagra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views8 pages

Slides On Speech To Text Model

The project aims to enhance automatic speech recognition (ASR) for the Pashto language by fine-tuning the Whisper model with domain-specific vocabulary to achieve a Word Error Rate (WER) below 10%. It involves collecting and annotating over 20 hours of speech data and is aligned with various Sustainable Development Goals. The methodology includes diverse data collection, model training, evaluation, and the potential for scalability to other regional languages.

Uploaded by

Afaq Ali Nagra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Name of Project 1

Enhancing Automatic Speech


Recognition for low resource
Languages using SOTA Model
(Whisper/Wav2Vec)

Supervised by: dr. Shibli nisar.

Syndicate leader: zarnab hassan malik


2
Problem Definition

This project aims to improve speech-to-text (STT) accuracy for the Pashto language by fine-tuning
the Whisper ASR model with additional vocabulary relevant to agriculture, health, banking, food,
and services. The project will focus on reducing Word Error Rate (WER) below 10% through data
collection, annotation, and model optimization. The expected outcome is a more robust
ASR system tailored to regional dialects and specialized vocabulary.
Relevant Sustainable Development Goals (SDGs) 3

• Industry, innovation and infrastructure.


• Linguistic marganalization and digital exclusion.
• Inclusive digital transformation.
• Reducing inequalities.
Objective of Project 4

Objective
• Collect and annotate 20+ hours of Pashto speech data.
• Fine-tune Whisper ASR with domain-specific vocabulary.
• Reduce Pashto ASR Word Error Rate (WER) below 10%.
• Build a scalable model for other regional languages.
Scope of Project 5

Scope
• Focus on Pashto STT improvement using Whisper ASR.
• Include diverse dialects and domain-specific terms.
• Perform model training, evaluation, and benchmarking.
• Extendable to other regional languages based on results.
Proposed Methodology 6

Proposed Methodology
• Collect Pashto audio from diverse speakers and dialects.
• Transcribe and annotate speech data with accuracy.
• Fine-tune Whisper ASR using the annotated dataset.
• Evaluate performance using WER and optimize results.
• Build a scalable pipeline.
Skill Set Involved 7

Resources Involved/Skill Set


M

Hardware: High-performance GPUs for training,


audio recording equipment.
Software: Python, PyTorch, Hugging Face
Transformers, Whisper ASR framework
Project Timeline 8

• Gantt Chart for 1 Year with Deliverables


Task Duration Deliverable
Literature Review Month 1 Report on existing ASR models

Data Collection Months 2–4 20+ hours of recorded audio

Data Annotation Months 5–6 Fully annotated dataset

Model Fine-Tuning Months 7–9 Fine-tuned Whisper ASR model

Model Evaluation Month 10 WER analysis report

Deployment & Testing Month 11 Finalized STT system

Documentation & Report Writing Month 12 Final project

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy