Slides On Speech To Text Model
Slides On Speech To Text Model
This project aims to improve speech-to-text (STT) accuracy for the Pashto language by fine-tuning
the Whisper ASR model with additional vocabulary relevant to agriculture, health, banking, food,
and services. The project will focus on reducing Word Error Rate (WER) below 10% through data
collection, annotation, and model optimization. The expected outcome is a more robust
ASR system tailored to regional dialects and specialized vocabulary.
Relevant Sustainable Development Goals (SDGs) 3
Objective
• Collect and annotate 20+ hours of Pashto speech data.
• Fine-tune Whisper ASR with domain-specific vocabulary.
• Reduce Pashto ASR Word Error Rate (WER) below 10%.
• Build a scalable model for other regional languages.
Scope of Project 5
Scope
• Focus on Pashto STT improvement using Whisper ASR.
• Include diverse dialects and domain-specific terms.
• Perform model training, evaluation, and benchmarking.
• Extendable to other regional languages based on results.
Proposed Methodology 6
Proposed Methodology
• Collect Pashto audio from diverse speakers and dialects.
• Transcribe and annotate speech data with accuracy.
• Fine-tune Whisper ASR using the annotated dataset.
• Evaluate performance using WER and optimize results.
• Build a scalable pipeline.
Skill Set Involved 7