0% found this document useful (0 votes)

11 views31 pages

224s 22 Lec1

The document outlines the course CS 224S / LINGUIST 285 on Spoken Language Processing at Stanford University, covering topics such as dialog systems, speech recognition, and synthesis. It emphasizes the importance of ethics in speech technology and provides details on course logistics, including project requirements and grading. The course aims to equip students with practical skills in building spoken language applications using advanced machine learning techniques.

Uploaded by

whythisonemore

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views31 pages

224s 22 Lec1

Uploaded by

whythisonemore

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

CS 224S / LINGUIST 285

Spoken Language Processing

Andrew Maas
Stanford University
Spring 2022

Lecture 1: Course Introduction

Original slides by Dan Jurafsky
Week 1
Course introduction
Course Logistics
Course topics overview
Dialog / conversational agents
Speech recognition (Speech to text)
Speech synthesis (Text to speech)
Applications
Brief history
Articulatory Phonetics
ARPAbet transcription
Exciting recent developments have
disrupted this field

Amazon Alexa + Neural TTS voice cloning

Alexa Prize 2017 End-to-end neural becomes SOTA
2014 2015 - present

Apple Google Microsoft

Realtime speech-speech translation
Siri Assistant Cortana
2020
2011 2016 2014
Entering a new era of spoken
language applications and impact

EuroNews Article
🇺🇦

EuroNews Article
Youtube video
Some basic ethics when working
on speech technologies
Don’t record someone without their consent
In California, all parties to any confidential conversation must give
their consent to be recorded. For calls occurring over cellular or
cordless phones, all parties must consent before a person can
record, regardless of confidentiality.

Don’t create a speech synthesizer / voice clone of

someone without their consent
It might be fun but it’s a little creepy. People get upset
Okay to use existing speech datasets (we’ll provide some)

Consider subgroup and language bias when building

real applications
Poor performance on subgroups e.g. non-native speakers
Many languages are under-served relative to English/Mandarin
Course Logistics
Course goal: Build something you are proud of
Course project: Research paper? Compelling demo/story for job
interviews? Applied system you can use at home/work?

Homeworks (2 weeks each):

Introduction to audio analysis and spoken language tools
Building a complete dialog system using Amazon Alexa Skills Kit
Implementing end-to-end deep neural network approaches to
speech recognition using PyTorch
Working with advanced deep learning toolkits for speech
recognition (SpeechBrain) and voice cloning

Homeworks use Colab and PyTorch (AWS for Alexa)

Course Logistics

http://www.stanford.edu/class/cs224s

Homeworks out on Tuesdays and due 11:59pm Monday

Gradescope for homework submission

Ed for questions. Use private post for

personal/confidential questions

Final project poster session in person!

Admin: Requirements and
Grading
Readings:
Jurafsky & Martin. Speech and Language Processing.
3rd edition pre-prints available online
A few conference and journal papers
Grading
Homework: 45%
Course Project: 50%
Participation: 5%
Attend 3 guest lectures (3%)
Ed participation (2%)
Course Projects
Build something you are proud of

Full systems / demos, research papers on individual

components, applying spoken language analysis to
interesting datasets, etc. are all great projects

Combining projects with other courses is great!

CS236G (GANs), CS224N, CS329S, CS229 all relevant
Need instructor permission to combine

Project handout + intro lecture / discussion soon. Ideally

groups of 2-3
Necessary Background
Foundations of machine learning and natural language
processing
CS 124, CS 224N, CS 229, or equivalent experience
Mathematical foundations of neural networks
Understand forward and back propagation in terms of
equations
Deep learning intro lecture will adjust to class needs.
Proficiency in Python
Programming heavy homeworks will use Python, Colab
Notebooks, and PyTorch
Office hours and CAs

Andrew: In person after class on Thursdays (projects + other)

CAs: Zoom with Calendly (homework + projects)

Meet your teaching staff!

Gaurab Banerjee

Shreya Gupta

Alex Ke

Questions on logistics?
Week 1
Course introduction
Course Logistics
Course topics overview
Dialog / conversational agents
Speech recognition (Speech to text)
Speech synthesis (Text to speech)
Applications
Brief history
Articulatory Phonetics
ARPAbet transcription
Dialogue (= Conversational Agents)
Task-oriented conversations
Personal Assistants (Alexa, Siri, etc.)
Design considerations
Synchronous or asynchronous tasks
Pure speech, pure text, UI hybrids
Functionality versus personality
Dialogue (= Conversational Agents)
Paradigms for Dialogue
POMDP
Partially-Observed Markov Decision Processes
Reinforcement Learning to learn what action to take
Asking a question or answering one are just actions
“Speech acts”

Simple slot filling (ML or regular expressions)

Pre-built frames
Calendar
Who
When
Where
Filled by hand-built rules
(“on (Mon|Tue|Wed…)”)
Paradigms for Dialogue
POMDP
Active research area. Deep learning RL
Not quite industry-strength
Simple slot filling (ML or regex)
State of the art used most systems
Reusing new search engine technology
Intent recognition / semantic parsing
Neural network chatbots
Replacing major pieces of dialog systems
Speech Recognition
Large Vocabulary Continuous Speech
Recognition (LVCSR)
~64,000 words
Speaker independent (vs. speaker-
dependent)
Continuous speech (vs isolated-word)
Current error rates
Why is conversational speech
harder?
A piece of an utterance without context

The same utterance with more context

HSR versus ASR

(Saon et al, 2017)

Why accents are hard
A word by itself

The word in context

So is speech recognition solved?
Why study it vs use some API?
In the last ~10 years
Dramatic reduction in LVCSR error rates (16% to 6%)
Human level LVCSR performance on Switchboard
New class of recognizers (end to end neural network)
Understanding how ASR works enables better ASR-
enabled systems
What types of errors are easy to correct?
How can a downstream system make use of uncertain
outputs?
How much would building our own improve on an API?
Next generation of ASR challenges as systems go live
on phones and in homes
Speech Recognition Design
Intuition
Build a statistical model of the speech-to-words
process
Collect lots and lots of speech, and transcribe all the
words.
Train the model on the labeled speech
Paradigm: Supervised Machine Learning + Search
TTS (= Text-to-Speech) (= Speech
Synthesis)
Produce speech from a text input
Applications:
Personal Assistants
Apple SIRI
Microsoft Cortana
Google Assistant
Games
Announcements / voice-overs
TTS Overview
Collect lots of speech (5-50 hours) from one
speaker, transcribe very carefully, all the
syllables and phones and whatnot
Rapid recent progress in neural approaches
Modern systems are DNN-based,
understandable, but not yet emotive
TTS Overview: End-to-end neural

Tacotron. (Wang et al. 2017)

Applications
Machine learning applications
Extract information from speech using
supervised learning
Emotion, speaker ID, flirtation, deception,
depression, intoxication
Dialog system / SLU applications
Building systems to solve a problem
Medical transcription, reservations via chat
New area: Self-supervised foundation
models
Extraction of Social Meaning from
Speech
Detection of student uncertainty in tutoring
Forbes-Riley et al. (2008)
Emotion detection (annoyance)
Ang et al. (2002)
Detection of deception
Newman et al. (2003)
Detection of charisma
Rosenberg and Hirschberg (2005)
Speaker stress, trauma
Rude et al. (2004), Pennebaker and Lay (2002)
Conversational style
Given speech and text from a conversation
Can we tell if a speaker is
Awkward?
Flirtatious?
Friendly?
Dataset:
1000 4-minute “speed-dates”
Each subject rated their partner for these styles
The following segment has been lightly signal-processed:
Week 1
Course introduction
Course Logistics
Course topics overview
Dialog / conversational agents
Speech recognition (Speech to text)
Speech synthesis (Text to speech)
Applications
Brief history
Articulatory Phonetics
ARPAbet transcription

Inventory Management and Its Effects On Customer Satisfaction
No ratings yet
Inventory Management and Its Effects On Customer Satisfaction
12 pages
POSNER, Richard - Law, Pragmatism and Democracy
100% (6)
POSNER, Richard - Law, Pragmatism and Democracy
412 pages
CCS369 TEXT AND SPEECH ANALYSIS - Syllabus
No ratings yet
CCS369 TEXT AND SPEECH ANALYSIS - Syllabus
4 pages
CCS369
No ratings yet
CCS369
2 pages
224s 22 Lec7
No ratings yet
224s 22 Lec7
50 pages
Ccs369-Text and Speech Analysis
No ratings yet
Ccs369-Text and Speech Analysis
3 pages
Course Structure
No ratings yet
Course Structure
13 pages
Lecture01 Introduction
No ratings yet
Lecture01 Introduction
35 pages
Unit 5 A.I
No ratings yet
Unit 5 A.I
17 pages
NLP 1.3.1 - Speed Recogmnition
No ratings yet
NLP 1.3.1 - Speed Recogmnition
20 pages
UNIT 5 Application AI
No ratings yet
UNIT 5 Application AI
16 pages
Ai in Natural Language Processing
No ratings yet
Ai in Natural Language Processing
4 pages
NLP Nanodegree Syllabus
No ratings yet
NLP Nanodegree Syllabus
11 pages
ccs369 Ts A Syllabus
No ratings yet
ccs369 Ts A Syllabus
3 pages
Session 5 - Speech Recognition
No ratings yet
Session 5 - Speech Recognition
20 pages
Deep Learning Methods For Automated Discourse
No ratings yet
Deep Learning Methods For Automated Discourse
12 pages
Voice Assistant
No ratings yet
Voice Assistant
34 pages
Week 1-Introduction To Conversational Artificial Intelligence
No ratings yet
Week 1-Introduction To Conversational Artificial Intelligence
52 pages
Natural Language Processing - Session 1 - Introduction
100% (1)
Natural Language Processing - Session 1 - Introduction
55 pages
DL Proj Rep
No ratings yet
DL Proj Rep
11 pages
Presentation 3
No ratings yet
Presentation 3
24 pages
Natural Language Processing: by Dr. Parminder Kaur
No ratings yet
Natural Language Processing: by Dr. Parminder Kaur
26 pages
Recent Advances in Speech Language Models: A Survey
No ratings yet
Recent Advances in Speech Language Models: A Survey
20 pages
CS312 NLP Lecture 1 Introduction
No ratings yet
CS312 NLP Lecture 1 Introduction
21 pages
Lecture1 PDF
No ratings yet
Lecture1 PDF
39 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
57 pages
CSR 322 Syllabus
No ratings yet
CSR 322 Syllabus
2 pages
Naacl2018 Tutorial
No ratings yet
Naacl2018 Tutorial
187 pages
Coding The Future: A Comprehensive Guide To AI Development-By Tyler P Welch - The Astral Merchant
No ratings yet
Coding The Future: A Comprehensive Guide To AI Development-By Tyler P Welch - The Astral Merchant
31 pages
Natural Language Processing With Deep Learning 1 PDF
No ratings yet
Natural Language Processing With Deep Learning 1 PDF
37 pages
NLP Lecture 1
No ratings yet
NLP Lecture 1
3 pages
NLP StudyMaterial
No ratings yet
NLP StudyMaterial
540 pages
03 NLP Document
No ratings yet
03 NLP Document
38 pages
Standfordsd Speech Recognition
No ratings yet
Standfordsd Speech Recognition
4 pages
Lecture 01
No ratings yet
Lecture 01
44 pages
Coding The Future: A Comprehensive Guide To AI Development-By Tyler Welch
No ratings yet
Coding The Future: A Comprehensive Guide To AI Development-By Tyler Welch
180 pages
AI Lecture1
No ratings yet
AI Lecture1
24 pages
A Brief Introduction To Automatic Speech Recognition
No ratings yet
A Brief Introduction To Automatic Speech Recognition
22 pages
Wa0002.
No ratings yet
Wa0002.
10 pages
Introduction NLC
No ratings yet
Introduction NLC
69 pages
Introduction
No ratings yet
Introduction
29 pages
Follow Me On For More:: Steve Nouri
No ratings yet
Follow Me On For More:: Steve Nouri
39 pages
Natural Language Processing Nanodegree Syllabus: Before You Start
No ratings yet
Natural Language Processing Nanodegree Syllabus: Before You Start
5 pages
SNLP Syllabus
No ratings yet
SNLP Syllabus
3 pages
NLP Short Que Ans
No ratings yet
NLP Short Que Ans
21 pages
Speech Processing
No ratings yet
Speech Processing
5 pages
tdt4310 2024 Lect1 Full
No ratings yet
tdt4310 2024 Lect1 Full
42 pages
Intro 2025
No ratings yet
Intro 2025
15 pages
AnandKumar Course Intro IT356
No ratings yet
AnandKumar Course Intro IT356
42 pages
Voice Assistant
No ratings yet
Voice Assistant
30 pages
Csa4006 Natural-Language-Processing LT 1.0 6 Csa4006
No ratings yet
Csa4006 Natural-Language-Processing LT 1.0 6 Csa4006
2 pages
Lecture 1
No ratings yet
Lecture 1
32 pages
MScIT Sem4
No ratings yet
MScIT Sem4
8 pages
NLP Syallabus Elective
No ratings yet
NLP Syallabus Elective
3 pages
NLP A
No ratings yet
NLP A
6 pages
Natural Language Processing-Course Handout September 2022
No ratings yet
Natural Language Processing-Course Handout September 2022
8 pages
A Review of Deep Learning Techniques For Speech Processing
No ratings yet
A Review of Deep Learning Techniques For Speech Processing
111 pages
Speech Recognition
No ratings yet
Speech Recognition
9 pages
Presentation Format
No ratings yet
Presentation Format
17 pages
Swe1017 NLP Syllabus
No ratings yet
Swe1017 NLP Syllabus
2 pages
Mastering Transformers: The Journey from BERT to Large Language Models and Stable Diffusion
From Everand
Mastering Transformers: The Journey from BERT to Large Language Models and Stable Diffusion
Savaş Yıldırım
No ratings yet
Natural Language Understanding: Fundamentals and Applications
From Everand
Natural Language Understanding: Fundamentals and Applications
Fouad Sabry
No ratings yet
Infectious Diseases of Livestock, 2nd Edition, Volume 1
No ratings yet
Infectious Diseases of Livestock, 2nd Edition, Volume 1
688 pages
Pascal Programming Assignment 2
No ratings yet
Pascal Programming Assignment 2
5 pages
Beginning Algebra 9th Edition Tobey Test Bank
100% (33)
Beginning Algebra 9th Edition Tobey Test Bank
25 pages
Denis Dutton - On Cold Reading
No ratings yet
Denis Dutton - On Cold Reading
24 pages
Y10 English Language Remote Learning 01.02.2021
No ratings yet
Y10 English Language Remote Learning 01.02.2021
8 pages
Agrowell New Catalogue
No ratings yet
Agrowell New Catalogue
32 pages
Math210 03notes
No ratings yet
Math210 03notes
4 pages
Biochem-Experiment 2-Carbohydrates
No ratings yet
Biochem-Experiment 2-Carbohydrates
6 pages
English Exam Unit 3
No ratings yet
English Exam Unit 3
4 pages
Carpentry LP 1
No ratings yet
Carpentry LP 1
7 pages
N I Act Esaay
100% (2)
N I Act Esaay
55 pages
Current Affairs Supplement 2018
No ratings yet
Current Affairs Supplement 2018
46 pages
CMoS s5 Phy Chem Calculations Seminar 01?
100% (1)
CMoS s5 Phy Chem Calculations Seminar 01?
3 pages
Writing A Creative Writing PHD Proposal - Guide Feb 2023
No ratings yet
Writing A Creative Writing PHD Proposal - Guide Feb 2023
3 pages
TOP LOCAL SEO STRATEGIES For 2025
No ratings yet
TOP LOCAL SEO STRATEGIES For 2025
6 pages
DR Jagadish Resume 2022
No ratings yet
DR Jagadish Resume 2022
7 pages
VW - BV (Series 50)
No ratings yet
VW - BV (Series 50)
2 pages
The Second Afghan War Its Causes Its Conduct and Its Consequences (1904)
No ratings yet
The Second Afghan War Its Causes Its Conduct and Its Consequences (1904)
391 pages
Geography S5 P1
No ratings yet
Geography S5 P1
80 pages
Sovenier 2023
No ratings yet
Sovenier 2023
492 pages
6thgrade Math I Can Statements
No ratings yet
6thgrade Math I Can Statements
155 pages
The Empire Writes Back
No ratings yet
The Empire Writes Back
6 pages
Thingsboard EN
No ratings yet
Thingsboard EN
4 pages
Online Distractions, Website Blockers, and Economic Productivity: A Randomized Field Experiment
No ratings yet
Online Distractions, Website Blockers, and Economic Productivity: A Randomized Field Experiment
27 pages
Relation-Reincarnation and Globalisation.
No ratings yet
Relation-Reincarnation and Globalisation.
3 pages
Industrial Visit Report Abhishek Gurjar
No ratings yet
Industrial Visit Report Abhishek Gurjar
21 pages
Historical Development of Fingerprints
No ratings yet
Historical Development of Fingerprints
3 pages
Grade 10 Data Handling QP 2024
No ratings yet
Grade 10 Data Handling QP 2024
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

224s 22 Lec1

Uploaded by

224s 22 Lec1

Uploaded by

CS 224S / LINGUIST 285

Spoken Language Processing

Lecture 1: Course Introduction

Amazon Alexa + Neural TTS voice cloning

Apple Google Microsoft

Don’t create a speech synthesizer / voice clone of

Consider subgroup and language bias when building

Homeworks (2 weeks each):

Homeworks use Colab and PyTorch (AWS for Alexa)

Homeworks out on Tuesdays and due 11:59pm Monday

Gradescope for homework submission

Ed for questions. Use private post for

Final project poster session in person!

Full systems / demos, research papers on individual

Combining projects with other courses is great!

Project handout + intro lecture / discussion soon. Ideally

Andrew: In person after class on Thursdays (projects + other)

CAs: Zoom with Calendly (homework + projects)

Meet your teaching staff!

Simple slot filling (ML or regular expressions)

The same utterance with more context

(Saon et al, 2017)

The word in context

Tacotron. (Wang et al. 2017)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

224s 22 Lec1

Uploaded by

224s 22 Lec1

Uploaded by

CS 224S / LINGUIST 285

Spoken Language Processing

Lecture 1: Course Introduction

Amazon Alexa + Neural TTS voice cloning

Apple Google Microsoft

 Don’t create a speech synthesizer / voice clone of

 Consider subgroup and language bias when building

 Homeworks (2 weeks each):

 Homeworks use Colab and PyTorch (AWS for Alexa)

 Homeworks out on Tuesdays and due 11:59pm Monday

 Gradescope for homework submission

 Ed for questions. Use private post for

 Final project poster session in person!

 Full systems / demos, research papers on individual

 Combining projects with other courses is great!

 Project handout + intro lecture / discussion soon. Ideally

 Andrew: In person after class on Thursdays (projects + other)

 CAs: Zoom with Calendly (homework + projects)

 Meet your teaching staff!

 Simple slot filling (ML or regular expressions)

 The same utterance with more context

(Saon et al, 2017)

 The word in context

Tacotron. (Wang et al. 2017)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Don’t create a speech synthesizer / voice clone of

Consider subgroup and language bias when building

Homeworks (2 weeks each):

Homeworks use Colab and PyTorch (AWS for Alexa)

Homeworks out on Tuesdays and due 11:59pm Monday

Gradescope for homework submission

Ed for questions. Use private post for

Final project poster session in person!

Full systems / demos, research papers on individual

Combining projects with other courses is great!

Project handout + intro lecture / discussion soon. Ideally

Andrew: In person after class on Thursdays (projects + other)

CAs: Zoom with Calendly (homework + projects)

Meet your teaching staff!

Simple slot filling (ML or regular expressions)

The same utterance with more context

The word in context