0% found this document useful (0 votes)
13 views51 pages

Lecture2.2 UnimodalRepresentations Part2

The document outlines the administrative details and objectives for Lecture 2.2 of the Multimodal Machine Learning course, focusing on unimodal representations. Key topics include word representations, learning neural representations, sentence modeling, and recurrent neural networks. Additionally, it covers reading assignments, team matching events, and AWS credits for students.

Uploaded by

zhizhang28600
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views51 pages

Lecture2.2 UnimodalRepresentations Part2

The document outlines the administrative details and objectives for Lecture 2.2 of the Multimodal Machine Learning course, focusing on unimodal representations. Key topics include word representations, learning neural representations, sentence modeling, and recurrent neural networks. Additionally, it covers reading assignments, team matching events, and AWS credits for students.

Uploaded by

zhizhang28600
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Multimodal Machine Learning

Lecture 2.2: Unimodal Representations (Part 2)


Louis-Philippe Morency

* Co-lecturer: Paul Liang. Original course co-developed


with Tadas Baltrusaitis. Spring 2021 and 2022 editions
taught by Yonatan
1 Bisk
Administrative Stuff
Lecture Highlight Form

Deadline: Today, Thursday at 9pm ET

Use your Andrew CMU email


You will need to login using this address

New form for each lecture


Posted on Piazza’s Resources section

You should start taking notes as soon


as the administrative stuff is over!
Contact us if you have any problem

3
Reading Assignments – Weekly Schedule

Four main steps for the reading assignments


1. Monday 8pm: Official start of the assignment
2. Wednesday 8pm: Select your paper
3. Friday 8pm: Post your summary
4. Monday 8pm: Post your extra comments (3 posts)

4
Team Matching Event – Today!

Today around 10:30am ET


(later part of the lecture)

Detailed instructions will be shared during lecture

Event optional for students who already have a full team

5
AWS Credits

New procedure this semester!


▪ We need your AWS account info (deadline: Tuesday 9/12)
▪ Max $150 credit for the whole semester. No exception.
▪ More details will be sent on Piazza

Alternative: Amazon SageMaker Studio Lab


▪ Similar to Google Colab (link)
▪ No cost, easy access to JupyterLab-based user interface
▪ Access to G4dn.xlarge instances

6
Multimodal Machine Learning
Lecture 2.2: Unimodal Representations (Part 2)
Louis-Philippe Morency

* Co-lecturer: Paul Liang. Original course co-developed


with Tadas Baltrusaitis. Spring 2021 and 2022 editions
taught by Yonatan
7 Bisk
Lecture Objectives
▪ Word representations
▪ Distributional hypothesis
▪ Learning neural representations
▪ Sentence representations and sequence modeling
▪ Recurrent neural networks
▪ Language models
▪ Syntax and language structure
▪ Phrase-structure and dependency grammars
▪ Recursive neural network
▪ Tree-based RNN

8
Word
Representations
9
Simple Word Representation

Written language
0

Input observation 𝒙𝒊
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0


“one-hot” vector
𝒙𝒊 = number of words in dictionary

10
What is the meaning of “bardiwac”?

▪ He handed her her glass of bardiwac.


▪ Beef dishes are made to complement the bardiwacs.
▪ Nigel staggered to his feet, face flushed from too much bardiwac.
▪ Malbec, one of the lesser-known bardiwac grapes, responds well to
Australia’s sunshine.
▪ I dined off bread and cheese and this excellent bardiwac.
▪ The drinks were delicious: blood-red bardiwac as well as light, sweet
Rhenish.
 bardiwac is a heavy red alcoholic beverage made from grapes

11
How to learn (word) features/representations?

Distribution hypothesis: Approximate the


word meaning by its surrounding words

Words used in a similar context will lie close together

He was walking away because …


He was running away because …

12
Geometric interpretation

▪ row vector xdog


describes usage of
word dog in the
corpus

▪ can be seen as
coordinates of point in
n-dimensional
Euclidean space Rn

Stefan Evert 2010 13


Distance and similarity

▪ illustrated for two


dimensions: get and
use: xdog = (115, 10)
▪ similarity = spatial
proximity (Euclidean
distance)
▪ location depends on
frequency of noun
(fdog  2.7 · fcat)

Stefan Evert 2010 14


Angle and similarity

▪ direction more
important than
location
▪ normalise “length”
||xdog|| of vector
▪ or use angle  as
distance measure 

Stefan Evert 2010 15


How to learn (word) features/representations?

Distribution hypothesis: Approximate the


word meaning by its surrounding words

Words used in a similar context will lie close together

He was walking away because …


He was running away because …

Instead of capturing co-occurrence counts directly,


predict surrounding words of every word

16
How to learn (word) features/representations?

He

100 000d
Was
100 000d W1 W2
walking x y
Away
because

300d 300d
[0; 0; 0; 0;….; 0; 0; 1; 0;…; 0; 0] [0; 1; 0; 0;….; 0; 0; 0; 0;…; 0; 0]
[0; 0; 0; 1;….; 0; 0; 0; 0;…; 0; 0]
He was walking away because … [0; 0; 0; 0;….; 1; 0; 0; 0;…; 0; 0]
He was running away because … [0; 0; 0; 0;….; 0; 0; 0; 0;…; 0; 1]
Word2vec algorithm: https://code.google.com/p/word2vec/

17
How to use these word representations

If we would have a vocabulary of 100 000 words:

Classic NLP: 100 000 dimensional vector


Walking: [0; 0; 0; 0;….; 0; 0; 1; 0;…; 0; 0]
Running: [0; 0; 0; 0;….; 0; 0; 0; 0;…; 1; 0]

100 000d
Similarity = 0.0 x W1
Transform: x’=x*W
Goal: 300 dimensional vector
300d
Walking: [0,1; 0,0003; 0;….; 0,02; 0.08; 0,05]
Running: [0,1; 0,0004; 0;….; 0,01; 0.09; 0,05]

Similarity = 0.9

18
Vector space models of words

While learning these word representations, we are


actually building a vector space in which all words
reside with certain relationships between them

Encodes both syntactic and semantic relationships

This vector space allows for algebraic operations:

Vec(king) – vec(man) + vec(woman) ≈ vec(queen)

19
Vector space models of words: semantic relationships

Do these work?
Issues of bias here
e.g. https://arxiv.org/abs/1607.06520
https://aclanthology.org/W14-1618.pdf

vec(programmer) – vec(man) +
Trained on the Google news corpus with over 300 billion words
vec(woman) ≈ vec(homemaker)

20
Sentence Modeling
and Recurrent Networks
21
Sentence Modeling: Sequence Prediction

Part-of-speech ?
Prediction (noun, verb,…)

Sentiment ?
(positive or negative)

POS? POS? POS? POS? POS? POS? POS? POS?

Ideal for anyone with an interest in disguises

22
RNN for Sequence Prediction

P(word is P(word is P(word is P(word is


positive) positive) positive) positive)

Ideal for anyone disguises


𝟏 𝟏
What is the loss? 𝑳= ෍ 𝑳(𝒕) = ෍ −𝒍𝒐𝒈𝑷(𝒀 = 𝒚(𝒕) |𝒛(𝒕) )
𝑵 𝑵
𝒕 𝒕

23
Recurrent Neural Network

Feedforward Neural Network

𝐿(𝑡) 𝐿(𝑡) = −𝑙𝑜𝑔𝑃(𝑌 = 𝑦 (𝑡) |𝒛(𝑡) )

𝑦 (𝑡) (𝑡) (𝑡)


𝒛(𝑡) 𝒛 = 𝑚𝑎𝑡𝑚𝑢𝑙𝑡(𝒉 , 𝑽)

𝑽
(𝑡) (𝑡)
𝒉(𝑡) 𝒉 = 𝑡𝑎𝑛ℎ(𝑼𝒙 )

𝑼
𝒙(𝑡)

24
Recurrent Neural Networks

𝐿 = ෍ 𝐿(𝑡)
𝑡

𝐿(𝑡) 𝐿(𝑡) = −𝑙𝑜𝑔𝑃(𝑌 = 𝑦 (𝑡) |𝒛(𝑡) )

𝑦 (𝑡) (𝑡) (𝑡)


𝒛(𝑡) 𝒛 = 𝑚𝑎𝑡𝑚𝑢𝑙𝑡(𝒉 , 𝑽)
𝑾
𝑽
𝒉(𝑡)

𝒉(𝑡) = 𝑡𝑎𝑛ℎ(𝑼𝒙 𝑡
+ 𝑾𝒉(𝑡−1) )
𝑼
𝒙(𝑡)

25
Recurrent Neural Networks - Unrolling

𝐿 = ෍ 𝐿(𝑡)
𝑡

𝐿(1) 𝐿(𝑡) = −𝑙𝑜𝑔𝑃(𝑌 = 𝑦 (𝑡) |𝒛(𝑡) ) 𝐿(2) 𝐿(3) 𝐿(𝑡)

𝑦 (1) (𝑡)
𝑦 (2) 𝑦 (3) 𝑦 (𝑡)
𝒛(𝟏) 𝒛 = 𝑚𝑎𝑡𝑚𝑢𝑙𝑡(𝒉(𝑡) , 𝑽) 𝒛(2) 𝒛(3) 𝒛(𝑡)
𝑾
𝑽
𝒉(1) 𝒉(2) 𝒉(3) 𝒉(𝑡)

𝒉(𝑡) = 𝑡𝑎𝑛ℎ(𝑼𝒙 𝑡
+ 𝑾𝒉(𝑡−1) )
𝑼
𝒙(1) 𝒙(2) 𝒙(3) 𝒙(𝑡)

Same model parameters are used for all time parts.

26
Sentence Modeling: Sequence Label Prediction

Prediction Sentiment ?
(positive or negative)

Sentiment label?

Ideal for anyone with an interest in disguises

27
RNN for Sequence Prediction

P(sequence is
positive)

Ideal for anyone disguises


What is the loss? 𝑳 = 𝑳(𝑵) = −𝒍𝒐𝒈𝑷(𝒀 = 𝒚(𝑵) |𝒛(𝑵) )

28
Language Models
29
Sentence Modeling: Language Model

Prediction Next word

Next word?

Ideal for anyone with an interest in disguises

30
Language Model Application: Speech Recognition

arg max P( wordsequence | acoustics) =


wordsequence

P(acoustics | wordsequence)  P( wordsequence)


arg max
wordsequence P(acoustics)

arg max P(acoustics | wordsequence)  P( wordsequence)


wordsequence

Language model

31
RNN for Language Model

P(next word is P(next word is P(next word is P(next word is


“dog”) “on”) “the”) “beach”)

1-of-N encoding 1-of-N encoding 1-of-N encoding 1-of-N encoding


of “START” of “dog” of “on” of “nice”

32
RNN for Sequence Representation (Encoder)

Sequence
Representation

1-of-N encoding 1-of-N encoding 1-of-N encoding 1-of-N encoding


of “START” of “dog” of “on” of “nice”

33
Bi-Directional RNN

Sequence
𝑍𝐵 𝑍𝐴
Representation

𝑍𝐵

𝑍𝐴

34
Pre-training and “Masking”

P(masked word
is “on”)

“The” “dog” MASKED “the” “beach”

(short-lived) ELMO was a bi-directional pretrained language model

35
RNN-based for Machine Translation

Le chien sur la plage The dog on the beach

1-of-N encoding 1-of-N encoding 1-of-N encoding 1-of-N encoding 1-of-N encoding
of “le” of “chien” of “sur” of “la” of “plage”
36
Encoder-Decoder Architecture

Context

1-of-N encoding 1-of-N encoding 1-of-N encoding 1-of-N encoding 1-of-N encoding
of “le” of “chien” of “sur” of “la” of “plage”
What is the loss function?
37
And There Are More Ways To Model Sequences…

Self-attention Models
(e.g., BERT, RoBERTa)

38
Syntax and
Language Structure
39
Syntax and Language Structure

What can you tell about this sentence?


Sentence
Phrase-structure Grammar
Noun Verb phrase
phrase 2 Syntactic parse tree

Noun phrase

Noun Verb Adjective Noun 1 Part-of-speech tags

Alice ate yellow squash

40
Syntax and Language Structure

What can you tell about this sentence?


Sentence Phrase-structure Grammar

Noun Verb phrase


phrase 2 Syntactic parse tree

Noun phrase

Noun Verb Adjective Noun 1 Part-of-speech tags

Alice ate yellow squash 3 Dependency Grammar


subject attribute
object

41
Ambiguity in Syntactic Parsing

“Like” can be a verb or a preposition


▪ I like/VBP candy.
▪ Time flies like/IN an arrow.
“Around” can be a preposition, particle, or adverb
▪ I bought it at the shop around/IN the corner.
▪ I never got around/RP to getting a car.
▪ A new Prius costs around/RB $25K.

42
42
Language Ambiguity

S S

NP VP NP VP

N V NP N V NP NP

Det N N Det N N

Salesmen sold the dog biscuits Salesmen sold the dog biscuits

43
Language Syntax – Examples

Det Noun Verb Det Noun Prep Det Noun


The boy saw the dog in the park
Part of Speech tagging
S
Object
VP Subject
Det.
NP NP Det.

Det N V Det N The boy saw the dog


The boy saw the dog ROOT

Constituency Parsing Dependency Parsing


How to take advantage of syntax when modeling
language with neural networks?
44
Tree-based RNNs (or Recursive Neural Network)

S
VP
NP NP
Det N V Det N
The boy likes the cars The boy likes the cars

45
Recursive Neural Unit

Pair-wise combination of two input features

300d
x
1 t
X

300d
600d
a
W n
1
2

300d
x h
The boy 2

300d Activation function

46
Resources
47
Resources

▪ spaCy (https://spacy.io/)
▪ POS tagger, dependency parser, etc.
▪ Berkeley Neural Parser (Constituency Parser)
▪ Software: https://github.com/nikitakit/self-attentive-parser
▪ Demo: https://parser.kitaev.io/
▪ Stanford NLP software
▪ Stanza: https://stanfordnlp.github.io/stanza/index.html
▪ Others (some are outdated):
https://nlp.stanford.edu/software/

48
Word Representation Resources

Word-level representations:
Word2Vec (Google, 2013)
https://code.google.com/archive/p/word2vec/
Glove (Stanford, 2014)
https://nlp.stanford.edu/projects/glove/ Factorizing co-occurrence matrix
FastText (Facebook, 2017)
https://fasttext.cc/ Uses sub-word information (e.g. walk
Contextual representations: and walking are similar?)
ELMO (Allen Institute for AI, 2018)
https://allennlp.org/elmo
BERT (Google, 2018) Word representations are
https://github.com/google-research/bert contextualized using all the
RoBERTa (Facebook, 2019) words in the sentence. And
https://github.com/pytorch/fairseq
sentence reps.
Lexicon-based Word Representation

LIWC: Language Inquiry & Word Count


Manually created dictionaries for different topics and categories:
▪ Function words: pronouns, preposition, negation…
▪ Affect words: positive, negative emotions
▪ Social words: family, friends, referents
▪ Cognitive processes: Insight, cause, …
▪ Perceptual processes: Seeing, hearing, feeling
▪ Biological processes: Body, health/illness,…
▪ Drives and needs: Affiliation, achievement, …
▪ Time orientation: past, present, future
▪ Relativity: motion, space, time
▪ Personal concerns: work, leisure, money, religion …
▪ Informal speech: swear words, fillers, assent,…
LIWC can encode individual words or full sentences.
Commercial software. Contact TAs in
https://liwc.wpengine.com/ advance if you would like to use it.

50
Other Lexicon Resources

Lexicons
• General Inquirer (Stone et al., 1966)
• OpinionFinder lexicon (Wiebe & Riloff, 2005)
• SentiWordNet (Esuli & Sebastiani, 2006)
• LIWC (Pennebaker)

Other Tools
• LightSIDE
• Stanford NLP toolbox
• IBM Watson Tone Analyzer
• Google Cloud Natural Language
• Microsoft Azure Text Analytics

51

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy