Lecture2.2 UnimodalRepresentations Part2
Lecture2.2 UnimodalRepresentations Part2
3
Reading Assignments – Weekly Schedule
4
Team Matching Event – Today!
5
AWS Credits
6
Multimodal Machine Learning
Lecture 2.2: Unimodal Representations (Part 2)
Louis-Philippe Morency
8
Word
Representations
9
Simple Word Representation
Written language
0
Input observation 𝒙𝒊
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
…
“one-hot” vector
𝒙𝒊 = number of words in dictionary
10
What is the meaning of “bardiwac”?
11
How to learn (word) features/representations?
12
Geometric interpretation
▪ can be seen as
coordinates of point in
n-dimensional
Euclidean space Rn
▪ direction more
important than
location
▪ normalise “length”
||xdog|| of vector
▪ or use angle as
distance measure
16
How to learn (word) features/representations?
He
100 000d
Was
100 000d W1 W2
walking x y
Away
because
300d 300d
[0; 0; 0; 0;….; 0; 0; 1; 0;…; 0; 0] [0; 1; 0; 0;….; 0; 0; 0; 0;…; 0; 0]
[0; 0; 0; 1;….; 0; 0; 0; 0;…; 0; 0]
He was walking away because … [0; 0; 0; 0;….; 1; 0; 0; 0;…; 0; 0]
He was running away because … [0; 0; 0; 0;….; 0; 0; 0; 0;…; 0; 1]
Word2vec algorithm: https://code.google.com/p/word2vec/
17
How to use these word representations
100 000d
Similarity = 0.0 x W1
Transform: x’=x*W
Goal: 300 dimensional vector
300d
Walking: [0,1; 0,0003; 0;….; 0,02; 0.08; 0,05]
Running: [0,1; 0,0004; 0;….; 0,01; 0.09; 0,05]
Similarity = 0.9
18
Vector space models of words
19
Vector space models of words: semantic relationships
Do these work?
Issues of bias here
e.g. https://arxiv.org/abs/1607.06520
https://aclanthology.org/W14-1618.pdf
vec(programmer) – vec(man) +
Trained on the Google news corpus with over 300 billion words
vec(woman) ≈ vec(homemaker)
20
Sentence Modeling
and Recurrent Networks
21
Sentence Modeling: Sequence Prediction
Part-of-speech ?
Prediction (noun, verb,…)
Sentiment ?
(positive or negative)
22
RNN for Sequence Prediction
23
Recurrent Neural Network
𝑽
(𝑡) (𝑡)
𝒉(𝑡) 𝒉 = 𝑡𝑎𝑛ℎ(𝑼𝒙 )
𝑼
𝒙(𝑡)
24
Recurrent Neural Networks
𝐿 = 𝐿(𝑡)
𝑡
𝒉(𝑡) = 𝑡𝑎𝑛ℎ(𝑼𝒙 𝑡
+ 𝑾𝒉(𝑡−1) )
𝑼
𝒙(𝑡)
25
Recurrent Neural Networks - Unrolling
𝐿 = 𝐿(𝑡)
𝑡
𝑦 (1) (𝑡)
𝑦 (2) 𝑦 (3) 𝑦 (𝑡)
𝒛(𝟏) 𝒛 = 𝑚𝑎𝑡𝑚𝑢𝑙𝑡(𝒉(𝑡) , 𝑽) 𝒛(2) 𝒛(3) 𝒛(𝑡)
𝑾
𝑽
𝒉(1) 𝒉(2) 𝒉(3) 𝒉(𝑡)
𝒉(𝑡) = 𝑡𝑎𝑛ℎ(𝑼𝒙 𝑡
+ 𝑾𝒉(𝑡−1) )
𝑼
𝒙(1) 𝒙(2) 𝒙(3) 𝒙(𝑡)
26
Sentence Modeling: Sequence Label Prediction
Prediction Sentiment ?
(positive or negative)
Sentiment label?
27
RNN for Sequence Prediction
P(sequence is
positive)
28
Language Models
29
Sentence Modeling: Language Model
Next word?
30
Language Model Application: Speech Recognition
Language model
31
RNN for Language Model
32
RNN for Sequence Representation (Encoder)
Sequence
Representation
33
Bi-Directional RNN
Sequence
𝑍𝐵 𝑍𝐴
Representation
𝑍𝐵
𝑍𝐴
34
Pre-training and “Masking”
P(masked word
is “on”)
35
RNN-based for Machine Translation
1-of-N encoding 1-of-N encoding 1-of-N encoding 1-of-N encoding 1-of-N encoding
of “le” of “chien” of “sur” of “la” of “plage”
36
Encoder-Decoder Architecture
Context
1-of-N encoding 1-of-N encoding 1-of-N encoding 1-of-N encoding 1-of-N encoding
of “le” of “chien” of “sur” of “la” of “plage”
What is the loss function?
37
And There Are More Ways To Model Sequences…
Self-attention Models
(e.g., BERT, RoBERTa)
38
Syntax and
Language Structure
39
Syntax and Language Structure
Noun phrase
40
Syntax and Language Structure
Noun phrase
41
Ambiguity in Syntactic Parsing
42
42
Language Ambiguity
S S
NP VP NP VP
N V NP N V NP NP
Det N N Det N N
Salesmen sold the dog biscuits Salesmen sold the dog biscuits
43
Language Syntax – Examples
S
VP
NP NP
Det N V Det N
The boy likes the cars The boy likes the cars
45
Recursive Neural Unit
300d
x
1 t
X
300d
600d
a
W n
1
2
300d
x h
The boy 2
46
Resources
47
Resources
▪ spaCy (https://spacy.io/)
▪ POS tagger, dependency parser, etc.
▪ Berkeley Neural Parser (Constituency Parser)
▪ Software: https://github.com/nikitakit/self-attentive-parser
▪ Demo: https://parser.kitaev.io/
▪ Stanford NLP software
▪ Stanza: https://stanfordnlp.github.io/stanza/index.html
▪ Others (some are outdated):
https://nlp.stanford.edu/software/
48
Word Representation Resources
Word-level representations:
Word2Vec (Google, 2013)
https://code.google.com/archive/p/word2vec/
Glove (Stanford, 2014)
https://nlp.stanford.edu/projects/glove/ Factorizing co-occurrence matrix
FastText (Facebook, 2017)
https://fasttext.cc/ Uses sub-word information (e.g. walk
Contextual representations: and walking are similar?)
ELMO (Allen Institute for AI, 2018)
https://allennlp.org/elmo
BERT (Google, 2018) Word representations are
https://github.com/google-research/bert contextualized using all the
RoBERTa (Facebook, 2019) words in the sentence. And
https://github.com/pytorch/fairseq
sentence reps.
Lexicon-based Word Representation
50
Other Lexicon Resources
Lexicons
• General Inquirer (Stone et al., 1966)
• OpinionFinder lexicon (Wiebe & Riloff, 2005)
• SentiWordNet (Esuli & Sebastiani, 2006)
• LIWC (Pennebaker)
Other Tools
• LightSIDE
• Stanford NLP toolbox
• IBM Watson Tone Analyzer
• Google Cloud Natural Language
• Microsoft Azure Text Analytics
51