Probing Knowledge and Structure in Transformers
Probing Knowledge and Structure in Transformers
Structure in Transformers
AAAI Tutorial: Transformers
• Attention Visualization
• Probing Classifiers & Interventions
• “Mechanistic” Analyses
Outline
• Attention Visualization
• Probing Classifiers & Interventions
• “Mechanistic” Analyses
Attention Visualization
What is attention visualization
• Attention Visualization
• Probing Classifiers and Interventions
• “Mechanistic” Analyses
Probing Classifiers
What is a probing classi er?
fi
Probing Classifiers
What is a probing classi er?
Freeze
Freeze
Freeze
Freeze
Designing and Interpreting Probes with Control Tasks. Hewitt and Liang (EMNLP 2019)
Probing Classifiers
Results on Transformer Language Models
Probing Classifiers
Results on Transformer Language Models
The important thing about
Disney is that it is a global brand.
Probing Sentence Structure in Contextualized Word Representations. Tenney et al. (ICLR 2019)
Probing Classifiers & Interventions
Results on Transformer Language Models
The important thing about
Disney is that it is a global brand.
Probing Sentence Structure in Contextualized Word Representations. Tenney et al. (ICLR 2019)
Probing Classifiers & Interventions
Results on Transformer Language Models
The important thing about
Disney is that it is a global brand.
Probing Sentence Structure in Contextualized Word Representations. Tenney et al. (ICLR 2019)
Probing Classifiers & Interventions
Results on Transformer Language Models
The important thing about
Disney is that it is a global brand.
Probing Sentence Structure in Contextualized Word Representations. Tenney et al. (ICLR 2019)
Probing Classifiers & Interventions
Results on Transformer Language Models
The important thing about
Disney is that it is a global brand.
Probing Sentence Structure in Contextualized Word Representations. Tenney et al. (ICLR 2019)
Probing Classifiers & Interventions
Results on Transformer Language Models
Word Prior Full Model
100
88 91
80 79 82
75 75
75 68
54
50
25
0
S
R1
R2
)
st
ep
W
ie
SR
PO
on
.(
.(
SP
SP
tit
D
ef
En
ef
C
or
or
C
C
Probing Sentence Structure in Contextualized Word Representations. Tenney et al. (ICLR 2019)
Probing Classifiers & Interventions
Results on Transformer Language Models
Word Prior Full Model
100 97 96
93 91 91 90
88 86
84 82 84
80 79
75 75
75 68
54 55
50
25
0
S
R1
R2
)
st
ep
W
ie
SR
PO
on
.(
.(
SP
SP
tit
D
ef
En
ef
C
or
or
C
C
Probing Sentence Structure in Contextualized Word Representations. Tenney et al. (ICLR 2019)
Probing Classifiers & Interventions
Results on Transformer Language Models
Large amounts
Change vs. Word Prior
of non-trivial
16 16
16 15 linguistic
13 information
12
encoded
9
8 7
5
4
2
1
0 0 0 0 0 0 0 0 0
0
S
R1
R2
)
st
ep
W
ie
SR
PO
on
.(
.(
SP
SP
tit
D
ef
En
ef
C
or
or
C
C
Probing Sentence Structure in Contextualized Word Representations. Tenney et al. (ICLR 2019)
316
317
318 Probing Classifiers & Interventions Figure 3: Pr
319 Results on Transformer Language Models ers of BERT
ACL 2019 Submission ***. Confidential Review Copy. D
320
ange is the in
over layers. B
321 (a) he smok
(`)
P⌧ (label|s
300
322
301
323
ple, the mode
302
324
noun/date/tem
303
325 ever, this phr
304
326 prets “china
305
Figure 2: Layer-wise metrics on BERT-large. Solid
327
(blue) are mixing weights (§ 3.1); outlined (purple) are network) and
306
328 (`) type and the
differential scores ⌧ (§ 3.2), normalized for each
307
329 (b) china tod
task. Horizontal axis is encoder layer.
Probing Sentence Structure in Contextualized Word Representations. Tenney et al. (ICLR 2019)
316
317
318 Probing Classifiers & Interventions Figure 3: Pr
319 Results on Transformer Language Models ers of BERT
ACL 2019 Submission ***. Confidential Review Copy. D
320
ange is the in
over layers. B
321 (a) he smok
(`)
P⌧ (label|s
300
322
301
323
ple, the mode
302
324
noun/date/tem
303
325 ever, this phr
304
326 Layer in the network. prets “china
305
Figure 2: Layer-wise metrics on BERT-large. Solid
327
(blue) are mixing weights (§ 3.1); outlined (purple) are network) and
306
328 (`) type and the
differential scores ⌧ (§ 3.2), normalized for each
307
329 (b) china tod
task. Horizontal axis is encoder layer.
Probing Sentence Structure in Contextualized Word Representations. Tenney et al. (ICLR 2019)
316
317
318 Probing Classifiers & Interventions Figure 3: Pr
319 Results on Transformer Language Models ers of BERT
ACL 2019 Submission ***. Confidential Review Copy. D
320 Importance of ange is the in
layer in decision over layers. B
321 (a) he smok
(`)
P⌧ (label|s
300
322
301
323
ple, the mode
302
324
noun/date/tem
303
325 ever, this phr
304
326 Layer in the network. prets “china
305
Figure 2: Layer-wise metrics on BERT-large. Solid
327
(blue) are mixing weights (§ 3.1); outlined (purple) are network) and
306
328 (`) type and the
differential scores ⌧ (§ 3.2), normalized for each
307
329 (b) china tod
task. Horizontal axis is encoder layer.
Probing Sentence Structure in Contextualized Word Representations. Tenney et al. (ICLR 2019)
300 350
301 351
Results
305 on Transformer Language Models 355
306 356
307 (b) china today blacked out a cnn interview that was ... 357
308 358
309 359
310 360
311 361
312 362
313 363
314 364
315 365
316 366
317 367
318 Figure 3: Probing classifier predictions across lay- 368
319 ers of BERT-base. Blue is the correct label; or- 369
320
ange is the incorrect label with highest average score 370
over layers. Bar heights are (normalized) probabilities
321 (`) 371
P⌧ (label|s1 , s2 ). Only select tasks shown for space.
322 372
323 373
ple, the model initially tags “today” as a common
324 374
noun/date/temporal modifier (ARGM-TMP). How-
Probing Sentence Structure in Contextualized Word Representations. Tenney et al. (ICLR 2019)
325 375
300 350
301 351
Results
305 on Transformer Language Models 355
306 356
307 (b) china today blacked out a cnn interview that was ... 357
308 358
309 Roughly: 359
Higher-level
310 360
311 361
312 information gets 362
313 363
314
encoded later in 364
315 the network. 365
316 366
317 367
318 Figure 3: Probing classifier predictions across lay- 368
319 ers of BERT-base. Blue is the correct label; or- 369
320
ange is the incorrect label with highest average score 370
over layers. Bar heights are (normalized) probabilities
321 (`) 371
P⌧ (label|s1 , s2 ). Only select tasks shown for space.
322 372
323 373
ple, the model initially tags “today” as a common
324 374
noun/date/temporal modifier (ARGM-TMP). How-
Probing Sentence Structure in Contextualized Word Representations. Tenney et al. (ICLR 2019)
325 375
es (SPR),300 the mix- 354 350
301 351
m, and that
302 nontriv- 355 352
Probing Classifiers & Interventions
resolved gradually
303
304
356
353
354
ntity labeling
305 many (b) china today
Results on Transformer Language Models
blacked out a cnn interview that was ... 357 355
1, but with
306
307
a long (b) china today blacked out a cnn interview that was ...
356
358 357
ak concentration
308 of 359 358
. Further309
study is Roughly: 359
360 360
Higher-level
310
is is because
311 BERT 361
361 362
312
correct 313
abstraction information gets
362 363
antic information
314 is encoded later in 364
315 the network. 363 365
316 366
364
317 367
or many319tasks, we
318 Figure 3: Probing classifier predictions across lay- 365 368
ers of BERT-base. Blue is the correct label; or- 369
s are highest
320 in the ange is the incorrect label with highest average score 366 370
over layers. Bar heights are (normalized) probabilities
yers 1-7322for BERT-
321 (`)
P⌧ (label|s1 , s2 ). Only select tasks shown for space. 367 371
372
be correctly
323 classi- Figure 3: Probing classifierple,predictions across lay-
the model initially tags “today” as a common
368 373
324
te this to325 the avail- ersProbing
of BERT-base. Blue is the correct
noun/date/temporal label;
modifier or-
(ARGM-TMP). How- 374
369 2019)
Sentence Structure in Contextualized Word Representations. Tenney et al. (ICLR 375
Probing Classifiers & Interventions
Results on Transformer Language Models
What Happens to BERT Embeddings During Finetuning? Merchant et al. (BlackboxNLP Workshop 2020)
Probing Interventions
What are probing interventions?
Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals. Elazar et al. (TACL 2021)
Probing Interventions
Results on Transformer Language Models
What if This Modi ed That? Syntactic Interventions via Counterfactual Embeddings. Tucker et al. (Findings of ACL 2020)
fi
Probing Classifiers & Interventions
Summary
• Attention Visualization
• Probing Classifiers & Interventions
• “Mechanistic” Analyses
Mechanistic Analyses
What are “mechanistic” analyses?
Transformer Feed-Forward Layers Are Key-Value Memories. Geva et al. (EMNLP 2021)
Manipulating these MLPs
Mechanistic Analyses can enable
Results on Transformers counterfactual
interventions
Locating and Editing Factual Associations in GPT. Meng et al. (NeurIPS 2022)
Mechanistic Analyses
Summary