0% found this document useful (0 votes)

15 views49 pages

Probing Knowledge and Structure in Transformers

This document summarizes common methods for analyzing what information is stored in Transformer models, including attention visualization, probing classifiers, and mechanistic analyses. Probing classifiers involve training additional models to predict properties like part-of-speech tags based on representations from a Transformer. Studies show Transformers capture linguistic information like syntax but the representations are not entirely interpretable.

Uploaded by

Jeremy Wayin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views49 pages

Probing Knowledge and Structure in Transformers

Uploaded by

Jeremy Wayin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Probing Knowledge and

Structure in Transformers
AAAI Tutorial: Transformers

Ellie Pavlick, Brown University, February 2023

Goals

• Methods for answering “What does my Transformer know? And

where is that information stored?”
• Survey several common methods for analyzing Transformers
• Summarize some key findings about how Transformers work
Limitations

• My expertise in in language, so my examples will be mostly drawn

from NLP. Similar trends likely hold in vision, RL, etc…
• There is a lot of very active work on analyzing large Transformer
language models, I am only covering the basics
• There is much more that we do not know than that we do know :)
Outline

• Attention Visualization
• Probing Classifiers & Interventions
• “Mechanistic” Analyses
Outline

• Attention Visualization
• Probing Classifiers & Interventions
• “Mechanistic” Analyses
Attention Visualization
What is attention visualization

• When trying to determine what your Transformer is doing, analyzing

attention is usually the first stop!
• Works like “feature heatmaps” in vision
• Widely agreed to not be decisive evidence of anything causal, but
still commonly used (you will see it in many papers)
Attention Visualization
What is attention visualization
Attention Visualization
What is attention visualization
Outline

• Attention Visualization
• Probing Classifiers and Interventions
• “Mechanistic” Analyses
Probing Classifiers
What is a probing classi er?

fi
Probing Classifiers
What is a probing classi er?

the car is not blue </s>

<s> the car is not blue

fi
Probing Classifiers
What is a probing classi er?

Freeze

<s> the car is not blue

fi
Probing Classifiers
What is a probing classi er?
Noun? E.g.,
Linguistic
Small Clf Knowledge

Freeze

<s> the car is not blue

fi
Probing Classifiers
What is a probing classi er?
modifier? E.g.,
Linguistic
MLP Knowledge

Freeze

<s> the car is not blue

fi
Probing Classifiers
What is a probing classi er?
Is it smaller than a breadbox?
E.g.,
Commonsense
Small Clf Knowledge

Freeze

<s> the car is not blue

fi
Probing Classifiers
What is a probing classi er?

• Training probe requires auxiliary data that captures the phenomena

you are interested in
• And requires knowing what you are looking for in advance!
• Prone to false positives, but there are methods to minimize this
• Typically people choose a “small” classifier, but this is quite
informal
• Control Tasks (Hewitt and Liang, 2019)
• Information-Theoretic Probing (Voita et al, 2020)
fi
Probing Classifiers
Minimizing False Positives
Probing Classifiers
Minimizing False Positives

Designing and Interpreting Probes with Control Tasks. Hewitt and Liang (EMNLP 2019)
Probing Classifiers
Results on Transformer Language Models
Probing Classifiers
Results on Transformer Language Models
The important thing about
Disney is that it is a global brand.

Probing Sentence Structure in Contextualized Word Representations. Tenney et al. (ICLR 2019)
Probing Classifiers & Interventions
Results on Transformer Language Models
The important thing about
Disney is that it is a global brand.

Noun Verb Noun

Entity Not Ent

Noun Verb Noun

Entity Not Ent

Noun Verb Noun

Entity Not Ent

Probing Sentence Structure in Contextualized Word Representations. Tenney et al. (ICLR 2019)
Probing Classifiers & Interventions
Results on Transformer Language Models
Word Prior Full Model
100
88 91
80 79 82
75 75
75 68
54
50

0
S

)
st

W
ie

SR
PO

.(
SP

SP
tit
D

ef
En

ef
C

or
C

C
Probing Sentence Structure in Contextualized Word Representations. Tenney et al. (ICLR 2019)
Probing Classifiers & Interventions
Results on Transformer Language Models
Word Prior Full Model
100 97 96
93 91 91 90
88 86
84 82 84
80 79
75 75
75 68
54 55
50

0
S

)
st

W
ie

SR
PO

.(
SP

SP
tit
D

ef
En

ef
C

or
C

C
Probing Sentence Structure in Contextualized Word Representations. Tenney et al. (ICLR 2019)
Probing Classifiers & Interventions
Results on Transformer Language Models
Large amounts
Change vs. Word Prior
of non-trivial
16 16
16 15 linguistic
13 information
12
encoded
9
8 7
5
4
2
1
0 0 0 0 0 0 0 0 0
0
S

)
st

W
ie

SR
PO

.(
SP

SP
tit
D

ef
En

ef
C

or
C

C
Probing Sentence Structure in Contextualized Word Representations. Tenney et al. (ICLR 2019)
316
317
318 Probing Classifiers & Interventions Figure 3: Pr
319 Results on Transformer Language Models ers of BERT
ACL 2019 Submission ***. Confidential Review Copy. D
320
ange is the in
over layers. B
321 (a) he smok
(`)
P⌧ (label|s
300
322
301
323
ple, the mode
302
324
noun/date/tem
303
325 ever, this phr
304
326 prets “china
305
Figure 2: Layer-wise metrics on BERT-large. Solid
327
(blue) are mixing weights (§ 3.1); outlined (purple) are network) and
306
328 (`) type and the
differential scores ⌧ (§ 3.2), normalized for each
307
329 (b) china tod
task. Horizontal axis is encoder layer.
Probing Sentence Structure in Contextualized Word Representations. Tenney et al. (ICLR 2019)
316
317
318 Probing Classifiers & Interventions Figure 3: Pr
319 Results on Transformer Language Models ers of BERT
ACL 2019 Submission ***. Confidential Review Copy. D
320
ange is the in
over layers. B
321 (a) he smok
(`)
P⌧ (label|s
300
322
301
323
ple, the mode
302
324
noun/date/tem
303
325 ever, this phr
304
326 Layer in the network. prets “china
305
Figure 2: Layer-wise metrics on BERT-large. Solid
327
(blue) are mixing weights (§ 3.1); outlined (purple) are network) and
306
328 (`) type and the
differential scores ⌧ (§ 3.2), normalized for each
307
329 (b) china tod
task. Horizontal axis is encoder layer.
Probing Sentence Structure in Contextualized Word Representations. Tenney et al. (ICLR 2019)
316
317
318 Probing Classifiers & Interventions Figure 3: Pr
319 Results on Transformer Language Models ers of BERT
ACL 2019 Submission ***. Confidential Review Copy. D
320 Importance of ange is the in
layer in decision over layers. B
321 (a) he smok
(`)
P⌧ (label|s
300
322
301
323
ple, the mode
302
324
noun/date/tem
303
325 ever, this phr
304
326 Layer in the network. prets “china
305
Figure 2: Layer-wise metrics on BERT-large. Solid
327
(blue) are mixing weights (§ 3.1); outlined (purple) are network) and
306
328 (`) type and the
differential scores ⌧ (§ 3.2), normalized for each
307
329 (b) china tod
task. Horizontal axis is encoder layer.
Probing Sentence Structure in Contextualized Word Representations. Tenney et al. (ICLR 2019)
300 350
301 351

Probing Classifiers & Interventions

302 352
303 353
304 354

Results
305 on Transformer Language Models 355
306 356
307 (b) china today blacked out a cnn interview that was ... 357
308 358
309 359
310 360
311 361
312 362
313 363
314 364
315 365
316 366
317 367
318 Figure 3: Probing classifier predictions across lay- 368
319 ers of BERT-base. Blue is the correct label; or- 369
320
ange is the incorrect label with highest average score 370
over layers. Bar heights are (normalized) probabilities
321 (`) 371
P⌧ (label|s1 , s2 ). Only select tasks shown for space.
322 372
323 373
ple, the model initially tags “today” as a common
324 374
noun/date/temporal modifier (ARGM-TMP). How-
Probing Sentence Structure in Contextualized Word Representations. Tenney et al. (ICLR 2019)
325 375
300 350
301 351

Probing Classifiers & Interventions

302 352
303 353
304 354

Results
305 on Transformer Language Models 355
306 356
307 (b) china today blacked out a cnn interview that was ... 357
308 358
309 Roughly: 359

Higher-level
310 360
311 361
312 information gets 362
313 363
314
encoded later in 364
315 the network. 365
316 366
317 367
318 Figure 3: Probing classifier predictions across lay- 368
319 ers of BERT-base. Blue is the correct label; or- 369
320
ange is the incorrect label with highest average score 370
over layers. Bar heights are (normalized) probabilities
321 (`) 371
P⌧ (label|s1 , s2 ). Only select tasks shown for space.
322 372
323 373
ple, the model initially tags “today” as a common
324 374
noun/date/temporal modifier (ARGM-TMP). How-
Probing Sentence Structure in Contextualized Word Representations. Tenney et al. (ICLR 2019)
325 375
es (SPR),300 the mix- 354 350
301 351
m, and that
302 nontriv- 355 352
Probing Classifiers & Interventions
resolved gradually
303
304
356
353
354
ntity labeling
305 many (b) china today
Results on Transformer Language Models
blacked out a cnn interview that was ... 357 355
1, but with
306
307
a long (b) china today blacked out a cnn interview that was ...
356
358 357
ak concentration
308 of 359 358

. Further309
study is Roughly: 359
360 360
Higher-level
310

is is because
311 BERT 361
361 362
312
correct 313
abstraction information gets
362 363
antic information
314 is encoded later in 364
315 the network. 363 365
316 366
364
317 367

or many319tasks, we
318 Figure 3: Probing classifier predictions across lay- 365 368
ers of BERT-base. Blue is the correct label; or- 369
s are highest
320 in the ange is the incorrect label with highest average score 366 370
over layers. Bar heights are (normalized) probabilities
yers 1-7322for BERT-
321 (`)
P⌧ (label|s1 , s2 ). Only select tasks shown for space. 367 371
372

be correctly
323 classi- Figure 3: Probing classifierple,predictions across lay-
the model initially tags “today” as a common
368 373
324
te this to325 the avail- ersProbing
of BERT-base. Blue is the correct
noun/date/temporal label;
modifier or-
(ARGM-TMP). How- 374
369 2019)
Sentence Structure in Contextualized Word Representations. Tenney et al. (ICLR 375
Probing Classifiers & Interventions
Results on Transformer Language Models

What Happens to BERT Embeddings During Finetuning? Merchant et al. (BlackboxNLP Workshop 2020)
Probing Interventions
What are probing interventions?

• Probing classifiers show you that a feature is represented

• It is purely correlative, so doesn’t speak to whether the full model
uses that feature
• Newer efforts have used the probe to run interventions
• Note: the word “causal” is debated in relation to these kinds of
studies
Probing Interventions
What are probing interventions?
Probing Interventions
What are probing interventions?

Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals. Elazar et al. (TACL 2021)
Probing Interventions
Results on Transformer Language Models

What if This Modi ed That? Syntactic Interventions via Counterfactual Embeddings. Tucker et al. (Findings of ACL 2020)
fi
Probing Classifiers & Interventions
Summary

• Used to detect if a feature is “represented” and where

• Primary Limitations:
• Probes alone are purely correlative and prone to false positives (though there are
controls)
• Methods for “causal”* intervention are under development but not yet standard
• Requires data specifying what to look for upfront
• Key Findings
• Transformer language models represent a significant amount of linguistic information
• This information is often organized across layers in an interpretable way
• Ablating the information often leads to expected changes in downstream behavior
Outline

• Attention Visualization
• Probing Classifiers & Interventions
• “Mechanistic” Analyses
Mechanistic Analyses
What are “mechanistic” analyses?

• Work by Chris Olah and others at

Anthropic popularized this term in
• Typically requires very manual
and architecture-specific analysis
of weights in the network
• But results are very low-level
interpretations of what the model
does and where
Mechanistic Analyses
Results on Transformers
“Induction Heads” which
Mechanistic Analyses copy next-token from
Results on Transformers previous points in the
context.

A Mathematical Framework for Transformer Circuits. Elhage et al. (Distil 2021)

Mechanistic Analyses
Results on Transformers
Individual neurons
Mechanistic Analyses respond to specific key
Results on Transformers patterns, to produce a
distribution over words

Transformer Feed-Forward Layers Are Key-Value Memories. Geva et al. (EMNLP 2021)
Manipulating these MLPs
Mechanistic Analyses can enable
Results on Transformers counterfactual
interventions

Locating and Editing Factual Associations in GPT. Meng et al. (NeurIPS 2022)
Mechanistic Analyses
Summary

• Promising new of work looks at low-level algorithms in play within the

Transformer
• These studies have revealed, e.g., that
• Attention heads can act as generic copy functions which move
information around in the network
• Feedforward layers can act as key-value stores which recall an
distribution over output words for a given input concept
• Main limitation is that this type of work tends to be very manual and
architecture-specific
General Discussion
Summary

• Work on analyzing neural networks (Transformers specifically) has become

increasingly less “black box”
• There is a lot we don’t know about how the models work, but there is also a lot
we do know!
• Transformer LMs capture a lot of conceptual and linguistic knowledge and
organize it in meaningful ways
• Their attention+MLP mechanisms appear to work together to copy
information and perform abstract lookups from memory
• We can sometimes intervene on individual concepts or components to
manipulate the Transformer’s behavior
General Discussion
Future Directions

• Still a lot of work needed to better understand mechanisms, and

understand behavior of large models “in the wild”
• Disagreement remains on when it is appropriate to attribute
“causality” to some/all of the model’s parameters
• Most work is in NLP—but Transformers are being applied to many
domains!
• Converging (or diverging!) evidence across domains would
significantly move the field(s) forward

Unlock The Hidden Power Within You
100% (2)
Unlock The Hidden Power Within You
28 pages
Bert Explained
No ratings yet
Bert Explained
8 pages
Reasoning With Transformer Bas
No ratings yet
Reasoning With Transformer Bas
28 pages
Transformer Part3 16 Mar 23 PDF
No ratings yet
Transformer Part3 16 Mar 23 PDF
59 pages
A E A T - B L M: E O M: Nalysis of The Volution of Dvanced Ransformer Ased Anguage Odels Xperiments On Pinion Ining
No ratings yet
A E A T - B L M: E O M: Nalysis of The Volution of Dvanced Ransformer Ased Anguage Odels Xperiments On Pinion Ining
16 pages
A Primer in BERTology
No ratings yet
A Primer in BERTology
15 pages
Pretraining Part1 16 Mar 23 PDF
No ratings yet
Pretraining Part1 16 Mar 23 PDF
32 pages
Lec14 Pretraining
No ratings yet
Lec14 Pretraining
42 pages
Transformers MUIA
No ratings yet
Transformers MUIA
34 pages
NLP Week9 Fine Tuning - and - IR
No ratings yet
NLP Week9 Fine Tuning - and - IR
64 pages
Comparison of BERT and XLNet Accuracy With Classical Methods and Algorithms in Text Classification
No ratings yet
Comparison of BERT and XLNet Accuracy With Classical Methods and Algorithms in Text Classification
3 pages
Joshua K. Cage - Python Transformers by Huggingface Hands On - 101 Practical Implementation Hands-On of ALBERT - ViT - BigBird and Other Latest Models With Huggingface Transformers
No ratings yet
Joshua K. Cage - Python Transformers by Huggingface Hands On - 101 Practical Implementation Hands-On of ALBERT - ViT - BigBird and Other Latest Models With Huggingface Transformers
186 pages
Agarwal, Resume Shortlisting and Ranking With Transformers
No ratings yet
Agarwal, Resume Shortlisting and Ranking With Transformers
12 pages
Text Classificatio N: - by TV Harshawardhan (COE17B 005)
No ratings yet
Text Classificatio N: - by TV Harshawardhan (COE17B 005)
19 pages
Introduction To Transformers
No ratings yet
Introduction To Transformers
187 pages
Transformers Tutorial 1 56
No ratings yet
Transformers Tutorial 1 56
56 pages
Probing
No ratings yet
Probing
12 pages
Analysis of The Evolution of Advanced Transformer-Based Language Models: Experiments On Opinion Mining
No ratings yet
Analysis of The Evolution of Advanced Transformer-Based Language Models: Experiments On Opinion Mining
16 pages
6-Bert T5 GPT
No ratings yet
6-Bert T5 GPT
31 pages
495 Lecture 11 BERT
No ratings yet
495 Lecture 11 BERT
31 pages
Week 3: Deeplearning - Ai
No ratings yet
Week 3: Deeplearning - Ai
98 pages
Text Classification Research Based On Bert Model and Bayesian Network
No ratings yet
Text Classification Research Based On Bert Model and Bayesian Network
5 pages
What Happens To BERT Embeddings During Fine-Tuning
No ratings yet
What Happens To BERT Embeddings During Fine-Tuning
13 pages
Target-Dependent Sentiment Classification With BERT: Zhengjie Gao, Ao Feng, Xinyu Song, and Xi Wu
No ratings yet
Target-Dependent Sentiment Classification With BERT: Zhengjie Gao, Ao Feng, Xinyu Song, and Xi Wu
19 pages
AMMUS: A Survey of Transformer-Based Pretrained Models in Natural Language Processing
No ratings yet
AMMUS: A Survey of Transformer-Based Pretrained Models in Natural Language Processing
42 pages
Modernbert or Debertav3
No ratings yet
Modernbert or Debertav3
11 pages
Rebertsubmission116 NW
No ratings yet
Rebertsubmission116 NW
26 pages
The Illustrated BERT, ELMo, and Co. (How NLP Cracked Transfer Learning) - Jay Alammar - Visualizing Machine Learning One Concept at A Time
No ratings yet
The Illustrated BERT, ELMo, and Co. (How NLP Cracked Transfer Learning) - Jay Alammar - Visualizing Machine Learning One Concept at A Time
19 pages
Week 6a
No ratings yet
Week 6a
33 pages
A Primer in BERTology - What We Know About How BERT Works
No ratings yet
A Primer in BERTology - What We Know About How BERT Works
23 pages
03 ML Essentials
No ratings yet
03 ML Essentials
52 pages
How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations
No ratings yet
How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations
10 pages
ADL AyushKumarShukla
No ratings yet
ADL AyushKumarShukla
13 pages
Bert Ayman
No ratings yet
Bert Ayman
5 pages
Strategies For Enhancing The Performance of News Article Classification in Bangla Handling Imbalance and Interpretation
No ratings yet
Strategies For Enhancing The Performance of News Article Classification in Bangla Handling Imbalance and Interpretation
21 pages
A Comparison of LSTM and BERT For Small Corpus: Aysu Ezen-Can SAS Inst. September 14, 2020
No ratings yet
A Comparison of LSTM and BERT For Small Corpus: Aysu Ezen-Can SAS Inst. September 14, 2020
12 pages
BERT
No ratings yet
BERT
98 pages
11 Bert
No ratings yet
11 Bert
66 pages
Data Mining Report
No ratings yet
Data Mining Report
17 pages
All About Encoder-Decoder Models
No ratings yet
All About Encoder-Decoder Models
50 pages
Over Fitting and TBL
No ratings yet
Over Fitting and TBL
46 pages
Information 14 00242
No ratings yet
Information 14 00242
17 pages
Applsci 12 05720 v2
No ratings yet
Applsci 12 05720 v2
20 pages
1102AITA04 AI For Text Analytics
No ratings yet
1102AITA04 AI For Text Analytics
88 pages
BLEURT: Learning Robust Metrics For Text Generation
No ratings yet
BLEURT: Learning Robust Metrics For Text Generation
12 pages
Deep Encodings vs. Linguistic Features
No ratings yet
Deep Encodings vs. Linguistic Features
17 pages
Transformers in NLP 1
No ratings yet
Transformers in NLP 1
9 pages
Talking Points
No ratings yet
Talking Points
8 pages
Rethinking of BERT Sentence Embedding For Text Classification
No ratings yet
Rethinking of BERT Sentence Embedding For Text Classification
14 pages
Model
No ratings yet
Model
5 pages
A Large Annotated Corpus For Learning Natural Language Inference
No ratings yet
A Large Annotated Corpus For Learning Natural Language Inference
11 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
99 pages
Interpreting Language Models Through Knowledge Graph Extraction
No ratings yet
Interpreting Language Models Through Knowledge Graph Extraction
13 pages
Lecture 12 Pretraining
No ratings yet
Lecture 12 Pretraining
46 pages
A Hands-On Guide To Text Classification With Transformer Models (XLNet, BERT, XLM, RoBERTa)
No ratings yet
A Hands-On Guide To Text Classification With Transformer Models (XLNet, BERT, XLM, RoBERTa)
9 pages
BERT Explained - State of The Art Language Model For NLP - by Rani Horev - Towards Data Science
100% (1)
BERT Explained - State of The Art Language Model For NLP - by Rani Horev - Towards Data Science
8 pages
Thesis
No ratings yet
Thesis
16 pages
Chapter 12
No ratings yet
Chapter 12
16 pages
Lecture 02
No ratings yet
Lecture 02
31 pages
2019 Wiedemannetal Konvens Bert 2
No ratings yet
2019 Wiedemannetal Konvens Bert 2
2 pages
Jargon Unchained
From Everand
Jargon Unchained
Len Guff
2.5/5 (2)
Iatss40 Theory 08
No ratings yet
Iatss40 Theory 08
9 pages
Albertazzi - Shapes of Forms
No ratings yet
Albertazzi - Shapes of Forms
12 pages
Bandura's Social Learning Theory & Social Cognitive Learning Theory
No ratings yet
Bandura's Social Learning Theory & Social Cognitive Learning Theory
23 pages
The Importance of Classroom Interaction in The Teaching of Reading in Junior High School Naimah Susani Hanum
No ratings yet
The Importance of Classroom Interaction in The Teaching of Reading in Junior High School Naimah Susani Hanum
9 pages
Auditory Verbal Therapy
No ratings yet
Auditory Verbal Therapy
3 pages
Selective Retention Theory of Communication
No ratings yet
Selective Retention Theory of Communication
3 pages
EDA 5 Activity 5-Job-Interviews Cuarto 2023
No ratings yet
EDA 5 Activity 5-Job-Interviews Cuarto 2023
2 pages
Articulo 1
No ratings yet
Articulo 1
20 pages
JOEL BENNY Resume Word D
No ratings yet
JOEL BENNY Resume Word D
3 pages
Understanding How The Brain Works
No ratings yet
Understanding How The Brain Works
5 pages
3rd Quarter DLP Esp 8 Week 1
No ratings yet
3rd Quarter DLP Esp 8 Week 1
6 pages
Chapter 1 Introduction Updated
No ratings yet
Chapter 1 Introduction Updated
18 pages
The Study of Man
No ratings yet
The Study of Man
3 pages
Tos Only 10
No ratings yet
Tos Only 10
3 pages
INTERDICIPLINARY
No ratings yet
INTERDICIPLINARY
3 pages
How Are Students' Conceptual Understanding For Solving
No ratings yet
How Are Students' Conceptual Understanding For Solving
9 pages
ABREGANA - Lesson Plan - PLACE VALUE SYSTEM
No ratings yet
ABREGANA - Lesson Plan - PLACE VALUE SYSTEM
5 pages
Linguistic Mode
No ratings yet
Linguistic Mode
13 pages
Inductive Approaches To Teaching Grammar 2
No ratings yet
Inductive Approaches To Teaching Grammar 2
4 pages
Eatv-S 4705 Unit 1 Lec 2
No ratings yet
Eatv-S 4705 Unit 1 Lec 2
24 pages
Minnaminni 1
No ratings yet
Minnaminni 1
2 pages
Paper 4 TP and Aging
No ratings yet
Paper 4 TP and Aging
19 pages
8 Sequential Lessons From Think and Grow Rich (Napoleon Hill)
No ratings yet
8 Sequential Lessons From Think and Grow Rich (Napoleon Hill)
1 page
Section A & B - Abstract & Seminar Report
No ratings yet
Section A & B - Abstract & Seminar Report
31 pages
Activity Plan (Air Transportation)
No ratings yet
Activity Plan (Air Transportation)
5 pages
Reviewer Ptlal
No ratings yet
Reviewer Ptlal
24 pages
Concept of Artificial Intelligence
No ratings yet
Concept of Artificial Intelligence
3 pages
Activity 4
100% (1)
Activity 4
3 pages
Learning Theories Notes 2025 2nd Sem
No ratings yet
Learning Theories Notes 2025 2nd Sem
8 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Probing Knowledge and Structure in Transformers

Uploaded by

Probing Knowledge and Structure in Transformers

Uploaded by

Probing Knowledge and

Ellie Pavlick, Brown University, February 2023

• Methods for answering “What does my Transformer know? And

• My expertise in in language, so my examples will be mostly drawn

• When trying to determine what your Transformer is doing, analyzing

the car is not blue </s>

<s> the car is not blue

<s> the car is not blue

<s> the car is not blue

<s> the car is not blue

<s> the car is not blue

• Training probe requires auxiliary data that captures the phenomena

Noun Verb Noun

Noun Verb Noun

Entity Not Ent

Noun Verb Noun

Entity Not Ent

Noun Verb Noun

Entity Not Ent

Probing Classifiers & Interventions

Probing Classifiers & Interventions

• Probing classifiers show you that a feature is represented

• Used to detect if a feature is “represented” and where

• Work by Chris Olah and others at

A Mathematical Framework for Transformer Circuits. Elhage et al. (Distil 2021)

• Promising new of work looks at low-level algorithms in play within the

• Work on analyzing neural networks (Transformers specifically) has become

• Still a lot of work needed to better understand mechanisms, and

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.