NLP Week9 Fine Tuning - and - IR
NLP Week9 Fine Tuning - and - IR
6. Group exercises
fi
fi
Problem with static embeddings (word2vec)
They are static! The embedding for a word doesn't reflect how
its meaning changes in context.
The chicken didn't cross the road because it was too tired
The chicken didn't cross the road because it was too wide
Devlin et al. 2018 - BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Sentence classification
Devlin et al. 2018 - BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Sequence-Pair classification
Devlin et al. 2018 - BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Using BERT for sequence labeling
• Do models truly learn the task Syntactic Heuristics in Natural Language Inference
Mosbach et al. (2021) - On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines
Practical considerations: Fine-tuning (in)stability
Mosbach et al. (2021) - On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines
Practical considerations: Parameter-efficient fine-tuning
• There are three main types of embeddings used in NLP and they are
created using three different methods:
(
1 + log10 count(t, d) if count(t, d) > 0
tft, d = (6.1
0 otherwise
RecallInformation retrieval
number of documents
from Week 5
in many collections, this measure
with aInverse log function. The resulting
document definition
frequency (idf) for inverse
• TODO: re-use slides from Chapter 6 here
) is thus
✓ ◆
N
idft = log10 (6.13)
dft
q⋅d
score(q, d) = cos(q, d) =
|q||d|
tf
q⋅d
tf-idf scoring score(q, d) = cos(q, d) =
|q||d|
score(z) = so max(Uz)
ft
Information retrieval with pre-trained LMs
• This approach is expensive z = BERT(q; [SEP]; d)[CLS]
• Need to re-encode every
document for each query score(z) = so max(Uz)
• This quickly becomes
impractical when dealing with
large document collections
ft
Information retrieval with pre-trained LMs
• Bi-encoder: encode queries and documents independently
zq = BERTquery(q)[CLS]
zd = BERTdoc(d)[CLS]
score(q, d) = zq ⋅ zd
Information retrieval with pre-trained LMs
zq = BERTquery(q)[CLS]
• Documents are encoded in
advance zd = BERTdoc(d)[CLS]
• When a new query comes
score(q, d) = zq ⋅ zd
in, we only have to encode
that query
• Much more efficient but
tends to give worse results
• Question: How to combine
the two approaches?
Information retrieval with pre-trained LMs
zq = BERTquery(q)[CLS]
• Documents are encoded in
advance zd = BERTdoc(d)[CLS]
• When a new query comes
score(q, d) = zq ⋅ zd
in, we only have to encode
that query
• Much more efficient but
tends to give worse results
• Question: How to combine
the two approaches?
• Re-ranking
Information retrieval with pre-trained LMs
• Question Answering (QA) as a retrieval problem
• Find supporting documents for a query
• Extract or generate an answer based on the top-k docs
Information retrieval with pre-trained LMs
• Consider the document on the
right
• There exist many queries for which
this document might be relevant
• When was HEC founded?
• Can I do a PhD at HEC?
• What’s Viger Square?
• Etc.
• Ideally, all of this needs to be
encoded in the document
representation
How to improve dense retrieval?
+ − n
• Use a collection {(qi i , di )}i=1 of queries paired with
, d
positive and negative documents
• Learn an implicit relevance definition based on this collection
via contrastive learning
• Intuition: move positive documents close to query, push
negative documents away from query
exp(q ⋅ d+i )
ℒ(qi, di+, {di,j
− k
}j=1) = k
+
exp(qi ⋅ di ) + ∑j=1 exp(qi ⋅ d−i,j)
Karpukhin et al. (2020) - Dense Passage Retrieval for Open-Domain Question Answering
Dense Passage Retrieval for Open-Domain QA
Karpukhin et al. (2020) - Dense Passage Retrieval for Open-Domain Question Answering
Dense Passage Retrieval for Open-Domain QA
Thakur et al. (2021) - BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models
Practical considerations
• How to efficiently search over dense representations?
• Approximate nearest neighbour search
• Fast embedding search (FAISS) on GPUs
[15 minute break]