0% found this document useful (0 votes)
26 views12 pages

29 - Khattab - CS224U IR Part 4

Uploaded by

Atharva Tambat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views12 pages

29 - Khattab - CS224U IR Part 4

Uploaded by

Atharva Tambat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

NLU & IR:

NEURAL IR (II)
Omar Khattab

CS224U: Natural Language Understanding

Spring 2021

1
Neural Ranking: Functional View
▪ All we need is a score for every query–document pair
– We’ll sort the results by decreasing score

What compounds in the stomach protect against


Q ingested pathogens?

Immune System | Wikipedia


Neural 0.93
D1 Chemical barriers also protect against infection. The skin and
respiratory tract secrete antimicrobial peptides such as the β-
defensins. […] In the stomach, gastric acid serves as a chemical
defense against ingested pathogens.
Ranker

What compounds in the stomach protect against


Q ingested pathogens?

Why isn't this a syntax error in python? | Stack Overflow


Neural 0.01

D99
Noticed a line in our codebase today which I thought surely would have
failed the build with syntax error. […] Whitespace is sometimes not
required in the conditional expression `1if True else 0`
Ranker
https://stackoverflow.com/questions/23998026

2
Query–Document Interaction Models
1. Tokenize the query and the document
2. Embed all the tokens of each
s
3. Build a query–document interaction matrix MLP
AvgPool
– Most commonly: store the cos similarity of each pair of words Convolution

Document
4. Reduce this dense matrix to a score
– Learn neural layers (e.g., convolution, linear layers)

Models in this category include


KNRM, Conv-KNRM, and Duet.
Query
Chenyan Xiong, et al. End-to-end neural ad-hoc ranking with kernel pooling. SIGIR’17
Zhuyun Dai, et al. Convolutional neural networks for soft-matching n-grams in ad-hoc search. WSDM’18
Bhaskar Mitra, et al. Learning to match using local and distributed representations of text for web search. WWW’17 3
Query–Document Interaction Models: MS MARCO Results

▪ Considerable gains in quality—at a reasonable increase in computational cost!

These models re-rank the top-1000


passages retrieved by BM25.

Bhaskar Mitra and Nick Craswell. An Updated Duet Model for Passage Re-ranking. arXiv:1903.07666 (2019)
Sebastian Hofstätter, et al. On the effect of low-frequency terms on neural-IR models. SIGIR’19 4
All-to-all Interaction with BERT
1. Feed BERT “[CLS] Query [SEP] Document [SEP]”
2. Run this through all the BERT layers s

3. Extract the final [CLS] output embedding


– Reduce to a single score through a linear layer

This is essentially a standard BERT


classifier, used for ranking passages.

Of course, we must fine-tune BERT for


this task with positives and negatives to
be effective.
Query Document
Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT. arXiv:1901.04085 (2019)
Zhuyun Dai and Jamie Callan. 2019. Deeper Text Understanding for IR with Contextual Neural Language Modeling. SIGIR’19 5
BERT Rankers: SOTA 2019 (in quality)

MS MARCO Ranking screenshot as of Jan 2019. From Rodrigo Nogueira’s Brief History of DL applied to IR (UoG talk).
https://blog.google/products/search/search-language-understanding-bert/
6
https://azure.microsoft.com/en-us/blog/bing-delivers-its-largest-improvementin-search-experience-using-azure-gpus/
BERT Rankers: Efficiency–Effectiveness Tradeoff

▪ Dramatic gains in quality—but also a dramatic increase in computational cost!

(Nogueira & Cho, 2019)

Can we achieve high MRR


and low latency?

Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT. arXiv:1901.04085 (2019) 7
Toward Faster Ranking: Pre-computation
▪ BERT rankers are slow because their computations be redundant:
– Represent the query (1000 times for 1000 documents)
– Represent the document (once for every query!)
– Conduct matching between the query and the document

Is there a unique value in jointly representing


▪ We have the documents in advance. queries and documents?

– Can we pre-compute the document representations?


– And “cache” these representations for use across queries

8
Neural IR Paradigms: Learning term weights
▪ BM25 decomposed a document’s score into a summation over
term–document weights. Can we learn term weights with BERT?

▪ Tokenize the query/document Save term weights to


the inverted index
Compute sum of
▪ Use BERT to produce a score scores for the t91 t2 t1 t32
matching terms!
for each token in the document
Lookup term
▪ Add the scores of the tokens weights from
inverted index
that also appear in the query
t 1 t2 t3 t91 t2 t1 … t32

Query Document
Dai, Zhuyun, and Jamie Callan. "Context-aware term weighting for first stage passage retrieval.” SIGIR’20
Nogueira, Rodrigo and Jimmy Lin. "From doc2query to docTTTTTquery." Online preprint (2019).
9
Mallia, Antonio, et al. "Learning Passage Impacts for Inverted Indexes.“ SIGIR’21.
Learning term weights
▪ We get to learn the term weights with BERT and to re-use them!
▪ But our query is back to being a “bag of words”.

DeepCT and doc2query are two


major models under this paradigm.

Can we do better?

10
Next: Can we achieve high MRR and low latency?

▪ Yes! We’ll discuss two rich neural IR paradigms:

– Representation Similarity

– Late Interaction

11
References
Omar Khattab and Matei Zaharia. “ColBERT: Efficient and effective passage search via contextualized late interaction over BERT.“ SIGIR’20
Chenyan Xiong, et al. End-to-end neural ad-hoc ranking with kernel pooling. SIGIR’17
Zhuyun Dai, et al. Convolutional neural networks for soft-matching n-grams in ad-hoc search. WSDM’18
Bhaskar Mitra, et al. Learning to match using local and distributed representations of text for web search. WWW’17
Bhaskar Mitra and Nick Craswell. An Updated Duet Model for Passage Re-ranking. arXiv:1903.07666 (2019)
Sebastian Hofstätter, et al. On the effect of low-frequency terms on neural-IR models. SIGIR’19
Zhuyun Dai and Jamie Callan. 2019. Deeper Text Understanding for IR with Contextual Neural Language Modeling. SIGIR’19
Rodrigo Nogueira. “A Brief History of Deep Learning applied to Information Retrieval” (UoG talk). Retrieved from
https://docs.google.com/presentation/d/1_mlvmyev0pjdG0OcfbEWManRREC0jCdjD3b1tPPvcbk
Zhuyun Dai, and Jamie Callan. "Context-aware term weighting for first stage passage retrieval.” SIGIR’20
Rodrigo Nogueira and Jimmy Lin. "From doc2query to docTTTTTquery." Online preprint (2019).
Antonio Mallia, et al. "Learning Passage Impacts for Inverted Indexes.“ SIGIR’21.

12

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy