29 - Khattab - CS224U IR Part 4
29 - Khattab - CS224U IR Part 4
NEURAL IR (II)
Omar Khattab
Spring 2021
1
Neural Ranking: Functional View
▪ All we need is a score for every query–document pair
– We’ll sort the results by decreasing score
D99
Noticed a line in our codebase today which I thought surely would have
failed the build with syntax error. […] Whitespace is sometimes not
required in the conditional expression `1if True else 0`
Ranker
https://stackoverflow.com/questions/23998026
2
Query–Document Interaction Models
1. Tokenize the query and the document
2. Embed all the tokens of each
s
3. Build a query–document interaction matrix MLP
AvgPool
– Most commonly: store the cos similarity of each pair of words Convolution
Document
4. Reduce this dense matrix to a score
– Learn neural layers (e.g., convolution, linear layers)
Bhaskar Mitra and Nick Craswell. An Updated Duet Model for Passage Re-ranking. arXiv:1903.07666 (2019)
Sebastian Hofstätter, et al. On the effect of low-frequency terms on neural-IR models. SIGIR’19 4
All-to-all Interaction with BERT
1. Feed BERT “[CLS] Query [SEP] Document [SEP]”
2. Run this through all the BERT layers s
MS MARCO Ranking screenshot as of Jan 2019. From Rodrigo Nogueira’s Brief History of DL applied to IR (UoG talk).
https://blog.google/products/search/search-language-understanding-bert/
6
https://azure.microsoft.com/en-us/blog/bing-delivers-its-largest-improvementin-search-experience-using-azure-gpus/
BERT Rankers: Efficiency–Effectiveness Tradeoff
Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT. arXiv:1901.04085 (2019) 7
Toward Faster Ranking: Pre-computation
▪ BERT rankers are slow because their computations be redundant:
– Represent the query (1000 times for 1000 documents)
– Represent the document (once for every query!)
– Conduct matching between the query and the document
8
Neural IR Paradigms: Learning term weights
▪ BM25 decomposed a document’s score into a summation over
term–document weights. Can we learn term weights with BERT?
Query Document
Dai, Zhuyun, and Jamie Callan. "Context-aware term weighting for first stage passage retrieval.” SIGIR’20
Nogueira, Rodrigo and Jimmy Lin. "From doc2query to docTTTTTquery." Online preprint (2019).
9
Mallia, Antonio, et al. "Learning Passage Impacts for Inverted Indexes.“ SIGIR’21.
Learning term weights
▪ We get to learn the term weights with BERT and to re-use them!
▪ But our query is back to being a “bag of words”.
Can we do better?
10
Next: Can we achieve high MRR and low latency?
– Representation Similarity
– Late Interaction
11
References
Omar Khattab and Matei Zaharia. “ColBERT: Efficient and effective passage search via contextualized late interaction over BERT.“ SIGIR’20
Chenyan Xiong, et al. End-to-end neural ad-hoc ranking with kernel pooling. SIGIR’17
Zhuyun Dai, et al. Convolutional neural networks for soft-matching n-grams in ad-hoc search. WSDM’18
Bhaskar Mitra, et al. Learning to match using local and distributed representations of text for web search. WWW’17
Bhaskar Mitra and Nick Craswell. An Updated Duet Model for Passage Re-ranking. arXiv:1903.07666 (2019)
Sebastian Hofstätter, et al. On the effect of low-frequency terms on neural-IR models. SIGIR’19
Zhuyun Dai and Jamie Callan. 2019. Deeper Text Understanding for IR with Contextual Neural Language Modeling. SIGIR’19
Rodrigo Nogueira. “A Brief History of Deep Learning applied to Information Retrieval” (UoG talk). Retrieved from
https://docs.google.com/presentation/d/1_mlvmyev0pjdG0OcfbEWManRREC0jCdjD3b1tPPvcbk
Zhuyun Dai, and Jamie Callan. "Context-aware term weighting for first stage passage retrieval.” SIGIR’20
Rodrigo Nogueira and Jimmy Lin. "From doc2query to docTTTTTquery." Online preprint (2019).
Antonio Mallia, et al. "Learning Passage Impacts for Inverted Indexes.“ SIGIR’21.
12