SpaDE: Improving Sparse Representations using a Dual Document Encoder for First-stage Retrieval

Choi, Eunseong; Lee, Sunkyung; Choi, Minjin; Ko, Hyeseon; Song, Young-In; Lee, Jongwuk

doi:10.1145/3511808.3557456

Computer Science > Information Retrieval

arXiv:2209.05917 (cs)

[Submitted on 13 Sep 2022 (v1), last revised 5 Oct 2023 (this version, v3)]

Title:SpaDE: Improving Sparse Representations using a Dual Document Encoder for First-stage Retrieval

Authors:Eunseong Choi, Sunkyung Lee, Minjin Choi, Hyeseon Ko, Young-In Song, Jongwuk Lee

View PDF

Abstract:Sparse document representations have been widely used to retrieve relevant documents via exact lexical matching. Owing to the pre-computed inverted index, it supports fast ad-hoc search but incurs the vocabulary mismatch problem. Although recent neural ranking models using pre-trained language models can address this problem, they usually require expensive query inference costs, implying the trade-off between effectiveness and efficiency. Tackling the trade-off, we propose a novel uni-encoder ranking model, Sparse retriever using a Dual document Encoder (SpaDE), learning document representation via the dual encoder. Each encoder plays a central role in (i) adjusting the importance of terms to improve lexical matching and (ii) expanding additional terms to support semantic matching. Furthermore, our co-training strategy trains the dual encoder effectively and avoids unnecessary intervention in training each other. Experimental results on several benchmarks show that SpaDE outperforms existing uni-encoder ranking models.

Comments:	In Proceedings of the 31st ACM International Conference on Information and Knowledge Management (CIKM '22). 13 pages
Subjects:	Information Retrieval (cs.IR)
Cite as:	arXiv:2209.05917 [cs.IR]
	(or arXiv:2209.05917v3 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2209.05917
Related DOI:	https://doi.org/10.1145/3511808.3557456

Submission history

From: Sunkyung Lee [view email]
[v1] Tue, 13 Sep 2022 12:06:01 UTC (4,430 KB)
[v2] Thu, 13 Apr 2023 05:57:34 UTC (4,430 KB)
[v3] Thu, 5 Oct 2023 02:33:49 UTC (1,600 KB)

Computer Science > Information Retrieval

Title:SpaDE: Improving Sparse Representations using a Dual Document Encoder for First-stage Retrieval

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Computer Science > Information Retrieval

Title:SpaDE: Improving Sparse Representations using a Dual Document Encoder for First-stage Retrieval

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.