0% found this document useful (0 votes)
67 views3 pages

Hypothetical Document Embeddings (HyDE) - Jupyter Notebook

Uploaded by

abhisarangan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views3 pages

Hypothetical Document Embeddings (HyDE) - Jupyter Notebook

Uploaded by

abhisarangan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

6/14/24, 4:20 PM Hypothetical Document Embeddings (HyDE) - Jupyter Notebook

Advanced RAG - Hypothetical Document


Embeddings (HyDE)
This document explores HyDE, a novel information retrieval technique utilizing large
language models (LLMs) for enhanced search accuracy.

HyDE uses a Language Learning Model, like ChatGPT, to create a hypothetical document
when responding to a query, as opposed to using the query and its computed vector to
directly seek in the vector database.

It goes a step further by using an unsupervised encoder learned through contrastive


methods. This encoder changes the hypothetical document into an embedding vector to
locate similar documents in a vector database.

Contrastive learning allows models to extract meaningful representations from unlabeled


data. By leveraging similarity and dissimilarity, contrastive learning enables models to map
similar instances close together in a latent space while pushing apart those that are
dissimilar.

It is inspired by the paper Precise Zero-Shot Dense Retrieval without Relevance Labels
(https://arxiv.org/pdf/2212.10496)

The Process
1. Query Formulation: Begin with your question. For instance, "What triggered the French
Revolution?"

2. LLM Generates Hypothetical Doc: HyDE leverages an LLM, like GPT-3, to craft a
hypothetical document based on your query. While factually inaccurate, this document
captures the essence and phrasing relevant to your question.

3. Embedding Creation: The system encodes this hypothetical document into a numerical
representation, known as an embedding. Think of this embedding as a unique fingerprint for
the hypothetical document.

4. Similar Document Retrieval: HyDE searches through a database of existing documents,


comparing their embeddings to the embedding of the hypothetical document. Documents
with the most similar embeddings are likely to hold the answers you seek.

Here we are doing answer to answer embedding similarity search as compared to query to
answer embedding similar search in traditional RAG retrieval approach.

In essence, HyDE utilizes the LLM to construct a "search template" based on your question.
It then retrieves real documents that align with this template.

localhost:8888/notebooks/OneDrive/Hypothetical Document Embeddings (HyDE).ipynb# 1/3


6/14/24, 4:20 PM Hypothetical Document Embeddings (HyDE) - Jupyter Notebook

Benefits of HyDE
Improved Retrieval Accuracy: HyDE can outperform traditional search methods,
particularly for intricate or nuanced questions.

Zero-Shot Learning: No pre-labeled data is required for operation, making it adaptable to


new domains.

Limitations to Consider
Factual Inconsistencies: The hypothetical documents are not factual, so retrieved
documents might contain irrelevant information.

The drawback to this approach is that it may not consistently produce good results. For
instance, if the subject being discussed is entirely unfamiliar to the language model, this
method is not effective and could lead to increased instances of generating incorrect
information.

localhost:8888/notebooks/OneDrive/Hypothetical Document Embeddings (HyDE).ipynb# 2/3


6/14/24, 4:20 PM Hypothetical Document Embeddings (HyDE) - Jupyter Notebook

Early Stage Technology: HyDE is a relatively new approach, and further research is needed
to refine its effectiveness.

Conclusion
HyDE, or Hypothetical Document Expansion, leverages Language Learning Models (LLMs)
like ChatGPT to generate theoretical documents that enhance search accuracy. It employs
an unsupervised encoder to convert theoretical documents into vectors for retrieval. This
method excels in tasks like web search, QA, and fact verification, displaying robust
performance comparable to well tuned retrievers However it’s not infallible; if the subject is
In [ ]:  ​

localhost:8888/notebooks/OneDrive/Hypothetical Document Embeddings (HyDE).ipynb# 3/3

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy