0% found this document useful (0 votes)

67 views3 pages

Hypothetical Document Embeddings (HyDE) - Jupyter Notebook

Uploaded by

abhisarangan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views3 pages

Hypothetical Document Embeddings (HyDE) - Jupyter Notebook

Uploaded by

abhisarangan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

6/14/24, 4:20 PM Hypothetical Document Embeddings (HyDE) - Jupyter Notebook

Advanced RAG - Hypothetical Document

Embeddings (HyDE)
This document explores HyDE, a novel information retrieval technique utilizing large
language models (LLMs) for enhanced search accuracy.

HyDE uses a Language Learning Model, like ChatGPT, to create a hypothetical document
when responding to a query, as opposed to using the query and its computed vector to
directly seek in the vector database.

It goes a step further by using an unsupervised encoder learned through contrastive

methods. This encoder changes the hypothetical document into an embedding vector to
locate similar documents in a vector database.

Contrastive learning allows models to extract meaningful representations from unlabeled

data. By leveraging similarity and dissimilarity, contrastive learning enables models to map
similar instances close together in a latent space while pushing apart those that are
dissimilar.

It is inspired by the paper Precise Zero-Shot Dense Retrieval without Relevance Labels
(https://arxiv.org/pdf/2212.10496)

The Process
1. Query Formulation: Begin with your question. For instance, "What triggered the French
Revolution?"

2. LLM Generates Hypothetical Doc: HyDE leverages an LLM, like GPT-3, to craft a
hypothetical document based on your query. While factually inaccurate, this document
captures the essence and phrasing relevant to your question.

3. Embedding Creation: The system encodes this hypothetical document into a numerical
representation, known as an embedding. Think of this embedding as a unique fingerprint for
the hypothetical document.

4. Similar Document Retrieval: HyDE searches through a database of existing documents,

comparing their embeddings to the embedding of the hypothetical document. Documents
with the most similar embeddings are likely to hold the answers you seek.

Here we are doing answer to answer embedding similarity search as compared to query to
answer embedding similar search in traditional RAG retrieval approach.

In essence, HyDE utilizes the LLM to construct a "search template" based on your question.
It then retrieves real documents that align with this template.

localhost:8888/notebooks/OneDrive/Hypothetical Document Embeddings (HyDE).ipynb# 1/3

6/14/24, 4:20 PM Hypothetical Document Embeddings (HyDE) - Jupyter Notebook

Benefits of HyDE
Improved Retrieval Accuracy: HyDE can outperform traditional search methods,
particularly for intricate or nuanced questions.

Zero-Shot Learning: No pre-labeled data is required for operation, making it adaptable to

new domains.

Limitations to Consider
Factual Inconsistencies: The hypothetical documents are not factual, so retrieved
documents might contain irrelevant information.

The drawback to this approach is that it may not consistently produce good results. For
instance, if the subject being discussed is entirely unfamiliar to the language model, this
method is not effective and could lead to increased instances of generating incorrect
information.

localhost:8888/notebooks/OneDrive/Hypothetical Document Embeddings (HyDE).ipynb# 2/3

6/14/24, 4:20 PM Hypothetical Document Embeddings (HyDE) - Jupyter Notebook

Early Stage Technology: HyDE is a relatively new approach, and further research is needed
to refine its effectiveness.

Conclusion
HyDE, or Hypothetical Document Expansion, leverages Language Learning Models (LLMs)
like ChatGPT to generate theoretical documents that enhance search accuracy. It employs
an unsupervised encoder to convert theoretical documents into vectors for retrieval. This
method excels in tasks like web search, QA, and fact verification, displaying robust
performance comparable to well tuned retrievers However it’s not infallible; if the subject is
In [ ]: 

localhost:8888/notebooks/OneDrive/Hypothetical Document Embeddings (HyDE).ipynb# 3/3

Body Learning Michael Gelb
93% (14)
Body Learning Michael Gelb
193 pages
Personal Effectiveness Scale
100% (4)
Personal Effectiveness Scale
4 pages
Siop Lesson Plan Sample
100% (12)
Siop Lesson Plan Sample
4 pages
Hypothetical Document Embeddings (HyDE)
No ratings yet
Hypothetical Document Embeddings (HyDE)
3 pages
Rag Hyde Explanation
No ratings yet
Rag Hyde Explanation
2 pages
Advanced RAG 06 - Exploring Query Rewriting - by Florian June - in MLearning - Ai - Freedium
No ratings yet
Advanced RAG 06 - Exploring Query Rewriting - by Florian June - in MLearning - Ai - Freedium
11 pages
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
From Everand
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
Fouad Sabry
No ratings yet
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet
Hands-On Machine Learning Recommender Systems with Apache Spark
From Everand
Hands-On Machine Learning Recommender Systems with Apache Spark
Ernesto Lee
No ratings yet
Controllable Fake Document Infilling For Cyber Deception
No ratings yet
Controllable Fake Document Infilling For Cyber Deception
15 pages
Efficient Scientific Programming with Spyder: Definitive Reference for Developers and Engineers
From Everand
Efficient Scientific Programming with Spyder: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Concept Mining: Fundamentals and Applications
From Everand
Concept Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Image Retrieval: Unlocking the Power of Visual Data
From Everand
Image Retrieval: Unlocking the Power of Visual Data
Fouad Sabry
No ratings yet
Deep Learning for Beginners: A Comprehensive Introduction of Deep Learning Fundamentals for Beginners to Understanding Frameworks, Neural Networks, Large Datasets, and Creative Applications with Ease
From Everand
Deep Learning for Beginners: A Comprehensive Introduction of Deep Learning Fundamentals for Beginners to Understanding Frameworks, Neural Networks, Large Datasets, and Creative Applications with Ease
Steven Cooper
5/5 (1)
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
From Everand
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
FLOYD BAX
No ratings yet
Pandas in 7 Days: Utilize Python to Manipulate Data, Conduct Scientific Computing, Time Series Analysis, and Exploratory Data Analysis
From Everand
Pandas in 7 Days: Utilize Python to Manipulate Data, Conduct Scientific Computing, Time Series Analysis, and Exploratory Data Analysis
Fabio Nelli
No ratings yet
Data-Driven Security: Analysis, Visualization and Dashboards
From Everand
Data-Driven Security: Analysis, Visualization and Dashboards
Jay Jacobs
No ratings yet
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
From Everand
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
Timothy King
No ratings yet
Gensim for Natural Language Processing: Definitive Reference for Developers and Engineers
From Everand
Gensim for Natural Language Processing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
A Simple Guide To Retrieval Augmented Generation 1720484135
No ratings yet
A Simple Guide To Retrieval Augmented Generation 1720484135
9 pages
Yan Transcending Forgery Specificity With Latent Space Augmentation For Generalizable Deepfake CVPR 2024 Paper
No ratings yet
Yan Transcending Forgery Specificity With Latent Space Augmentation For Generalizable Deepfake CVPR 2024 Paper
11 pages
Practical Data Science with Jupyter: Explore Data Cleaning, Pre-processing, Data Wrangling, Feature Engineering and Machine Learning using Python and Jupyter (English Edition)
From Everand
Practical Data Science with Jupyter: Explore Data Cleaning, Pre-processing, Data Wrangling, Feature Engineering and Machine Learning using Python and Jupyter (English Edition)
Prateek Gupta
No ratings yet
RAG Technics
100% (1)
RAG Technics
8 pages
Few-Shot Machine Learning: Doing More with Less Data
From Everand
Few-Shot Machine Learning: Doing More with Less Data
Robert Johnson
No ratings yet
Comprehensive Guide to Glue for Scientific Data Exploration: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Glue for Scientific Data Exploration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering Deepseek in Python: A Complete Guide to Building, Training, Deploying, and Scaling Advanced NLP Applications with Deepseek Models in Python
From Everand
Mastering Deepseek in Python: A Complete Guide to Building, Training, Deploying, and Scaling Advanced NLP Applications with Deepseek Models in Python
Dargslan
No ratings yet
Neural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention
From Everand
Neural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention
Steven Cooper
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Knowledge Reasoning: Fundamentals and Applications
From Everand
Knowledge Reasoning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
From Everand
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
William Sullivan
1/5 (1)
Data Science with Jupyter: Master Data Science skills with easy-to-follow Python examples
From Everand
Data Science with Jupyter: Master Data Science skills with easy-to-follow Python examples
Prateek Gupta
No ratings yet
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
From Everand
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
Robert Johnson
No ratings yet
Transcending Forgery Specificity With Latent Space Augmentation
No ratings yet
Transcending Forgery Specificity With Latent Space Augmentation
11 pages
Image Retrieval: Fundamentals and Applications
From Everand
Image Retrieval: Fundamentals and Applications
Fouad Sabry
No ratings yet
IR LLMs
No ratings yet
IR LLMs
17 pages
Object–Oriented Programming with Swift 2
From Everand
Object–Oriented Programming with Swift 2
Hillar Gastón C.
No ratings yet
Hadoop Blueprints
From Everand
Hadoop Blueprints
Anurag Shrivastava
No ratings yet
DynamoDB Applied Design Patterns
From Everand
DynamoDB Applied Design Patterns
Uchit Vyas
3/5 (1)
ChatGPT for Researcher: Accelerate Your Research with AI-Powered Insights and Analysis (2024 Guide)
From Everand
ChatGPT for Researcher: Accelerate Your Research with AI-Powered Insights and Analysis (2024 Guide)
SEBASTIAN ORTEGA
No ratings yet
Deep Learning: Fundamentals and Applications
From Everand
Deep Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
The Predictive Project Manager
From Everand
The Predictive Project Manager
Puneet Mathur
No ratings yet
Java for Data Science
From Everand
Java for Data Science
Richard M. Reese
No ratings yet
Managing Multimedia and Unstructured Data in the Oracle Database
From Everand
Managing Multimedia and Unstructured Data in the Oracle Database
Marcelle Kratochvil
No ratings yet
Python for Secret Agents
From Everand
Python for Secret Agents
Steven F. Lott
No ratings yet
Python for AI: Applying Machine Learning in Everyday Projects
From Everand
Python for AI: Applying Machine Learning in Everyday Projects
Robert Johnson
No ratings yet
GPT for authors: Write with Chat
From Everand
GPT for authors: Write with Chat
T.S Avini
No ratings yet
Parallel Python with Dask: Perform distributed computing, concurrent programming and manage large dataset
From Everand
Parallel Python with Dask: Perform distributed computing, concurrent programming and manage large dataset
Tim Peters
No ratings yet
Parallel Python with Dask
From Everand
Parallel Python with Dask
Tim Peters
No ratings yet
Hands-on Data Analysis and Visualization with Pandas: Engineer, Analyse and Visualize Data, Using Powerful Python Libraries
From Everand
Hands-on Data Analysis and Visualization with Pandas: Engineer, Analyse and Visualize Data, Using Powerful Python Libraries
PURNA CHANDER RAO. KATHULA
5/5 (1)
Artificial Intelligence Frame: Fundamentals and Applications
From Everand
Artificial Intelligence Frame: Fundamentals and Applications
Fouad Sabry
No ratings yet
Generative AI – An Overview: Software, #1
From Everand
Generative AI – An Overview: Software, #1
Editor IJSMI
No ratings yet
Relationship Extraction: Fundamentals and Applications
From Everand
Relationship Extraction: Fundamentals and Applications
Fouad Sabry
No ratings yet
Livegrep Code Search in Depth: The Complete Guide for Developers and Engineers
From Everand
Livegrep Code Search in Depth: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Jump Start MySQL: Master the Database That Powers the Web
From Everand
Jump Start MySQL: Master the Database That Powers the Web
Timothy Boronczyk
No ratings yet
Practical NetCDF Techniques: Definitive Reference for Developers and Engineers
From Everand
Practical NetCDF Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Hadoop Ecosystem for Big Data
From Everand
Hadoop Ecosystem for Big Data
Dr. Zemelak Goraga
No ratings yet
Learning Bayesian Models with R
From Everand
Learning Bayesian Models with R
M.Koduvely Dr. Hari
5/5 (1)
Semantic Translation: Fundamentals and Applications
From Everand
Semantic Translation: Fundamentals and Applications
Fouad Sabry
No ratings yet
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Problem Statement - 20240105 - 104142 - 0000
No ratings yet
Problem Statement - 20240105 - 104142 - 0000
7 pages
Protocol Buffers Handbook: Getting deeper into Protobuf internals and its usage
From Everand
Protocol Buffers Handbook: Getting deeper into Protobuf internals and its usage
Clément Jean
No ratings yet
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
From Everand
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
Peter Jones
No ratings yet
OpenJS Node.js Application Developer (JSNAD) Certification Guide: A complete practical study guide to become a node.js certified developer with 100+ sample programs demonstrated
From Everand
OpenJS Node.js Application Developer (JSNAD) Certification Guide: A complete practical study guide to become a node.js certified developer with 100+ sample programs demonstrated
Liora Venith
No ratings yet
Icelt Syllabus
100% (1)
Icelt Syllabus
41 pages
Syllabus
No ratings yet
Syllabus
2 pages
2 Complete The School Subjects
No ratings yet
2 Complete The School Subjects
2 pages
Bits
No ratings yet
Bits
2 pages
Corporate Culture AND Strategic Management
100% (1)
Corporate Culture AND Strategic Management
23 pages
Catalan Pronouns
No ratings yet
Catalan Pronouns
10 pages
Qualitative Research (Research in Daily Life 1)
No ratings yet
Qualitative Research (Research in Daily Life 1)
2 pages
EN6RC-IVc-3.2.5 - 2023-2024 - Day 2
No ratings yet
EN6RC-IVc-3.2.5 - 2023-2024 - Day 2
4 pages
WEEK 12 & 13 Simulated Teaching
No ratings yet
WEEK 12 & 13 Simulated Teaching
30 pages
Coursebook 1 Module Bumaya
No ratings yet
Coursebook 1 Module Bumaya
21 pages
Body Language - Guide To Reading Body Language Signals in Management, Training, Courtship, Flirting and Other Communications and Relationships
No ratings yet
Body Language - Guide To Reading Body Language Signals in Management, Training, Courtship, Flirting and Other Communications and Relationships
62 pages
Monovit: Self-Supervised Monocular Depth Estimation With A Vision Transformer
No ratings yet
Monovit: Self-Supervised Monocular Depth Estimation With A Vision Transformer
11 pages
Instructional Systems Design
100% (3)
Instructional Systems Design
30 pages
Capstone Paper
0% (1)
Capstone Paper
13 pages
1 - Introducing SLA - LA Vs LL
No ratings yet
1 - Introducing SLA - LA Vs LL
16 pages
Information Direction, Website Reputation and eWOM Effect
No ratings yet
Information Direction, Website Reputation and eWOM Effect
7 pages
The Epistemology of Cognitive Literary Studies: Hart, F. Elizabeth (Faith Elizabeth), 1959
No ratings yet
The Epistemology of Cognitive Literary Studies: Hart, F. Elizabeth (Faith Elizabeth), 1959
22 pages
III Learning Guide-English III
No ratings yet
III Learning Guide-English III
5 pages
Foreign Language Anxiety and English Medium Instruction Classrooms: An Introduction
No ratings yet
Foreign Language Anxiety and English Medium Instruction Classrooms: An Introduction
10 pages
Eng8-Q2 Mod4 Version5
No ratings yet
Eng8-Q2 Mod4 Version5
11 pages
Management - ISC 12
No ratings yet
Management - ISC 12
4 pages
The LLM Triangle Principles To Architect Reliable AI Apps - by Almog Baku - Jul, 2024 - Towards Data Science
No ratings yet
The LLM Triangle Principles To Architect Reliable AI Apps - by Almog Baku - Jul, 2024 - Towards Data Science
20 pages
English - B.A
100% (1)
English - B.A
9 pages
Error Codes
No ratings yet
Error Codes
3 pages
Lecture Notes Creative Writing - Lesson 1
No ratings yet
Lecture Notes Creative Writing - Lesson 1
5 pages
Pre Processing of Twitter's Data For Opinion Mining in Political Context
No ratings yet
Pre Processing of Twitter's Data For Opinion Mining in Political Context
11 pages
20 Rules of Subject Verb Agreement
No ratings yet
20 Rules of Subject Verb Agreement
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Hypothetical Document Embeddings (HyDE) - Jupyter Notebook

Uploaded by

Hypothetical Document Embeddings (HyDE) - Jupyter Notebook

Uploaded by

6/14/24, 4:20 PM Hypothetical Document Embeddings (HyDE) - Jupyter Notebook

Advanced RAG - Hypothetical Document

It goes a step further by using an unsupervised encoder learned through contrastive

Contrastive learning allows models to extract meaningful representations from unlabeled

4. Similar Document Retrieval: HyDE searches through a database of existing documents,

localhost:8888/notebooks/OneDrive/Hypothetical Document Embeddings (HyDE).ipynb# 1/3

Zero-Shot Learning: No pre-labeled data is required for operation, making it adaptable to

localhost:8888/notebooks/OneDrive/Hypothetical Document Embeddings (HyDE).ipynb# 2/3

localhost:8888/notebooks/OneDrive/Hypothetical Document Embeddings (HyDE).ipynb# 3/3

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.