0% found this document useful (0 votes)
45 views11 pages

Research Ibm Com Blog retrieval-augmented-generation-RAG

Uploaded by

uma5b3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views11 pages

Research Ibm Com Blog retrieval-augmented-generation-RAG

Uploaded by

uma5b3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Research Focus areas Blog Publications Careers About

Home
↳ Blog

Date Explainer 5 minute read


22 Aug 2023

Authors What is retrieval-augmented


Kim Martineau
generation?
Topics
RAG is an AI framework for retrieving facts from an external
AI
knowledge base to ground large language models (LLMs) on
Explainable AI
the most accurate, up-to-date information and to give users
Generative AI
insight into LLMs' generative process.
Natural Language Process…

Trustworthy Generation

Share

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Large language models can be inconsistent. Sometimes they
Explore what you can do with
nail the answer to questions, other times they regurgitate IBM watsonx to deploy and
random facts from their training data. If they occasionally embed AI across your business.
sound like they have no idea what they’re saying, it’s
because they don’t. LLMs know how words relate
statistically, but not what they mean.

Retrieval-augmented generation (RAG) is an AI framework


for improving the quality of LLM-generated responses by
grounding the model on external sources of knowledge to
supplement the LLM’s internal representation of information.
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Implementing RAG in an LLM-based question answering
system has two main benefits: It ensures that the model has
access to the most current, reliable facts, and that users
have access to the model’s sources, ensuring that its claims
can be checked for accuracy and ultimately trusted.

“You want to cross-reference a model’s answers with the


original content so you can see what it is basing its answer
on,” said Luis Lastras, director of language technologies at
IBM Research.

RAG has additional benefits. By grounding an LLM on a set of


external, verifiable facts, the model has fewer opportunities
to pull information baked into its parameters. This reduces
the chances that an LLM will leak sensitive data, or
‘hallucinate’ incorrect or misleading information.

RAG also reduces the need for users to continuously train the
model on new data and update its parameters as
circumstances evolve. In this way, RAG can lower the
computational and financial costs of running LLM-powered
chatbots in an enterprise setting. IBM unveiled its new AI
and data platform, watsonx, which offers RAG, back in May.

An overview from IBM expert Marina Danilevsky

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Hear from other AI experts

Dmitri Krotov is on a quest to improve AI.


Ruchir Puri leads the charge on code modernization.
Kush Varshney explains how governance is applied throughout the AI
pipeline.

An ‘open book’ approach to answering tough


questions

Underpinning all foundation models, including LLMs, is an AI


architecture known as the transformer. It turns heaps of raw

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
data into a compressed representation of its basic structure.
Starting from this raw representation, a foundation model
can be adapted to a variety of tasks with some additional
fine-tuning on labeled, domain-specific knowledge.

But fine-tuning alone rarely gives the model the full breadth
of knowledge it needs to answer highly specific questions in
an ever-changing context. In a 2020 paper, Meta (then
known as Facebook) came up with a framework called
retrieval-augmented generation to give LLMs access to
information beyond their training data. RAG allows LLMs to
build on a specialized body of knowledge to answer
questions in more accurate way.

“It’s the difference between an open-book and a closed-


book exam,” Lastras said. “In a RAG system, you are asking
the model to respond to a question by browsing through the
content in a book, as opposed to trying to remember facts
from memory.”

As the name suggests, RAG has two phases: retrieval and


content generation. In the retrieval phase, algorithms search
for and retrieve snippets of information relevant to the user’s
prompt or question. In an open-domain, consumer setting,
those facts can come from indexed documents on the
internet; in a closed-domain, enterprise setting, a narrower
set of sources are typically used for added security and
reliability.

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
This assortment of external knowledge is appended to the
user’s prompt and passed to the language model. In the
generative phase, the LLM draws from the augmented
prompt and its internal representation of its training data to
synthesize an engaging answer tailored to the user in that
instant. The answer can then be passed to a chatbot with
links to its sources.

Toward personalized and verifiable responses

Before LLMs, digital conversation agents followed a manual


dialogue flow. They confirmed the customer’s intent, fetched
the requested information, and delivered an answer in a one-
size-fits all script. For straightforward queries, this manual
decision-tree method worked just fine.

But it had limitations. Anticipating and scripting answers to


every question a customer might conceivably ask took time;
if you missed a scenario, the chatbot had no ability to
improvise. Updating the scripts as policies and
circumstances evolved was either impractical or impossible.

Today, LLM-powered chatbots can give customers more


personalized answers without humans having to write out
new scripts. And RAG allows LLMs to go one step further by
greatly reducing the need to feed and retrain the model on
fresh examples. Simply upload the latest documents or

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
policies, and the model retrieves the information in open-
book mode to answer the question.

IBM is currently using RAG to ground its internal customer-


care chatbots on content that can be verified and trusted.
This real-world scenario shows how it works: An employee,
Alice, has learned that her son’s school will have early
dismissal on Wednesdays for the rest of the year. She wants
to know if she can take vacation in half-day increments and if
she has enough vacation to finish the year.

To craft its response, the LLM first pulls data from Alice’s HR
files to find out how much vacation she gets as a longtime
employee, and how many days she has left for the year. It
also searches the company’s policies to verify that her
vacation can be taken in half-days. These facts are injected
into Alice’s initial query and passed to the LLM, which
generates a concise, personalized answer. A chatbot delivers
the response, with links to its sources.

Teaching the model to recognize when it doesn’t


know

Customer queries aren’t always this straightforward. They


can be ambiguously worded, complex, or require knowledge
the model either doesn’t have or can’t easily parse. These

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
are the conditions in which LLMs are prone to making things
up.

“Think of the model as an overeager junior employee that


blurts out an answer before checking the facts,” said Lastras.
“Experience teaches us to stop and say when we don’t know
something. But LLMs need to be explicitly trained to
recognize questions they can’t answer.”

In a more challenging scenario taken from real life, Alice


wants to know how many days of maternity leave she gets. A
chatbot that does not use RAG responds cheerfully (and
incorrectly): “Take as long as you want.”

Maternity-leave policies are complex, in part, because they


vary by the state or country of the employee’s home-office.
When the LLM failed to find a precise answer, it should have
responded, “I’m sorry, I don’t know,” said Lastras, or asked
additional questions until it could land on a question it could
definitively answer. Instead, it pulled a phrase from a training
set stocked with empathetic, customer-pleasing language.

With enough fine-tuning, an LLM can be trained to pause and


say when it’s stuck. But it may need to see thousands of
examples of questions that can and can’t be answered. Only
then can the model learn to identify an unanswerable
question, and probe for more detail until it hits on a question
that it has the information to answer.

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
RAG is currently the best-known tool for grounding LLMs on
the latest, verifiable information, and lowering the costs of
having to constantly retrain and update them. RAG depends
on the ability to enrich prompts with relevant information
contained in vectors, which are mathematical
representations of data. Vector databases can efficiently
index, store and retrieve information for things like
recommendation engines and chatbots. But RAG is
imperfect, and many interesting challenges remain in getting
RAG done right.

At IBM Research, we are focused on innovating at both ends


of the process: retrieval, how to find and fetch the most
relevant information possible to feed the LLM; and
generation, how to best structure that information to get the
richest responses from the LLM.

See how we're applying AI research

We are setting out to reduce the environmental impact of PFAS.


We are developing methods to improve chatbots.
We are co-creating an AI foundation model for weather and climate
prediction.

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Subscribe to our Future Forward newsletter and stay up to
date on the latest research news S ubs
c ri b e
news to our
lette
r

Explainer Explainer Research Research

An air traffic Why we’re teaching New algorithms open How memory
controller for LLMs LLMs to forget things possibilities for augmentation can
training AI models o… improve large…

Kim Martineau Kim Martineau Peter Hess Peter Hess


10 Oct 2024 07 Oct 2024 03 Oct 2024 24 Sep 2024

Foundation Models Generative AI AI Generative AI AI AI Hardware AI Generative AI

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Focus areas Work with us Follow us
Semiconductors Careers Newsletter
Artificial Intelligence Collaborate X
Quantum Computing Contact Research LinkedIn
Hybrid Cloud YouTube

Directories
Quick links Topics
About People
Publications Projects
Blog
Events

Contact IBM Terms of use


Privacy Accessibility

PDFmyURL converts web pages and even full websites to PDF easily and quickly.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy