The page explains reranking and types of rankers. The page also demonstrates how to use the Vertex AI ranking API to rerank your retrieved responses.
Available rerankers
Ranker options | Description | Latency | Accuracy | Pricing |
---|---|---|---|---|
Vertex AI ranking API | The Vertex AI ranking API is a standalone semantic reranker designed for highly-precise relevance scoring and low latency. For more information about Vertex AI ranking API, see Improve search and RAG quality with ranking API. |
Very low (less than 100 milliseconds) | State-of-the-art performance | Per Vertex AI RAG Engine request |
LLM reranker | LLM reranker uses a separate call to Gemini to assess relevance of chunks to a query. | High (1 to 2 seconds) | Model dependent | LLM token pricing |
Use the Vertex AI ranking API
To use the Vertex AI ranking API, you must enable the Discovery Engine API. All supported models can be found in the Improve search and RAG quality with ranking API.
These code samples demonstrate how to enable reranking with the Vertex AI ranking API in the tool configuration.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Replace the following variables used in the sample code:
- PROJECT_ID: The ID of your Google Cloud project.
- LOCATION: The region to process the request.
- MODEL_NAME: LLM model for content generation. For
example,
gemini-2.0-flash
. - INPUT_PROMPT: The text sent to the LLM for content generation.
- RAG_CORPUS_RESOURCE: The name of the RAG corpus
resource.
Format:projects/{project}/locations/{location}/ragCorpora/{rag_corpus}
. - SIMILARITY_TOP_K: Optional: The number of top contexts to retrieve.
- RANKER_MODEL_NAME: The name of the model used for
reranking. For example,
semantic-ranker-default@latest
.
from vertexai import rag
from vertexai.generative_models import GenerativeModel, Tool
import vertexai
PROJECT_ID = "PROJECT_ID"
CORPUS_NAME = "projects/{PROJECT_ID}/locations/LOCATION/ragCorpora/RAG_CORPUS_RESOURCE"
# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location="LOCATION")
config = rag.RagRetrievalConfig(
top_k=10,
ranking=rag.Ranking(
rank_service=rag.RankService(
model_name=RANKER_MODEL_NAME
)
)
)
rag_retrieval_tool = Tool.from_retrieval(
retrieval=rag.Retrieval(
source=rag.VertexRagStore(
rag_resources=[
rag.RagResource(
rag_corpus=CORPUS_NAME,
)
],
rag_retrieval_config=config
),
)
)
rag_model = GenerativeModel(
model_name="MODEL_NAME", tools=[rag_retrieval_tool]
)
response = rag_model.generate_content("INPUT_PROMPT")
print(response.text)
# Example response:
# The sky appears blue due to a phenomenon called Rayleigh scattering.
# Sunlight, which contains all colors of the rainbow, is scattered
# by the tiny particles in the Earth's atmosphere....
# ...
REST
To generate content using Gemini models, make a call to the
Vertex AI GenerateContent
API. By specifying the
RAG_CORPUS_RESOURCE
when you make the request, the model automatically retrieves data
from the Vertex AI RAG Engine.
Replace the following variables used in the sample code:
- PROJECT_ID: The ID of your Google Cloud project.
- LOCATION: The region to process the request.
- MODEL_NAME: LLM model for content generation. For
example,
gemini-2.0-flash
. - GENERATION_METHOD: LLM method for content generation.
Options include
generateContent
andstreamGenerateContent
. - INPUT_PROMPT: The text sent to the LLM for content generation.
- RAG_CORPUS_RESOURCE: The name of the RAG corpus
resource.
Format:projects/{project}/locations/{location}/ragCorpora/{rag_corpus}
. - SIMILARITY_TOP_K: Optional: The number of top contexts to retrieve.
- RANKER_MODEL_NAME: The name of the model used for
reranking. For example,
semantic-ranker-default@latest
.
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_NAME:GENERATION_METHOD" \
-d '{
"contents": {
"role": "user",
"parts": {
"text": "INPUT_PROMPT"
}
},
"tools": {
"retrieval": {
"disable_attribution": false,
"vertex_rag_store": {
"rag_resources": {
"rag_corpus": "RAG_CORPUS_RESOURCE"
},
"rag_retrieval_config": {
"top_k": SIMILARITY_TOP_K,
"ranking": {
"rank_service": {
"model_name": "RANKER_MODEL_NAME"
}
}
}
}
}
}
}'
Use the LLM reranker in Vertex AI RAG Engine
This section presents the prerequisites and code samples for using an LLM reranker.
The LLM reranker supports only Gemini models, which are accessible when the Vertex AI RAG Engine API is enabled. To view the list of supported models, see Gemini models.
To retrieve relevant contexts using the Vertex AI RAG Engine API, do the following:
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Replace the following variables used in the code sample:
- PROJECT_ID: The ID of your Google Cloud project.
- LOCATION: The region to process the request.
- RAG_CORPUS_RESOURCE: The name of the RAG corpus
resource. Format:
projects/{project}/locations/{location}/ragCorpora/{rag_corpus}
. - TEXT: The query text to get relevant contexts.
- MODEL_NAME: The name of the model used for reranking.
from vertexai import rag
import vertexai
PROJECT_ID = "PROJECT_ID"
CORPUS_NAME = "projects/[PROJECT_ID]/locations/LOCATION/ragCorpora/RAG_CORPUS_RESOURCE"
MODEL_NAME= "MODEL_NAME"
# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location="LOCATION")
rag_retrieval_config = rag.RagRetrievalConfig(
top_k=10,
ranking=rag.Ranking(
llm_ranker=rag.LlmRanker(
model_name=MODEL_NAME
)
)
)
response = rag.retrieval_query(
rag_resources=[
rag.RagResource(
rag_corpus=CORPUS_NAME,
)
],
text="TEXT",
rag_retrieval_config=rag_retrieval_config,
)
print(response)
# Example response:
# contexts {
# contexts {
# source_uri: "gs://your-bucket-name/file.txt"
# text: "....
# ....
REST
Replace the following variables used in the code sample:
- PROJECT_ID: The ID of your Google Cloud project.
- LOCATION: The region to process the request.
- RAG_CORPUS_RESOURCE: The name of the RAG corpus
resource. Format:
projects/{project}/locations/{location}/ragCorpora/{rag_corpus}
. - TEXT: The query text to get relevant contexts.
- MODEL_NAME: The name of the model used for reranking.
curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:retrieveContexts" \
-d '{
"vertex_rag_store": {
"rag_resources": {
"rag_corpus": "RAG_CORPUS_RESOURCE"
}
},
"query": {
"text": "TEXT",
"rag_retrieval_config": {
"top_k": 10,
"ranking": {
"llm_ranker": {
"model_name": "MODEL_NAME"
}
}
}
}
}'
What's next
- To learn more about the responses from RAG, see Retrieval and generation output of Vertex AI RAG Engine.
- Manage your RAG knowledge base (corpus)