Reranking for Vertex AI RAG Engine

The page explains reranking and types of rankers. The page also demonstrates how to use the Vertex AI ranking API to rerank your retrieved responses.

Available rerankers

Ranker options Description Latency Accuracy Pricing
Vertex AI ranking API The Vertex AI ranking API is a standalone semantic reranker designed for highly-precise relevance scoring and low latency.

For more information about Vertex AI ranking API, see Improve search and RAG quality with ranking API.
Very low (less than 100 milliseconds) State-of-the-art performance Per Vertex AI RAG Engine request
LLM reranker LLM reranker uses a separate call to Gemini to assess relevance of chunks to a query. High (1 to 2 seconds) Model dependent LLM token pricing

Use the Vertex AI ranking API

To use the Vertex AI ranking API, you must enable the Discovery Engine API. All supported models can be found in the Improve search and RAG quality with ranking API.

These code samples demonstrate how to enable reranking with the Vertex AI ranking API in the tool configuration.

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.

Replace the following variables used in the sample code:

  • PROJECT_ID: The ID of your Google Cloud project.
  • LOCATION: The region to process the request.
  • MODEL_NAME: LLM model for content generation. For example, gemini-2.0-flash.
  • INPUT_PROMPT: The text sent to the LLM for content generation.
  • RAG_CORPUS_RESOURCE: The name of the RAG corpus resource.
    Format: projects/{project}/locations/{location}/ragCorpora/{rag_corpus}.
  • SIMILARITY_TOP_K: Optional: The number of top contexts to retrieve.
  • RANKER_MODEL_NAME: The name of the model used for reranking. For example, semantic-ranker-default@latest.
from vertexai import rag
from vertexai.generative_models import GenerativeModel, Tool
import vertexai

PROJECT_ID = "PROJECT_ID"
CORPUS_NAME = "projects/{PROJECT_ID}/locations/LOCATION/ragCorpora/RAG_CORPUS_RESOURCE"

# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location="LOCATION")

config = rag.RagRetrievalConfig(
    top_k=10,
    ranking=rag.Ranking(
        rank_service=rag.RankService(
            model_name=RANKER_MODEL_NAME
        )
    )
)

rag_retrieval_tool = Tool.from_retrieval(
    retrieval=rag.Retrieval(
        source=rag.VertexRagStore(
            rag_resources=[
                rag.RagResource(
                    rag_corpus=CORPUS_NAME,
                )
            ],
            rag_retrieval_config=config
        ),
    )
)

rag_model = GenerativeModel(
    model_name="MODEL_NAME", tools=[rag_retrieval_tool]
)
response = rag_model.generate_content("INPUT_PROMPT")
print(response.text)
# Example response:
#   The sky appears blue due to a phenomenon called Rayleigh scattering.
#   Sunlight, which contains all colors of the rainbow, is scattered
#   by the tiny particles in the Earth's atmosphere....
#   ...

REST

To generate content using Gemini models, make a call to the Vertex AI GenerateContent API. By specifying the RAG_CORPUS_RESOURCE when you make the request, the model automatically retrieves data from the Vertex AI RAG Engine.

Replace the following variables used in the sample code:

  • PROJECT_ID: The ID of your Google Cloud project.
  • LOCATION: The region to process the request.
  • MODEL_NAME: LLM model for content generation. For example, gemini-2.0-flash.
  • GENERATION_METHOD: LLM method for content generation. Options include generateContent and streamGenerateContent.
  • INPUT_PROMPT: The text sent to the LLM for content generation.
  • RAG_CORPUS_RESOURCE: The name of the RAG corpus resource.
    Format: projects/{project}/locations/{location}/ragCorpora/{rag_corpus}.
  • SIMILARITY_TOP_K: Optional: The number of top contexts to retrieve.
  • RANKER_MODEL_NAME: The name of the model used for reranking. For example, semantic-ranker-default@latest.
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_NAME:GENERATION_METHOD" \
-d '{
  "contents": {
    "role": "user",
    "parts": {
      "text": "INPUT_PROMPT"
    }
  },
  "tools": {
    "retrieval": {
      "disable_attribution": false,
      "vertex_rag_store": {
        "rag_resources": {
            "rag_corpus": "RAG_CORPUS_RESOURCE"
          },
        "rag_retrieval_config": {
          "top_k": SIMILARITY_TOP_K,
          "ranking": {
            "rank_service": {
              "model_name": "RANKER_MODEL_NAME"
            }
          }
        }
      }
    }
  }
}'

Use the LLM reranker in Vertex AI RAG Engine

This section presents the prerequisites and code samples for using an LLM reranker.

The LLM reranker supports only Gemini models, which are accessible when the Vertex AI RAG Engine API is enabled. To view the list of supported models, see Gemini models.

To retrieve relevant contexts using the Vertex AI RAG Engine API, do the following:

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.

Replace the following variables used in the code sample:

  • PROJECT_ID: The ID of your Google Cloud project.
  • LOCATION: The region to process the request.
  • RAG_CORPUS_RESOURCE: The name of the RAG corpus resource. Format: projects/{project}/locations/{location}/ragCorpora/{rag_corpus}.
  • TEXT: The query text to get relevant contexts.
  • MODEL_NAME: The name of the model used for reranking.
from vertexai import rag
import vertexai

PROJECT_ID = "PROJECT_ID"
CORPUS_NAME = "projects/[PROJECT_ID]/locations/LOCATION/ragCorpora/RAG_CORPUS_RESOURCE"
MODEL_NAME= "MODEL_NAME"

# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location="LOCATION")

rag_retrieval_config = rag.RagRetrievalConfig(
    top_k=10,
    ranking=rag.Ranking(
        llm_ranker=rag.LlmRanker(
            model_name=MODEL_NAME
        )
    )
)

response = rag.retrieval_query(
    rag_resources=[
        rag.RagResource(
            rag_corpus=CORPUS_NAME,
        )
    ],
    text="TEXT",
    rag_retrieval_config=rag_retrieval_config,
)
print(response)
# Example response:
# contexts {
#   contexts {
#     source_uri: "gs://your-bucket-name/file.txt"
#     text: "....
#   ....

REST

Replace the following variables used in the code sample:

  • PROJECT_ID: The ID of your Google Cloud project.
  • LOCATION: The region to process the request.
  • RAG_CORPUS_RESOURCE: The name of the RAG corpus resource. Format: projects/{project}/locations/{location}/ragCorpora/{rag_corpus}.
  • TEXT: The query text to get relevant contexts.
  • MODEL_NAME: The name of the model used for reranking.
curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:retrieveContexts" \
  -d '{
    "vertex_rag_store": {
      "rag_resources": {
          "rag_corpus": "RAG_CORPUS_RESOURCE"
        }
    },
    "query": {
      "text": "TEXT",
      "rag_retrieval_config": {
        "top_k": 10,
        "ranking": {
          "llm_ranker": {
            "model_name": "MODEL_NAME"
          }
        }
      }
    }
  }'

What's next