Skip to content

Problem with QA generation #44

Open
@Lucas-Fernandes-Martins

Description

Hi!

Firstly, congrats on this library, I had to implement something similar for a project and came across this amazing work.

However, I've come across what appears to be a bug: when using the Gemini API to generate QA pairs, since the Gemini API format requires a user message in the messages object, it gives out a 400 HTTP error.

I've figured out a way to solve this by adding a "dummy" user message to the prompt.

I'd be excited to work on a PR to add support for Gemini API format. I understand your main priority is to make synthetic-data-kit work well with LLama, but I'd be very happy to help make this awesome library API agnostic.

Thank you :)

Minimum reproducible example:

from synthetic_data_kit.models.llm_client import LLMClient
from synthetic_data_kit.core.context import AppContext
from synthetic_data_kit.utils.config import load_config, get_vllm_config, get_openai_config, get_llm_provider, get_path_config
from synthetic_data_kit.core.create import process_file

ctx = AppContext()

ctx.config = load_config("config.yaml")

provider = get_llm_provider(ctx.config)
print(f"L Using {provider} provider")

api_endpoint_config = get_openai_config(ctx.config)
api_base = api_endpoint_config.get("api_base")
model = api_endpoint_config.get("model")
num_pairs=1
content_type="qa"
output_dir = "output"
verbose= True
input_file = "docs/test.txt"
output_path = process_file(
                input_file,
                output_dir,
                config_path=ctx.config_path,
                api_base=api_base,
                model=model,
                content_type=content_type,
                num_pairs=num_pairs,
                verbose=verbose,
                provider=provider  # Pass the provider parameter
            )

Error message:

INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 400 Bad Request"
ERROR:synthetic_data_kit.models.llm_client:api-endpoint API error (attempt 3/3): Error code: 400 - [{'error': {'code': 400, 'message': '* GenerateContentRequest.contents: contents is not specified\n', 'status': 'INVALID_ARGUMENT'}}]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      pFad - Phonifier reborn

      Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

      Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


      Alternative Proxies:

      Alternative Proxy

      pFad Proxy

      pFad v3 Proxy

      pFad v4 Proxy