Problem with QA generation

Hi!

Firstly, congrats on this library, I had to implement something similar for a project and came across this amazing work.

However, I've come across what appears to be a bug: when using the Gemini API to generate QA pairs, since the Gemini API format requires a user message in the messages object, it gives out a 400 HTTP error.

I've figured out a way to solve this by adding a "dummy" user message to the prompt. 

I'd be excited to work on a PR to add support for Gemini API format. I understand your main priority is to make synthetic-data-kit work well with LLama, but I'd be very happy to help make this awesome library API agnostic.

Thank you :)

Minimum reproducible example:

```
from synthetic_data_kit.models.llm_client import LLMClient
from synthetic_data_kit.core.context import AppContext
from synthetic_data_kit.utils.config import load_config, get_vllm_config, get_openai_config, get_llm_provider, get_path_config
from synthetic_data_kit.core.create import process_file

ctx = AppContext()

ctx.config = load_config("config.yaml")

provider = get_llm_provider(ctx.config)
print(f"L Using {provider} provider")

api_endpoint_config = get_openai_config(ctx.config)
api_base = api_endpoint_config.get("api_base")
model = api_endpoint_config.get("model")
num_pairs=1
content_type="qa"
output_dir = "output"
verbose= True
input_file = "docs/test.txt"
output_path = process_file(
                input_file,
                output_dir,
                config_path=ctx.config_path,
                api_base=api_base,
                model=model,
                content_type=content_type,
                num_pairs=num_pairs,
                verbose=verbose,
                provider=provider  # Pass the provider parameter
            )

```

Error message:
 
```
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 400 Bad Request"
ERROR:synthetic_data_kit.models.llm_client:api-endpoint API error (attempt 3/3): Error code: 400 - [{'error': {'code': 400, 'message': '* GenerateContentRequest.contents: contents is not specified\n', 'status': 'INVALID_ARGUMENT'}}]
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problem with QA generation #44

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Problem with QA generation #44

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.