Open
Description
Hi!
Firstly, congrats on this library, I had to implement something similar for a project and came across this amazing work.
However, I've come across what appears to be a bug: when using the Gemini API to generate QA pairs, since the Gemini API format requires a user message in the messages object, it gives out a 400 HTTP error.
I've figured out a way to solve this by adding a "dummy" user message to the prompt.
I'd be excited to work on a PR to add support for Gemini API format. I understand your main priority is to make synthetic-data-kit work well with LLama, but I'd be very happy to help make this awesome library API agnostic.
Thank you :)
Minimum reproducible example:
from synthetic_data_kit.models.llm_client import LLMClient
from synthetic_data_kit.core.context import AppContext
from synthetic_data_kit.utils.config import load_config, get_vllm_config, get_openai_config, get_llm_provider, get_path_config
from synthetic_data_kit.core.create import process_file
ctx = AppContext()
ctx.config = load_config("config.yaml")
provider = get_llm_provider(ctx.config)
print(f"L Using {provider} provider")
api_endpoint_config = get_openai_config(ctx.config)
api_base = api_endpoint_config.get("api_base")
model = api_endpoint_config.get("model")
num_pairs=1
content_type="qa"
output_dir = "output"
verbose= True
input_file = "docs/test.txt"
output_path = process_file(
input_file,
output_dir,
config_path=ctx.config_path,
api_base=api_base,
model=model,
content_type=content_type,
num_pairs=num_pairs,
verbose=verbose,
provider=provider # Pass the provider parameter
)
Error message:
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 400 Bad Request"
ERROR:synthetic_data_kit.models.llm_client:api-endpoint API error (attempt 3/3): Error code: 400 - [{'error': {'code': 400, 'message': '* GenerateContentRequest.contents: contents is not specified\n', 'status': 'INVALID_ARGUMENT'}}]
Metadata
Metadata
Assignees
Labels
No labels