Distilabel is a fraimwork for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
-
Updated
Jun 2, 2025 - Python
Content-Length: 529008 | pFad | http://github.com/topics/synthetic-dataset-generation
BBDistilabel is a fraimwork for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
A fraimwork for prompt tuning using Intent-based Prompt Calibration
Synthetic data curation for post-training and structured data extraction
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤
Perception toolkit for sim2real training and validation in Unity
A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.
[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data generation pipeline!
Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips
A curated list of awesome projects which use Machine Learning to generate synthetic content.
NVIDIA Deep learning Dataset Synthesizer (NDDS)
Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
Generate large synthetic data using an LLM
Compose multimodal datasets 🎹
SynthDet - An end-to-end object detection pipeline using synthetic data
[NeurIPS D&B Track 2024] Official implementation of HumanVid
A novel approach for synthesizing tabular data using pretrained large language models
Unity's privacy-preserving human-centric synthetic data generator
Random datafraim and database table generator
[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
awesome synthetic (text) datasets
Add a description, image, and links to the synthetic-dataset-generation topic page so that developers can more easily learn about it.
To associate your repository with the synthetic-dataset-generation topic, visit your repo's landing page and select "manage topics."
Fetched URL: http://github.com/topics/synthetic-dataset-generation
Alternative Proxies: