[feature] Support SetFit: few-shot fine-tuning of Sentence Transformers, works with about 8 samples per class. #1633
Labels
enhancement
New feature or request
Content-Length: 219292 | pFad | http://github.com/postgresml/postgresml/issues/1633
A4Fetched URL: http://github.com/postgresml/postgresml/issues/1633
Alternative Proxies:
It would be great to add SetFit to postgresml
See
https://pypi.org/project/setfit/
https://huggingface.co/docs/setfit
SetFit for text classification is different from other libraries: Usually, to train/fine-tune a model you need thousands of samples per class. In this example
https://postgresml.org/docs/open-source/pgml/guides/llms/fine-tuning
the "train" part of IMDB dataset contains 25K rows. There are 2 classes, so 12500 samples per class.
Now I'm quoting the SetFit documentation
The code where they train a classifier - again they classifying film reviews (nothing really new here)
is here
https://huggingface.co/docs/setfit/main/quickstart#training
Compare this:
12500 samples per class
vs
8 samples per class with SetFit
In the real life, in many cases, you can collect... 50 samples per class and use SetFit to train a model.
Situations where you have tens of thousands of samples are quite rare.
Let's support SetFit.
The text was updated successfully, but these errors were encountered: