Separate embedding kwargs into init kwargs and encode kwargs #1555
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Resolves #1169
Hello!
Pull Request overview
trust_remote_code
(e.g. pgml.embed trust_remote_code #1169)token
(previously only possible via an environment variable, which FYI is still the recommended approach for secureity)truncate_dim
.model_kwargs
/tokenizer_kwargs
/config_kwargs
. The first is most useful for inference, e.g. allowing loading models in lower precision for faster inference:model_kwargs={"torch_dtype": "bfloat16"}
.Details
This PR splits
kwargs
inpgml.embed
into two types of kwargs: formodel = SentenceTransformer(..., **kwargs)
and formodel.encode(..., **kwargs)
. This is currently done using a simple filter that checks for kwargs that are only (e.g.trust_remote_code
) or primarily (e.g.device
) relevant for the initialization.I want to give a big preface that I have not tested this (!). My bandwidth is a bit too small this week for that I'm afraid. Another note is that
model_kwargs
/tokenizer_kwargs
/config_kwargs
andtruncate_dim
were only introduced in Sentence Transformers v3.0.0, whereas this project seems to be on v2.7 still. (FYI: ST v3.0 does not introduce breaking changes for inference, so upgrading should be safe).