Skip to content

Commit 4d574bd

Browse files
authored
feat(server): Add support for pulling models from Huggingface Hub (abetlen#1222)
* Basic support for hf pull on server * Add hf_model_repo_id setting * Update README
1 parent b3e358d commit 4d574bd

File tree

3 files changed

+24
-2
lines changed

3 files changed

+24
-2
lines changed

README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -577,6 +577,12 @@ python3 -m llama_cpp.server --model models/7B/llama-model.gguf --chat_format cha
577577
That will format the prompt according to how model expects it. You can find the prompt format in the model card.
578578
For possible options, see [llama_cpp/llama_chat_format.py](llama_cpp/llama_chat_format.py) and look for lines starting with "@register_chat_format".
579579

580+
If you have `huggingface-hub` installed, you can also use the `--hf_model_repo_id` flag to load a model from the Hugging Face Hub.
581+
582+
```bash
583+
python3 -m llama_cpp.server --hf_model_repo_id Qwen/Qwen1.5-0.5B-Chat-GGUF --model '*q8_0.gguf'
584+
```
585+
580586
### Web Server Features
581587

582588
- [Local Copilot replacement](https://llama-cpp-python.readthedocs.io/en/latest/server/#code-completion)

llama_cpp/server/model.py

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -120,9 +120,20 @@ def load_llama_from_model_settings(settings: ModelSettings) -> llama_cpp.Llama:
120120
kv_overrides[key] = float(value)
121121
else:
122122
raise ValueError(f"Unknown value type {value_type}")
123+
124+
import functools
123125

124-
_model = llama_cpp.Llama(
125-
model_path=settings.model,
126+
kwargs = {}
127+
128+
if settings.hf_model_repo_id is not None:
129+
create_fn = functools.partial(llama_cpp.Llama.from_pretrained, repo_id=settings.hf_model_repo_id, filename=settings.model)
130+
else:
131+
create_fn = llama_cpp.Llama
132+
kwargs["model_path"] = settings.model
133+
134+
135+
_model = create_fn(
136+
**kwargs,
126137
# Model Params
127138
n_gpu_layers=settings.n_gpu_layers,
128139
main_gpu=settings.main_gpu,

llama_cpp/server/settings.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,11 @@ class ModelSettings(BaseSettings):
143143
default=None,
144144
description="The model name or path to a pretrained HuggingFace tokenizer model. Same as you would pass to AutoTokenizer.from_pretrained().",
145145
)
146+
# Loading from HuggingFace Model Hub
147+
hf_model_repo_id: Optional[str] = Field(
148+
default=None,
149+
description="The model repo id to use for the HuggingFace tokenizer model.",
150+
)
146151
# Speculative Decoding
147152
draft_model: Optional[str] = Field(
148153
default=None,

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy