Skip to content

Commit e4431a6

Browse files
authored
Merge pull request abetlen#522 from bretello/llama2-70b-support
Llama2 70b support
2 parents 4aaaec5 + 0f09f10 commit e4431a6

File tree

2 files changed

+11
-0
lines changed

2 files changed

+11
-0
lines changed

README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,14 @@ For instance, if you want to work with larger contexts, you can expand the conte
135135
llm = Llama(model_path="./models/7B/ggml-model.bin", n_ctx=2048)
136136
```
137137

138+
### Loading llama-2 70b
139+
140+
Llama2 70b must set the `n_gqa` parameter (grouped-query attention factor) to 8 when loading:
141+
142+
```python
143+
llm = Llama(model_path="./models/7B/ggml-model.bin", n_gqa=8)
144+
```
145+
138146
## Web Server
139147

140148
`llama-cpp-python` offers a web server which aims to act as a drop-in replacement for the OpenAI API.

llama_cpp/llama.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -216,6 +216,7 @@ def __init__(
216216
embedding: bool = False,
217217
n_threads: Optional[int] = None,
218218
n_batch: int = 512,
219+
n_gqa: Optional[int] = None, # must be 8 for llama2 70b
219220
last_n_tokens_size: int = 64,
220221
lora_base: Optional[str] = None,
221222
lora_path: Optional[str] = None,
@@ -260,6 +261,8 @@ def __init__(
260261

261262
self.params = llama_cpp.llama_context_default_params()
262263
self.params.n_ctx = n_ctx
264+
if n_gqa is not None:
265+
self.params.n_gqa = n_gqa
263266
self.params.n_gpu_layers = n_gpu_layers
264267
self.params.seed = seed
265268
self.params.f16_kv = f16_kv

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy