When multiple requests are processed, the first request is interrupted

When multiple requests are processed, the first request is interrupted. How to solve this problem?

My run command is as follows：

python3 -m llama_cpp.server --model ./models/WizardLM-13B-V1.2/ggml-model-f16-Q5.gguf --n_gpu_layers 1 --n_ctx 8192

I tried to set up the following command：

python3 -m llama_cpp.server --model ./models/WizardLM-13B-V1.2/ggml-model-f16-Q5.gguf --n_gpu_layers 1 --n_ctx 8192 --interrupt_requests False

But --interrupt_requests False did not take effect.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

When multiple requests are processed, the first request is interrupted #867

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

When multiple requests are processed, the first request is interrupted #867

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.