Open
Description
When multiple requests are processed, the first request is interrupted. How to solve this problem?
My run command is as follows:
python3 -m llama_cpp.server --model ./models/WizardLM-13B-V1.2/ggml-model-f16-Q5.gguf --n_gpu_layers 1 --n_ctx 8192
I tried to set up the following command:
python3 -m llama_cpp.server --model ./models/WizardLM-13B-V1.2/ggml-model-f16-Q5.gguf --n_gpu_layers 1 --n_ctx 8192 --interrupt_requests False
But --interrupt_requests False did not take effect.