This repository was archived by the owner on Jul 4, 2025. It is now read-only.
File tree Expand file tree Collapse file tree 2 files changed +7
-2
lines changed
Expand file tree Collapse file tree 2 files changed +7
-2
lines changed Original file line number Diff line number Diff line change @@ -69,7 +69,8 @@ In case you got error while loading models. Please check for the correct model p
6969| ` ngl ` | Integer | The number of GPU layers to use. |
7070| ` ctx_len ` | Integer | The context length for the model operations. |
7171| ` embedding ` | Boolean | Whether to use embedding in the model. |
72- | ` n_parallel ` | Integer | The number of parallel operations. Uses Drogon thread count if not set. |
72+ | ` n_parallel ` | Integer | The number of parallel operations.|
73+ | ` cpu_threads ` | Integer| The number of threads for CPU inference.|
7374| ` cont_batching ` | Boolean | Whether to use continuous batching. |
7475| ` user_prompt ` | String | The prompt to use for the user. |
7576| ` ai_prompt ` | String | The prompt to use for the AI assistant. |
Original file line number Diff line number Diff line change @@ -235,6 +235,11 @@ components:
235235 example : 4
236236 nullable : true
237237 description : The number of parallel operations. Only set when enable continuous batching.
238+ cpu_threads :
239+ type : integer
240+ example : 4
241+ nullable : true
242+ description : The number of threads for CPU-based inference.
238243 pre_prompt :
239244 type : string
240245 default : A chat between a curious user and an artificial intelligence assistant. The assistant follows the given rules no matter what.
@@ -255,7 +260,6 @@ components:
255260 default : " ASSISTANT:"
256261 nullable : true
257262 description : The prefix for assistant prompt.
258-
259263 required :
260264 - llama_model_path
261265
You can’t perform that action at this time.
0 commit comments