Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

Commit 5135656

Browse files
committed
add cpu_threads api + docs
1 parent 0de3d0f commit 5135656

File tree

2 files changed

+7
-2
lines changed

2 files changed

+7
-2
lines changed

docs/docs/features/load-unload.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,8 @@ In case you got error while loading models. Please check for the correct model p
6969
| `ngl` | Integer | The number of GPU layers to use. |
7070
| `ctx_len` | Integer | The context length for the model operations. |
7171
| `embedding` | Boolean | Whether to use embedding in the model. |
72-
| `n_parallel` | Integer | The number of parallel operations. Uses Drogon thread count if not set. |
72+
| `n_parallel` | Integer | The number of parallel operations.|
73+
|`cpu_threads`|Integer|The number of threads for CPU inference.|
7374
| `cont_batching` | Boolean | Whether to use continuous batching. |
7475
| `user_prompt` | String | The prompt to use for the user. |
7576
| `ai_prompt` | String | The prompt to use for the AI assistant. |

docs/openapi/NitroAPI.yaml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -235,6 +235,11 @@ components:
235235
example: 4
236236
nullable: true
237237
description: The number of parallel operations. Only set when enable continuous batching.
238+
cpu_threads:
239+
type: integer
240+
example: 4
241+
nullable: true
242+
description: The number of threads for CPU-based inference.
238243
pre_prompt:
239244
type: string
240245
default: A chat between a curious user and an artificial intelligence assistant. The assistant follows the given rules no matter what.
@@ -255,7 +260,6 @@ components:
255260
default: "ASSISTANT:"
256261
nullable: true
257262
description: The prefix for assistant prompt.
258-
259263
required:
260264
- llama_model_path
261265

0 commit comments

Comments
 (0)