Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

Commit 5338a78

Browse files
author
Gabrielle Ong
authored
Merge pull request #1631 from janhq/j/engines-docs
chore: adding engines docs
2 parents 0942fc1 + 6b3795e commit 5338a78

File tree

3 files changed

+215
-14
lines changed

3 files changed

+215
-14
lines changed

docs/docs/engines/index.mdx

Lines changed: 203 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,213 @@
22
slug: /engines
33
title: Engines
44
---
5-
import DocCardList from '@theme/DocCardList';
65

6+
import DocCardList from "@theme/DocCardList";
77

88
:::warning
99
🚧 Cortex.cpp is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.
1010
:::
1111

12+
# Engines
13+
14+
Engines in Cortex serve as execution drivers for machine learning models, providing the runtime environment necessary for model operations. Each engine is specifically designed to optimize the performance and ensure compatibility with its corresponding model types.
15+
16+
## Supported Engines
17+
18+
Cortex currently supports three industry-standard engines:
19+
20+
| Engine | Source | Description |
21+
| -------------------------------------------------------- | --------- | -------------------------------------------------------------------------------------- |
22+
| [llama.cpp](https://github.com/ggerganov/llama.cpp) | ggerganov | Inference of Meta's LLaMA model (and others) in pure C/C++ |
23+
| [ONNX Runtime](https://github.com/microsoft/onnxruntime) | Microsoft | ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator |
24+
| [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) | NVIDIA | GPU-optimized inference engine for large language models |
25+
26+
> **Note:** Cortex also supports users in building their own engines.
27+
28+
## Features
29+
30+
- **Engine Retrieval**: Install engines with a single click.
31+
- **Engine Management**: Easily manage engines by type, variant, and version.
32+
- **User-Friendly Interface**: Access models via Command Line Interface (CLI) or HTTP API.
33+
- **Engine Selection**: Select the appropriate engines to run your models.
34+
35+
## Usage
36+
37+
Cortex offers comprehensive support for multiple engine types, including [llama.cpp](https://github.com/ggerganov/llama.cpp), [ONNX Runtime](https://github.com/microsoft/onnxruntime), and [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM). These engines are utilized to load their corresponding model types. The platform provides a flexible management system for different engine variants and versions, enabling developers and users to easily rollback changes or compare performance metrics across different engine versions.
38+
39+
### Installing an engine
40+
41+
Cortex makes it extremely easy to install an engine. For example, to run a `GGUF` model, you will need the `llama-cpp` engine. To install it, simply enter `cortex engines install llama-cpp` into your terminal and wait for the process to complete. Cortex will automatically pull the latest stable version suitable for your PC's specifications.
42+
43+
#### CLI
44+
45+
To install an engine using the CLI, use the following command:
46+
47+
```sh
48+
cortex engines install llama-cpp
49+
Validating download items, please wait..
50+
Start downloading..
51+
llama-cpp 100%[==================================================] [00m:00s] 1.24 MB/1.24 MB
52+
Engine llama-cpp downloaded successfully!
53+
```
54+
55+
#### HTTP API
56+
57+
To install an engine using the HTTP API, use the following command:
58+
59+
```sh
60+
curl --location --request POST 'http://127.0.0.1:39281/engines/install/llama-cpp'
61+
```
62+
63+
Example response:
64+
65+
```json
66+
{
67+
"message": "Engine llama-cpp starts installing!"
68+
}
69+
```
70+
71+
### Listing engines
72+
73+
Cortex allowing clients to easily list current engines and their statuses. Each engine type can have different variants and versions, which are crucial for debugging and performance optimization. Different variants cater to specific hardware configurations, such as CUDA for NVIDIA GPUs and Vulkan for AMD GPUs on Windows, or AVX512 support for CPUs.
74+
75+
#### CLI
76+
77+
You can list the available engines using the following command:
78+
79+
```sh
80+
cortex engines list
81+
+---+--------------+-------------------+---------+-----------+--------------+
82+
| # | Name | Supported Formats | Version | Variant | Status |
83+
+---+--------------+-------------------+---------+-----------+--------------+
84+
| 1 | onnxruntime | ONNX | | | Incompatible |
85+
+---+--------------+-------------------+---------+-----------+--------------+
86+
| 2 | llama-cpp | GGUF | 0.1.37 | mac-arm64 | Ready |
87+
+---+--------------+-------------------+---------+-----------+--------------+
88+
| 3 | tensorrt-llm | TensorRT Engines | | | Incompatible |
89+
+---+--------------+-------------------+---------+-----------+--------------+
90+
```
91+
92+
#### HTTP API
93+
94+
You can also retrieve the list of engines via the HTTP API:
95+
96+
```sh
97+
curl --location 'http://127.0.0.1:39281/v1/engines'
98+
```
99+
100+
Example response:
101+
102+
```json
103+
{
104+
"data": [
105+
{
106+
"description": "This extension enables chat completion API calls using the Onnx engine",
107+
"format": "ONNX",
108+
"name": "onnxruntime",
109+
"productName": "onnxruntime",
110+
"status": "Incompatible",
111+
"variant": "",
112+
"version": ""
113+
},
114+
{
115+
"description": "This extension enables chat completion API calls using the LlamaCPP engine",
116+
"format": "GGUF",
117+
"name": "llama-cpp",
118+
"productName": "llama-cpp",
119+
"status": "Ready",
120+
"variant": "mac-arm64",
121+
"version": "0.1.37"
122+
},
123+
{
124+
"description": "This extension enables chat completion API calls using the TensorrtLLM engine",
125+
"format": "TensorRT Engines",
126+
"name": "tensorrt-llm",
127+
"productName": "tensorrt-llm",
128+
"status": "Incompatible",
129+
"variant": "",
130+
"version": ""
131+
}
132+
],
133+
"object": "list",
134+
"result": "OK"
135+
}
136+
```
137+
138+
### Getting detail information of an engine
139+
140+
Cortex allows users to retrieve detailed information about a specific engine. This includes supported formats, versions, variants, and status. This feature helps users understand the capabilities and compatibility of the engine they are working with.
141+
142+
#### CLI
143+
144+
To retrieve detailed information about an engine using the CLI, use the following command:
145+
146+
```sh
147+
cortex engines get llama-cpp
148+
+-----------+-------------------+---------+-----------+--------+
149+
| Name | Supported Formats | Version | Variant | Status |
150+
+-----------+-------------------+---------+-----------+--------+
151+
| llama-cpp | GGUF | 0.1.37 | mac-arm64 | Ready |
152+
+-----------+-------------------+---------+-----------+--------+
153+
```
154+
155+
This command will display information such as the engine's name, supported formats, version, variant, and status.
156+
157+
#### HTTP API
158+
159+
To retrieve detailed information about an engine using the HTTP API, send a GET request to the appropriate endpoint:
160+
161+
```sh
162+
curl --location 'http://127.0.0.1:39281/engines/llama-cpp'
163+
```
164+
165+
This request will return a JSON response containing detailed information about the engine, including its description, format, name, product name, status, variant, and version.
166+
Example response:
167+
168+
```json
169+
{
170+
"description": "This extension enables chat completion API calls using the LlamaCPP engine",
171+
"format": "GGUF",
172+
"name": "llama-cpp",
173+
"productName": "llama-cpp",
174+
"status": "Not Installed",
175+
"variant": "",
176+
"version": ""
177+
}
178+
```
179+
180+
### Uninstalling an engine
181+
182+
Cortex provides an easy way to uninstall an engine. This is useful when users want to uninstall the current version and then install the latest stable version of a particular engine.
183+
184+
#### CLI
185+
186+
To uninstall an engine, use the following CLI command:
187+
188+
```sh
189+
cortex engines uninstall llama-cpp
190+
```
191+
192+
#### HTTP API
193+
194+
To uninstall an engine using the HTTP API, send a DELETE request to the appropriate endpoint.
195+
196+
```sh
197+
curl --location --request DELETE 'http://127.0.0.1:39281/engines/llama-cpp'
198+
```
199+
200+
Example response:
201+
202+
```json
203+
{
204+
"message": "Engine llama-cpp uninstalled successfully!"
205+
}
206+
```
207+
208+
### Upcoming Engine Features
209+
210+
- Enhanced engine update mechanism with automated compatibility checks
211+
- Seamless engine switching between variants and versions
212+
- Improved Vulkan engine support with optimized performance
213+
12214
<DocCardList />

docs/docs/hub/index.mdx

Lines changed: 11 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -129,23 +129,23 @@ Unlike the CLI, where users can observe the download progress directly in the te
129129
"downloadUrl": "https://huggingface.co/cortexso/tinyllama/resolve/1b-gguf-q2-k/metadata.yml",
130130
"downloadedBytes": 0,
131131
"id": "metadata.yml",
132-
"localPath": "/Users/jamesnguyen/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q2-k/metadata.yml"
132+
"localPath": "/Users/user_name/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q2-k/metadata.yml"
133133
},
134134
{
135135
"bytes": 0,
136136
"checksum": "N/A",
137137
"downloadUrl": "https://huggingface.co/cortexso/tinyllama/resolve/1b-gguf-q2-k/model.gguf",
138138
"downloadedBytes": 0,
139139
"id": "model.gguf",
140-
"localPath": "/Users/jamesnguyen/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q2-k/model.gguf"
140+
"localPath": "/Users/user_name/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q2-k/model.gguf"
141141
},
142142
{
143143
"bytes": 0,
144144
"checksum": "N/A",
145145
"downloadUrl": "https://huggingface.co/cortexso/tinyllama/resolve/1b-gguf-q2-k/model.yml",
146146
"downloadedBytes": 0,
147147
"id": "model.yml",
148-
"localPath": "/Users/jamesnguyen/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q2-k/model.yml"
148+
"localPath": "/Users/user_name/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q2-k/model.yml"
149149
}
150150
],
151151
"type": "Model"
@@ -175,23 +175,23 @@ Unlike the CLI, where users can observe the download progress directly in the te
175175
"downloadUrl": "https://huggingface.co/cortexso/tinyllama/resolve/1b-gguf-q2-k/metadata.yml",
176176
"downloadedBytes": 58,
177177
"id": "metadata.yml",
178-
"localPath": "/Users/jamesnguyen/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q2-k/metadata.yml"
178+
"localPath": "/Users/user_name/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q2-k/metadata.yml"
179179
},
180180
{
181181
"bytes": 432131456,
182182
"checksum": "N/A",
183183
"downloadUrl": "https://huggingface.co/cortexso/tinyllama/resolve/1b-gguf-q2-k/model.gguf",
184184
"downloadedBytes": 235619714,
185185
"id": "model.gguf",
186-
"localPath": "/Users/jamesnguyen/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q2-k/model.gguf"
186+
"localPath": "/Users/user_name/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q2-k/model.gguf"
187187
},
188188
{
189189
"bytes": 562,
190190
"checksum": "N/A",
191191
"downloadUrl": "https://huggingface.co/cortexso/tinyllama/resolve/1b-gguf-q2-k/model.yml",
192192
"downloadedBytes": 562,
193193
"id": "model.yml",
194-
"localPath": "/Users/jamesnguyen/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q2-k/model.yml"
194+
"localPath": "/Users/user_name/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q2-k/model.yml"
195195
}
196196
],
197197
"type": "Model"
@@ -215,23 +215,23 @@ The DownloadSuccess event indicates that all items in the download task have bee
215215
"downloadUrl": "https://huggingface.co/cortexso/tinyllama/resolve/1b-gguf-q2-k/metadata.yml",
216216
"downloadedBytes": 0,
217217
"id": "metadata.yml",
218-
"localPath": "/Users/jamesnguyen/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q2-k/metadata.yml"
218+
"localPath": "/Users/user_name/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q2-k/metadata.yml"
219219
},
220220
{
221221
"bytes": 0,
222222
"checksum": "N/A",
223223
"downloadUrl": "https://huggingface.co/cortexso/tinyllama/resolve/1b-gguf-q2-k/model.gguf",
224224
"downloadedBytes": 0,
225225
"id": "model.gguf",
226-
"localPath": "/Users/jamesnguyen/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q2-k/model.gguf"
226+
"localPath": "/Users/user_name/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q2-k/model.gguf"
227227
},
228228
{
229229
"bytes": 0,
230230
"checksum": "N/A",
231231
"downloadUrl": "https://huggingface.co/cortexso/tinyllama/resolve/1b-gguf-q2-k/model.yml",
232232
"downloadedBytes": 0,
233233
"id": "model.yml",
234-
"localPath": "/Users/jamesnguyen/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q2-k/model.yml"
234+
"localPath": "/Users/user_name/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q2-k/model.yml"
235235
}
236236
],
237237
"type": "Model"
@@ -249,14 +249,13 @@ When clients have models that are not inside the Cortex data folder and wish to
249249
Use the following command to import a local model using the CLI:
250250

251251
```sh
252-
cortex models import --model_id my-tinyllama --model_path /Users/jamesnguyen/cortexcpp/models/cortex.so/tinyllama
253-
/1b-gguf/model.gguf
252+
cortex models import --model_id my-tinyllama --model_path /Users/user_name/cortexcpp/models/cortex.so/tinyllama/1b-gguf/model.gguf
254253
```
255254

256255
Response:
257256

258257
```sh
259-
Successfully import model from '/Users/jamesnguyen/cortexcpp/models/cortex.so/tinyllama/1b-gguf/model.gguf' for modeID 'my-tinyllama'.
258+
Successfully import model from '/Users/user_name/cortexcpp/models/cortex.so/tinyllama/1b-gguf/model.gguf' for modeID 'my-tinyllama'.
260259
```
261260

262261
#### via HTTP API

docs/static/openapi/cortex.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -308,7 +308,7 @@
308308
"downloadUrl": "https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q3_K_L.gguf",
309309
"downloadedBytes": 0,
310310
"id": "TheBloke:Mistral-7B-Instruct-v0.1-GGUF:mistral-7b-instruct-v0.1.Q3_K_L.gguf",
311-
"localPath": "/Users/jamesnguyen/cortexcpp/models/huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/mistral-7b-instruct-v0.1.Q3_K_L.gguf"
311+
"localPath": "/Users/user_name/cortexcpp/models/huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/mistral-7b-instruct-v0.1.Q3_K_L.gguf"
312312
}
313313
],
314314
"type": "Model"

0 commit comments

Comments
 (0)