Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

Commit 8961a0d

Browse files
authored
Merge pull request #1623 from janhq/j/update-model-pull-docs
chore: model pull docs
2 parents e3acf66 + d9b0856 commit 8961a0d

File tree

1 file changed

+342
-2
lines changed

1 file changed

+342
-2
lines changed

docs/docs/hub/index.mdx

Lines changed: 342 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,353 @@
22
slug: /model-sources
33
title: Model Sources
44
---
5-
import DocCardList from '@theme/DocCardList';
65

6+
import DocCardList from "@theme/DocCardList";
77

88
:::warning
99
🚧 Cortex.cpp is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.
1010
:::
1111

12-
Cortex.cpp allows users to pull models from multiple repositories, offering flexibility and extensive model access. Here are the supported repositories:
12+
# Pulling Models in Cortex
13+
14+
Cortex provides a streamlined way to pull (download) machine learning models from Hugging Face and other third-party sources, as well as import models from local storage. This functionality allows users to easily access a variety of pre-trained models to enhance their applications.
15+
16+
## Features
17+
18+
- **Model Retrieval**: Download models directly from Hugging Face or third-party repositories.
19+
- **Local Import**: Import models stored on your local machine.
20+
- **User-Friendly Interface**: Access models through a Command Line Interface (CLI) or an HTTP API.
21+
- **Model Selection**: Choose your desired model from a provided selection menu in the CLI.
22+
23+
## Usage
24+
25+
### Pulling Models via CLI
26+
27+
1. **Open the CLI**: Launch the Cortex CLI on your terminal.
28+
2. **Select Model**: Use the selection menu to browse available models.
29+
- Enter the corresponding number for your desired model quant.
30+
3. **Provide Repository Handle**: Input the repository handle (e.g., `username/repo_name` for Hugging Face) when prompted.
31+
4. **Download Model**: Cortex will handle the download process automatically.
32+
33+
For pulling models from [Cortex model registry](https://huggingface.co/cortexso), simply type `cortex pull <model_name>` to your terminal.
34+
35+
```sh
36+
cortex pull tinyllama
37+
Downloaded models:
38+
tinyllama:1b-gguf
39+
40+
Available to download:
41+
1. tinyllama:1b-gguf-q2-k
42+
2. tinyllama:1b-gguf-q3-kl
43+
3. tinyllama:1b-gguf-q3-km
44+
4. tinyllama:1b-gguf-q3-ks
45+
5. tinyllama:1b-gguf-q4-km
46+
6. tinyllama:1b-gguf-q4-ks
47+
7. tinyllama:1b-gguf-q5-km
48+
8. tinyllama:1b-gguf-q5-ks
49+
9. tinyllama:1b-gguf-q6-k
50+
10. tinyllama:1b-gguf-q8-0
51+
11. tinyllama:gguf
52+
53+
Select a model (1-11):
54+
```
55+
56+
#### Pulling models with repository handle
57+
58+
When user want to pull a model which is not ready in [Cortex model registry](https://huggingface.co/cortexso), user can provide the repository handle to Cortex.
59+
60+
For example, we can pull model from [QuantFactory-FinanceLlama3](https://huggingface.co/QuantFactory/finance-Llama3-8B-GGUF) by enter to terminal `cortex pull QuantFactory/finance-Llama3-8B-GGUF`.
61+
62+
```sh
63+
cortex pull QuantFactory/finance-Llama3-8B-GGUF
64+
Select an option
65+
1. finance-Llama3-8B.Q2_K.gguf
66+
2. finance-Llama3-8B.Q3_K_L.gguf
67+
3. finance-Llama3-8B.Q3_K_M.gguf
68+
4. finance-Llama3-8B.Q3_K_S.gguf
69+
5. finance-Llama3-8B.Q4_0.gguf
70+
6. finance-Llama3-8B.Q4_1.gguf
71+
7. finance-Llama3-8B.Q4_K_M.gguf
72+
8. finance-Llama3-8B.Q4_K_S.gguf
73+
9. finance-Llama3-8B.Q5_0.gguf
74+
10. finance-Llama3-8B.Q5_1.gguf
75+
11. finance-Llama3-8B.Q5_K_M.gguf
76+
12. finance-Llama3-8B.Q5_K_S.gguf
77+
13. finance-Llama3-8B.Q6_K.gguf
78+
14. finance-Llama3-8B.Q8_0.gguf
79+
80+
Select an option (1-14):
81+
```
82+
83+
#### Pulling models with direct url
84+
85+
Clients can pull models directly using a URL. This allows for the direct download of models from a specified location without additional configuration.
86+
87+
```sh
88+
cortex pull https://huggingface.co/QuantFactory/OpenMath2-Llama3.1-8B-GGUF/blob/main/OpenMath2-Llama3.1-8B.Q4_0.gguf
89+
Validating download items, please wait..
90+
Start downloading..
91+
QuantFactory:OpenMat 0%[==================================================] [00m:00s] 3.98 MB/0.00 B
92+
```
93+
94+
### Pulling Models via HTTP API
95+
96+
To pull a model using the HTTP API, make a `POST` request to the following endpoint:
97+
98+
```sh
99+
curl --request POST \
100+
--url http://localhost:39281/v1/models/pull \
101+
--header 'Content-Type: application/json' \
102+
--data '{
103+
"model": "tinyllama:gguf"
104+
}'
105+
```
106+
107+
#### Notes
108+
109+
- Ensure that you have an active internet connection for pulling models from external repositories.
110+
- For local model imports, specify the path to the model in your CLI command or API request.
111+
112+
### Observing download progress
113+
114+
Unlike the CLI, where users can observe the download progress directly in the terminal, the HTTP API must be asynchronous. Therefore, clients can monitor the download progress by listening to the Event WebSocket API at `ws://127.0.0.1:39281/events`.
115+
116+
#### Download started event
117+
118+
- `DownloadStarted` event will be emitted when the download starts. It will contain the `DownloadTask` object. Each `DownloadTask` will have an unique `id`, along with a type of downloading (e.g. Model, Engine, etc.).
119+
- `DownloadTask`'s `id` will be required when client wants to stop a downloading task.
120+
121+
```json
122+
{
123+
"task": {
124+
"id": "tinyllama:1b-gguf-q2-k",
125+
"items": [
126+
{
127+
"bytes": 0,
128+
"checksum": "N/A",
129+
"downloadUrl": "https://huggingface.co/cortexso/tinyllama/resolve/1b-gguf-q2-k/metadata.yml",
130+
"downloadedBytes": 0,
131+
"id": "metadata.yml",
132+
"localPath": "/Users/jamesnguyen/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q2-k/metadata.yml"
133+
},
134+
{
135+
"bytes": 0,
136+
"checksum": "N/A",
137+
"downloadUrl": "https://huggingface.co/cortexso/tinyllama/resolve/1b-gguf-q2-k/model.gguf",
138+
"downloadedBytes": 0,
139+
"id": "model.gguf",
140+
"localPath": "/Users/jamesnguyen/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q2-k/model.gguf"
141+
},
142+
{
143+
"bytes": 0,
144+
"checksum": "N/A",
145+
"downloadUrl": "https://huggingface.co/cortexso/tinyllama/resolve/1b-gguf-q2-k/model.yml",
146+
"downloadedBytes": 0,
147+
"id": "model.yml",
148+
"localPath": "/Users/jamesnguyen/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q2-k/model.yml"
149+
}
150+
],
151+
"type": "Model"
152+
},
153+
"type": "DownloadStarted"
154+
}
155+
```
156+
157+
#### Download updated event
158+
159+
- `DownloadUpdated` event will be emitted when the download is in progress. It will contain the `DownloadTask` object. Each `DownloadTask` will have an unique `id`, along with a type of downloading (e.g. Model, Engine, etc.).
160+
- A `DownloadTask` will have a list of `DownloadItem`s. Each `DownloadItem` will have the following properties:
161+
- `id`: the id of the download item.
162+
- `bytes`: the total size of the download item.
163+
- `downloadedBytes`: the number of bytes that have been downloaded so far.
164+
- `checksum`: the checksum of the download item.
165+
- Client can use the `downloadedBytes` and `bytes` properties to calculate the download progress.
166+
167+
```json
168+
{
169+
"task": {
170+
"id": "tinyllama:1b-gguf-q2-k",
171+
"items": [
172+
{
173+
"bytes": 58,
174+
"checksum": "N/A",
175+
"downloadUrl": "https://huggingface.co/cortexso/tinyllama/resolve/1b-gguf-q2-k/metadata.yml",
176+
"downloadedBytes": 58,
177+
"id": "metadata.yml",
178+
"localPath": "/Users/jamesnguyen/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q2-k/metadata.yml"
179+
},
180+
{
181+
"bytes": 432131456,
182+
"checksum": "N/A",
183+
"downloadUrl": "https://huggingface.co/cortexso/tinyllama/resolve/1b-gguf-q2-k/model.gguf",
184+
"downloadedBytes": 235619714,
185+
"id": "model.gguf",
186+
"localPath": "/Users/jamesnguyen/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q2-k/model.gguf"
187+
},
188+
{
189+
"bytes": 562,
190+
"checksum": "N/A",
191+
"downloadUrl": "https://huggingface.co/cortexso/tinyllama/resolve/1b-gguf-q2-k/model.yml",
192+
"downloadedBytes": 562,
193+
"id": "model.yml",
194+
"localPath": "/Users/jamesnguyen/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q2-k/model.yml"
195+
}
196+
],
197+
"type": "Model"
198+
},
199+
"type": "DownloadUpdated"
200+
}
201+
```
202+
203+
#### Download success event
204+
205+
The DownloadSuccess event indicates that all items in the download task have been successfully downloaded. This event provides details about the download task and its items, including their IDs, download URLs, local paths, and other properties. In this event, the bytes and downloadedBytes properties for each item are set to 0, signifying that the download is complete and no further bytes are pending.
206+
207+
```json
208+
{
209+
"task": {
210+
"id": "tinyllama:1b-gguf-q2-k",
211+
"items": [
212+
{
213+
"bytes": 0,
214+
"checksum": "N/A",
215+
"downloadUrl": "https://huggingface.co/cortexso/tinyllama/resolve/1b-gguf-q2-k/metadata.yml",
216+
"downloadedBytes": 0,
217+
"id": "metadata.yml",
218+
"localPath": "/Users/jamesnguyen/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q2-k/metadata.yml"
219+
},
220+
{
221+
"bytes": 0,
222+
"checksum": "N/A",
223+
"downloadUrl": "https://huggingface.co/cortexso/tinyllama/resolve/1b-gguf-q2-k/model.gguf",
224+
"downloadedBytes": 0,
225+
"id": "model.gguf",
226+
"localPath": "/Users/jamesnguyen/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q2-k/model.gguf"
227+
},
228+
{
229+
"bytes": 0,
230+
"checksum": "N/A",
231+
"downloadUrl": "https://huggingface.co/cortexso/tinyllama/resolve/1b-gguf-q2-k/model.yml",
232+
"downloadedBytes": 0,
233+
"id": "model.yml",
234+
"localPath": "/Users/jamesnguyen/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q2-k/model.yml"
235+
}
236+
],
237+
"type": "Model"
238+
},
239+
"type": "DownloadSuccess"
240+
}
241+
```
242+
243+
### Importing local-models
244+
245+
When clients have models that are not inside the Cortex data folder and wish to run them inside Cortex, they can import local models using either the CLI or the HTTP API.
246+
247+
#### via CLI
248+
249+
Use the following command to import a local model using the CLI:
250+
251+
```sh
252+
cortex models import --model_id my-tinyllama --model_path /Users/jamesnguyen/cortexcpp/models/cortex.so/tinyllama
253+
/1b-gguf/model.gguf
254+
```
255+
256+
Response:
257+
258+
```sh
259+
Successfully import model from '/Users/jamesnguyen/cortexcpp/models/cortex.so/tinyllama/1b-gguf/model.gguf' for modeID 'my-tinyllama'.
260+
```
261+
262+
#### via HTTP API
263+
264+
Use the following curl command to import a local model using the HTTP API:
265+
266+
```sh
267+
curl --request POST \
268+
--url http://127.0.0.1:39281/v1/models/import \
269+
--header 'Content-Type: application/json' \
270+
--data '{
271+
"model": "model-id",
272+
"modelPath": "absolute/path/to/gguf",
273+
"name": "model display name"
274+
}'
275+
```
276+
277+
### Aborting Download Task
278+
279+
Clients can abort a downloading task using the task ID. Below is a sample `curl` command to abort a download task:
280+
281+
```sh
282+
curl --location --request DELETE 'http://127.0.0.1:3928/models/pull' \
283+
--header 'Content-Type: application/json' \
284+
--data '{
285+
"taskId": "tinyllama:1b-gguf-q2-k"
286+
}'
287+
```
288+
289+
An event with type `DownloadStopped` will be emitted when the task is successfully aborted.
290+
291+
### Listing local-available models via CLI
292+
293+
You can list your ready-to-use models via CLI using `cortex models list` command.
294+
295+
```sh
296+
cortex models list
297+
+---------+-------------------+
298+
| (Index) | ID |
299+
+---------+-------------------+
300+
| 1 | tinyllama:1b-gguf |
301+
+---------+-------------------+
302+
```
303+
304+
For more options, use `cortex models list --help` command.
305+
306+
```sh
307+
cortex models list -h
308+
List all local models
309+
Usage:
310+
cortex models [options] [subcommand]
311+
312+
Positionals:
313+
filter TEXT Filter model id
314+
315+
Options:
316+
-h,--help Print this help message and exit
317+
-e,--engine Display engine
318+
-v,--version Display version
319+
```
320+
321+
### Listing local-available models via HTTP API
322+
323+
This section describes how to list all models that are available locally on your system using the HTTP API. By making a GET request to the specified endpoint, you can retrieve a list of models along with their details, such as model ID, name, file paths, engine type, and version. This is useful for managing and verifying the models you have downloaded and are ready to use in your local environment.
324+
325+
```sh
326+
curl --request GET \
327+
--url http://127.0.0.1:39281/v1/models
328+
329+
```
330+
331+
Response:
332+
333+
```json
334+
{
335+
"data": [
336+
{
337+
"model": "tinyllama:1b-gguf",
338+
"name": "tinyllama",
339+
"files": [
340+
"models/cortex.so/tinyllama/1b-gguf/model.gguf"
341+
],
342+
"engine": "llama-cpp",
343+
"version": "1",
344+
# Omit some configuration parameters
345+
}
346+
],
347+
"object": "list",
348+
"result": "OK"
349+
}
350+
```
351+
352+
With Cortex, pulling and managing models is simplified, allowing you to focus more on building your applications!
13353

14354
<DocCardList />

0 commit comments

Comments
 (0)