Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

Commit 7cf62d5

Browse files
Fix/gemma2 chat template (#1657)
* chore/embeddings-docs * chore: add embedding capabilities * chore: remove un developed file * fix: gemma2 chat template renderer * format code
1 parent 5ec3a59 commit 7cf62d5

File tree

3 files changed

+807
-166
lines changed

3 files changed

+807
-166
lines changed
Lines changed: 99 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,103 @@
11
---
22
title: Embeddings
33
---
4-
54
:::info
6-
🚧 Cortex is currently under development, and this page is a stub for future development.
7-
:::
5+
🚧 Cortex is currently under development, and this page is a stub for future development.
6+
:::
7+
8+
cortex.cpp now support embeddings endpoint with fully OpenAI compatible.
9+
10+
For embeddings API usage please refer to [API references](/api-reference#tag/chat/POST/v1/embeddings). This tutorial show you how to use embeddings in cortex with openai python SDK.
11+
12+
## Embedding with openai compatible
13+
14+
### 1. Start server and run model
15+
16+
```
17+
cortex run llama3.1:8b-gguf-q4-km
18+
```
19+
20+
### 2. Create script `embeddings.py` with this content
21+
22+
```
23+
from datetime import datetime
24+
from openai import OpenAI
25+
from pydantic import BaseModel
26+
ENDPOINT = "http://localhost:39281/v1"
27+
MODEL = "llama3.1:8bb-gguf-q4-km"
28+
client = OpenAI(
29+
base_url=ENDPOINT,
30+
api_key="not-needed"
31+
)
32+
```
33+
34+
### 3. Create embeddings
35+
36+
```
37+
response = client.embeddings.create(input = "embedding", model=MODEL, encoding_format="base64")
38+
print(response)
39+
```
40+
41+
The reponse will be like this
42+
43+
```
44+
CreateEmbeddingResponse(
45+
data=[
46+
Embedding(
47+
embedding='hjuAPOD8TryuPU8...',
48+
index=0,
49+
object='embedding'
50+
)
51+
],
52+
model='meta-llama3.1-8b-instruct',
53+
object='list',
54+
usage=Usage(
55+
prompt_tokens=2,
56+
total_tokens=2
57+
)
58+
)
59+
```
60+
61+
62+
The output embeddings is encoded as base64 string. Default the model will output the embeddings in float mode.
63+
64+
```
65+
response = client.embeddings.create(input = "embedding", model=MODEL)
66+
print(response)
67+
```
68+
69+
Result will be
70+
71+
```
72+
CreateEmbeddingResponse(
73+
data=[
74+
Embedding(
75+
embedding=[0.1, 0.3, 0.4 ....],
76+
index=0,
77+
object='embedding'
78+
)
79+
],
80+
model='meta-llama3.1-8b-instruct',
81+
object='list',
82+
usage=Usage(
83+
prompt_tokens=2,
84+
total_tokens=2
85+
)
86+
)
87+
```
88+
89+
Cortex also supports all input types as [OpenAI](https://platform.openai.com/docs/api-reference/embeddings/create#embeddings-create-input).
90+
91+
```sh
92+
# input as string
93+
response = client.embeddings.create(input = "embedding", model=MODEL)
94+
95+
# input as array of string
96+
response = client.embeddings.create(input = ["embedding"], model=MODEL)
97+
98+
# input as array of tokens
99+
response = client.embeddings.create(input = [12,44,123], model=MODEL)
100+
101+
# input as array of arrays contain tokens
102+
response = client.embeddings.create(input = [[912,312,54],[12,433,1241]], model=MODEL)
103+
```

0 commit comments

Comments
 (0)