Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

Commit 92aefc2

Browse files
committed
add(self extend): add self extend documentation
1 parent a92ec6b commit 92aefc2

File tree

2 files changed

+34
-0
lines changed

2 files changed

+34
-0
lines changed

docs/docs/features/grammar.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
title: GBNF Grammar
3+
description: What Nitro supports
4+
keywords: [Nitro, Jan, fast inference, inference server, local AI, large language model, OpenAI compatible, open source, llama]
5+
---

docs/docs/features/self-extend.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
---
2+
title: Self extend
3+
description: Self-Extend LLM Context Window Without Tuning
4+
keywords: [long context, longlm, Nitro, Jan, fast inference, inference server, local AI, large language model, OpenAI compatible, open source, llama]
5+
---
6+
7+
## Enhancing LLMs with Self-Extend
8+
Self-Extend offers an innovative approach to increase the context window of Large Language Models (LLMs) without the usual need for re-tuning. This method adapts the attention mechanism during the inference phase and eliminates the necessity for additional training or fine-tuning.
9+
10+
For in-depth technical insights, refer to their research [paper](https://arxiv.org/pdf/2401.01325.pdf).
11+
12+
## Activating Self-Extend for LLMs
13+
14+
To activate the Self-Extend feature while loading your model, use the following command:
15+
16+
```bash title="Enable Self-Extend" {6,7}
17+
curl http://localhost:3928/inferences/llamacpp/loadmodel \
18+
-H 'Content-Type: application/json' \
19+
-d '{
20+
"llama_model_path": "/path/to/your_model.gguf",
21+
"ctx_len": 8192,
22+
"grp_attn_n": 4,
23+
"grp_attn_w": 2048,
24+
}'
25+
```
26+
27+
**Note:**
28+
- For optimal performance, `grp_attn_w` should be as large as possible, but smaller than the training context length.
29+
- Setting `grp_attn_n` between 2 to 4 is recommended for peak efficiency. Higher values may result in increased incoherence in output.

0 commit comments

Comments
 (0)