diff --git a/playbooks/supplemental/openclaw-lemonade-server/README.md b/playbooks/supplemental/openclaw-lemonade-server/README.md
index 53a539f4..f3d2f59e 100644
--- a/playbooks/supplemental/openclaw-lemonade-server/README.md
+++ b/playbooks/supplemental/openclaw-lemonade-server/README.md
@@ -23,7 +23,7 @@ By the end of this playbook you will be able to:
- Learn about **Lemonade Server**
- **Install OpenClaw** and **point it at Lemonade Server** as its AI backend.
- **Start the OpenClaw gateway** and confirm your agent is ready to work.
-- **Connect a Discord bot** to your agent so you can chat with it from any device.
+- **Connect a communication channel** (Discord or Telegram) so you can chat with your agent from any device.
---
@@ -54,15 +54,28 @@ By the end of this playbook you will be able to:
---
-## Configuring Context Size
+## Pull and Load the Recommended Model
-For agent workloads, a larger context window lets the model keep more of the task history, tool outputs, and reasoning steps in view at once. Set this once after the server is running:
+The recommended model for this playbook is **Qwen3.6-35B-A3B-GGUF**, a strong MoE model with a 263k-token context window that is well-suited to agent workloads. Pull it now:
```bash
-lemonade config set ctx_size=190000
+lemonade pull Qwen3.6-35B-A3B-GGUF
```
-This takes effect for newly loaded models. A context of 190000 tokens is a reasonable floor for agent use; increase it if your model and available RAM support it.
+Then load it with a large context window and save that setting for future runs:
+
+```bash
+lemonade load Qwen3.6-35B-A3B-GGUF --ctx-size 262144 --save-options
+```
+
+The model has a default context length of 262,144 tokens. If you encounter out-of-memory (OOM) errors, consider reducing the context window. However, because Qwen3.6 leverages extended context for complex tasks, we advise maintaining a context length of at least 128K tokens to preserve thinking capabilities.
+
+> **Tip: Disable thinking for faster agent responses:** Qwen3.6-35B-A3B runs in thinking mode by default, which adds latency before each response. For agent loops this overhead accumulates quickly. The [lemonade-sdk/recipes](https://github.com/lemonade-sdk/recipes/blob/main/coding-agents/Qwen3.6-35B-A3B-NoThinking.json) repo provides a ready-made config that disables thinking. To use it, download the file and import it:
+>
+> ```bash
+> curl -LO https://raw.githubusercontent.com/lemonade-sdk/recipes/main/coding-agents/Qwen3.6-35B-A3B-NoThinking.json
+> lemonade import Qwen3.6-35B-A3B-NoThinking.json
+> ```
---
@@ -160,22 +173,15 @@ curl -fsSL https://openclaw.ai/install.sh | bash -s -- --no-prompt --no-onboard
The `--no-onboard` flag skips the interactive setup wizard, you will configure the model backend manually in the next step, which gives you precise control over which model and server are used.
-After installation, confirm `openclaw` is on your `PATH`:
+Open a new terminal and confirm the installation:
```bash
-export PATH="$HOME/.npm-global/bin:$HOME/.local/bin:$PATH"
openclaw --version
```
-To persist this across terminal sessions:
-
-```bash
-echo 'export PATH="$HOME/.npm-global/bin:$HOME/.local/bin:$PATH"' >> ~/.bashrc
-```
-
### Configure OpenClaw to Use Lemonade
-Run OpenClaw's non-interactive onboarding, replacing `YOUR_MODEL_ID` with your Lemonade Model ID. Use the plain name (e.g., `Qwen3.5-35B-A3B-GGUF`) for catalog models, or the `user.` prefixed name (e.g., `user.Qwen3.6-35B-A3B-UD-Q4_K_M`) for custom imported ones:
+Run OpenClaw's non-interactive onboarding.
```bash
openclaw onboard \
@@ -183,7 +189,7 @@ openclaw onboard \
--mode local \
--auth-choice custom-api-key \
--custom-base-url "http://127.0.0.1:13305/api/v1" \
- --custom-model-id "YOUR_MODEL_ID" \
+ --custom-model-id "Qwen3.6-35B-A3B-GGUF" \
--custom-provider-id "lemonade" \
--custom-compatibility "openai" \
--custom-api-key "lemonade" \
@@ -203,7 +209,7 @@ openclaw onboard \
--mode local \
--auth-choice custom-api-key \
--custom-base-url "http://$WINDOWS_HOST:13305/api/v1" \
- --custom-model-id "YOUR_MODEL_ID" \
+ --custom-model-id "Qwen3.6-35B-A3B-GGUF" \
--custom-provider-id "lemonade" \
--custom-compatibility "openai" \
--custom-api-key "lemonade" \
@@ -217,6 +223,84 @@ openclaw onboard \
This command writes OpenClaw's configuration to `~/.openclaw/openclaw.json`.
+> **OpenClaw context window sizing:** OpenClaw's compaction triggers when `contextTokens > contextWindow − reserveTokens`. The default `reserveTokensFloor` is 20,000 tokens, a floor that overrides `reserveTokens` when lower, so any model context below ~37k will trigger an infinite compaction loop. Set a low reserve and disable the floor once in your config and it applies to every model, no per-model tuning needed:
+>
+> ```json
+> "compaction": {
+> "reserveTokens": 4096,
+> "reserveTokensFloor": 0
+> }
+> ```
+>
+> `reserveTokensFloor` is a *floor* (minimum guard), not the reserve itself, setting only the floor has no effect. `reserveTokensFloor: 0` disables the guard so the lower `reserveTokens` is accepted.
+>
+> **When to apply this:** Use this config if your model's effective context window is below ~37k, either because the model is small (e.g. 8k, 16k, 32k) or because you've intentionally capped it to a lower value (e.g. loading a 128k model but setting context to 16k in Lemonade). Without it, OpenClaw enters an infinite compaction loop on startup.
+>
+> **Large-context models at full context:** You can skip this entirely. The defaults work fine, compaction will kick in well before the window fills and the model has ample room to generate long responses. If you do apply it, be aware that `reserveTokens: 4096` limits response length to ~4k tokens, which may cut off long file generation or detailed plans.
+>
+> **Where to add this:** Place the `compaction` block inside `agents.defaults` in your `openclaw.json` (usually at `~/.openclaw/openclaw.json`):
+>
+> ```json
+> {
+> "agents": {
+> "defaults": {
+> "workspace": "/home/
+openclaw pairing approve discord
```
> Pairing codes expire after one hour.
@@ -322,12 +434,54 @@ You can now chat with your agent directly from Discord and offload tasks to your
---
+### Option B: Telegram
+
+Telegram is simpler than Discord for most users, it requires no server and no admin access.
+
+#### Create a Telegram bot
+
+1. Open Telegram and message **@BotFather**.
+2. Send `/newbot` and follow the prompts. Save the bot token it gives you.
+
+#### Configure OpenClaw for Telegram
+
+Store the token as an environment variable:
+
+```bash
+export TELEGRAM_BOT_TOKEN="YOUR_BOT_TOKEN"
+```
+
+Add the channel configuration to `~/.openclaw/openclaw.json` (or patch it via the dashboard):
+
+```json
+{
+ "channels": {
+ "telegram": {
+ "enabled": true,
+ "botToken": "YOUR_BOT_TOKEN",
+ "dmPolicy": "pairing"
+ }
+ }
+}
+```
+
+Restart the gateway, then send your bot any message in Telegram. Approve the pairing:
+
+```bash
+openclaw pairing list telegram
+openclaw pairing approve telegram
+```
+
+Pairing codes expire after one hour. You can now chat with your agent via Telegram DM.
+
+---
+
## Next Steps
Now that your agent can receive commands from your phone and act on your local machine, here are three directions worth exploring:
-1. **Stock market summarizer**: Schedule OpenClaw to fetch data from financial APIs on a fixed interval, summarize the day's movements with your local model, and push a digest to your Discord DM each morning.
+1. **Stock market summarizer**: Schedule OpenClaw to fetch data from financial APIs on a fixed interval, summarize the day's movements with your local model, and push a digest to your phone each morning via your chosen channel.
-2. **Fine-tuning monitor**: Kick off a training job remotely via Discord, then have the agent tail the training log and report periodic loss values, GPU utilization, and disk usage back to your phone. If the run stalls or VRAM spikes, you find out immediately without needing to be at the machine.
+2. **Fine-tuning monitor**: Kick off a training job remotely via Telegram or Discord, then have the agent tail the training log and report periodic loss values, GPU utilization, and disk usage back to your phone. If the run stalls or VRAM spikes, you find out immediately without needing to be at the machine.
3. **IOT with a local VLM**: Point a camera at your front door, run a vision model on Lemonade, and have OpenClaw analyze frames on demand or on a trigger. Ask "did any packages arrive today?" from your phone and get a straight answer from your own hardware.
diff --git a/playbooks/supplemental/openclaw-lemonade-server/assets/openclaw_dashboard.png b/playbooks/supplemental/openclaw-lemonade-server/assets/openclaw_dashboard.png
index e033f03c..95cb1782 100644
Binary files a/playbooks/supplemental/openclaw-lemonade-server/assets/openclaw_dashboard.png and b/playbooks/supplemental/openclaw-lemonade-server/assets/openclaw_dashboard.png differ
diff --git a/playbooks/supplemental/openclaw-lemonade-server/playbook.json b/playbooks/supplemental/openclaw-lemonade-server/playbook.json
index 983cf54c..90612dfd 100644
--- a/playbooks/supplemental/openclaw-lemonade-server/playbook.json
+++ b/playbooks/supplemental/openclaw-lemonade-server/playbook.json
@@ -25,6 +25,38 @@
"halo": [
"windows",
"linux"
+ ],
+ "stx": [
+ "linux",
+ "windows"
+ ],
+ "krk": [
+ "linux",
+ "windows"
+ ]
+ },
+ "required_platforms": {
+ "halo": [
+ "windows",
+ "linux"
+ ],
+ "halo_box": [
+ "windows",
+ "linux"
+ ]
+ },
+ "published_platforms": {
+ "halo": [
+ "windows",
+ "linux"
+ ],
+ "stx": [
+ "linux",
+ "windows"
+ ],
+ "krk": [
+ "linux",
+ "windows"
]
},
"difficulty": "intermediate",