Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ Only write entries that are worth mentioning to users.

## Unreleased

- Core: Add `compaction_trigger_ratio` config option (default `0.85`) to control when auto-compaction triggers — compaction now fires when context usage reaches the configured ratio or when remaining space falls below `reserved_context_size`, whichever comes first
- Core: Support custom instructions in `/compact` command (e.g., `/compact keep database discussions`) to guide what the compaction preserves
- Web: Add URL action parameters (`?action=create` to open create-session dialog, `?action=create-in-dir&workDir=xxx` to create a session directly) for external integrations, and support Cmd/Ctrl+Click on new-session buttons to open session creation in a new browser tab
- Web: Add todo list display in prompt toolbar — shows task progress with expandable panel when the `SetTodoList` tool is active
- ACP: Add authentication check for session operations with `AUTH_REQUIRED` error responses for terminal-based login flow
Expand Down
2 changes: 2 additions & 0 deletions docs/en/configuration/config-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ max_steps_per_turn = 100
max_retries_per_step = 3
max_ralph_iterations = 0
reserved_context_size = 50000
compaction_trigger_ratio = 0.85

[services.moonshot_search]
base_url = "https://api.kimi.com/coding/v1/search"
Expand Down Expand Up @@ -123,6 +124,7 @@ capabilities = ["thinking", "image_in"]
| `max_retries_per_step` | `integer` | `3` | Maximum retries per step |
| `max_ralph_iterations` | `integer` | `0` | Extra iterations after each user message; `0` disables; `-1` is unlimited |
| `reserved_context_size` | `integer` | `50000` | Reserved token count for LLM response generation; auto-compaction triggers when `context_tokens + reserved_context_size >= max_context_size` |
| `compaction_trigger_ratio` | `float` | `0.85` | Context usage ratio threshold for auto-compaction (0.5–0.99); auto-compaction triggers when `context_tokens >= max_context_size * compaction_trigger_ratio`, whichever condition is met first with `reserved_context_size` |

### `services`

Expand Down
6 changes: 3 additions & 3 deletions docs/en/customization/wire-mode.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ interface JSONRPCError {
Added in Wire 1.1. Legacy clients can skip this request and send `prompt` directly.
:::

- **Direction**: clientagent
- **Direction**: ClientAgent
- **Type**: Request (requires response)

Optional handshake request for negotiating protocol version, submitting external tool definitions, and retrieving the slash command list.
Expand Down Expand Up @@ -330,7 +330,7 @@ If no turn is in progress:

### `event`

- **Direction**: agentclient
- **Direction**: AgentClient
- **Type**: Notification (no response needed)

Events emitted by the agent during a turn. No `id` field, client doesn't need to respond.
Expand All @@ -351,7 +351,7 @@ interface EventParams {

### `request`

- **Direction**: agentclient
- **Direction**: AgentClient
- **Type**: Request (requires response)

Requests from the agent to the client, used for approval confirmation or external tool calls. The client must respond before the agent can continue execution.
Expand Down
2 changes: 1 addition & 1 deletion docs/en/guides/interaction.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ Each question supports 2–4 predefined options, and the AI will set appropriate
The AI only uses this tool when your choice genuinely affects subsequent actions. For decisions that can be inferred from context, the AI will decide on its own and continue execution.
:::

## Approvals
## Approvals and confirmations

When the AI needs to perform operations that may have an impact (such as modifying files or running commands), Kimi Code CLI will request your confirmation.

Expand Down
6 changes: 6 additions & 0 deletions docs/en/guides/sessions.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,12 @@ Enter `/compact` to have the AI summarize the current conversation and replace t
/compact
```

You can also append custom instructions after the command to tell the AI what content to prioritize preserving during compaction:

```
/compact keep the database-related discussion
```

Compacting preserves key information while reducing token consumption. This is useful when the conversation is long but you still want to retain some context.

::: tip
Expand Down
1 change: 1 addition & 0 deletions docs/en/reference/kimi-web.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,7 @@ Web UI provides a unified prompt toolbar above the input box, displaying various
- **Activity status**: Shows the current agent state (processing, waiting for approval, etc.)
- **Message queue**: Queue follow-up messages while the AI is processing; queued messages are sent automatically when the current response completes
- **File changes**: Detects Git repository status, showing the number of new, modified, and deleted files (including untracked files). Click to view a detailed list of changes
- **Todo list**: When the `SetTodoList` tool is active, shows task progress with support for expanding to view the detailed list

::: info Changed
Git diff status bar added in version 1.5. Activity status indicator added in version 1.9. Version 1.10 unified it into the prompt toolbar. Version 1.11 moved the context usage indicator to the prompt toolbar.
Expand Down
6 changes: 3 additions & 3 deletions docs/en/reference/slash-commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ This command is only available when using the default configuration file. If a c

### `/editor`

Set the default external editor. When called without arguments, displays an interactive selection interface; you can also specify the editor command directly, e.g., `/editor vim`. After configuration, pressing `Ctrl-O` will open this editor to edit the current input content. See [Keyboard shortcuts](./keyboard.md#external-editor) for details.
Set the external editor. When called without arguments, displays an interactive selection interface; you can also specify the editor command directly, e.g., `/editor vim`. After configuration, pressing `Ctrl-O` will open this editor to edit the current input content. See [Keyboard shortcuts](./keyboard.md#external-editor) for details.

### `/reload`

Expand All @@ -82,7 +82,7 @@ Debug information is displayed in a pager, press `q` to exit.

Display API usage and quota information, showing quota usage with progress bars and remaining percentages.

Aliases: `/status`
Alias: `/status`

::: tip
This command only works with the Kimi Code platform.
Expand Down Expand Up @@ -118,7 +118,7 @@ Alias: `/reset`

### `/compact`

Manually compact the context to reduce token usage.
Manually compact the context to reduce token usage. You can append custom instructions after the command to tell the AI which information to prioritize preserving during compaction, e.g., `/compact preserve database-related discussions`.

When the context is too long, Kimi Code CLI will automatically trigger compaction. This command allows manually triggering the compaction process.

Expand Down
2 changes: 2 additions & 0 deletions docs/en/release-notes/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ This page documents the changes in each Kimi Code CLI release.

## Unreleased

- Core: Add `compaction_trigger_ratio` config option (default `0.85`) to control when auto-compaction triggers — compaction now fires when context usage reaches the configured ratio or when remaining space falls below `reserved_context_size`, whichever comes first
- Core: Support custom instructions in `/compact` command (e.g., `/compact keep database discussions`) to guide what the compaction preserves
- Web: Add URL action parameters (`?action=create` to open create-session dialog, `?action=create-in-dir&workDir=xxx` to create a session directly) for external integrations, and support Cmd/Ctrl+Click on new-session buttons to open session creation in a new browser tab
- Web: Add todo list display in prompt toolbar — shows task progress with expandable panel when the `SetTodoList` tool is active
- ACP: Add authentication check for session operations with `AUTH_REQUIRED` error responses for terminal-based login flow
Expand Down
2 changes: 2 additions & 0 deletions docs/zh/configuration/config-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ max_steps_per_turn = 100
max_retries_per_step = 3
max_ralph_iterations = 0
reserved_context_size = 50000
compaction_trigger_ratio = 0.85

[services.moonshot_search]
base_url = "https://api.kimi.com/coding/v1/search"
Expand Down Expand Up @@ -123,6 +124,7 @@ capabilities = ["thinking", "image_in"]
| `max_retries_per_step` | `integer` | `3` | 单步最大重试次数 |
| `max_ralph_iterations` | `integer` | `0` | 每个 User 消息后额外自动迭代次数;`0` 表示关闭;`-1` 表示无限 |
| `reserved_context_size` | `integer` | `50000` | 预留给 LLM 响应生成的 token 数量;当 `context_tokens + reserved_context_size >= max_context_size` 时自动触发压缩 |
| `compaction_trigger_ratio` | `float` | `0.85` | 触发自动压缩的上下文使用率阈值(0.5–0.99);当 `context_tokens >= max_context_size * compaction_trigger_ratio` 时自动触发压缩,与 `reserved_context_size` 条件取先触发者 |

### `services`

Expand Down
6 changes: 6 additions & 0 deletions docs/zh/guides/sessions.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,12 @@ kimi --session abc123
/compact
```

你也可以在命令后附带自定义指引,告诉 AI 在压缩时优先保留哪些内容:

```
/compact 保留数据库相关的讨论
```

压缩会保留关键信息,同时减少 token 消耗。这在对话很长但你还想保留一些上下文时很有用。

::: tip 提示
Expand Down
1 change: 1 addition & 0 deletions docs/zh/reference/kimi-web.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,7 @@ Web UI 在输入框上方提供统一的提示工具栏,以可折叠标签页
- **活动状态**:显示 Agent 当前状态(处理中、等待审批等)
- **消息队列**:在 AI 处理过程中可以排队发送后续消息,待当前回复完成后自动发送
- **文件变更**:检测 Git 仓库状态,显示新增、修改和删除的文件数量(包含未跟踪文件),点击可查看详细的变更列表
- **待办事项**:当 `SetTodoList` 工具处于活动状态时,显示任务进度,支持展开查看详细列表

::: info 变更
Git diff 状态栏新增于 1.5 版本。1.9 版本添加了活动状态指示器。1.10 版本将其统一为提示工具栏。1.11 版本将上下文用量指示器移至提示工具栏。
Expand Down
2 changes: 1 addition & 1 deletion docs/zh/reference/slash-commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@

### `/compact`

手动压缩上下文,减少 token 使用。
手动压缩上下文,减少 token 使用。可以在命令后附带自定义指引,告诉 AI 在压缩时优先保留哪些信息,例如 `/compact 保留数据库相关的讨论`。

当上下文过长时,Kimi Code CLI 会自动触发压缩。此命令可手动触发压缩过程。

Expand Down
2 changes: 2 additions & 0 deletions docs/zh/release-notes/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@

## 未发布

- Core:新增 `compaction_trigger_ratio` 配置项(默认 `0.85`),用于控制自动压缩的触发时机——当上下文用量达到配置比例或剩余空间低于 `reserved_context_size` 时触发压缩,以先满足的条件为准
- Core:`/compact` 命令支持自定义指令(如 `/compact keep database discussions`),可指导压缩时重点保留的内容
- Web:新增 URL 操作参数(`?action=create` 打开创建会话对话框,`?action=create-in-dir&workDir=xxx` 直接创建会话)用于外部集成,支持 Cmd/Ctrl+点击新建会话按钮在新标签页中打开会话创建
- Web:在提示输入工具栏中添加待办列表显示——当 `SetTodoList` 工具激活时,显示任务进度并支持展开面板查看详情
- ACP:为会话操作添加认证检查,未认证时返回 `AUTH_REQUIRED` 错误响应,支持终端登录流程
Expand Down
7 changes: 6 additions & 1 deletion src/kimi_cli/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,12 @@ class LoopControl(BaseModel):
"""Extra iterations after the first turn in Ralph mode. Use -1 for unlimited."""
reserved_context_size: int = Field(default=50_000, ge=1000)
"""Reserved token count for LLM response generation. Auto-compaction triggers when
context_tokens + reserved_context_size >= max_context_size. Default is 50000."""
either context_tokens + reserved_context_size >= max_context_size or
context_tokens >= max_context_size * compaction_trigger_ratio. Default is 50000."""
compaction_trigger_ratio: float = Field(default=0.85, ge=0.5, le=0.99)
"""Context usage ratio threshold for auto-compaction. Default is 0.85 (85%).
Auto-compaction triggers when context_tokens >= max_context_size * compaction_trigger_ratio
or when context_tokens + reserved_context_size >= max_context_size."""


class MoonshotSearchConfig(BaseModel):
Expand Down
44 changes: 39 additions & 5 deletions src/kimi_cli/soul/compaction.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,15 +53,37 @@ def _estimate_text_tokens(messages: Sequence[Message]) -> int:
return total_chars // 4


def should_auto_compact(
token_count: int,
max_context_size: int,
*,
trigger_ratio: float,
reserved_context_size: int,
) -> bool:
"""Determine whether auto-compaction should be triggered.

Returns True when either condition is met (whichever fires first):
- Ratio-based: token_count >= max_context_size * trigger_ratio
- Reserved-based: token_count + reserved_context_size >= max_context_size
"""
return (
token_count >= max_context_size * trigger_ratio
or token_count + reserved_context_size >= max_context_size
)
Comment on lines +69 to +72
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should_auto_compact compares token_count to max_context_size * trigger_ratio using floating-point math, which can produce thresholds like 170000.00000000003 and cause an unexpected false negative at the boundary. To make the trigger deterministic, compute an integer threshold (e.g., math.ceil(max_context_size * trigger_ratio) or equivalent integer arithmetic) before comparing.

Copilot uses AI. Check for mistakes.


@runtime_checkable
class Compaction(Protocol):
async def compact(self, messages: Sequence[Message], llm: LLM) -> CompactionResult:
async def compact(
self, messages: Sequence[Message], llm: LLM, *, custom_instruction: str = ""
) -> CompactionResult:
"""
Compact a sequence of messages into a new sequence of messages.

Args:
messages (Sequence[Message]): The messages to compact.
llm (LLM): The LLM to use for compaction.
custom_instruction: Optional user instruction to guide compaction focus.

Returns:
CompactionResult: The compacted messages and token usage from the compaction LLM call.
Expand All @@ -82,8 +104,10 @@ class SimpleCompaction:
def __init__(self, max_preserved_messages: int = 2) -> None:
self.max_preserved_messages = max_preserved_messages

async def compact(self, messages: Sequence[Message], llm: LLM) -> CompactionResult:
compact_message, to_preserve = self.prepare(messages)
async def compact(
self, messages: Sequence[Message], llm: LLM, *, custom_instruction: str = ""
) -> CompactionResult:
compact_message, to_preserve = self.prepare(messages, custom_instruction=custom_instruction)
if compact_message is None:
return CompactionResult(messages=to_preserve, usage=None)

Expand Down Expand Up @@ -118,7 +142,9 @@ class PrepareResult(NamedTuple):
compact_message: Message | None
to_preserve: Sequence[Message]

def prepare(self, messages: Sequence[Message]) -> PrepareResult:
def prepare(
self, messages: Sequence[Message], *, custom_instruction: str = ""
) -> PrepareResult:
if not messages or self.max_preserved_messages <= 0:
return self.PrepareResult(compact_message=None, to_preserve=messages)

Expand Down Expand Up @@ -151,5 +177,13 @@ def prepare(self, messages: Sequence[Message]) -> PrepareResult:
compact_message.content.extend(
part for part in msg.content if not isinstance(part, ThinkPart)
)
compact_message.content.append(TextPart(text="\n" + prompts.COMPACT))
prompt_text = "\n" + prompts.COMPACT
if custom_instruction:
prompt_text += (
"\n\n**User's Custom Compaction Instruction:**\n"
"The user has specifically requested the following focus during compaction. "
"You MUST prioritize this instruction above the default compression priorities:\n"
f"{custom_instruction}"
)
compact_message.content.append(TextPart(text=prompt_text))
return self.PrepareResult(compact_message=compact_message, to_preserve=to_preserve)
16 changes: 11 additions & 5 deletions src/kimi_cli/soul/kimisoul.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
wire_send,
)
from kimi_cli.soul.agent import Agent, Runtime
from kimi_cli.soul.compaction import CompactionResult, SimpleCompaction
from kimi_cli.soul.compaction import CompactionResult, SimpleCompaction, should_auto_compact
from kimi_cli.soul.context import Context
from kimi_cli.soul.message import check_message, system, tool_result_to_message
from kimi_cli.soul.slash import registry as soul_slash_registry
Expand Down Expand Up @@ -392,8 +392,12 @@ async def _pipe_approval_to_wire():
step_outcome: StepOutcome | None = None
try:
# compact the context if needed
reserved = self._loop_control.reserved_context_size
if self._context.token_count + reserved >= self._runtime.llm.max_context_size:
if should_auto_compact(
self._context.token_count,
self._runtime.llm.max_context_size,
trigger_ratio=self._loop_control.compaction_trigger_ratio,
reserved_context_size=self._loop_control.reserved_context_size,
):
logger.info("Context too long, compacting...")
await self.compact_context()

Expand Down Expand Up @@ -544,7 +548,7 @@ async def _grow_context(self, result: StepResult, tool_results: list[ToolResult]
await self._context.append_message(tool_messages)
# token count of tool results are not available yet

async def compact_context(self) -> None:
async def compact_context(self, custom_instruction: str = "") -> None:
"""
Compact the context.

Expand All @@ -558,7 +562,9 @@ async def compact_context(self) -> None:
async def _run_compaction_once() -> CompactionResult:
if self._runtime.llm is None:
raise LLMNotSet()
return await self._compaction.compact(self._context.history, self._runtime.llm)
return await self._compaction.compact(
self._context.history, self._runtime.llm, custom_instruction=custom_instruction
)

@tenacity.retry(
retry=retry_if_exception(self._is_retryable_error),
Expand Down
4 changes: 2 additions & 2 deletions src/kimi_cli/soul/slash.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,13 +51,13 @@ async def init(soul: KimiSoul, args: str):

@registry.command
async def compact(soul: KimiSoul, args: str):
"""Compact the context"""
"""Compact the context (optionally with a custom focus, e.g. /compact keep db discussions)"""
if soul.context.n_checkpoints == 0:
wire_send(TextPart(text="The context is empty."))
return

logger.info("Running `/compact`")
await soul.compact_context()
await soul.compact_context(custom_instruction=args.strip())
wire_send(TextPart(text="The context has been compacted."))
Comment on lines 59 to 61
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/compact forwards the raw argument string into the compaction prompt. Because there’s no length/size guard, a very long custom instruction can push the compaction request over the model’s context limit and cause a hard failure (non-retryable 4xx). Consider truncating to a reasonable max length (and/or warning the user) before passing it into compact_context.

Copilot uses AI. Check for mistakes.
wire_send(StatusUpdate(context_usage=soul.status.context_usage))

Expand Down
21 changes: 21 additions & 0 deletions tests/core/test_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ def test_default_config_dump():
"max_retries_per_step": 3,
"max_ralph_iterations": 0,
"reserved_context_size": 50000,
"compaction_trigger_ratio": 0.85,
},
"services": {"moonshot_search": None, "moonshot_fetch": None},
"mcp": {"client": {"tool_call_timeout_ms": 60000}},
Expand Down Expand Up @@ -92,3 +93,23 @@ def test_load_config_max_steps_per_run():
def test_load_config_reserved_context_size_too_low():
with pytest.raises(ConfigError, match="reserved_context_size"):
load_config_from_string('{"loop_control": {"reserved_context_size": 500}}')


def test_load_config_compaction_trigger_ratio():
config = load_config_from_string('{"loop_control": {"compaction_trigger_ratio": 0.8}}')
assert config.loop_control.compaction_trigger_ratio == 0.8


Comment on lines +98 to +102
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config validation tests cover an in-range value and out-of-range values, but they don't exercise the documented boundaries (0.5 and 0.99). Add tests asserting that compaction_trigger_ratio=0.5 and =0.99 are accepted, and that values just outside the bounds are rejected, to prevent future regressions in the ge/le constraints.

Copilot uses AI. Check for mistakes.
def test_load_config_compaction_trigger_ratio_default():
config = load_config_from_string("{}")
assert config.loop_control.compaction_trigger_ratio == 0.85


def test_load_config_compaction_trigger_ratio_too_low():
with pytest.raises(ConfigError, match="compaction_trigger_ratio"):
load_config_from_string('{"loop_control": {"compaction_trigger_ratio": 0.3}}')


def test_load_config_compaction_trigger_ratio_too_high():
with pytest.raises(ConfigError, match="compaction_trigger_ratio"):
load_config_from_string('{"loop_control": {"compaction_trigger_ratio": 1.0}}')
Loading
Loading