Skip to content

fix(hooks): remove interim truncation caps — reducer is the size gate#99

Merged
aaronsb merged 1 commit into
mainfrom
fix/remove-truncation-caps
May 21, 2026
Merged

fix(hooks): remove interim truncation caps — reducer is the size gate#99
aaronsb merged 1 commit into
mainfrom
fix/remove-truncation-caps

Conversation

@aaronsb
Copy link
Copy Markdown
Owner

@aaronsb aaronsb commented May 21, 2026

Summary

Removes the character-based truncation blocks from check-bash-pre.sh (PR #95) and check-prompt.sh (PR #96). They were misframed in their own commit messages as "belt and suspenders" — in practice they ran strictly upstream of the ADR-130 reducer (PR #98), starving it by chopping the input before it could score sentence salience.

The custom-agent discriminator in check-task-pre.sh (PR #94) is not a cap — it stops ways scan task from running at all for dispatches to custom agents. Untouched.

Why now, not after an observation window

The caps make every signal we'd want to observe worse:

  • They pre-clamp inputs to a safe range, so a reducer-bounds bug would never surface in practice
  • They pre-cut prose, so any match-quality regression after removal would be conflated with reducer behavior
  • They add small shell-side overhead that masks reducer perf

Keeping them was actively delaying the observation we wanted to do.

What's still guaranteed

ADR-130's reduce_for_embed enforces a per-hook token budget (110 for prompt/task, 75 for command, 30 for file) and bounds output regardless of input size. The CJK-aware approximate tokenizer (added in the PR #98 review fixes) prevents the Japanese/Chinese under-counting that would have been a regression here.

Test plan

  • bash -n clean on both scripts
  • 6.4KB heredoc-bodied gh pr create through bash hook → 0 crashes
  • 6KB task-notification through prompt hook → 0 crashes
  • Hook context still emitted normally (no scan path regression)

PRs #95 (bash 256-char cap) and #96 (prompt 1024-char cap) were
character-based pre-clamps that ran strictly upstream of the ADR-130
sentence-salience reducer. They weren't redundant — they were
*starving* the reducer by chopping off the back of the input before
the reducer could score it.

Concrete shape of the bug, on a 4000-char `gh pr create --body
"$(cat <<EOF…)"`:

  hook receives 4000-char CMD
    ↓
  check-bash-pre.sh truncates to 256 chars    ← cap fires here
    ↓
  ways scan command --command "<256 chars>"
    ↓
  reduce_for_embed("<256 chars>", 75)         ← input fits, passthrough
    ↓
  batch_embed_score("<256 chars>")

The reducer's whole pitch — preserve prose distributed across the
whole document — was defeated as long as the caps ran first. The
caps also masked any reducer-bounds bug by clamping inputs to a
safe range before the reducer could exercise its full path.

ADR-130's reducer provides the size guarantee these caps were meant
to provide. With the caps gone, the reducer sees the full input,
scores sentence salience across it, and hands the embedder a
bounded query.

Custom-agent discriminator in check-task-pre.sh (PR #94) is
*not* a cap — it's a "don't invoke ways scan task at all for
dispatches to custom agents" gate. Unchanged.

Tested live: 6.4KB bash command, 6KB task-notification prompt
both run cleanly through the production hook chain, 0 SIGABRTs.
@aaronsb aaronsb merged commit 15f0952 into main May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant