Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 8 additions & 20 deletions hooks/ways/check-bash-pre.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,26 +4,16 @@
# The ways binary handles: command pattern matching, semantic scoring,
# check curve scoring, session state, and content output.
#
# The command is truncated to its semantic prefix before being passed to
# `ways scan command`. Heredoc bodies (`gh pr create --body "$(cat <<EOF…)"`),
# JSON payloads (`curl -d '{…}'`), and other large argument bodies carry
# no signal for "what kind of command is this" — the program name and
# first few args do. The MiniLM embedding models cap at ~128 tokens of
# position embeddings; queries past that abort the embedder (ggml
# get_rows out-of-range). 256 chars ≈ 60 tokens, safely under the limit
# with headroom for the description that gets appended downstream.
#
# This truncation feeds both the embed query *and* the regex matcher in
# the ways binary. Every existing `commands:` pattern under hooks/ways/
# matches on the program name + first arg (≤106 chars), so cropping at
# 256 changes no current behavior. Future patterns that need to look
# past char 256 of a bash command would be misusing this trigger
# anyway — that signal belongs in `pattern:` against the description.
# Size bounding for the embed query is the ways binary's responsibility
# (ADR-130 sentence-salience reducer in scan/reduce.rs). This script
# passes the full command through so the reducer can score the prose
# distribution itself — pre-truncating here would starve the reducer
# of the back half of any long input. The regex `commands:` matcher
# also gets the full command, which is what ways with patterns like
# `^(npm|cargo|gh) ` expect.

source "$(dirname "$0")/require-ways.sh"

readonly CMD_QUERY_MAX=256

INPUT=$(cat)
CMD=$(echo "$INPUT" | jq -r '.tool_input.command // empty')
DESC=$(echo "$INPUT" | jq -r '.tool_input.description // empty' | tr '[:upper:]' '[:lower:]')
Expand All @@ -32,11 +22,9 @@ AGENT_ID=$(echo "$INPUT" | jq -r '.agent_id // empty')
[[ -n "$AGENT_ID" ]] && export CLAUDE_AGENT_ID="$AGENT_ID"
PROJECT_DIR="${CLAUDE_PROJECT_DIR:-$(echo "$INPUT" | jq -r '.cwd // empty')}"

CMD_QUERY="${CMD:0:$CMD_QUERY_MAX}"

export CLAUDE_PROJECT_DIR="${PROJECT_DIR}"
"${HOME}/.claude/bin/ways" scan command \
--command "$CMD_QUERY" \
--command "$CMD" \
--description "$DESC" \
--session "$SESSION_ID" \
--project "$PROJECT_DIR"
18 changes: 5 additions & 13 deletions hooks/ways/check-prompt.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,18 +9,13 @@
# harness also injects structured content here: <task-notification>
# blobs from completed background agents, <persisted-output> pointers
# for tool results that exceed inline budget, and other system-reminder
# envelopes. Any of those can run multiple KB, and embedding that
# overruns the MiniLM model's position-embedding table (SIGABRT in
# ggml_compute_forward_get_rows). Cap the embed query at 1024 chars
# (~240 tokens — generous because real user prompts can legitimately
# be paragraphs of context, unlike bash commands). Anything past 1024
# in a prompt is system-injected envelope content that carries no
# additional signal for matching the user's *intent* against ways.
# envelopes. Size bounding for the embed query is the ways binary's
# responsibility (ADR-130 sentence-salience reducer in scan/reduce.rs).
# This script passes the full combined prompt+topics through so the
# reducer can score sentence salience across the whole input.

source "$(dirname "$0")/require-ways.sh"

readonly PROMPT_QUERY_MAX=1024

INPUT=$(cat)
PROMPT=$(echo "$INPUT" | jq -r '.prompt // empty' | tr '[:upper:]' '[:lower:]')
SESSION_ID=$(echo "$INPUT" | jq -r '.session_id // empty')
Expand All @@ -37,11 +32,8 @@ if [[ -f "$RESPONSE_STATE" ]]; then
RESPONSE_TOPICS=$(jq -r '.topics // empty' "$RESPONSE_STATE" 2>/dev/null)
fi

# Combined context: user prompt + Claude's recent topics, capped.
# RESPONSE_TOPICS is bounded by check-response.sh's extraction (~50 chars
# of keywords) so the cap is effectively a guard on PROMPT itself.
# Combined context: user prompt + Claude's recent topics.
COMBINED="${PROMPT} ${RESPONSE_TOPICS}"
COMBINED="${COMBINED:0:$PROMPT_QUERY_MAX}"

export CLAUDE_PROJECT_DIR="${PROJECT_DIR}"
"${HOME}/.claude/bin/ways" scan prompt \
Expand Down