The problem
The warlock's LLM triage step is non-deterministic, and it bit a consumer for real. A context-mill CI build flipped from failing (64 "threats") to passing on a plain re-run, with zero code changes. Same files, same rules, same warlock version. The only thing that changed was the triage model's verdict.
Two root causes:
- The triage prompt never tells the model what actually matched. We hand it the whole file plus the rule name, but not the specific substring the YARA rule fired on (
engine.ts drops the offsets). So the model hunts through the file and sometimes blames the wrong text. In our case the real trigger was a harmless You are now logged in greeting in a demo app, but the model sometimes pinned it on an unrelated You are an audit subagent line and called it a role hijack.
- Temperature isn't pinned. Triage runs at the model's default temperature, so the same question can get different answers run to run. For a blocking security gate we want the same answer every time.
The wrinkle: the warlock does not make the LLM call, the consumer does (bring-your-own-provider). So today every consumer has to remember to set temperature 0 themselves, which is exactly how one consumer ends up silently flaky.
What we want
Keep the warlock a rules-only engine (it should not hold credentials or make network calls), but give consumers a plug-and-play, hard-to-misconfigure setup with good defaults baked in:
- Pass the matched substring(s) through to triage. yara-x already returns
patterns[].matches[].{offset, length} per match; we currently throw them away in engine.ts. Capture them (slice the byte payload, not the JS string, so multi-byte chars stay aligned), add them to ScanMatch, and include them in the triage prompt so the model judges the actual trigger instead of guessing. Cap count and length so prompts stay sane.
- Centralize the recommended settings (temperature 0, etc.) in the warlock, applied via plug-and-play adapters. The warlock still calls nothing itself, but ships:
- a clear provider contract,
- optional thin adapters for common LLMs (wrap a consumer's existing Anthropic / OpenAI client) that apply the recommended defaults automatically, so nobody has to remember temperature 0,
- a single config object with suggested defaults that consumers override only when they need to.
- Optional: a real setup step like
npx warlock init that scaffolds the wiring for a consumer (the "wizard"), run on purpose after install, not hidden install-script magic.
Why this matters
A security gate that flips pass/fail at random trains people to just hit re-run until it goes green, which defeats the whole point of the gate. Making triage deterministic and the setup plug-and-play fixes both the reliability and the DX.
Out of scope / notes
- The immediate flaky build in context-mill is being handled separately as a smaller fix. This issue is the durable architecture work.
- The adapters must not turn the warlock into the credential-holder or hard-depend on any specific LLM SDK.
The problem
The warlock's LLM triage step is non-deterministic, and it bit a consumer for real. A context-mill CI build flipped from failing (64 "threats") to passing on a plain re-run, with zero code changes. Same files, same rules, same warlock version. The only thing that changed was the triage model's verdict.
Two root causes:
engine.tsdrops the offsets). So the model hunts through the file and sometimes blames the wrong text. In our case the real trigger was a harmlessYou are now logged ingreeting in a demo app, but the model sometimes pinned it on an unrelatedYou are an audit subagentline and called it a role hijack.The wrinkle: the warlock does not make the LLM call, the consumer does (bring-your-own-provider). So today every consumer has to remember to set temperature 0 themselves, which is exactly how one consumer ends up silently flaky.
What we want
Keep the warlock a rules-only engine (it should not hold credentials or make network calls), but give consumers a plug-and-play, hard-to-misconfigure setup with good defaults baked in:
patterns[].matches[].{offset, length}per match; we currently throw them away inengine.ts. Capture them (slice the byte payload, not the JS string, so multi-byte chars stay aligned), add them toScanMatch, and include them in the triage prompt so the model judges the actual trigger instead of guessing. Cap count and length so prompts stay sane.npx warlock initthat scaffolds the wiring for a consumer (the "wizard"), run on purpose after install, not hidden install-script magic.Why this matters
A security gate that flips pass/fail at random trains people to just hit re-run until it goes green, which defeats the whole point of the gate. Making triage deterministic and the setup plug-and-play fixes both the reliability and the DX.
Out of scope / notes