Skip to content

Make LLM triage deterministic + plug-and-play (rules-only engine, provider adapters, sane defaults) #30

Description

@sarahxsanders

The problem

The warlock's LLM triage step is non-deterministic, and it bit a consumer for real. A context-mill CI build flipped from failing (64 "threats") to passing on a plain re-run, with zero code changes. Same files, same rules, same warlock version. The only thing that changed was the triage model's verdict.

Two root causes:

  1. The triage prompt never tells the model what actually matched. We hand it the whole file plus the rule name, but not the specific substring the YARA rule fired on (engine.ts drops the offsets). So the model hunts through the file and sometimes blames the wrong text. In our case the real trigger was a harmless You are now logged in greeting in a demo app, but the model sometimes pinned it on an unrelated You are an audit subagent line and called it a role hijack.
  2. Temperature isn't pinned. Triage runs at the model's default temperature, so the same question can get different answers run to run. For a blocking security gate we want the same answer every time.

The wrinkle: the warlock does not make the LLM call, the consumer does (bring-your-own-provider). So today every consumer has to remember to set temperature 0 themselves, which is exactly how one consumer ends up silently flaky.

What we want

Keep the warlock a rules-only engine (it should not hold credentials or make network calls), but give consumers a plug-and-play, hard-to-misconfigure setup with good defaults baked in:

  1. Pass the matched substring(s) through to triage. yara-x already returns patterns[].matches[].{offset, length} per match; we currently throw them away in engine.ts. Capture them (slice the byte payload, not the JS string, so multi-byte chars stay aligned), add them to ScanMatch, and include them in the triage prompt so the model judges the actual trigger instead of guessing. Cap count and length so prompts stay sane.
  2. Centralize the recommended settings (temperature 0, etc.) in the warlock, applied via plug-and-play adapters. The warlock still calls nothing itself, but ships:
    • a clear provider contract,
    • optional thin adapters for common LLMs (wrap a consumer's existing Anthropic / OpenAI client) that apply the recommended defaults automatically, so nobody has to remember temperature 0,
    • a single config object with suggested defaults that consumers override only when they need to.
  3. Optional: a real setup step like npx warlock init that scaffolds the wiring for a consumer (the "wizard"), run on purpose after install, not hidden install-script magic.

Why this matters

A security gate that flips pass/fail at random trains people to just hit re-run until it goes green, which defeats the whole point of the gate. Making triage deterministic and the setup plug-and-play fixes both the reliability and the DX.

Out of scope / notes

  • The immediate flaky build in context-mill is being handled separately as a smaller fix. This issue is the durable architecture work.
  • The adapters must not turn the warlock into the credential-holder or hard-depend on any specific LLM SDK.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions