Skip to content

Add scanFiles(filePaths, llmProvider?) API for per-match content windowing #35

Description

@sarahxsanders

Why

Warlock today exposes `scan(content)` — single content blob in, matches out. Consumers that need to scan a directory of files (the wizard's skill-install scan, for example) end up building aggregation logic on top, which has two correctness pitfalls:

  1. Combined-buffer triage drops real attacks. If the consumer scans each file then concatenates everything into a `combined` string and passes `combined.slice(0, MAX_SCAN_LENGTH)` to `triageMatches`, matches whose evidence lives in files past the truncation cut are invisible to the triage LLM → biased toward `false_positive` → real violations get dropped.
  2. Per-file scan + cross-file aggregation is repeated boilerplate. Every consumer reinvents file reading, scan loops, match accumulation, and (incorrectly) triage windowing.

The wizard hit pitfall #1 in scanSkillFiles. The short-term fix on the wizard side is to triage per-file (each file's matches against that file's content), but the proper home for this abstraction is warlock.

Proposal

```ts
export interface ScanFilesOptions {
/** Per-file truncation cap (default: 100KB). /
maxScanLength?: number;
/
* Triage provider; omit to skip triage and return all flagged matches. */
llmProvider?: LLMProvider;
}

export async function scanFiles(
filePaths: string[],
options?: ScanFilesOptions,
): Promise<ScanMatch[]>;
```

Internals:

  • Read each file (parallel `fs.promises.readFile` is fine — disk I/O isn't the bottleneck).
  • Per-file: scan with the file's truncated content; collect matches via `matchesForContext`.
  • Per-file: triage with that file's content (not a combined buffer). Each match is judged against the evidence that produced it.
  • Aggregate triaged matches across all files; return.

Wizard caller would simplify to:

```ts
const matches = await warlock.scanFiles(files, { llmProvider });
```

…and the wizard's `scanSkillFiles` helper goes away.

Out of scope

  • Single-file content scanning (existing `scan(content)` stays as-is).
  • WASM init / cold-start optimization (separate concern).

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions