Skip to content

Scenario validation: run probes inside a longer conversation (context depth) #496

Description

@kcarnold

Background

scripts/scenario_design/probe.ts (added in the colleague-eval merge) runs single-turn probes at depth 0 — i.e. right after the scenario's opening messages. This catches base-case behavior but not regressions that only appear once the conversation has accumulated context.

Ask

Add an optional context-depth mode to probe.ts: load an existing archetype conversation log from outputs/<scenario>_<archetype>.json (produced by simulate.ts), use it as the conversation prefix, then append the probe and judge as usual. This verifies that accumulated context doesn't cause the colleague to (e.g.) start volunteering info or drafting.

Notes

  • Deferred from the colleague-eval merge (branch merge-colleague-evals) per review.
  • Reuse the already-exported loadScenario / callColleague (simulate.ts) and judgeConversation (judge.ts).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions