Scenario validation: run probes inside a longer conversation (context depth)

## Background
`scripts/scenario_design/probe.ts` (added in the colleague-eval merge) runs single-turn probes at **depth 0** — i.e. right after the scenario's opening messages. This catches base-case behavior but not regressions that only appear once the conversation has accumulated context.

## Ask
Add an optional context-depth mode to `probe.ts`: load an existing archetype conversation log from `outputs/<scenario>_<archetype>.json` (produced by `simulate.ts`), use it as the conversation prefix, then append the probe and judge as usual. This verifies that accumulated context doesn't cause the colleague to (e.g.) start volunteering info or drafting.

## Notes
- Deferred from the colleague-eval merge (branch `merge-colleague-evals`) per review.
- Reuse the already-exported `loadScenario` / `callColleague` (simulate.ts) and `judgeConversation` (judge.ts).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scenario validation: run probes inside a longer conversation (context depth) #496

Background

Ask

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Scenario validation: run probes inside a longer conversation (context depth) #496

Description

Background

Ask

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions