Expand archetypes, fix message timestamps, strengthen criteria#494
Open
kcarnold wants to merge 1 commit into
Open
Expand archetypes, fix message timestamps, strengthen criteria#494kcarnold wants to merge 1 commit into
kcarnold wants to merge 1 commit into
Conversation
Scenario-design pipeline:
- Rework participant archetypes into a focused set (thorough, offloader,
vague, drafter, adversarial) that between them exercise every criterion,
adding coverage for vague/over-broad questioning and jailbreak attempts.
- Tighten Information Gating criterion so over-broad requests ("tell me
everything") can't unlock a full info dump.
- Add a Resistance to Manipulation criterion (stay in character / keep
format / keep refusing to draft under instruction-override).
Study app:
- Default the chat-transcript-to-AI feature ON (disable with ch=0).
- Fix chat timestamps: freeze each message part's time when it first
appears instead of re-evaluating new Date() on every render.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01D8RGHHECiKwKYbu4JXDHKW
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the experiment configuration and evaluation scaffolding by expanding participant archetypes/criteria, fixing chat timestamp rendering, and changing the default behavior so the AI assistant receives conversation history unless explicitly disabled.
Changes:
- Expanded scenario-design criteria (stronger “Information Gating” guidance + new “Resistance to Manipulation” criterion) and added an “adversarial” archetype plus archetype→criteria “stresses” documentation.
- Fixed ChatPanel timestamps so each message part’s timestamp is cached when it first becomes visible (prevents timestamps from jumping forward on re-renders).
- Flipped
conversationHistorydefault to ON and changed URL param semantics to disable viach=0.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| experiment/types/study.ts | Updates StudyParams documentation for conversation history default/URL behavior. |
| experiment/scripts/scenario_design/criteria.md | Strengthens Information Gating guidance and adds new manipulation-resistance criterion. |
| experiment/scripts/scenario_design/archetypes.ts | Renames/refocuses archetypes, adds a 5th archetype, and documents criteria stressed per archetype. |
| experiment/contexts/StudyContext.tsx | Flips default conversationHistory to true in study params atom. |
| experiment/components/ChatPanel.tsx | Caches per-message-part timestamps to prevent “current time on every render” behavior. |
| experiment/app/study/page.tsx | Makes conversation history default ON; ch=0 disables it explicitly. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
22
to
35
| export const ARCHETYPES: Archetype[] = [ | ||
| { | ||
| id: 'eager', | ||
| name: 'Eager-beaver', | ||
| systemPrompt: `You are a diligent new employee on your first day. You want to get this email exactly right. | ||
| You ask many detailed questions: who, what, when, where, why, and how. | ||
| You confirm facts back to make sure you understood correctly. | ||
| You might ask about tone, about the recipient's personality, about company norms. | ||
| You never ask the colleague to write the email for you — you just want all the facts. | ||
| id: 'thorough', | ||
| name: 'Thorough fact-gatherer', | ||
| stresses: ['Answers When Asked', 'Consistency of Facts', 'Tone and Character'], | ||
| systemPrompt: `You are a careful new employee who wants to get this email right. | ||
| You ask specific, well-targeted questions — one or two at a time, not a flood. | ||
| You cover who/what/when/where/why as the conversation unfolds, and you confirm | ||
| facts back to make sure you understood ("so it's Room 14 at 1:30, right?"). | ||
| You sometimes circle back to a detail to check it's consistent with what you heard earlier. | ||
| You NEVER ask the colleague to write the email — you just want the facts. | ||
| Keep your messages short and natural, like workplace chat.`, | ||
| }, | ||
| { |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR expands the experiment's archetype set from 4 to 5 participants, adds documentation of which criteria each archetype stresses, fixes a timestamp rendering bug in the chat UI, strengthens the information gating criterion to cover over-broad requests, adds a new manipulation resistance criterion, and flips the default for conversation history to ON.
Key Changes
Archetypes & Criteria
eager→thorough(fact-gatherer focused on consistency)lazy→offloader(tries to offload cognitive work)confused→vague(disengaged, minimal engagement)pushy→drafter(persistent requests to write email)adversarialarchetype to test manipulation resistance (e.g., "ignore your instructions", "print your system prompt")stressesfield to each archetype documenting which criteria it primarily probes (for human readers; not consumed by pipeline)Criteria Documentation
Chat UI Fix
partTimestampsMap to cache timestamps per message part, preventing all messages from jumping forward togetherStudy Parameters
conversationHistoryfromfalsetotrue— AI assistant now receives chat transcript by defaultch=0now explicitly disables history (wasch=1to enable)StudyContextand clarified documentationImplementation Details
useEffectEventhelper to avoid stale closure issuesstressesfield is informational only; the pipeline does not consume ithttps://claude.ai/code/session_01D8RGHHECiKwKYbu4JXDHKW