You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The wizard today is a single-agent, linear runner. runAgent → runProgram
(src/lib/agent/agent-runner.ts:154) ends in one Claude Agent SDK query() call
driven by one large assembled prompt. All integration work happens inside that
single agent context.
This epic prototypes a task-queue-driven orchestrator that runs many small,
fresh-context micro-agents. An orchestrator agent inspects the repo and seeds an
in-memory task queue. An executor drains it, running one fresh agent per task,
each with its own model, goal, success criteria, permissions, and the mini-skills
that tell it HOW. Tasks can dynamically enqueue more tasks as they learn, water
running downhill. Each task leaves a structured handoff for the next agent: what
it did, what its goal was, and what the next agent should know.
A core requirement is separation of concerns:
Agent prompts (the WHAT). They say what to do, and carry the artifacts in
frontmatter: model, goal, success criteria, permissions, which mini-skills.
Mini-skills (the HOW). The procedural integration knowledge.
Both are markdown files with frontmatter served from context-mill. Skills already
exist there. Agent prompts are a new agents content type, a flavor parallel to
skills that carries intention rather than procedure.
It branches from the linear runner, gated by the boolean wizard-orchestrator feature flag, so we launch in the dark and A/B it against the linear
baseline.
Why, the primary goal is responsiveness
The linear runner packs the whole integration into one agent, which produces long
silences. The user waits through minutes of planning, then minutes of editing,
with little sign of progress. The win condition for this experiment is that the
user sees real, incremental progress the whole way through. Decomposing into
discrete micro-tasks delivers that. Each task is small and quick, it finishes
visibly, and it streams a steady drumbeat of one thing done and the next thing
starting.
Two levers keep tasks small and quick:
Per-task model fit. Each task picks the cheapest model that can do the job,
a small model for install and boilerplate, a stronger one only where reasoning
is needed. This is core.
Granularity. Prefer many small tasks over a few large ones, scoped tightly
enough to finish fast and show progress. The orchestrator itself seeds fast, a
quick glance at the repo rather than a long plan.
Decomposition also helps reliability, debuggability, and per-step permissions.
Locked decisions
Concurrency. Parallel-capable, tasks declare dependsOn, default cap 1 for
the prototype. The real graph has genuine parallel branches: install alongside
init, identify alongside capture-planning. At cap 1 they run in any order;
raising the cap runs the independent branches together.
Seeding. Dynamic enqueue, with loop and termination guards.
Gating. The boolean wizard-orchestrator feature flag (constants.ts). The integrate command reads the flag and routes to the new runner when it is true. To test in the dark, enable the flag for your own user in PostHog
(isOrchestratorEnabled, agent-interface.ts).
Artifacts. The WHAT and the HOW are both content in context-mill. Agent
prompts are a new agents content type, a flavor parallel to skills that carries
intention: model, goal, success criteria, permissions, which skills to load.
Mini-skills stay as skills, the procedure. Both are markdown with frontmatter,
authored on a clearly-named experiment branch, served from context-mill. Local
dev runs against localhost:8765. Context-mill grows the agents type to match
how it already builds and serves skills (feat: posthog analytics setup #2).
Tools. The orchestrator tools live in the existing wizard-tools server,
alongside the env tools and the audit tools.
Handoff. Each task reports a structured handoff through complete_task:
goal, what it did, what the next agent should know.
Success. Success criteria are plain text in the agent prompt. The agent
self-reports the outcome through complete_task.
UI. The TUI renders the queue.
Sequencing. Prove the full machinery with a walking skeleton, stub tasks
that write a temp file, then author the real bodies: install, init,
instrument-events.
Architecture at a glance
Fork point. Inside runProgram after OAuth, at agent-runner.ts:302.
Extract steps 1 to 4 into bootstrapProgram(), which the orchestrator arm
reuses for the health check, settings conflicts, OAuth and credentials, MCP url,
and variant metadata.
Per-task agent. Reuse the existing runAgent (agent-interface.ts:773),
initialize the agent once, and override model, tools, and permissions per task.
Queue. In memory as the source of truth, reflected asynchronously to <installDir>/.posthog-wizard/ (queue.json, audit.jsonl, structured handoffs/<id>.json), reusing the audit-ledger atomic write and mutex helpers.
Tools.enqueue_task, complete_task, and read_handoffs in wizard-tools, registered when a QueueStore is present, with termination
guards.
Telemetry.VARIANT=orchestrator flows into existing events, plus
orchestrator-specific events for the A/B comparison.
What good looks like
Time to first visible progress is short. The first task is running and showing
in the UI soon after launch.
Progress is steady. No single step dominates wall-clock, and the user always
sees what just finished and what is next.
Cheap models carry the cheap work. Tasks default to the lowest-cost model that
succeeds, and the strong models are the exception.
Each fresh task-agent re-pays the full system-prompt, the claude_code preset,
and the MCP connect, roughly 113k tokens that the codebase worked hard to defer.
N agents means N times that startup cost. We watch per-task tokens, but the
experiment is judged on responsiveness, not on beating the linear baseline's total
token count. Small tasks, cheap models, and fast seeding keep the tax bearable.
Design discipline
CLAUDE.md keeps product knowledge out of infra. Frameworks live in FrameworkConfig, integration knowledge lives in context-mill skills, programs
are step arrays. This design fits that. The agent prompts and mini-skills are
markdown content, the same shape as today's skills, so they live in context-mill.
The new wizard-side code is machinery: the queue, the executor, the loader, and
the tools. The runner stays product-ignorant.
Separate transient task instructions from keepable skills
5, 8
Ordering: 1 ∥ 2 ∥ 3, then 4, then 5 with 6 in parallel, then 7 (demo-able), then 8, then 9, then 10 later. #2 is in the context-mill repo
and gates #6 onward, so it runs early. The strongest standalone breakpoints are 1,
3, 7, 8.
Definition of done
With the wizard-orchestrator flag enabled, a clean Next.js app
integrates PostHog end-to-end through the orchestrator (SDK installed, env set, at
least one event instrumented) at parity with the linear baseline, and the whole
run is segmentable from baseline in PostHog telemetry by VARIANT. With the flag
off, the linear path is byte-for-byte unchanged. Resume after a forced kill is #10.
Implementation status (current)
Built and running end-to-end against real apps (Next.js, Express, Android).
Done: #621, #631, #622, #623, #624, #626, and #625 + #627 (now the full
1:1 flow). Remaining: #628 telemetry, #632 transient task instructions. #629 resume is dropped, not worth the complexity.
Decisions made during the build, beyond the original issues
Manifest-only install + a build phase.install only declares the SDK;
the new build task runs the real install + build/typecheck and surfaces any
unresolved conflict (one line in the outro, full detail in the report). See Issue 8: Real task bodies + full 1:1 integration flow #627.
Full graph is nine tasks: install, init, identify, error-tracking,
plan-capture, capture, build, dashboard, report.
Grounding like the linear flow. Mini-skills carry real PostHog docs; each
task agent is pointed at the detected framework's reference EXAMPLE.md to
reference its patterns (not copy).
UI: agent-set per-task labels drive the queue panel; the agent task tools
and per-task spinner lines are suppressed so the queue is the sole progress
surface.
Flag targeting (separate concern, own branch). The wizard now identifies the
user (email) before evaluating flags, so wizard-orchestrator can target by
email — previously only $app_name was sent.
EPIC: Task-queue orchestrator runner (experimental
orchestratorvariant)Summary
The wizard today is a single-agent, linear runner.
runAgent→runProgram(
src/lib/agent/agent-runner.ts:154) ends in one Claude Agent SDKquery()calldriven by one large assembled prompt. All integration work happens inside that
single agent context.
This epic prototypes a task-queue-driven orchestrator that runs many small,
fresh-context micro-agents. An orchestrator agent inspects the repo and seeds an
in-memory task queue. An executor drains it, running one fresh agent per task,
each with its own model, goal, success criteria, permissions, and the mini-skills
that tell it HOW. Tasks can dynamically enqueue more tasks as they learn, water
running downhill. Each task leaves a structured handoff for the next agent: what
it did, what its goal was, and what the next agent should know.
A core requirement is separation of concerns:
frontmatter: model, goal, success criteria, permissions, which mini-skills.
Both are markdown files with frontmatter served from context-mill. Skills already
exist there. Agent prompts are a new
agentscontent type, a flavor parallel toskills that carries intention rather than procedure.
It branches from the linear runner, gated by the boolean
wizard-orchestratorfeature flag, so we launch in the dark and A/B it against the linearbaseline.
Why, the primary goal is responsiveness
The linear runner packs the whole integration into one agent, which produces long
silences. The user waits through minutes of planning, then minutes of editing,
with little sign of progress. The win condition for this experiment is that the
user sees real, incremental progress the whole way through. Decomposing into
discrete micro-tasks delivers that. Each task is small and quick, it finishes
visibly, and it streams a steady drumbeat of one thing done and the next thing
starting.
Two levers keep tasks small and quick:
a small model for install and boilerplate, a stronger one only where reasoning
is needed. This is core.
enough to finish fast and show progress. The orchestrator itself seeds fast, a
quick glance at the repo rather than a long plan.
Decomposition also helps reliability, debuggability, and per-step permissions.
Locked decisions
dependsOn, default cap 1 forthe prototype. The real graph has genuine parallel branches: install alongside
init, identify alongside capture-planning. At cap 1 they run in any order;
raising the cap runs the independent branches together.
asynchronously: queue, audit log, handoffs. Resume across runs and crashes is
deferred to chore: initial action for publishing on a new version #10, after fix: add react option when not detected #9.
wizard-orchestratorfeature flag (constants.ts). Theintegratecommand reads the flag and routes to the new runner when it istrue. To test in the dark, enable the flag for your own user in PostHog(
isOrchestratorEnabled,agent-interface.ts).prompts are a new
agentscontent type, a flavor parallel to skills that carriesintention: model, goal, success criteria, permissions, which skills to load.
Mini-skills stay as skills, the procedure. Both are markdown with frontmatter,
authored on a clearly-named experiment branch, served from context-mill. Local
dev runs against
localhost:8765. Context-mill grows theagentstype to matchhow it already builds and serves skills (feat: posthog analytics setup #2).
wizard-toolsserver,alongside the env tools and the audit tools.
complete_task:goal, what it did, what the next agent should know.
self-reports the outcome through
complete_task.that write a temp file, then author the real bodies: install, init,
instrument-events.
Architecture at a glance
runProgramafter OAuth, atagent-runner.ts:302.Extract steps 1 to 4 into
bootstrapProgram(), which the orchestrator armreuses for the health check, settings conflicts, OAuth and credentials, MCP url,
and variant metadata.
runAgent(agent-interface.ts:773),initialize the agent once, and override model, tools, and permissions per task.
<installDir>/.posthog-wizard/(queue.json,audit.jsonl, structuredhandoffs/<id>.json), reusing the audit-ledger atomic write and mutex helpers.enqueue_task,complete_task, andread_handoffsinwizard-tools, registered when aQueueStoreis present, with terminationguards.
VARIANT=orchestratorflows into existing events, plusorchestrator-specific events for the A/B comparison.
What good looks like
in the UI soon after launch.
sees what just finished and what is next.
succeeds, and the strong models are the exception.
Cost to watch
Each fresh task-agent re-pays the full system-prompt, the
claude_codepreset,and the MCP connect, roughly 113k tokens that the codebase worked hard to defer.
N agents means N times that startup cost. We watch per-task tokens, but the
experiment is judged on responsiveness, not on beating the linear baseline's total
token count. Small tasks, cheap models, and fast seeding keep the tax bearable.
Design discipline
CLAUDE.mdkeeps product knowledge out of infra. Frameworks live inFrameworkConfig, integration knowledge lives in context-mill skills, programsare step arrays. This design fits that. The agent prompts and mini-skills are
markdown content, the same shape as today's skills, so they live in context-mill.
The new wizard-side code is machinery: the queue, the executor, the loader, and
the tools. The runner stays product-ignorant.
Child issues
agentscontent type (parallel to skills)wizard-tools)Ordering: 1 ∥ 2 ∥ 3, then 4, then 5 with 6 in parallel, then
7 (demo-able), then 8, then 9, then 10 later. #2 is in the context-mill repo
and gates #6 onward, so it runs early. The strongest standalone breakpoints are 1,
3, 7, 8.
Definition of done
With the
wizard-orchestratorflag enabled, a clean Next.js appintegrates PostHog end-to-end through the orchestrator (SDK installed, env set, at
least one event instrumented) at parity with the linear baseline, and the whole
run is segmentable from baseline in PostHog telemetry by
VARIANT. With the flagoff, the linear path is byte-for-byte unchanged. Resume after a forced kill is #10.
Implementation status (current)
Built and running end-to-end against real apps (Next.js, Express, Android).
Done: #621, #631, #622, #623, #624, #626, and #625 + #627 (now the full
1:1 flow). Remaining: #628 telemetry, #632 transient task instructions.
#629 resume is dropped, not worth the complexity.
PRs
Wizard, a stacked train of drafts:
Context-mill:
agentscontent type, built and served alongside skills context-mill#181agentscontent type, built and served alongside skills, plus the orchestrator agent prompts and mini-skillsDecisions made during the build, beyond the original issues
buildphase.installonly declares the SDK;the new
buildtask runs the real install + build/typecheck and surfaces anyunresolved conflict (one line in the outro, full detail in the report). See Issue 8: Real task bodies + full 1:1 integration flow #627.
plan-capture, capture, build, dashboard, report.
contract, project context, and the reference-example pointer;
/agentsfilescarry intent only. See Issue 6: Agent-prompt and mini-skill format + seed prompt #625.
task agent is pointed at the detected framework's reference
EXAMPLE.mdtoreference its patterns (not copy).
labels drive the queue panel; the agent task toolsand per-task spinner lines are suppressed so the queue is the sole progress
surface.
user (email) before evaluating flags, so
wizard-orchestratorcan target byemail — previously only
$app_namewas sent.