API Arbitration

How we resolve API design disputes with data instead of opinion. This is the operational process for choosing between API shapes when the right answer isn't obvious.

When to Use This

Not every API decision needs arbitration. Use this process when:

You're choosing between 2–3 viable API shapes and can argue for any of them
Naming isn't obvious (it rarely is)
The "right" answer depends on how someone will actually reach for it in practice
Two people disagree and both have reasonable arguments
A spec review surfaces uncertainty that can't be resolved by checking conventions alone

Don't use this for:

Decisions already covered by API Conventions (just follow the convention)
Bug fixes or implementation details that don't change the public API
Cases where only one option actually works (no arbitration needed — ship it)

The Process

Four phases, in order. Don't skip phases — each one feeds the next.

Phase 1: Explore API Options

Enumerate the realistic design space. Don't anchor on two options prematurely — consider the full range:

Dimension	Options to explore
Abstraction level	Hook vs prop vs wrapper component vs context provider
Naming	What would a naive developer search for? What mental models does each name trigger?
Configuration	Plain boolean vs `boolean \| config` object vs separate props vs children-based
Ownership	State on parent (hoisted) vs state on child (local) vs shared context
Composition	Slot on parent vs standalone child vs render prop vs compound components
Granularity	One component with modes vs separate components per mode

For each viable option, write out what consumer code looks like — not the implementation, the usage. Write at least 3 realistic usage snippets. If you can't write 3, the option probably isn't viable.

Output: 2–4 candidate API shapes, each with consumer code examples.

Phase 2: Enumerate Use Cases

Before testing, exhaustively list the scenarios the API must handle. This is the step that prevents you from only testing the happy path and declaring victory.

Every API arbitration should cover at minimum:

Case	What it tests
Simple/default	The 80% use case. Zero config. Does it just work?
Configured	Custom options beyond defaults. Does configuration feel natural?
Controlled	External state management. Can the consumer own the state?
Composed	Inside Dialog, Table, AppShell, Card. Does it play well with siblings?
Edge/mixed	The scenario that reveals friction — mixed modes, dynamic switching, responsive changes
Migration	What does adopting this look like for someone with nothing today?

Add domain-specific cases as needed. For resize: "two resizable panels side by side." For a selector: "1000 items with search." For navigation: "mobile drawer vs desktop sidebar from the same source."

Output: A numbered list of 5–10 specific scenarios to test against.

Phase 3: Craft Naive Prompts and Test

This is where the vibe test methodology applies. The core constraint: prompts describe desired UX, never name components or props.

Writing Prompts

❌ Bad (names the solution)	✅ Good (describes the experience)
"Make the sidebar resizable using useResizable"	"Build a layout where the user can drag the sidebar edge to make it wider or narrower"
"Add a collapsible prop to the card"	"Build a FAQ page where each question can be expanded to show its answer"
"Use Selector with async loading"	"Build a people picker that searches a remote API as the user types"

Each prompt should map to one of your Phase 2 use cases. Write one prompt per case minimum.

Running the Test

For each (prompt × API option):

Write a minimal skill doc for that option (200–400 words). Document ONLY that API shape — component name, props with types, 2 usage examples, one anti-pattern. Keep all docs the same length and structure. The only variable is the API itself.
Generate code as if the skill doc is the only reference. The generator must not know about the other options. See Vibe Evaluation#Sub-Agent Isolation for how to enforce this with sub-agents. If running single-model (acknowledged contamination), note where cross-knowledge may have influenced results.
Collect the output — the generated code, what the agent reached for, where it hesitated, what it hallucinated.

What to Look For

Signal	What it means
Agent finds the API immediately	Good discoverability — the name matches the mental model
Agent halluccinates props that don't exist	The name triggered associations with another library's API
Agent transforms option A into option B	Option B is how people actually think about the problem
Agent adds wrapper divs or extra state	The API has a composition gap
Agent uses the API correctly but code is verbose	The abstraction level might be wrong
Agent ignores a feature entirely	The feature isn't discoverable from the docs
All options produce identical code	The difference doesn't matter — pick the simpler one

Output: Raw results for each (prompt × option) — generated code, observations, escape hatches used.

Phase 4: Problem-Solve Until Resolved

The first round rarely produces a clear winner. This phase is iterative.

Decision Patterns

Observation	Action
One option wins on 4/5 prompts, ties on 1	Ship the winner. The tie doesn't matter.
Options split — each wins on different cases	The abstraction is wrong. Neither is the answer. Revisit Phase 1.
One option triggers consistent hallucinations	The naming conflicts with prior art (Radix, shadcn, MUI). Rename or restructure.
LLM transforms one option into the other	The "losing" option is how people think. The "winning" option is how the system works. Consider shipping the mental-model version.
Neither option handles the edge case	The edge case reveals a gap. Add it to the spec. May need a new approach entirely.
Results are inconclusive / too close to call	Tie-break on: fewer props > fewer concepts > fewer characters > matches existing Astryx patterns.

Iteration

If no clear winner emerges:

Narrow the question. Maybe the difference only matters for one specific case. Test that case in isolation with tighter prompts.
Blend approaches. The winner on simple cases + the winner on advanced cases might be combinable (e.g., prop for simple, hook for advanced).
Challenge the premise. If three rounds of testing don't resolve it, the component might be trying to do too much. Consider splitting.

Output: A decision with evidence. Document which option won, on which prompts, and why. This becomes part of the spec.

Sample Prompt

Copy this template into your AI assistant to run an API arbitration. Fill in the bracketed sections.

# API Arbitration: [Component/Feature Name]

## Context

I'm designing the API for [brief description — what the feature does and
why it exists]. We need to choose between [N] candidate approaches.

## Candidate APIs

### Option A: [Short name — e.g. "wrapper component"]
```tsx
// Simple case
[consumer code]

// Configured case
[consumer code]

// Controlled case
[consumer code]
```

### Option B: [Short name — e.g. "hook with spread"]
```tsx
// Simple case
[consumer code]

// Configured case
[consumer code]

// Controlled case
[consumer code]
```

### Option C: [Short name — e.g. "boolean-or-config prop"] (if applicable)
```tsx
[same structure]
```

## Use Cases

1. [Simple/default — describe the 80% scenario]
2. [Configured — describe customization beyond defaults]
3. [Controlled — describe external state ownership]
4. [Composed — describe usage inside a parent like AppShell or Dialog]
5. [Edge case — describe the scenario that reveals friction]

## Instructions

For each candidate API, do the following:

**Step 1: Write a skill doc** (200-400 words) that documents ONLY that
option. Include:
- Component/hook name and import path
- Props/parameters with types and defaults
- 2 usage examples (simple + configured)
- 1 anti-pattern ("don't do this because...")

Keep all docs the same length. The only variable is the API shape.

**Step 2: For each use case, write a naive prompt** that describes the
desired UX without naming any components or props. These prompts must be
identical across all options.

**Step 3: Generate code** for each (prompt × option) as if the skill doc
is your only reference. For each generation:
- Note what you reached for first
- Note where you hesitated or re-read the doc
- Note any props/components you wanted to use but couldn't find
- Flag if prior knowledge of another option influenced you

**Step 4: Evaluate** each result:
- Hallucinations (props/components that don't exist in the skill doc)
- Lines of code and boilerplate ratio
- Escape hatches (dropping out of the system to raw CSS/HTML)
- Would the code survive a new use case without rewriting?
- Does someone unfamiliar with the API understand the code on first read?

**Step 5: Synthesize**
- Which option had the lowest friction across all prompts?
- Which triggered the most hallucinations?
- Did any prompt reveal a fundamental limitation?
- Recommendation with specific evidence from the results.

## Contamination Note

If running this as a single agent (not isolated sub-agents), results are
biased by cross-knowledge between approaches. Relative signal is still
useful — but for high-stakes decisions, re-run with isolated agents per
[[Vibe Evaluation#Sub-Agent Isolation]].

Worked Example: Resize API

This is a real decision from Astryx development that followed this process.

Phase 1: Options

Option	Shape	Consumer code
A: Wrapper	Component wraps target	`<Resizable defaultWidth={260}><SideNav /></Resizable>`
B: Hook	Returns props to spread	`<SideNav {...useResizable({ defaultWidth: 260 })} />`
C: Prop	Config on the target	`<SideNav resizable={{ defaultWidth: 260, onWidthChange }} />`

Phase 2: Use Cases

Simple sidebar resize (drag to widen/narrow, default constraints)
Resize with min/max bounds and width persistence
Resize with collapse — drag past minimum triggers collapse animation
Resize in a constrained layout (AppShell header + sidebar)
Two resizable panels sharing available space

Phase 3: Results

Prompts were written for each case. Key findings:

Option A (Wrapper):

Added an extra DOM node in every case — unavoidable structural cost
Two-panel case required nested wrappers with confusing ordering
LLMs understood the API immediately (familiar pattern from libraries like react-resizable)
But generated code had wrapper-ordering bugs in 2/5 cases

Option B (Hook):

Composed cleanly with no extra DOM
Required understanding "spread props" — some LLMs produced {...resize} in the wrong position
Handle placement was ambiguous: does the hook add the handle, or does the consumer?
Lowest boilerplate for the simple case

Option C (Prop):

Most discoverable — LLMs found resizable as a prop without any hesitation
Config object felt natural for the configured/controlled cases
Limitation: only works on components that explicitly accept the prop
Cleanest generated code across all 5 prompts

Phase 4: Resolution

No single option won all cases. The resolution:

Ship C (prop) for SideNav — the most common resize target, and the prop covers all real scenarios that came up
Ship B (hook) as the general primitive — any element can be made resizable
Drop A (wrapper) — DOM bloat for zero benefit in any tested scenario

The prop version calls the hook internally. One implementation, two API surfaces matching different use cases. Documented in API Conventions#Behaviors: Hooks Over Wrappers.

Key insight: The prop was validated by stress-testing all use cases against it. If any case had required the hook's flexibility, we would have shipped hook-only. The prop exists because the common cases didn't need that flexibility — not because props are inherently better than hooks.

Running with Isolated Sub-Agents

For high-confidence results (new components, contentious decisions), use isolated sub-agents instead of single-model evaluation. See Vibe Evaluation#Sub-Agent Isolation for the full methodology and contamination risks.

The short version:

One agent per (prompt × option) — 5 prompts × 3 options = 15 agents
Each agent gets ONLY its option's skill doc. No knowledge of alternatives.
Include the blank-slate constraint: "You have NO prior knowledge of any design system."
After all agents complete, spawn a judge agent that sees all outputs side-by-side and scores comparatively.

Single-model evaluation (one agent doing all options sequentially) is fine for quick directional signal, early exploration, or cases where the decision isn't high-stakes. Just acknowledge the contamination.

Recording Decisions

Every API arbitration that ships a result should be documented:

In the spec issue — link to the vibe test results, note which option won and why
In the API Conventions wiki — if the decision establishes a new convention (like "hooks over wrappers"), add it
In the component's doc — the {Name}.doc.mjs should reference the decision if consumers might wonder "why is it this way?"

This creates a trail. When someone proposes changing a decided API, they can see the evidence behind the current choice — and either accept it or propose a re-test with new evidence.

API Conventions — The documented conventions that API review checks against
Vibe Tests — Full evaluation methodology (comparative harness, scoring, reports)
Vibe Evaluation#Sub-Agent Isolation — How to prevent contamination in testing
Vibe Evaluation#Judge Agent Evaluation — Comparative scoring by a dedicated judge
Component Specification Protocol — Where arbitration fits in the spec process (Phase 8)
Contributing with AI Assistants — How contributors encounter this process
Agent Init Prompt Vibe Testing — Related but different: testing the CLI init prompt, not component APIs

Uh oh!

API Arbitration

API Arbitration

When to Use This

The Process

Phase 1: Explore API Options

Phase 2: Enumerate Use Cases

Phase 3: Craft Naive Prompts and Test

Writing Prompts

Running the Test

What to Look For

Phase 4: Problem-Solve Until Resolved

Decision Patterns

Iteration

Sample Prompt

Worked Example: Resize API

Phase 1: Options

Phase 2: Use Cases

Phase 3: Results

Phase 4: Resolution

Running with Isolated Sub-Agents

Recording Decisions

Related

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally