Skip to content

API Arbitration

Cindy Zhang edited this page Jun 23, 2026 · 1 revision

API Arbitration

How we resolve API design disputes with data instead of opinion. This is the operational process for choosing between API shapes when the right answer isn't obvious.


When to Use This

Not every API decision needs arbitration. Use this process when:

  • You're choosing between 2–3 viable API shapes and can argue for any of them
  • Naming isn't obvious (it rarely is)
  • The "right" answer depends on how someone will actually reach for it in practice
  • Two people disagree and both have reasonable arguments
  • A spec review surfaces uncertainty that can't be resolved by checking conventions alone

Don't use this for:

  • Decisions already covered by API Conventions (just follow the convention)
  • Bug fixes or implementation details that don't change the public API
  • Cases where only one option actually works (no arbitration needed — ship it)

The Process

Four phases, in order. Don't skip phases — each one feeds the next.

Phase 1: Explore API Options

Enumerate the realistic design space. Don't anchor on two options prematurely — consider the full range:

Dimension Options to explore
Abstraction level Hook vs prop vs wrapper component vs context provider
Naming What would a naive developer search for? What mental models does each name trigger?
Configuration Plain boolean vs boolean | config object vs separate props vs children-based
Ownership State on parent (hoisted) vs state on child (local) vs shared context
Composition Slot on parent vs standalone child vs render prop vs compound components
Granularity One component with modes vs separate components per mode

For each viable option, write out what consumer code looks like — not the implementation, the usage. Write at least 3 realistic usage snippets. If you can't write 3, the option probably isn't viable.

Output: 2–4 candidate API shapes, each with consumer code examples.

Phase 2: Enumerate Use Cases

Before testing, exhaustively list the scenarios the API must handle. This is the step that prevents you from only testing the happy path and declaring victory.

Every API arbitration should cover at minimum:

Case What it tests
Simple/default The 80% use case. Zero config. Does it just work?
Configured Custom options beyond defaults. Does configuration feel natural?
Controlled External state management. Can the consumer own the state?
Composed Inside Dialog, Table, AppShell, Card. Does it play well with siblings?
Edge/mixed The scenario that reveals friction — mixed modes, dynamic switching, responsive changes
Migration What does adopting this look like for someone with nothing today?

Add domain-specific cases as needed. For resize: "two resizable panels side by side." For a selector: "1000 items with search." For navigation: "mobile drawer vs desktop sidebar from the same source."

Output: A numbered list of 5–10 specific scenarios to test against.

Phase 3: Craft Naive Prompts and Test

This is where the vibe test methodology applies. The core constraint: prompts describe desired UX, never name components or props.

Writing Prompts

❌ Bad (names the solution) ✅ Good (describes the experience)
"Make the sidebar resizable using useResizable" "Build a layout where the user can drag the sidebar edge to make it wider or narrower"
"Add a collapsible prop to the card" "Build a FAQ page where each question can be expanded to show its answer"
"Use Selector with async loading" "Build a people picker that searches a remote API as the user types"

Each prompt should map to one of your Phase 2 use cases. Write one prompt per case minimum.

Running the Test

For each (prompt × API option):

  1. Write a minimal skill doc for that option (200–400 words). Document ONLY that API shape — component name, props with types, 2 usage examples, one anti-pattern. Keep all docs the same length and structure. The only variable is the API itself.

  2. Generate code as if the skill doc is the only reference. The generator must not know about the other options. See Vibe Evaluation#Sub-Agent Isolation for how to enforce this with sub-agents. If running single-model (acknowledged contamination), note where cross-knowledge may have influenced results.

  3. Collect the output — the generated code, what the agent reached for, where it hesitated, what it hallucinated.

What to Look For

Signal What it means
Agent finds the API immediately Good discoverability — the name matches the mental model
Agent halluccinates props that don't exist The name triggered associations with another library's API
Agent transforms option A into option B Option B is how people actually think about the problem
Agent adds wrapper divs or extra state The API has a composition gap
Agent uses the API correctly but code is verbose The abstraction level might be wrong
Agent ignores a feature entirely The feature isn't discoverable from the docs
All options produce identical code The difference doesn't matter — pick the simpler one

Output: Raw results for each (prompt × option) — generated code, observations, escape hatches used.

Phase 4: Problem-Solve Until Resolved

The first round rarely produces a clear winner. This phase is iterative.

Decision Patterns

Observation Action
One option wins on 4/5 prompts, ties on 1 Ship the winner. The tie doesn't matter.
Options split — each wins on different cases The abstraction is wrong. Neither is the answer. Revisit Phase 1.
One option triggers consistent hallucinations The naming conflicts with prior art (Radix, shadcn, MUI). Rename or restructure.
LLM transforms one option into the other The "losing" option is how people think. The "winning" option is how the system works. Consider shipping the mental-model version.
Neither option handles the edge case The edge case reveals a gap. Add it to the spec. May need a new approach entirely.
Results are inconclusive / too close to call Tie-break on: fewer props > fewer concepts > fewer characters > matches existing Astryx patterns.

Iteration

If no clear winner emerges:

  1. Narrow the question. Maybe the difference only matters for one specific case. Test that case in isolation with tighter prompts.
  2. Blend approaches. The winner on simple cases + the winner on advanced cases might be combinable (e.g., prop for simple, hook for advanced).
  3. Challenge the premise. If three rounds of testing don't resolve it, the component might be trying to do too much. Consider splitting.

Output: A decision with evidence. Document which option won, on which prompts, and why. This becomes part of the spec.


Sample Prompt

Copy this template into your AI assistant to run an API arbitration. Fill in the bracketed sections.

# API Arbitration: [Component/Feature Name]

## Context

I'm designing the API for [brief description — what the feature does and
why it exists]. We need to choose between [N] candidate approaches.

## Candidate APIs

### Option A: [Short name — e.g. "wrapper component"]
```tsx
// Simple case
[consumer code]

// Configured case
[consumer code]

// Controlled case
[consumer code]
```

### Option B: [Short name — e.g. "hook with spread"]
```tsx
// Simple case
[consumer code]

// Configured case
[consumer code]

// Controlled case
[consumer code]
```

### Option C: [Short name — e.g. "boolean-or-config prop"] (if applicable)
```tsx
[same structure]
```

## Use Cases

1. [Simple/default — describe the 80% scenario]
2. [Configured — describe customization beyond defaults]
3. [Controlled — describe external state ownership]
4. [Composed — describe usage inside a parent like AppShell or Dialog]
5. [Edge case — describe the scenario that reveals friction]

## Instructions

For each candidate API, do the following:

**Step 1: Write a skill doc** (200-400 words) that documents ONLY that
option. Include:
- Component/hook name and import path
- Props/parameters with types and defaults
- 2 usage examples (simple + configured)
- 1 anti-pattern ("don't do this because...")

Keep all docs the same length. The only variable is the API shape.

**Step 2: For each use case, write a naive prompt** that describes the
desired UX without naming any components or props. These prompts must be
identical across all options.

**Step 3: Generate code** for each (prompt × option) as if the skill doc
is your only reference. For each generation:
- Note what you reached for first
- Note where you hesitated or re-read the doc
- Note any props/components you wanted to use but couldn't find
- Flag if prior knowledge of another option influenced you

**Step 4: Evaluate** each result:
- Hallucinations (props/components that don't exist in the skill doc)
- Lines of code and boilerplate ratio
- Escape hatches (dropping out of the system to raw CSS/HTML)
- Would the code survive a new use case without rewriting?
- Does someone unfamiliar with the API understand the code on first read?

**Step 5: Synthesize**
- Which option had the lowest friction across all prompts?
- Which triggered the most hallucinations?
- Did any prompt reveal a fundamental limitation?
- Recommendation with specific evidence from the results.

## Contamination Note

If running this as a single agent (not isolated sub-agents), results are
biased by cross-knowledge between approaches. Relative signal is still
useful — but for high-stakes decisions, re-run with isolated agents per
[[Vibe Evaluation#Sub-Agent Isolation]].

Worked Example: Resize API

This is a real decision from Astryx development that followed this process.

Phase 1: Options

Option Shape Consumer code
A: Wrapper Component wraps target <Resizable defaultWidth={260}><SideNav /></Resizable>
B: Hook Returns props to spread <SideNav {...useResizable({ defaultWidth: 260 })} />
C: Prop Config on the target <SideNav resizable={{ defaultWidth: 260, onWidthChange }} />

Phase 2: Use Cases

  1. Simple sidebar resize (drag to widen/narrow, default constraints)
  2. Resize with min/max bounds and width persistence
  3. Resize with collapse — drag past minimum triggers collapse animation
  4. Resize in a constrained layout (AppShell header + sidebar)
  5. Two resizable panels sharing available space

Phase 3: Results

Prompts were written for each case. Key findings:

Option A (Wrapper):

  • Added an extra DOM node in every case — unavoidable structural cost
  • Two-panel case required nested wrappers with confusing ordering
  • LLMs understood the API immediately (familiar pattern from libraries like react-resizable)
  • But generated code had wrapper-ordering bugs in 2/5 cases

Option B (Hook):

  • Composed cleanly with no extra DOM
  • Required understanding "spread props" — some LLMs produced {...resize} in the wrong position
  • Handle placement was ambiguous: does the hook add the handle, or does the consumer?
  • Lowest boilerplate for the simple case

Option C (Prop):

  • Most discoverable — LLMs found resizable as a prop without any hesitation
  • Config object felt natural for the configured/controlled cases
  • Limitation: only works on components that explicitly accept the prop
  • Cleanest generated code across all 5 prompts

Phase 4: Resolution

No single option won all cases. The resolution:

  • Ship C (prop) for SideNav — the most common resize target, and the prop covers all real scenarios that came up
  • Ship B (hook) as the general primitive — any element can be made resizable
  • Drop A (wrapper) — DOM bloat for zero benefit in any tested scenario

The prop version calls the hook internally. One implementation, two API surfaces matching different use cases. Documented in API Conventions#Behaviors: Hooks Over Wrappers.

Key insight: The prop was validated by stress-testing all use cases against it. If any case had required the hook's flexibility, we would have shipped hook-only. The prop exists because the common cases didn't need that flexibility — not because props are inherently better than hooks.


Running with Isolated Sub-Agents

For high-confidence results (new components, contentious decisions), use isolated sub-agents instead of single-model evaluation. See Vibe Evaluation#Sub-Agent Isolation for the full methodology and contamination risks.

The short version:

  1. One agent per (prompt × option) — 5 prompts × 3 options = 15 agents
  2. Each agent gets ONLY its option's skill doc. No knowledge of alternatives.
  3. Include the blank-slate constraint: "You have NO prior knowledge of any design system."
  4. After all agents complete, spawn a judge agent that sees all outputs side-by-side and scores comparatively.

Single-model evaluation (one agent doing all options sequentially) is fine for quick directional signal, early exploration, or cases where the decision isn't high-stakes. Just acknowledge the contamination.


Recording Decisions

Every API arbitration that ships a result should be documented:

  1. In the spec issue — link to the vibe test results, note which option won and why
  2. In the API Conventions wiki — if the decision establishes a new convention (like "hooks over wrappers"), add it
  3. In the component's doc — the {Name}.doc.mjs should reference the decision if consumers might wonder "why is it this way?"

This creates a trail. When someone proposes changing a decided API, they can see the evidence behind the current choice — and either accept it or propose a re-test with new evidence.


Related

Clone this wiki locally