Skip to content

feat: 2.1.0 — ExecutorErrorCode taxonomy + structured error events (Phase 1)#209

Merged
CocoRoF merged 1 commit into
mainfrom
feat/error-codes-2.1.0
May 21, 2026
Merged

feat: 2.1.0 — ExecutorErrorCode taxonomy + structured error events (Phase 1)#209
CocoRoF merged 1 commit into
mainfrom
feat/error-codes-2.1.0

Conversation

@CocoRoF
Copy link
Copy Markdown
Owner

@CocoRoF CocoRoF commented May 21, 2026

Adds stable, fine-grained error codes to every executor exception so hosts can group errors for logging / Sentry / i18n / telemetry without parsing message strings. Fully backward compatible — every existing call site keeps working unchanged.

Why

The current error surface gives downstream consumers only:

  • An exception class (APIError, StageError, …) — too coarse for i18n.
  • A free-form English message — fragile to parse, hostile to translate.
  • ErrorCategory on APIError only — answers "should I retry?" but not "what specifically broke?".

Hosts working around this have to grep message text (Geny's frontend currently does exactly this for the red "Error: Claude Code CLI is not authenticated…" banner — verbatim English server message, no i18n hook).

What's new

1. ExecutorErrorCode enum (core/errors.py)

~30 codes in exec.<component>.<reason> format spanning the api / cli / pipeline / stage / tool / mutation / mcp components. Naming mirrors the existing ToolErrorCode precedent. Examples:

ExecutorErrorCode.EXEC_CLI_AUTH_FAILED          # = "exec.cli.auth_failed"
ExecutorErrorCode.EXEC_API_RATE_LIMITED         # = "exec.api.rate_limited"
ExecutorErrorCode.EXEC_API_RETRY_EXHAUSTED      # = "exec.api.retry_exhausted"
ExecutorErrorCode.EXEC_STAGE_GUARD_REJECTED     # = "exec.stage.guard_rejected"
ExecutorErrorCode.EXEC_TOOL_CRASHED             # = "exec.tool.crashed"

Full table + recoverability + recommended user-facing action lives in docs/error_codes.md.

2. GenyExecutorError.code attribute

Every executor exception now exposes a code attribute resolved as:

explicit code= kwarg > category-derived (APIError) > subclass _DEFAULT_CODE

code is always set on the exception instance; downstream consumers can rely on it never being None.

3. Structured error event payloads

pipeline.error / stage.error / api.retry events now carry:

{
    "error": "<stringified message>",      # legacy field, preserved
    "code": "exec.cli.auth_failed",         # new, stable identifier
    "exception_type": "geny_executor.core.errors.APIError"  # new
}

Hosts can switch over to data["code"] for i18n / telemetry without disturbing existing consumers that read data["error"].

4. Explicit codes on Stage 6's most operational raise sites

  • EXEC_API_NO_CLIENTstate.llm_client is None config bug
  • EXEC_API_RETRY_EXHAUSTED — recoverable category but max_retries hit
  • EXEC_API_STREAM_INCOMPLETE — stream ended without message_complete

All other Stage 6 APIError raises auto-resolve via the category default mapping — no per-site change needed.

5. docs/error_codes.md

Authoritative reference for every code: recoverability, source raise sites, description, recommended user-facing action. Includes "how to add a new code" + "how to deprecate a code" workflow so the taxonomy stays curated.

6. tests/contract/test_error_codes_stability.py

Pins every shipped code's exact string value in a frozen dict. Renaming / repurposing / accidentally deleting a code now fails CI before release. Verifies:

  • frozen string values match enum (catches renames)
  • every enum member is in the frozen pin (catches additions that bypass the docs flow)
  • all codes match the exec.<component>.<reason> format (lowercase, dot-separated, ≤4 segments)
  • every ErrorCategory has a non-fallback default code (so legacy raises never degrade to exec.unknown)
  • APIError resolution matrix (explicit > category > default) works correctly across all combinations

Stability contract

Once a release ships, a code's string value never changes. Renaming or repurposing is a major-version change. The regression test enforces this.

Backward compatibility

  • All existing call sites work unchanged. raise APIError("...", category=ErrorCategory.CLI_AUTH_FAILED) keeps working and now carries .code = ExecutorErrorCode.EXEC_CLI_AUTH_FAILED for free.
  • All existing event consumers work unchanged. data["code"] is purely additive.
  • No exception class signatures break. Only an optional code= kwarg added.

Roadmap (post-merge)

  • Phase 2: Geny middleware — agent_session.py error-catch paths preserve code, session_logger metadata, WebSocket payload, frontend ErrorBanner renders via i18n key executor.{code}.
  • Phase 3: Refit remaining raise sites (StageError, MutationError, MCP) with explicit codes; drain generic RuntimeError/ValueError to typed exceptions.
  • Phase 4: Doc polish + per-code "what does the user do" advice.

Test results

3138 passed, 8 skipped, 0 failed (+11 new stability tests; previous baseline was 3127).

🤖 Generated with Claude Code

…hase 1)

Adds stable, fine-grained error codes to every executor exception so
hosts can group errors for logging / Sentry / i18n / telemetry
without parsing message strings. Backward compatible: legacy call
sites of the form ``raise APIError("...", category=...)`` keep
working unchanged — the base class derives a sensible code from
the category via ``ExecutorErrorCode.from_category``.

== What's new ==

1. ``ExecutorErrorCode`` (``core/errors.py``) — a string enum with
   ~30 codes in ``exec.<component>.<reason>`` format spanning
   the api/cli/pipeline/stage/tool/mutation/mcp components. Naming
   matches the existing ``ToolErrorCode`` precedent.

2. ``GenyExecutorError.code`` — every executor exception now exposes
   a ``code`` attribute resolved as:

     explicit kwarg (`code=`) > category-derived (`APIError`) > subclass `_DEFAULT_CODE`

   ``code`` is always set on the exception instance; downstream
   consumers can rely on it never being ``None``.

3. Structured error event payloads. ``pipeline.error`` / ``stage.error``
   / ``api.retry`` now carry:

     {
       "error": "<stringified message>",      // legacy field, kept
       "code": "exec.cli.auth_failed",         // new, stable
       "exception_type": "geny_executor.core.errors.APIError"  // new
     }

   Hosts can switch over to ``data["code"]`` without disturbing
   existing consumers reading ``data["error"]``.

4. Explicit code annotations on the most operationally important
   raise sites in Stage 6: ``EXEC_API_NO_CLIENT``,
   ``EXEC_API_RETRY_EXHAUSTED``, ``EXEC_API_STREAM_INCOMPLETE``.
   All other Stage 6 APIError raises auto-resolve via the category
   default mapping.

5. ``docs/error_codes.md`` — authoritative reference for every code:
   recoverability, source raise sites, description, recommended
   user-facing action. Includes a "how to add a new code" /
   "how to deprecate a code" workflow so the taxonomy stays
   curated rather than sprawling.

6. ``tests/contract/test_error_codes_stability.py`` — pins every
   shipped code's exact string value in a frozen dict. Renaming /
   repurposing / accidentally deleting a code now fails CI before
   release. Verifies:

     - frozen string values match enum (catches renames)
     - every enum member appears in the frozen pin (catches
       additions that didn't go through the docs flow)
     - all codes match the ``exec.<component>.<reason>`` format
     - every ``ErrorCategory`` has a non-fallback default code
     - ``APIError`` resolution matrix (explicit > category > default)

== Stability contract ==

Once published in a release, a code's string value never changes.
Renaming or repurposing is a major-version change. The regression
test enforces this.

== Backward compatibility ==

- All existing ``raise APIError("...", category=...)`` call sites
  work unchanged and now carry a sensible ``.code`` for free.
- All existing event consumers reading ``data["error"]`` work
  unchanged; ``data["code"]`` is purely additive.
- No exception class signatures changed beyond accepting an
  optional ``code=`` kwarg.

3138 passed, 8 skipped, 0 failed (+11 new stability tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@CocoRoF CocoRoF merged commit 75b8a66 into main May 21, 2026
4 of 6 checks passed
@CocoRoF CocoRoF deleted the feat/error-codes-2.1.0 branch May 21, 2026 00:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant