From 6e53b0b16be94162932bc9120556fd87b6a2639c Mon Sep 17 00:00:00 2001 From: CocoRoF Date: Thu, 21 May 2026 09:52:58 +0900 Subject: [PATCH] =?UTF-8?q?feat:=202.1.0=20=E2=80=94=20ExecutorErrorCode?= =?UTF-8?q?=20taxonomy=20+=20structured=20error=20events=20(Phase=201)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds stable, fine-grained error codes to every executor exception so hosts can group errors for logging / Sentry / i18n / telemetry without parsing message strings. Backward compatible: legacy call sites of the form ``raise APIError("...", category=...)`` keep working unchanged — the base class derives a sensible code from the category via ``ExecutorErrorCode.from_category``. == What's new == 1. ``ExecutorErrorCode`` (``core/errors.py``) — a string enum with ~30 codes in ``exec..`` format spanning the api/cli/pipeline/stage/tool/mutation/mcp components. Naming matches the existing ``ToolErrorCode`` precedent. 2. ``GenyExecutorError.code`` — every executor exception now exposes a ``code`` attribute resolved as: explicit kwarg (`code=`) > category-derived (`APIError`) > subclass `_DEFAULT_CODE` ``code`` is always set on the exception instance; downstream consumers can rely on it never being ``None``. 3. Structured error event payloads. ``pipeline.error`` / ``stage.error`` / ``api.retry`` now carry: { "error": "", // legacy field, kept "code": "exec.cli.auth_failed", // new, stable "exception_type": "geny_executor.core.errors.APIError" // new } Hosts can switch over to ``data["code"]`` without disturbing existing consumers reading ``data["error"]``. 4. Explicit code annotations on the most operationally important raise sites in Stage 6: ``EXEC_API_NO_CLIENT``, ``EXEC_API_RETRY_EXHAUSTED``, ``EXEC_API_STREAM_INCOMPLETE``. All other Stage 6 APIError raises auto-resolve via the category default mapping. 5. ``docs/error_codes.md`` — authoritative reference for every code: recoverability, source raise sites, description, recommended user-facing action. Includes a "how to add a new code" / "how to deprecate a code" workflow so the taxonomy stays curated rather than sprawling. 6. ``tests/contract/test_error_codes_stability.py`` — pins every shipped code's exact string value in a frozen dict. Renaming / repurposing / accidentally deleting a code now fails CI before release. Verifies: - frozen string values match enum (catches renames) - every enum member appears in the frozen pin (catches additions that didn't go through the docs flow) - all codes match the ``exec..`` format - every ``ErrorCategory`` has a non-fallback default code - ``APIError`` resolution matrix (explicit > category > default) == Stability contract == Once published in a release, a code's string value never changes. Renaming or repurposing is a major-version change. The regression test enforces this. == Backward compatibility == - All existing ``raise APIError("...", category=...)`` call sites work unchanged and now carry a sensible ``.code`` for free. - All existing event consumers reading ``data["error"]`` work unchanged; ``data["code"]`` is purely additive. - No exception class signatures changed beyond accepting an optional ``code=`` kwarg. 3138 passed, 8 skipped, 0 failed (+11 new stability tests). Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/error_codes.md | 152 +++++++++++++ pyproject.toml | 2 +- src/geny_executor/__init__.py | 4 +- src/geny_executor/core/errors.py | 208 +++++++++++++++++- src/geny_executor/core/pipeline.py | 41 +++- .../stages/s06_api/artifact/default/stage.py | 18 +- tests/contract/test_error_codes_stability.py | 203 +++++++++++++++++ 7 files changed, 608 insertions(+), 20 deletions(-) create mode 100644 docs/error_codes.md create mode 100644 tests/contract/test_error_codes_stability.py diff --git a/docs/error_codes.md b/docs/error_codes.md new file mode 100644 index 0000000..fd63fa1 --- /dev/null +++ b/docs/error_codes.md @@ -0,0 +1,152 @@ +# geny-executor Error Codes + +**Since:** 2.1.0 +**Source of truth:** `src/geny_executor/core/errors.py` (`ExecutorErrorCode` enum) + +Every exception raised by geny-executor carries a stable string identifier +in the form `exec..`. Hosts use this code for: + +- **Logging / Sentry grouping** — drop the free-form `str(exception)` from + your dashboards and group on the code instead. +- **i18n** — map each code to a localized message template in your UI + layer (see [Geny's example](https://github.com/CocoRoF/Geny/blob/main/frontend/src/lib/i18n/en.ts)). +- **Telemetry routing** — alert differently for `exec.api.*` (vendor + outages) vs `exec.cli.*` (host config bugs). +- **Retry / fallback decisions** — recoverability is also exposed via + `ErrorCategory.is_recoverable`, but `code` lets you fine-tune. + +## Stability contract + +- Once published in a release, a code's string value **never changes**. +- Renaming or repurposing a code is a **breaking change** — deprecate + the old code, add a new one. +- Adding new codes is non-breaking and ships in minor versions. +- The `tests/error_codes/test_code_stability.py` regression locks the + string values so accidental rename CI-fails before release. + +## Where the code surfaces + +Every `GenyExecutorError` subclass exposes the code as the `code` +attribute. The pipeline's structured events (`stage.error`, +`pipeline.error`, `api.retry`) also carry it: + +```json +{ + "type": "pipeline.error", + "data": { + "error": "Claude Code CLI is not authenticated …", + "code": "exec.cli.auth_failed", + "exception_type": "geny_executor.core.errors.APIError" + } +} +``` + +## Code table + +### `exec.api.*` — vendor API surface + +These come from the SDK-driven providers (Anthropic, OpenAI, Google, +vLLM). The companion `ErrorCategory` on the `APIError` decides retry +behavior; the code is the stable identifier consumers branch on. + +| Code | Recoverable? | Source | Description | +|------|---|---|---| +| `exec.api.auth.invalid_key` | ❌ no | `APIError(category=AUTH)` | API key missing / malformed / rejected by vendor. Action: paste a valid key in the host's LLM Backends settings. | +| `exec.api.auth.expired` | ❌ no | `APIError(category=AUTH)` *(future use)* | Vendor reports the credential is past its TTL. Action: re-issue / refresh. | +| `exec.api.rate_limited` | ✅ yes | `APIError(category=RATE_LIMITED)` | Vendor 429. The retry strategy backs off and retries automatically. Persisted-rate errors after `EXEC_API_RETRY_EXHAUSTED`. | +| `exec.api.timeout` | ✅ yes | `APIError(category=TIMEOUT)` | Request exceeded the per-call timeout. Retry with backoff. | +| `exec.api.network` | ✅ yes | `APIError(category=NETWORK)` | Connection reset / DNS / TLS / transport. Retry with backoff. | +| `exec.api.token_limit` | ❌ no | `APIError(category=TOKEN_LIMIT)` | Prompt + max_tokens exceeded the model's context window. Action: shrink context or pick a larger-window model. | +| `exec.api.bad_request` | ❌ no | `APIError(category=BAD_REQUEST)` | Vendor 4xx other than auth/rate-limit. Usually a schema bug in the host's request shape. | +| `exec.api.server_error` | ✅ yes | `APIError(category=SERVER_ERROR)` | Vendor 5xx. Retried by the executor. | +| `exec.api.terminal` | ❌ no | `APIError(category=TERMINAL)` | Vendor declared the request fatally unprocessable (e.g. policy block). Don't retry. | +| `exec.api.unknown` | ❌ no | `APIError(category=UNKNOWN)` | Catch-all for vendor errors the executor couldn't classify. Investigate the underlying cause. | +| `exec.api.no_client` | ❌ no | Stage 6 build error | `state.llm_client` is `None`. Host forgot to call `Pipeline.from_manifest(credentials=…)` or `attach_runtime(llm_client=…)`. | +| `exec.api.stream_incomplete` | ❌ no | Stage 6 streaming | The stream ended without a `message_complete` event. Usually a vendor SDK bug or an interrupted upstream connection. | +| `exec.api.retry_exhausted` | ❌ no | Stage 6 retry loop | Hit `max_retries` after a recoverable error category. Look at the chained cause for the original failure. | + +### `exec.cli.*` — CLI-driven backends (currently `claude_code_cli`) + +| Code | Recoverable? | Source | Description | +|------|---|---|---| +| `exec.cli.binary_not_found` | ❌ no | `APIError(category=CLI_NOT_FOUND)` | The CLI binary (e.g. `claude`) is not on `PATH` and `binary_path` was not set. Action: install the CLI or configure the binary path. | +| `exec.cli.auth_failed` | ❌ no | `APIError(category=CLI_AUTH_FAILED)` | The spawned CLI reported `authentication_failed`. Action: re-run the CLI's login command (e.g. `claude auth login`) or paste a valid `ANTHROPIC_API_KEY`. | +| `exec.cli.timeout` | ✅ yes | `APIError(category=CLI_TIMEOUT)` | The CLI did not return within the configured `timeout_s`. Retry. | +| `exec.cli.protocol_error` | ✅ yes | `APIError(category=CLI_PROTOCOL_ERROR)` | The CLI emitted malformed stream-json output or unrecognised envelope. Retry; report if it persists. | +| `exec.cli.permission_denied` | ❌ no | `APIError(category=CLI_PERMISSION_DENIED)` | The CLI's permission system blocked the call (e.g. `--dangerously-skip-permissions` was attempted as root). Action: configure `permissions.allow` in the spawned settings. | +| `exec.cli.exited` | ✅ yes | CLI subprocess non-zero exit | The CLI process exited with a non-zero return code outside the categorised cases above. Inspect the chained cause. | + +### `exec.pipeline.*` / `exec.stage.*` — orchestration + +| Code | Source | Description | +|------|---|---| +| `exec.pipeline.not_initialized` | `PipelineError` | The pipeline was used before `build()` / `from_manifest()` was called. | +| `exec.pipeline.invalid_manifest` | `PipelineError` *(future use)* | The manifest's schema/strict load rejected the configuration. | +| `exec.stage.failed` | `StageError` (default) | A stage raised an exception that was wrapped by the pipeline's stage runner. Inspect the chained cause for the original failure. | +| `exec.stage.guard_rejected` | `GuardRejectError` | A Stage 4 guard refused execution (budget / cost / iteration / permission). The `guard_name` field on the exception identifies which guard. | + +### `exec.tool.*` — Stage 10 tool dispatch + +These mirror the existing `ToolErrorCode` enum at the routing layer. +Host pipelines see them surface via `ToolError.code` on the +`tool_result` payload too. + +| Code | Source | Description | +|------|---|---| +| `exec.tool.unknown` | `RegistryRouter.unknown_tool()` | The LLM emitted a `tool_use` for a name that isn't registered. Usually a hallucination or a stale registry. | +| `exec.tool.invalid_input` | `RegistryRouter.invalid_input()` | The tool's input schema validation failed. `details.field_path` says where. | +| `exec.tool.access_denied` | `RegistryRouter.access_denied()` | The session's tool binding disallows this tool. | +| `exec.tool.crashed` | `RegistryRouter.tool_crashed()` | The tool's `execute()` raised an unexpected exception. `details.exception_type` carries the class. | +| `exec.tool.transport` | `RegistryRouter.transport()` | MCP adapter / RPC transport failure. `details.server` identifies the server. | + +### `exec.mutation.*` — runtime config mutation + +| Code | Source | Description | +|------|---|---| +| `exec.mutation.invalid` | `MutationError` | Bad stage / slot / impl in the mutation request. | +| `exec.mutation.locked` | `MutationLocked` | The target stage is currently executing; try again after stage exit. | + +### `exec.mcp.*` — MCP server lifecycle (host-attached servers) + +| Code | Source | Description | +|------|---|---| +| `exec.mcp.connect_failed` | `MCPConnectionError(phase="connect")` | Could not reach the MCP server (transport / process spawn / handshake). | +| `exec.mcp.initialize_failed` | `MCPConnectionError(phase="initialize")` | The MCP server connected but the `initialize` handshake failed. | +| `exec.mcp.list_tools_failed` | `MCPConnectionError(phase="list_tools")` | `tools/list` errored after a successful initialize. | +| `exec.mcp.sdk_missing` | `MCPConnectionError(phase="sdk_missing")` | The MCP SDK is not installed in the host's environment. | + +### `exec.unknown` — fallback + +| Code | When | Description | +|------|---|---| +| `exec.unknown` | last resort | The exception is a non-`GenyExecutorError` (e.g. raw `RuntimeError` / `ValueError`) and no code could be inferred. Indicates a raise site that hasn't been migrated to the typed exception hierarchy yet — file an issue. | + +## How to add a new code + +1. Add the enum value to `ExecutorErrorCode` in `core/errors.py`. + Lowercase, dot-separated, ≤4 segments. +2. Add a row to the table above under the right component. +3. If the code corresponds to a legacy `ErrorCategory`, extend + `_CATEGORY_TO_CODE_DEFAULT` so existing call sites pick it up. +4. The stability regression test will auto-pick up the new code; no + test change needed for *additions*. + +## How to deprecate a code + +Don't delete it. Instead: + +1. Mark the enum value with a deprecation comment. +2. Add a new code and migrate raise sites incrementally. +3. Keep the deprecated value in the table with a "deprecated → new code" + note for at least one minor-version cycle. +4. Only remove the enum value in a **major** version bump. + +## Migration phases + +Phase 1 (this release, 2.1.0) — critical-path raise sites in +`claude_code.py` and `s06_api/stage.py`; all `APIError(category=…)` +sites automatically inherit the right code via `from_category`. + +Phase 2+ (planned) — refit stage/guard/mutation/MCP raise sites with +explicit codes, drain generic `RuntimeError`/`ValueError` to typed +exceptions where appropriate. diff --git a/pyproject.toml b/pyproject.toml index b271c92..4bedb47 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "hatchling.build" [project] name = "geny-executor" -version = "2.0.6" +version = "2.1.0" description = "Harness-engineered agent pipeline library with 21-stage dual-abstraction architecture, built on the Anthropic API" readme = "README.md" license = "MIT" diff --git a/src/geny_executor/__init__.py b/src/geny_executor/__init__.py index e843ee9..a6ed748 100644 --- a/src/geny_executor/__init__.py +++ b/src/geny_executor/__init__.py @@ -29,6 +29,7 @@ APIError, ToolExecutionError, ErrorCategory, + ExecutorErrorCode, MutationError, MutationLocked, ) @@ -95,7 +96,7 @@ ProviderDrivenStrategy, ) -__version__ = "2.0.6" +__version__ = "2.1.0" __all__ = [ # Core @@ -165,6 +166,7 @@ "APIError", "ToolExecutionError", "ErrorCategory", + "ExecutorErrorCode", "MutationError", "MutationLocked", # Schema & Mutation diff --git a/src/geny_executor/core/errors.py b/src/geny_executor/core/errors.py index c0acc46..9ab882f 100644 --- a/src/geny_executor/core/errors.py +++ b/src/geny_executor/core/errors.py @@ -1,9 +1,130 @@ -"""Error classification and exception hierarchy.""" +"""Error classification and exception hierarchy. + +Two parallel classification dimensions +-------------------------------------- + +* :class:`ErrorCategory` (existing) — coarse, retry-aware buckets used by + the executor's own retry / backoff machinery. Stable since 2.0.x; one + value per *behavioural* class (recoverable vs fatal). + +* :class:`ExecutorErrorCode` (new in 2.1.0) — fine-grained, stable string + identifier the host (Geny, CI runners, downstream consumers) uses for + logging, i18n, telemetry grouping, and alerting. Each code is a stable + string in the form ``exec..``. Once published, a + code's value MUST NOT change — see ``docs/error_codes.md``. + +The two coexist on every :class:`GenyExecutorError`: + + - ``e.code`` answers *what specifically went wrong* (``exec.cli.auth_failed``) + - ``e.category`` (on :class:`APIError`) answers *should we retry?* + (``ErrorCategory.CLI_AUTH_FAILED.is_fatal == True``) + +Backward compatibility +---------------------- +Existing call sites of the form +``raise APIError("...", category=ErrorCategory.CLI_AUTH_FAILED)`` +keep working unchanged — the base class now derives a reasonable default +``code`` from the ``category`` argument via :data:`_CATEGORY_TO_CODE_DEFAULT`. +Sites that want to be specific can pass ``code=ExecutorErrorCode.EXEC_CLI_AUTH_FAILED`` +explicitly; the explicit code wins. The ``code`` attribute is always set +(no ``None`` checks needed downstream). +""" from enum import Enum from typing import Optional +# ─────────────────────────────────────────────────────── ExecutorErrorCode ─ + + +class ExecutorErrorCode(str, Enum): + """Stable, fine-grained error identifiers (2.1.0+). + + Format: ``exec..``. Lowercase, dot-separated, ≤4 + segments deep. Designed for: + + - Host logging / Sentry grouping + - Frontend i18n key lookup (Geny maps ``exec.cli.auth_failed`` → + ``t("executor.exec.cli.auth_failed")``) + - Telemetry alerting (route ``exec.api.*`` rate to a different + channel than ``exec.cli.*``) + - Operator dashboards + + Stability contract: + - **NEVER renumber** — once shipped, a code's string value is API surface. + - **NEVER repurpose** — if the meaning of a code drifts, deprecate + and add a new code rather than mutating the old one. + - Adding new codes is non-breaking. + + The full list with descriptions, recoverability, and example + scenarios lives in ``docs/error_codes.md`` — keep it in sync. + """ + + # ── exec.api.* — vendor API surface (Anthropic/OpenAI/Google/vLLM SDK) ── + EXEC_API_AUTH_INVALID_KEY = "exec.api.auth.invalid_key" + EXEC_API_AUTH_EXPIRED = "exec.api.auth.expired" + EXEC_API_RATE_LIMITED = "exec.api.rate_limited" + EXEC_API_TIMEOUT = "exec.api.timeout" + EXEC_API_NETWORK = "exec.api.network" + EXEC_API_TOKEN_LIMIT = "exec.api.token_limit" + EXEC_API_BAD_REQUEST = "exec.api.bad_request" + EXEC_API_SERVER_ERROR = "exec.api.server_error" + EXEC_API_TERMINAL = "exec.api.terminal" + EXEC_API_UNKNOWN = "exec.api.unknown" + EXEC_API_NO_CLIENT = "exec.api.no_client" + EXEC_API_STREAM_INCOMPLETE = "exec.api.stream_incomplete" + EXEC_API_RETRY_EXHAUSTED = "exec.api.retry_exhausted" + + # ── exec.cli.* — CLI-driven backends (claude_code_cli) ── + EXEC_CLI_BINARY_NOT_FOUND = "exec.cli.binary_not_found" + EXEC_CLI_AUTH_FAILED = "exec.cli.auth_failed" + EXEC_CLI_TIMEOUT = "exec.cli.timeout" + EXEC_CLI_PROTOCOL_ERROR = "exec.cli.protocol_error" + EXEC_CLI_PERMISSION_DENIED = "exec.cli.permission_denied" + EXEC_CLI_EXITED = "exec.cli.exited" + + # ── exec.pipeline.* / exec.stage.* — pipeline / stage orchestration ── + EXEC_PIPELINE_NOT_INITIALIZED = "exec.pipeline.not_initialized" + EXEC_PIPELINE_INVALID_MANIFEST = "exec.pipeline.invalid_manifest" + EXEC_STAGE_FAILED = "exec.stage.failed" + EXEC_STAGE_GUARD_REJECTED = "exec.stage.guard_rejected" + + # ── exec.tool.* — Stage 10 tool dispatch ── + EXEC_TOOL_UNKNOWN = "exec.tool.unknown" + EXEC_TOOL_INVALID_INPUT = "exec.tool.invalid_input" + EXEC_TOOL_ACCESS_DENIED = "exec.tool.access_denied" + EXEC_TOOL_CRASHED = "exec.tool.crashed" + EXEC_TOOL_TRANSPORT = "exec.tool.transport" + + # ── exec.mutation.* — runtime config mutation ── + EXEC_MUTATION_INVALID = "exec.mutation.invalid" + EXEC_MUTATION_LOCKED = "exec.mutation.locked" + + # ── exec.mcp.* — MCP server lifecycle ── + EXEC_MCP_CONNECT_FAILED = "exec.mcp.connect_failed" + EXEC_MCP_INITIALIZE_FAILED = "exec.mcp.initialize_failed" + EXEC_MCP_LIST_TOOLS_FAILED = "exec.mcp.list_tools_failed" + EXEC_MCP_SDK_MISSING = "exec.mcp.sdk_missing" + + # ── exec.unknown — fallback when no code is attached ── + EXEC_UNKNOWN = "exec.unknown" + + @classmethod + def from_category(cls, category: "ErrorCategory") -> "ExecutorErrorCode": + """Map a legacy ``ErrorCategory`` to a default code. + + Used by :class:`APIError` to keep existing call sites + (``raise APIError("...", category=ErrorCategory.CLI_AUTH_FAILED)``) + working unchanged — the resulting exception still gets a + meaningful ``code`` attribute without the caller having to + thread it explicitly. + """ + return _CATEGORY_TO_CODE_DEFAULT.get(category, cls.EXEC_UNKNOWN) + + +# ─────────────────────────────────────────────────────── ErrorCategory ─ + + class ErrorCategory(str, Enum): """API error classification for retry decisions.""" @@ -47,32 +168,77 @@ def is_fatal(self) -> bool: } +# ─────────────────────────────────────────────────────── default mapping ─ + + +# Category → default error code. Kept private — call sites should rely on +# the public ``ExecutorErrorCode.from_category(...)`` accessor so the +# mapping can evolve without churn on consumers. +_CATEGORY_TO_CODE_DEFAULT: dict = { + ErrorCategory.RATE_LIMITED: ExecutorErrorCode.EXEC_API_RATE_LIMITED, + ErrorCategory.TIMEOUT: ExecutorErrorCode.EXEC_API_TIMEOUT, + ErrorCategory.NETWORK: ExecutorErrorCode.EXEC_API_NETWORK, + ErrorCategory.TOKEN_LIMIT: ExecutorErrorCode.EXEC_API_TOKEN_LIMIT, + ErrorCategory.AUTH: ExecutorErrorCode.EXEC_API_AUTH_INVALID_KEY, + ErrorCategory.BAD_REQUEST: ExecutorErrorCode.EXEC_API_BAD_REQUEST, + ErrorCategory.SERVER_ERROR: ExecutorErrorCode.EXEC_API_SERVER_ERROR, + ErrorCategory.TERMINAL: ExecutorErrorCode.EXEC_API_TERMINAL, + ErrorCategory.UNKNOWN: ExecutorErrorCode.EXEC_API_UNKNOWN, + ErrorCategory.CLI_NOT_FOUND: ExecutorErrorCode.EXEC_CLI_BINARY_NOT_FOUND, + ErrorCategory.CLI_AUTH_FAILED: ExecutorErrorCode.EXEC_CLI_AUTH_FAILED, + ErrorCategory.CLI_TIMEOUT: ExecutorErrorCode.EXEC_CLI_TIMEOUT, + ErrorCategory.CLI_PROTOCOL_ERROR: ExecutorErrorCode.EXEC_CLI_PROTOCOL_ERROR, + ErrorCategory.CLI_PERMISSION_DENIED: ExecutorErrorCode.EXEC_CLI_PERMISSION_DENIED, +} + + +# ─────────────────────────────────────────────────────── exceptions ─ + + class GenyExecutorError(Exception): - """Base exception for geny-executor.""" + """Base exception for geny-executor. + + Every executor exception carries a stable :class:`ExecutorErrorCode` + accessible as ``e.code``. Subclasses set a class-level default in + :attr:`_DEFAULT_CODE` so call sites that pre-date the 2.1.0 code + field still get a meaningful code on the resulting exception. + """ - def __init__(self, message: str, *, cause: Optional[Exception] = None): + _DEFAULT_CODE: ExecutorErrorCode = ExecutorErrorCode.EXEC_UNKNOWN + + def __init__( + self, + message: str, + *, + code: Optional[ExecutorErrorCode] = None, + cause: Optional[Exception] = None, + ): super().__init__(message) + self.code: ExecutorErrorCode = code or self._DEFAULT_CODE self.cause = cause class PipelineError(GenyExecutorError): """Pipeline-level error.""" - pass + _DEFAULT_CODE = ExecutorErrorCode.EXEC_PIPELINE_NOT_INITIALIZED class StageError(GenyExecutorError): """Stage execution error.""" + _DEFAULT_CODE = ExecutorErrorCode.EXEC_STAGE_FAILED + def __init__( self, message: str, *, stage_name: str = "", stage_order: int = 0, + code: Optional[ExecutorErrorCode] = None, cause: Optional[Exception] = None, ): - super().__init__(message, cause=cause) + super().__init__(message, code=code, cause=cause) self.stage_name = stage_name self.stage_order = stage_order @@ -80,13 +246,23 @@ def __init__( class GuardRejectError(StageError): """Guard rejected execution.""" + _DEFAULT_CODE = ExecutorErrorCode.EXEC_STAGE_GUARD_REJECTED + def __init__(self, message: str, *, guard_name: str = "", **kwargs): super().__init__(message, stage_name="guard", stage_order=4, **kwargs) self.guard_name = guard_name class APIError(GenyExecutorError): - """API call error with classification.""" + """API call error with classification. + + Carries both a coarse :class:`ErrorCategory` (used by the executor's + retry machinery) and a fine-grained :class:`ExecutorErrorCode` (used + by hosts for logging / i18n / telemetry). When ``code`` is omitted, + it is derived from ``category`` via + :meth:`ExecutorErrorCode.from_category` so legacy call sites keep + their existing semantics. + """ def __init__( self, @@ -94,9 +270,12 @@ def __init__( *, category: ErrorCategory = ErrorCategory.UNKNOWN, status_code: Optional[int] = None, + code: Optional[ExecutorErrorCode] = None, cause: Optional[Exception] = None, ): - super().__init__(message, cause=cause) + # Resolve code: explicit > derived-from-category > UNKNOWN. + resolved_code = code or ExecutorErrorCode.from_category(category) + super().__init__(message, code=resolved_code, cause=cause) self.category = category self.status_code = status_code @@ -104,29 +283,35 @@ def __init__( class ToolExecutionError(GenyExecutorError): """Tool execution failed.""" + _DEFAULT_CODE = ExecutorErrorCode.EXEC_TOOL_CRASHED + def __init__( self, message: str, *, tool_name: str = "", + code: Optional[ExecutorErrorCode] = None, cause: Optional[Exception] = None, ): - super().__init__(message, cause=cause) + super().__init__(message, code=code, cause=cause) self.tool_name = tool_name class MutationError(GenyExecutorError): """Invalid mutation request (bad stage/slot/impl).""" + _DEFAULT_CODE = ExecutorErrorCode.EXEC_MUTATION_INVALID + def __init__( self, message: str, *, stage_order: int = 0, slot_name: str = "", + code: Optional[ExecutorErrorCode] = None, cause: Optional[Exception] = None, ): - super().__init__(message, cause=cause) + super().__init__(message, code=code, cause=cause) self.stage_order = stage_order self.slot_name = slot_name @@ -134,12 +319,15 @@ def __init__( class MutationLocked(GenyExecutorError): """Mutation blocked because the target stage is currently executing.""" + _DEFAULT_CODE = ExecutorErrorCode.EXEC_MUTATION_LOCKED + def __init__( self, message: str, *, stage_order: int = 0, + code: Optional[ExecutorErrorCode] = None, cause: Optional[Exception] = None, ): - super().__init__(message, cause=cause) + super().__init__(message, code=code, cause=cause) self.stage_order = stage_order diff --git a/src/geny_executor/core/pipeline.py b/src/geny_executor/core/pipeline.py index 788dda3..ff5878a 100644 --- a/src/geny_executor/core/pipeline.py +++ b/src/geny_executor/core/pipeline.py @@ -8,7 +8,11 @@ from typing import TYPE_CHECKING, Any, AsyncIterator, Callable, Dict, List, Optional, Sequence from geny_executor.core.config import PipelineConfig -from geny_executor.core.errors import StageError +from geny_executor.core.errors import ( + ExecutorErrorCode, + GenyExecutorError, + StageError, +) from geny_executor.core.result import PipelineResult from geny_executor.core.stage import Stage, StageDescription from geny_executor.core.state import PipelineState @@ -31,6 +35,33 @@ logger = logging.getLogger(__name__) +def _error_event_data(exc: Exception) -> Dict[str, Any]: + """Build a structured event payload for ``pipeline.error`` / + ``stage.error`` / similar terminal-failure events. + + Carries: + - ``error``: stringified message (legacy field, preserved for + backward compat — every existing consumer reads this). + - ``code``: the stable :class:`ExecutorErrorCode` value when the + exception is a :class:`GenyExecutorError` subclass; otherwise + ``"exec.unknown"``. Hosts use this for i18n / telemetry + grouping without parsing the message text. + - ``exception_type``: fully qualified class name, useful for + ad-hoc filtering when no code is attached. + + Stable since 2.1.0 — adding fields is non-breaking, removing + fields is a major-version change. + """ + code_str = ExecutorErrorCode.EXEC_UNKNOWN.value + if isinstance(exc, GenyExecutorError) and exc.code is not None: + code_str = exc.code.value + return { + "error": str(exc), + "code": code_str, + "exception_type": f"{type(exc).__module__}.{type(exc).__name__}", + } + + def _pipeline_config_from_manifest(manifest: "EnvironmentManifest") -> PipelineConfig: """Build a :class:`PipelineConfig` from manifest pipeline+model blocks. @@ -989,7 +1020,7 @@ async def run(self, input: Any, state: Optional[PipelineState] = None) -> Pipeli return result except Exception as e: - await self._emit("pipeline.error", data={"error": str(e)}) + await self._emit("pipeline.error", data=_error_event_data(e)) return PipelineResult.error_result(str(e), state) async def run_stream( @@ -1048,7 +1079,7 @@ async def _run_pipeline() -> None: PipelineEvent( type="pipeline.error", data={ - "error": str(e), + **_error_event_data(e), "total_cost_usd": state.total_cost_usd, }, ) @@ -1073,7 +1104,7 @@ async def _run_pipeline() -> None: await task # propagate any unexpected errors except Exception as e: - yield PipelineEvent(type="pipeline.error", data={"error": str(e)}) + yield PipelineEvent(type="pipeline.error", data=_error_event_data(e)) finally: state._event_listener = None @@ -1343,7 +1374,7 @@ async def _run_stage(self, order: int, input: Any, state: PipelineState) -> Any: "stage.error", stage=stage.name, iteration=state.iteration, - data={"error": str(e)}, + data=_error_event_data(e), ) recovery = await stage.on_error(e, state) if recovery is not None: diff --git a/src/geny_executor/stages/s06_api/artifact/default/stage.py b/src/geny_executor/stages/s06_api/artifact/default/stage.py index 913151a..5152886 100644 --- a/src/geny_executor/stages/s06_api/artifact/default/stage.py +++ b/src/geny_executor/stages/s06_api/artifact/default/stage.py @@ -19,7 +19,7 @@ import asyncio from typing import Any, AsyncIterator, Dict, List, Optional, Union -from geny_executor.core.errors import APIError, ErrorCategory +from geny_executor.core.errors import APIError, ErrorCategory, ExecutorErrorCode from geny_executor.core.schema import ConfigField, ConfigSchema from geny_executor.core.slot import StrategySlot from geny_executor.core.stage import Stage @@ -321,6 +321,7 @@ def _resolve_client(self, state: PipelineState) -> BaseClient: "Pipeline.from_manifest(credentials=...) or attach a client " "explicitly with Pipeline.attach_runtime(llm_client=...).", category=ErrorCategory.BAD_REQUEST, + code=ExecutorErrorCode.EXEC_API_NO_CLIENT, ) async def execute(self, input: Any, state: PipelineState) -> APIResponse: @@ -429,6 +430,7 @@ async def _call_with_retry( { "attempt": attempt + 1, "category": e.category.value, + "code": e.code.value, "delay": delay, }, ) @@ -444,12 +446,17 @@ async def _call_with_retry( { "attempt": attempt + 1, "category": category.value, + "code": ExecutorErrorCode.from_category(category).value, "delay": delay, }, ) await asyncio.sleep(delay) - raise last_error or APIError("Max retries exceeded", category=ErrorCategory.UNKNOWN) + raise last_error or APIError( + "Max retries exceeded", + category=ErrorCategory.UNKNOWN, + code=ExecutorErrorCode.EXEC_API_RETRY_EXHAUSTED, + ) async def _call_streaming_with_retry( self, client: BaseClient, cfg: Any, state: PipelineState @@ -491,7 +498,11 @@ async def _call_streaming_with_retry( ) await asyncio.sleep(delay) - raise last_error or APIError("Max retries exceeded", category=ErrorCategory.UNKNOWN) + raise last_error or APIError( + "Max retries exceeded", + category=ErrorCategory.UNKNOWN, + code=ExecutorErrorCode.EXEC_API_RETRY_EXHAUSTED, + ) async def _call_streaming( self, client: BaseClient, cfg: Any, state: PipelineState @@ -511,6 +522,7 @@ async def _call_streaming( raise APIError( "Stream ended without message_complete", category=ErrorCategory.UNKNOWN, + code=ExecutorErrorCode.EXEC_API_STREAM_INCOMPLETE, ) return response diff --git a/tests/contract/test_error_codes_stability.py b/tests/contract/test_error_codes_stability.py new file mode 100644 index 0000000..5bdfee6 --- /dev/null +++ b/tests/contract/test_error_codes_stability.py @@ -0,0 +1,203 @@ +"""Stability regression for :class:`ExecutorErrorCode` (since 2.1.0). + +Error codes are **API surface**: hosts (Geny, downstream CI runners, +log dashboards, Sentry grouping rules, frontend i18n keys) all depend +on the string values being stable across releases. + +This test pins every shipped code's exact string value. Any rename, +re-purpose, or accidental delete fails CI before a release — forcing +a deliberate deprecation step (add a new code, mark the old one +deprecated) instead of a silent breaking change. + +When you add a new code, add a row to ``_FROZEN`` below and to +``docs/error_codes.md``. When you intentionally retire a code (major +version bump), remove it from ``_FROZEN`` here AND from the enum AND +note the breaking change in CHANGELOG. +""" + +from __future__ import annotations + +from typing import Dict + +import pytest + +from geny_executor.core.errors import ( + APIError, + ErrorCategory, + ExecutorErrorCode, + GenyExecutorError, +) + + +# ──────────────────────────────────────────────────────── frozen codes ─ + + +# The canonical set of codes shipped in geny-executor ≥ 2.1.0. +# **Do not edit this dict to make a failing test pass.** If a code +# value here doesn't match the enum, fix the enum (you accidentally +# renamed a code) or — if you really mean to remove/rename a code — +# bump the major version and update both the enum and this dict. +_FROZEN: Dict[str, str] = { + # exec.api.* + "EXEC_API_AUTH_INVALID_KEY": "exec.api.auth.invalid_key", + "EXEC_API_AUTH_EXPIRED": "exec.api.auth.expired", + "EXEC_API_RATE_LIMITED": "exec.api.rate_limited", + "EXEC_API_TIMEOUT": "exec.api.timeout", + "EXEC_API_NETWORK": "exec.api.network", + "EXEC_API_TOKEN_LIMIT": "exec.api.token_limit", + "EXEC_API_BAD_REQUEST": "exec.api.bad_request", + "EXEC_API_SERVER_ERROR": "exec.api.server_error", + "EXEC_API_TERMINAL": "exec.api.terminal", + "EXEC_API_UNKNOWN": "exec.api.unknown", + "EXEC_API_NO_CLIENT": "exec.api.no_client", + "EXEC_API_STREAM_INCOMPLETE": "exec.api.stream_incomplete", + "EXEC_API_RETRY_EXHAUSTED": "exec.api.retry_exhausted", + # exec.cli.* + "EXEC_CLI_BINARY_NOT_FOUND": "exec.cli.binary_not_found", + "EXEC_CLI_AUTH_FAILED": "exec.cli.auth_failed", + "EXEC_CLI_TIMEOUT": "exec.cli.timeout", + "EXEC_CLI_PROTOCOL_ERROR": "exec.cli.protocol_error", + "EXEC_CLI_PERMISSION_DENIED": "exec.cli.permission_denied", + "EXEC_CLI_EXITED": "exec.cli.exited", + # exec.pipeline.* / exec.stage.* + "EXEC_PIPELINE_NOT_INITIALIZED": "exec.pipeline.not_initialized", + "EXEC_PIPELINE_INVALID_MANIFEST": "exec.pipeline.invalid_manifest", + "EXEC_STAGE_FAILED": "exec.stage.failed", + "EXEC_STAGE_GUARD_REJECTED": "exec.stage.guard_rejected", + # exec.tool.* + "EXEC_TOOL_UNKNOWN": "exec.tool.unknown", + "EXEC_TOOL_INVALID_INPUT": "exec.tool.invalid_input", + "EXEC_TOOL_ACCESS_DENIED": "exec.tool.access_denied", + "EXEC_TOOL_CRASHED": "exec.tool.crashed", + "EXEC_TOOL_TRANSPORT": "exec.tool.transport", + # exec.mutation.* + "EXEC_MUTATION_INVALID": "exec.mutation.invalid", + "EXEC_MUTATION_LOCKED": "exec.mutation.locked", + # exec.mcp.* + "EXEC_MCP_CONNECT_FAILED": "exec.mcp.connect_failed", + "EXEC_MCP_INITIALIZE_FAILED": "exec.mcp.initialize_failed", + "EXEC_MCP_LIST_TOOLS_FAILED": "exec.mcp.list_tools_failed", + "EXEC_MCP_SDK_MISSING": "exec.mcp.sdk_missing", + # exec.unknown — fallback + "EXEC_UNKNOWN": "exec.unknown", +} + + +def test_frozen_codes_match_enum_values_exactly() -> None: + """Every code in ``_FROZEN`` must exist on the enum with the + pinned string value. Catches accidental renames.""" + for member_name, expected_value in _FROZEN.items(): + member = getattr(ExecutorErrorCode, member_name, None) + assert member is not None, ( + f"{member_name} disappeared from ExecutorErrorCode. " + f"If this was intentional, bump the major version and " + f"remove the row from _FROZEN." + ) + assert member.value == expected_value, ( + f"ExecutorErrorCode.{member_name} = {member.value!r} but " + f"_FROZEN pinned it to {expected_value!r}. " + f"Renaming code strings is a breaking change — restore " + f"the old value or bump the major version." + ) + + +def test_enum_has_no_codes_missing_from_frozen() -> None: + """Every enum member must appear in ``_FROZEN``. Catches additions + that weren't recorded in the stability pin — usually a forgotten + docstring / docs/error_codes.md update.""" + enum_names = {m.name for m in ExecutorErrorCode} + frozen_names = set(_FROZEN.keys()) + new = enum_names - frozen_names + assert not new, ( + f"ExecutorErrorCode added member(s) not yet pinned in " + f"_FROZEN: {sorted(new)}. Add a row to _FROZEN here AND a " + f"row to docs/error_codes.md so the new code is documented." + ) + + +def test_all_code_values_match_canonical_format() -> None: + """``exec..`` — lowercase, dot-separated, + ≤4 segments, ASCII-only.""" + for code in ExecutorErrorCode: + v = code.value + assert v == v.lower(), f"{code.name} value {v!r} contains uppercase" + assert v.startswith("exec."), f"{code.name} value {v!r} missing exec.* prefix" + segments = v.split(".") + assert 2 <= len(segments) <= 4, ( + f"{code.name} value {v!r} has {len(segments)} segments — " + f"expected 2–4 (exec..[.])" + ) + for seg in segments: + assert seg, f"{code.name} value {v!r} has an empty segment" + assert seg.replace("_", "").isalnum(), ( + f"{code.name} segment {seg!r} contains non-alphanumeric " + f"chars beyond underscores" + ) + + +# ──────────────────────────────────── default category → code mapping ─ + + +def test_every_error_category_has_a_default_code() -> None: + """``ExecutorErrorCode.from_category()`` must return a meaningful + code for every ``ErrorCategory`` value — never the generic fallback + ``EXEC_UNKNOWN``. Otherwise legacy ``APIError(category=…)`` raises + would all degrade to ``exec.unknown`` and hosts couldn't tell them + apart.""" + for cat in ErrorCategory: + code = ExecutorErrorCode.from_category(cat) + assert code is not ExecutorErrorCode.EXEC_UNKNOWN, ( + f"ErrorCategory.{cat.name} has no specific default code — " + f"add it to _CATEGORY_TO_CODE_DEFAULT in core/errors.py." + ) + + +# ──────────────────────────────────────────────────── exception wiring ─ + + +def test_api_error_default_code_derives_from_category() -> None: + """``APIError("...", category=ErrorCategory.CLI_AUTH_FAILED)`` — + a legacy call site that pre-dates 2.1.0 — must still get a + sensible ``code`` attribute without the caller having to thread + one explicitly.""" + err = APIError("nope", category=ErrorCategory.CLI_AUTH_FAILED) + assert err.code is ExecutorErrorCode.EXEC_CLI_AUTH_FAILED + + +def test_api_error_explicit_code_wins_over_category_default() -> None: + """When a caller passes both ``category`` and ``code``, the + explicit ``code`` is preserved — important for sites that want + finer-grained classification than the broad category provides.""" + err = APIError( + "rate limit retries exhausted", + category=ErrorCategory.RATE_LIMITED, + code=ExecutorErrorCode.EXEC_API_RETRY_EXHAUSTED, + ) + assert err.code is ExecutorErrorCode.EXEC_API_RETRY_EXHAUSTED + assert err.category is ErrorCategory.RATE_LIMITED + + +def test_base_exception_has_default_code() -> None: + """``GenyExecutorError("...")`` with no code argument falls back + to the subclass's ``_DEFAULT_CODE``, never ``None``. Downstream + consumers can rely on ``e.code`` being set.""" + err = GenyExecutorError("test") + assert err.code is ExecutorErrorCode.EXEC_UNKNOWN + + +@pytest.mark.parametrize( + "kwargs, expected_code", + [ + ({}, ExecutorErrorCode.EXEC_API_UNKNOWN), + ({"category": ErrorCategory.CLI_NOT_FOUND}, ExecutorErrorCode.EXEC_CLI_BINARY_NOT_FOUND), + ({"category": ErrorCategory.RATE_LIMITED}, ExecutorErrorCode.EXEC_API_RATE_LIMITED), + ({"code": ExecutorErrorCode.EXEC_API_NO_CLIENT}, ExecutorErrorCode.EXEC_API_NO_CLIENT), + ], +) +def test_api_error_code_resolution_matrix( + kwargs: dict, expected_code: ExecutorErrorCode, +) -> None: + """End-to-end: confirms the resolution rules + (explicit code wins, otherwise category-derived, otherwise default).""" + err = APIError("test", **kwargs) + assert err.code is expected_code