Skip to content

Surface LLM errors in CLI, use pi-ai credential detection, and implement model fallback#2

Open
Priyanshu-Priyam wants to merge 2 commits into
open-gitagent:mainfrom
Priyanshu-Priyam:fix/surface-llm-errors-and-model-fallback
Open

Surface LLM errors in CLI, use pi-ai credential detection, and implement model fallback#2
Priyanshu-Priyam wants to merge 2 commits into
open-gitagent:mainfrom
Priyanshu-Priyam:fix/surface-llm-errors-and-model-fallback

Conversation

@Priyanshu-Priyam
Copy link
Copy Markdown

Summary

  • Surface LLM errors in CLI: When a model returns stopReason: "error" (e.g. Bedrock rejecting a model ID), the CLI now prints the error message in red instead of showing a blank line. Also fires on_error hooks and audit logging for these failures.

  • Use pi-ai's getEnvApiKey() for credential detection: Replaces the hardcoded 6-provider API key map with pi-ai's provider-aware function, which supports all providers including Amazon Bedrock, Google Vertex, Azure OpenAI, GitHub Copilot, and others.

  • Implement model.fallback support: When the preferred model fails, automatically retry with the next model from agent.yaml's model.fallback list. Works in both CLI (REPL + single-shot) and SDK (query()) paths. This activates the already-declared but previously unused fallback field in the agent manifest.

Motivation

When using amazon-bedrock:anthropic.claude-opus-4-6-v1, Bedrock returns an error ("on-demand throughput isn't supported, use an inference profile"). The CLI showed a blank response with no error -- making it impossible to diagnose. The SDK path (sdk.ts) already handled this correctly; the CLI did not.

Test plan

  • npm run build passes
  • All 19 existing tests pass
  • Manual: run with a model that returns an error -- error now visible in red
  • Manual: set fallback in agent.yaml with a broken preferred model -- agent falls back automatically

Priyanshu-Priyam... added 2 commits March 6, 2026 21:33
…ent model fallback

Three related fixes for silent model failures:

1. CLI now prints LLM-level errors (stopReason: "error") instead of showing
   a blank line. Fires on_error hooks and audit logging for these failures.

2. Replace hardcoded 6-provider API key map with pi-ai's getEnvApiKey(),
   which supports all providers including Bedrock, Vertex, Azure, etc.

3. Implement model fallback: when the preferred model fails, automatically
   retry with models from agent.yaml's model.fallback list. Works in both
   CLI (REPL + single-shot) and SDK (query()) paths.

Made-with: Cursor
Copy link
Copy Markdown
Contributor

@shreyas-lyzr shreyas-lyzr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds error surfacing and model fallback — good goals. However the diff is against a very old base and will have massive merge conflicts with current main. The codebase has changed significantly since this was opened.

Recommend closing this and re-opening against current main if the features are still desired.

@shreyas-lyzr
Copy link
Copy Markdown
Contributor

Apologies for the long silence on this — two months is too long.

Updating you on the current main, because parts of this PR are already in:

  • Surface LLM errors when stopReason === "error" — DONE in main. src/index.ts and src/sdk.ts both handle it; src/telemetry.ts records it; the assistant message still gets emitted so callers can inspect stopReason. The CLI red-error display is in.
  • pi-ai's getEnvApiKey() for credential detection — NOT in main yet. Still relevant — this also addresses Auto-detect API keys and default to latest models for zero-config startup #14 (auto-detect API keys). Worth keeping.
  • model.fallback runtime support — schema field exists (fallback: string[] in src/loader.ts), but I couldn't find a fallback loop that actually retries the next model on failure. Still relevant.

The right path forward, if you have time: narrow this PR to just the two un-merged concerns — getEnvApiKey() + the fallback runtime loop. That's a much smaller diff (probably ~150 lines instead of 638), no conflicts with the new LLM error path, and it reads cleanly.

If a fresh PR is easier than a rebase, that's fine — link this one in the description and your authorship credit carries over.

Real apologies for the wait — the Bedrock error-surfacing piece you flagged was useful enough that it shaped how main handles errors now. That part is materially yours.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants