Skip to content

[OCI] Unify graceful-degradation policy across external dependencies #306

Description

@DevanshuNEU

Why this exists

Graceful-degradation policy in `backend/services/indexer_optimized.py` is inconsistent. Pinecone init at lines 61-83 is wrapped in `try/except` and logs "search/indexing unavailable until reconnect" while letting the app start. OpenAI client at lines 55-56 is constructed unconditionally; if `OPENAI_API_KEY` is missing or invalid, every subsequent embed call throws and there is no fallback embedder. Two external deps, two different failure semantics.

DDIA Ch 1 "Reliability" property: the system should continue to work correctly under fault conditions, OR fail fast and visibly. The current mixed model creates the worst of both — indexing might start, then crash on the first embed call, leaving users with confusing partial-failure traces. Becomes critical once LOCAL_MODE (#TBD) ships: the local backend has no required external deps and must degrade cleanly when an optional one is absent.

Source: static audit 2026-05-13. Logged as dogfood finding F-015.

What ships

  • New file `backend/services/degradation.py` declaring per-dependency policy: each external dep is REQUIRED or OPTIONAL with a typed fallback path
  • REQUIRED deps: fail at `startup_checks.py` (existing pattern, no change)
  • OPTIONAL deps: wrapped client + typed fallback (e.g., Cohere reranker absent → bypass reranking; Pinecone absent → degrade to local cache; etc.)
  • Refactor of all 5 external-dep call sites in `backend/services/` to use the new degradation helpers
  • Documentation: add "External dependency policy" section to repo `CLAUDE.md`
  • Tests: unit test for each fallback path (missing key, expired key, rate-limited, network error)

Acceptance criteria

  • `backend/services/degradation.py` exists and is imported by all 5 external-dep call sites (Pinecone, OpenAI, Voyage, Cohere, Supabase)
  • For each OPTIONAL dep, removing the API key from env results in clear log " disabled ()" instead of a crash
  • For each REQUIRED dep, removing the API key from env results in clean fast-fail at startup with actionable error
  • Backend tests pass: `cd backend && pytest tests/ -v` with 100% coverage on `degradation.py`
  • Repo `CLAUDE.md` documents the policy

Wave

pre-thesis

Type

refactor

Stack scope

  • backend

Priority

medium

ADR required

no - under ADR threshold (single-file refactor with established pattern). Cross-references the LOCAL_MODE ADR for context.

Dogfooding signal

yes - audit finding F-015 from 2026-05-13. Surfaced while reading `indexer_optimized.py:55-83` to map external dep failure modes for the LOCAL_MODE design.

Related

  • Wave: pre-thesis (enables LOCAL_MODE cleanliness)
  • Prior ADRs: `oci/decisions/2026-05-13-local-mode-v0.1.md`
  • Linked: #TBD (LOCAL_MODE)
  • External: DDIA Ch 1 — Reliability section on fault-tolerance vs fast-fail

Filed from OCI audit 2026-05-13. Full context: `oci/dogfood-findings.md` F-015.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions