Why this exists
Graceful-degradation policy in `backend/services/indexer_optimized.py` is inconsistent. Pinecone init at lines 61-83 is wrapped in `try/except` and logs "search/indexing unavailable until reconnect" while letting the app start. OpenAI client at lines 55-56 is constructed unconditionally; if `OPENAI_API_KEY` is missing or invalid, every subsequent embed call throws and there is no fallback embedder. Two external deps, two different failure semantics.
DDIA Ch 1 "Reliability" property: the system should continue to work correctly under fault conditions, OR fail fast and visibly. The current mixed model creates the worst of both — indexing might start, then crash on the first embed call, leaving users with confusing partial-failure traces. Becomes critical once LOCAL_MODE (#TBD) ships: the local backend has no required external deps and must degrade cleanly when an optional one is absent.
Source: static audit 2026-05-13. Logged as dogfood finding F-015.
What ships
- New file `backend/services/degradation.py` declaring per-dependency policy: each external dep is REQUIRED or OPTIONAL with a typed fallback path
- REQUIRED deps: fail at `startup_checks.py` (existing pattern, no change)
- OPTIONAL deps: wrapped client + typed fallback (e.g., Cohere reranker absent → bypass reranking; Pinecone absent → degrade to local cache; etc.)
- Refactor of all 5 external-dep call sites in `backend/services/` to use the new degradation helpers
- Documentation: add "External dependency policy" section to repo `CLAUDE.md`
- Tests: unit test for each fallback path (missing key, expired key, rate-limited, network error)
Acceptance criteria
Wave
pre-thesis
Type
refactor
Stack scope
Priority
medium
ADR required
no - under ADR threshold (single-file refactor with established pattern). Cross-references the LOCAL_MODE ADR for context.
Dogfooding signal
yes - audit finding F-015 from 2026-05-13. Surfaced while reading `indexer_optimized.py:55-83` to map external dep failure modes for the LOCAL_MODE design.
Related
- Wave: pre-thesis (enables LOCAL_MODE cleanliness)
- Prior ADRs: `oci/decisions/2026-05-13-local-mode-v0.1.md`
- Linked: #TBD (LOCAL_MODE)
- External: DDIA Ch 1 — Reliability section on fault-tolerance vs fast-fail
Filed from OCI audit 2026-05-13. Full context: `oci/dogfood-findings.md` F-015.
Why this exists
Graceful-degradation policy in `backend/services/indexer_optimized.py` is inconsistent. Pinecone init at lines 61-83 is wrapped in `try/except` and logs "search/indexing unavailable until reconnect" while letting the app start. OpenAI client at lines 55-56 is constructed unconditionally; if `OPENAI_API_KEY` is missing or invalid, every subsequent embed call throws and there is no fallback embedder. Two external deps, two different failure semantics.
DDIA Ch 1 "Reliability" property: the system should continue to work correctly under fault conditions, OR fail fast and visibly. The current mixed model creates the worst of both — indexing might start, then crash on the first embed call, leaving users with confusing partial-failure traces. Becomes critical once LOCAL_MODE (#TBD) ships: the local backend has no required external deps and must degrade cleanly when an optional one is absent.
Source: static audit 2026-05-13. Logged as dogfood finding F-015.
What ships
Acceptance criteria
Wave
pre-thesis
Type
refactor
Stack scope
Priority
medium
ADR required
no - under ADR threshold (single-file refactor with established pattern). Cross-references the LOCAL_MODE ADR for context.
Dogfooding signal
yes - audit finding F-015 from 2026-05-13. Surfaced while reading `indexer_optimized.py:55-83` to map external dep failure modes for the LOCAL_MODE design.
Related
Filed from OCI audit 2026-05-13. Full context: `oci/dogfood-findings.md` F-015.