docs(architecture): reconcile credential model — strict deny-by-default style#177
Conversation
The canon carried an internally-contradictory credential model: most rows said "broker issues a scoped-JWT into the guest, guest presents it" while NFR-SEC-30 already said "broker terminates outbound TLS". The guest-token framing was never the intent. Converge everything to the strict model confirmed against Anthropic's process_api / egress design: - The guest holds NO upstream credential. It sends an unauthenticated request; the Egress trust-edge attaches the upstream authorization on the outbound leg, fetched from host-side Credential custody. - "Credential broker" → "Credential custody" (host-side store, rotation, delegated STS). Injection folds into the Egress trust-edge. Custody has no guest-facing interface. - Token taxonomy: the guest holds only the Session JWT (session identity, ≤4h, CP→guest). The ≤15min lease is held by the edge (custody credential lease), never the guest. "Egress JWT" → "Session JWT" everywhere. - Injection requires the edge to originate the upstream connection (L7 / MITM mode); transparent pass-through cannot inject. Cert-pinning / client-mTLS / DPoP upstreams tracked at #176. Touches: NFR-SEC-23/25/27/29/30/31 + token taxonomy + SEC-10/11 rename; Layer 3 §2/§3/§5 diagram/§7 revoke/§8/§11; Layer 4 §2/§4; Layer 5 zone names + context-map vocab + open question; glossary entries. The five containers do not change — custody and edge stay two boxes (custody ≠ enforcement). Layer 6 (PR #173) reconciled separately on its branch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 17 minutes and 47 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (7)
WalkthroughThis PR refactors architecture and security documentation to consolidate "Credential broker" terminology into "Credential custody," restructure the trust-zone identity taxonomy (renaming "Egress JWT" to "Session JWT"), and clarify custody-lease injection semantics and fail-closed behavior constraints across the trust boundary, NFR definitions, glossary, and context documents. ChangesCredential Custody Architecture Refactoring
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Scope-consistency review (6/6 PASS, 2 nits): the §8 table claimed to match the NFR token table "verbatim", but the two now differ in column header (Consumer vs Holder) after the rewrite. Claim the three classes / scopes / TTLs match, not the literal layout. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
docs/architecture/02-trust-boundaries.md (1)
131-132: 💤 Low valueConsider clarifying revoke timeline vs. credential survival.
The text states "Revoke propagation target is ≤5 min platform-wide" but also notes that already-fetched credentials at the edge survive up to their lease TTL (≤15 min). This means a revoked session's upstream access could persist for up to 15 minutes after revocation. While technically accurate (denylist propagation is ≤5 min, but credential survival is bounded separately), readers might expect "platform-wide ≤5 min" to mean all access ceases within 5 minutes.
Consider explicitly stating: "Revoke propagation (denylist delivery) is ≤5 min; already-fetched upstream credentials survive up to their lease TTL (≤15 min), so complete upstream-access termination may take up to 15 min in worst case."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/architecture/02-trust-boundaries.md` around lines 131 - 132, Update the paragraph about revoke timing to explicitly distinguish denylist propagation from credential survival: state that "Revoke propagation (denylist delivery) is ≤5 min platform-wide per NFR-SEC-04, but already-fetched upstream credentials at the edge survive up to the credential-lease TTL (≤15 min per NFR-SEC-29), so complete upstream-access termination may take up to 15 min in the worst case; the kill-switch (NFR-SEC-01) still provides a ≤30 s p99 stop on the Compute-plane denylist check." Reference the existing terms "denylist", "credential-lease TTL", "Revoke propagation", and the NFR IDs to make the clarification unambiguous.docs/architecture/manifesto/02-nfrs.md (1)
151-151: 💤 Low valueVerify revocation propagation time consistency.
NFR-SEC-29 states "revocation propagation ≤1 min" for custody credentials, while NFR-SEC-04 (line 126) specifies "revocation ≤5 min platform-wide" and 02-trust-boundaries.md line 131 references this ≤5 min target. Consider clarifying whether:
- High-value custody credentials have a tighter ≤1 min SLA (subset of the ≤5 min general target), or
- ≤1 min applies to custody-internal propagation while ≤5 min applies end-to-end
Either interpretation is reasonable, but the relationship between these two targets should be explicit to avoid confusion.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/architecture/manifesto/02-nfrs.md` at line 151, Update the documentation to resolve the revocation propagation time inconsistency between NFR-SEC-29 and NFR-SEC-04: explicitly state whether NFR-SEC-29’s "revocation propagation ≤1 min" is a stricter SLA for custody credentials (a subset of the platform-wide "revocation ≤5 min" in NFR-SEC-04) or if the ≤1 min refers only to custody-internal propagation with ≤5 min as the end-to-end target; modify the NFR-SEC-29 line in 02-nfrs.md to include a clarifying phrase (e.g., "≤1 min internal/custody-only; ≤5 min end-to-end platform target" or "≤1 min stricter SLA for custody credentials (subset of ≤5 min platform-wide)"), and add a cross-reference note in 02-trust-boundaries.md and NFR-SEC-04 to reflect the chosen interpretation so all three references (NFR-SEC-29, NFR-SEC-04, 02-trust-boundaries.md) are consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/architecture/02-trust-boundaries.md`:
- Line 28: The Egress trust-edge description currently implies upstream
credential attachment is always available; update the text for the "Egress
trust-edge" entry (the line starting "Attaches the upstream authorization on the
outbound leg") to explicitly qualify that credential injection only occurs in
MITM-inspecting mode (e.g., append "(MITM-inspecting mode only; transparent
pass-through cannot inject upstream credentials; see §7)") and make the same
clarification in the later paragraph that currently spans lines referencing
transparent pass-through (the lines that state pass-through mode cannot attach
upstream credentials) so both the summary row (Egress trust-edge) and the
detailed section consistently state the mode restriction.
---
Nitpick comments:
In `@docs/architecture/02-trust-boundaries.md`:
- Around line 131-132: Update the paragraph about revoke timing to explicitly
distinguish denylist propagation from credential survival: state that "Revoke
propagation (denylist delivery) is ≤5 min platform-wide per NFR-SEC-04, but
already-fetched upstream credentials at the edge survive up to the
credential-lease TTL (≤15 min per NFR-SEC-29), so complete upstream-access
termination may take up to 15 min in the worst case; the kill-switch
(NFR-SEC-01) still provides a ≤30 s p99 stop on the Compute-plane denylist
check." Reference the existing terms "denylist", "credential-lease TTL", "Revoke
propagation", and the NFR IDs to make the clarification unambiguous.
In `@docs/architecture/manifesto/02-nfrs.md`:
- Line 151: Update the documentation to resolve the revocation propagation time
inconsistency between NFR-SEC-29 and NFR-SEC-04: explicitly state whether
NFR-SEC-29’s "revocation propagation ≤1 min" is a stricter SLA for custody
credentials (a subset of the platform-wide "revocation ≤5 min" in NFR-SEC-04) or
if the ≤1 min refers only to custody-internal propagation with ≤5 min as the
end-to-end target; modify the NFR-SEC-29 line in 02-nfrs.md to include a
clarifying phrase (e.g., "≤1 min internal/custody-only; ≤5 min end-to-end
platform target" or "≤1 min stricter SLA for custody credentials (subset of ≤5
min platform-wide)"), and add a cross-reference note in 02-trust-boundaries.md
and NFR-SEC-04 to reflect the chosen interpretation so all three references
(NFR-SEC-29, NFR-SEC-04, 02-trust-boundaries.md) are consistent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 5d2334cf-785b-47d0-b98a-d1eda4681115
📒 Files selected for processing (6)
docs/architecture/02-trust-boundaries.mddocs/architecture/03-c4-context.mddocs/architecture/04-bounded-contexts.mddocs/architecture/diagrams/02-trust-boundaries.mmddocs/architecture/glossary.mddocs/architecture/manifesto/02-nfrs.md
Three spots stated the "guest holds no upstream credential" invariant two or
three times in one passage. Keep the single load-bearing negative, drop the
restatements: custody glossary ("custody issues no token to the guest"),
egress-edge fail-closed ("never bypassed"), token-table ("not anything the
guest holds —").
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The rewrite left two revoke numbers reading as a conflict (≤5 min propagation vs ≤15 min lease survival). State the three bounds smallest-first: custody revokes the lease ≤1 min (NFR-SEC-29); denylist propagates ≤5 min (NFR-SEC-04); a non-revoked lease self-expires at its ≤15 min TTL. An explicit revoke cuts upstream access within ≤5 min — the 15-min TTL is only the ceiling for a session never revoked. Resolves both CodeRabbit nits. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- NFR-SEC-10: the ≤4h is a per-token TTL, not a session cap. While a session is active the Control plane rotates the token before expiry (a stolen token dies in ≤4h); session length is not bounded by the TTL. A long human session (multi-hour investigation) is not interrupted. Drop the undefined "session-max"; a hard session cap, if any, is a policy tracked at #178. - NFR-SEC-29 (CodeRabbit): ≤1 min revocation is custody-internal — a tighter subset of the ≤5 min platform-wide target (NFR-SEC-04). State the relation. - Layer 3 zone 4 (CodeRabbit): qualify that upstream-auth injection is MITM-inspecting mode only; transparent pass-through cannot inject (§7). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… TTL The ≤4h session-JWT TTL was an unsourced number that conflated three distinct limits. Research (PCI-DSS 4.0, NIST SP 800-63B-4) separates them; we align to AAL3 (privileged automation in a bank perimeter): - NFR-SEC-10: session JWT TTL ≤4h → ≤60min, reframed as an anti-replay window (engineering bound, no regulatory citation), rotated while the session runs. - NFR-SEC-40 (new): idle / inactivity timeout ≤15min — PCI-DSS 4.0 Req 8.2.8 + NIST 800-63B-4 §2.3.3 (AAL3). This is the number a bank auditor looks for. - NFR-SEC-41 (new): absolute session lifetime ≤12h — NIST 800-63B-4 §2.3.3 (AAL3 SHALL). Default off on the minimal shelf, customer-tunable on the full shelf. Closes #178 with a sourced number instead of "if any". Token TTL ≠ idle ≠ absolute: the token rotates under both session limits; session length is bounded by SEC-40/41, not by the token TTL. Updated token taxonomy table, Layer 3 §5/§8 + diagram, glossary Session JWT entry. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
§7 revoke paragraph stated IdP-independence three times. Keep the opening
claim and the closing rationale ("that is why ≤5 min revoke holds even during
an IdP outage"); drop the bare middle restatement.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
"MCP allow-list" mixed abstraction levels — it named a private case (one destination type) where the general control (egress allow-list, deny-by- default) was never stated. An MCP server, LLM API, object store, or internal API is one allow-listed destination, not its own enforcement mechanism. - NFR-SEC-08: "MCP allow-list enforcement" → "Egress allow-list" (customer declares reachable destinations; deny-by-default; destination type is not a separate control). - Layer 3 zone 4 + .mmd: same generalization. How a corporate environment authors these egress rules (rule format / standard) is deferred to the egress-proxy component spec + a separate research pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gglers) bounded-contexts §3 and the §02 long-form-scenario list still named the private case. Generalize both. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…kill supply-chain NFR Abstraction-substitution sweep (the MCP-allow-list defect class) found two more embarrassing instances; both fixed: - NFR-SEC-17 named "Egress posture (sandbox trust tier)" — but it is the enforcement mechanism of the NFR-SEC-08 allow-list, and "Egress posture" collides with the glossary/NFR-FLEX-15 term (transparent vs MITM). Renamed to "Egress allow-list enforcement mechanism" + cross-link SEC-08↔SEC-17. - NFR-SEC-05 named "MITM-friendly egress" — but it anchors the whole Egress zone and MITM is the opt-in mode, not the default. Renamed "Single forward-proxy egress (customer-CA injectable; MITM-inspecting opt-in)". - primitives-backlog: same two renames. Also: NFR-SEC-42 (new, v2/tbd) — skill-registry supply chain (author-uploaded, server-signed, provenance-tracked, signature-verified at attach), pairs with SkillProvider NFR-SEC-24; tracks #179. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…d example Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Description retired during the initial-public-release history consolidation. The canonical content lives in docs/architecture/ at the current tip.