Skip to content

docs(architecture): reconcile credential model — strict deny-by-default style#177

Merged
Yambr merged 11 commits into
next/v1from
docs/reconcile-credential-egress-injection
May 30, 2026
Merged

docs(architecture): reconcile credential model — strict deny-by-default style#177
Yambr merged 11 commits into
next/v1from
docs/reconcile-credential-egress-injection

Conversation

@Yambr

@Yambr Yambr commented May 30, 2026

Copy link
Copy Markdown
Collaborator

Description retired during the initial-public-release history consolidation. The canonical content lives in docs/architecture/ at the current tip.

The canon carried an internally-contradictory credential model: most rows
said "broker issues a scoped-JWT into the guest, guest presents it" while
NFR-SEC-30 already said "broker terminates outbound TLS". The guest-token
framing was never the intent. Converge everything to the strict model
confirmed against Anthropic's process_api / egress design:

- The guest holds NO upstream credential. It sends an unauthenticated
  request; the Egress trust-edge attaches the upstream authorization on the
  outbound leg, fetched from host-side Credential custody.
- "Credential broker" → "Credential custody" (host-side store, rotation,
  delegated STS). Injection folds into the Egress trust-edge. Custody has no
  guest-facing interface.
- Token taxonomy: the guest holds only the Session JWT (session identity,
  ≤4h, CP→guest). The ≤15min lease is held by the edge (custody credential
  lease), never the guest. "Egress JWT" → "Session JWT" everywhere.
- Injection requires the edge to originate the upstream connection (L7 /
  MITM mode); transparent pass-through cannot inject. Cert-pinning /
  client-mTLS / DPoP upstreams tracked at #176.

Touches: NFR-SEC-23/25/27/29/30/31 + token taxonomy + SEC-10/11 rename;
Layer 3 §2/§3/§5 diagram/§7 revoke/§8/§11; Layer 4 §2/§4; Layer 5 zone
names + context-map vocab + open question; glossary entries. The five
containers do not change — custody and edge stay two boxes (custody ≠
enforcement). Layer 6 (PR #173) reconciled separately on its branch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 30, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@Yambr, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 17 minutes and 47 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: b322be02-72cd-49d1-a5d6-55189ff816e4

📥 Commits

Reviewing files that changed from the base of the PR and between 3806049 and 68bbc5d.

📒 Files selected for processing (7)
  • docs/architecture/02-trust-boundaries.md
  • docs/architecture/04-bounded-contexts.md
  • docs/architecture/PROCESS.md
  • docs/architecture/diagrams/02-trust-boundaries.mmd
  • docs/architecture/glossary.md
  • docs/architecture/manifesto/02-nfrs.md
  • docs/architecture/primitives-backlog.md

Walkthrough

This PR refactors architecture and security documentation to consolidate "Credential broker" terminology into "Credential custody," restructure the trust-zone identity taxonomy (renaming "Egress JWT" to "Session JWT"), and clarify custody-lease injection semantics and fail-closed behavior constraints across the trust boundary, NFR definitions, glossary, and context documents.

Changes

Credential Custody Architecture Refactoring

Layer / File(s) Summary
Core trust-boundary specification and token taxonomy
docs/architecture/02-trust-boundaries.md
Trust-boundary spec introduces "Credential custody" zone replacing "Credential broker," updates "Workload-identity floor" token classes (Session JWT, Generic internal token, Custody credential lease), expands egress-auth and lifecycle text to clarify upstream-authorization injection requirements by connection mode and fail-closed behavior, and updates external-actor and NFR-anchor mappings.
Glossary and security non-functional requirements
docs/architecture/glossary.md, docs/architecture/manifesto/02-nfrs.md
New "Credential custody" glossary entry replaces "Credential broker," "Egress JWT" renamed to "Session JWT" with expanded TTL and scope distinctions, "Egress trust-edge" entry updated to describe custody-sourced authorization injection and fail-closed routing, and Security NFR entries (NFR-SEC-10, NFR-SEC-11, NFR-SEC-23 through NFR-SEC-31) rewritten to clarify host-side custody, custody-lease TTLs, guest secret-delivery prohibition, network-bound edge identity, and edge/custody enforcement points.
C4 context and bounded-context alignment
docs/architecture/03-c4-context.md, docs/architecture/04-bounded-contexts.md
C4 context and bounded-context documents replace "Credential broker" with "Credential custody," clarify egress trust-edge injects upstream authorization from custody, realign trust-zone-to-context consolidation and policy-evaluation consumption to custody boundary, and rephrased Open Question 4 to ask whether Credential custody remains distinct or collapses into generic Secrets custody.
Trust-boundary diagram implementation
docs/architecture/diagrams/02-trust-boundaries.mmd
Mermaid diagram renames "Credential broker" subgraph to "Credential custody," adds BR→PROXY edge for custody-lease injection, updates ORCH→VM to "Session JWT on WS bound to container_name," clarifies VM→PROXY carries "no credential in request," and updates PROXY→LLM/OBJ to describe "upstream auth injected from custody" with TLS and fail-closed semantics.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related issues

Possibly related PRs

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and specifically describes the main change: reconciling the credential model to align with Anthropic's architecture and adopting a strict style, which is the primary objective across all modified files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/reconcile-credential-egress-injection

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Scope-consistency review (6/6 PASS, 2 nits): the §8 table claimed to match
the NFR token table "verbatim", but the two now differ in column header
(Consumer vs Holder) after the rewrite. Claim the three classes / scopes /
TTLs match, not the literal layout.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
docs/architecture/02-trust-boundaries.md (1)

131-132: 💤 Low value

Consider clarifying revoke timeline vs. credential survival.

The text states "Revoke propagation target is ≤5 min platform-wide" but also notes that already-fetched credentials at the edge survive up to their lease TTL (≤15 min). This means a revoked session's upstream access could persist for up to 15 minutes after revocation. While technically accurate (denylist propagation is ≤5 min, but credential survival is bounded separately), readers might expect "platform-wide ≤5 min" to mean all access ceases within 5 minutes.

Consider explicitly stating: "Revoke propagation (denylist delivery) is ≤5 min; already-fetched upstream credentials survive up to their lease TTL (≤15 min), so complete upstream-access termination may take up to 15 min in worst case."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/architecture/02-trust-boundaries.md` around lines 131 - 132, Update the
paragraph about revoke timing to explicitly distinguish denylist propagation
from credential survival: state that "Revoke propagation (denylist delivery) is
≤5 min platform-wide per NFR-SEC-04, but already-fetched upstream credentials at
the edge survive up to the credential-lease TTL (≤15 min per NFR-SEC-29), so
complete upstream-access termination may take up to 15 min in the worst case;
the kill-switch (NFR-SEC-01) still provides a ≤30 s p99 stop on the
Compute-plane denylist check." Reference the existing terms "denylist",
"credential-lease TTL", "Revoke propagation", and the NFR IDs to make the
clarification unambiguous.
docs/architecture/manifesto/02-nfrs.md (1)

151-151: 💤 Low value

Verify revocation propagation time consistency.

NFR-SEC-29 states "revocation propagation ≤1 min" for custody credentials, while NFR-SEC-04 (line 126) specifies "revocation ≤5 min platform-wide" and 02-trust-boundaries.md line 131 references this ≤5 min target. Consider clarifying whether:

  • High-value custody credentials have a tighter ≤1 min SLA (subset of the ≤5 min general target), or
  • ≤1 min applies to custody-internal propagation while ≤5 min applies end-to-end

Either interpretation is reasonable, but the relationship between these two targets should be explicit to avoid confusion.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/architecture/manifesto/02-nfrs.md` at line 151, Update the documentation
to resolve the revocation propagation time inconsistency between NFR-SEC-29 and
NFR-SEC-04: explicitly state whether NFR-SEC-29’s "revocation propagation ≤1
min" is a stricter SLA for custody credentials (a subset of the platform-wide
"revocation ≤5 min" in NFR-SEC-04) or if the ≤1 min refers only to
custody-internal propagation with ≤5 min as the end-to-end target; modify the
NFR-SEC-29 line in 02-nfrs.md to include a clarifying phrase (e.g., "≤1 min
internal/custody-only; ≤5 min end-to-end platform target" or "≤1 min stricter
SLA for custody credentials (subset of ≤5 min platform-wide)"), and add a
cross-reference note in 02-trust-boundaries.md and NFR-SEC-04 to reflect the
chosen interpretation so all three references (NFR-SEC-29, NFR-SEC-04,
02-trust-boundaries.md) are consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/architecture/02-trust-boundaries.md`:
- Line 28: The Egress trust-edge description currently implies upstream
credential attachment is always available; update the text for the "Egress
trust-edge" entry (the line starting "Attaches the upstream authorization on the
outbound leg") to explicitly qualify that credential injection only occurs in
MITM-inspecting mode (e.g., append "(MITM-inspecting mode only; transparent
pass-through cannot inject upstream credentials; see §7)") and make the same
clarification in the later paragraph that currently spans lines referencing
transparent pass-through (the lines that state pass-through mode cannot attach
upstream credentials) so both the summary row (Egress trust-edge) and the
detailed section consistently state the mode restriction.

---

Nitpick comments:
In `@docs/architecture/02-trust-boundaries.md`:
- Around line 131-132: Update the paragraph about revoke timing to explicitly
distinguish denylist propagation from credential survival: state that "Revoke
propagation (denylist delivery) is ≤5 min platform-wide per NFR-SEC-04, but
already-fetched upstream credentials at the edge survive up to the
credential-lease TTL (≤15 min per NFR-SEC-29), so complete upstream-access
termination may take up to 15 min in the worst case; the kill-switch
(NFR-SEC-01) still provides a ≤30 s p99 stop on the Compute-plane denylist
check." Reference the existing terms "denylist", "credential-lease TTL", "Revoke
propagation", and the NFR IDs to make the clarification unambiguous.

In `@docs/architecture/manifesto/02-nfrs.md`:
- Line 151: Update the documentation to resolve the revocation propagation time
inconsistency between NFR-SEC-29 and NFR-SEC-04: explicitly state whether
NFR-SEC-29’s "revocation propagation ≤1 min" is a stricter SLA for custody
credentials (a subset of the platform-wide "revocation ≤5 min" in NFR-SEC-04) or
if the ≤1 min refers only to custody-internal propagation with ≤5 min as the
end-to-end target; modify the NFR-SEC-29 line in 02-nfrs.md to include a
clarifying phrase (e.g., "≤1 min internal/custody-only; ≤5 min end-to-end
platform target" or "≤1 min stricter SLA for custody credentials (subset of ≤5
min platform-wide)"), and add a cross-reference note in 02-trust-boundaries.md
and NFR-SEC-04 to reflect the chosen interpretation so all three references
(NFR-SEC-29, NFR-SEC-04, 02-trust-boundaries.md) are consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 5d2334cf-785b-47d0-b98a-d1eda4681115

📥 Commits

Reviewing files that changed from the base of the PR and between 256a4f3 and 6c44607.

📒 Files selected for processing (6)
  • docs/architecture/02-trust-boundaries.md
  • docs/architecture/03-c4-context.md
  • docs/architecture/04-bounded-contexts.md
  • docs/architecture/diagrams/02-trust-boundaries.mmd
  • docs/architecture/glossary.md
  • docs/architecture/manifesto/02-nfrs.md

Comment thread docs/architecture/02-trust-boundaries.md Outdated
Yambr and others added 9 commits May 30, 2026 17:56
Three spots stated the "guest holds no upstream credential" invariant two or
three times in one passage. Keep the single load-bearing negative, drop the
restatements: custody glossary ("custody issues no token to the guest"),
egress-edge fail-closed ("never bypassed"), token-table ("not anything the
guest holds —").

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The rewrite left two revoke numbers reading as a conflict (≤5 min propagation
vs ≤15 min lease survival). State the three bounds smallest-first: custody
revokes the lease ≤1 min (NFR-SEC-29); denylist propagates ≤5 min
(NFR-SEC-04); a non-revoked lease self-expires at its ≤15 min TTL. An explicit
revoke cuts upstream access within ≤5 min — the 15-min TTL is only the ceiling
for a session never revoked. Resolves both CodeRabbit nits.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- NFR-SEC-10: the ≤4h is a per-token TTL, not a session cap. While a session
  is active the Control plane rotates the token before expiry (a stolen token
  dies in ≤4h); session length is not bounded by the TTL. A long human session
  (multi-hour investigation) is not interrupted. Drop the undefined
  "session-max"; a hard session cap, if any, is a policy tracked at #178.
- NFR-SEC-29 (CodeRabbit): ≤1 min revocation is custody-internal — a tighter
  subset of the ≤5 min platform-wide target (NFR-SEC-04). State the relation.
- Layer 3 zone 4 (CodeRabbit): qualify that upstream-auth injection is
  MITM-inspecting mode only; transparent pass-through cannot inject (§7).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… TTL

The ≤4h session-JWT TTL was an unsourced number that conflated three distinct
limits. Research (PCI-DSS 4.0, NIST SP 800-63B-4) separates them; we align to
AAL3 (privileged automation in a bank perimeter):

- NFR-SEC-10: session JWT TTL ≤4h → ≤60min, reframed as an anti-replay window
  (engineering bound, no regulatory citation), rotated while the session runs.
- NFR-SEC-40 (new): idle / inactivity timeout ≤15min — PCI-DSS 4.0 Req 8.2.8
  + NIST 800-63B-4 §2.3.3 (AAL3). This is the number a bank auditor looks for.
- NFR-SEC-41 (new): absolute session lifetime ≤12h — NIST 800-63B-4 §2.3.3
  (AAL3 SHALL). Default off on the minimal shelf, customer-tunable on the full
  shelf. Closes #178 with a sourced number instead of "if any".

Token TTL ≠ idle ≠ absolute: the token rotates under both session limits;
session length is bounded by SEC-40/41, not by the token TTL. Updated token
taxonomy table, Layer 3 §5/§8 + diagram, glossary Session JWT entry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
§7 revoke paragraph stated IdP-independence three times. Keep the opening
claim and the closing rationale ("that is why ≤5 min revoke holds even during
an IdP outage"); drop the bare middle restatement.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
"MCP allow-list" mixed abstraction levels — it named a private case (one
destination type) where the general control (egress allow-list, deny-by-
default) was never stated. An MCP server, LLM API, object store, or internal
API is one allow-listed destination, not its own enforcement mechanism.

- NFR-SEC-08: "MCP allow-list enforcement" → "Egress allow-list" (customer
  declares reachable destinations; deny-by-default; destination type is not a
  separate control).
- Layer 3 zone 4 + .mmd: same generalization.

How a corporate environment authors these egress rules (rule format / standard)
is deferred to the egress-proxy component spec + a separate research pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gglers)

bounded-contexts §3 and the §02 long-form-scenario list still named the
private case. Generalize both.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…kill supply-chain NFR

Abstraction-substitution sweep (the MCP-allow-list defect class) found two
more embarrassing instances; both fixed:

- NFR-SEC-17 named "Egress posture (sandbox trust tier)" — but it is the
  enforcement mechanism of the NFR-SEC-08 allow-list, and "Egress posture"
  collides with the glossary/NFR-FLEX-15 term (transparent vs MITM). Renamed
  to "Egress allow-list enforcement mechanism" + cross-link SEC-08↔SEC-17.
- NFR-SEC-05 named "MITM-friendly egress" — but it anchors the whole Egress
  zone and MITM is the opt-in mode, not the default. Renamed "Single
  forward-proxy egress (customer-CA injectable; MITM-inspecting opt-in)".
- primitives-backlog: same two renames.

Also: NFR-SEC-42 (new, v2/tbd) — skill-registry supply chain (author-uploaded,
server-signed, provenance-tracked, signature-verified at attach), pairs with
SkillProvider NFR-SEC-24; tracks #179.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…d example

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Yambr Yambr merged commit ce5b223 into next/v1 May 30, 2026
17 checks passed
@Yambr Yambr deleted the docs/reconcile-credential-egress-injection branch May 30, 2026 16:05
@Yambr Yambr changed the title docs(architecture): reconcile credential model — strict Anthropic style docs(architecture): reconcile credential model — strict deny-by-default style Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant