docs(adr): propose ADR 0044 gate workflows:write on repo secret allowlist#1739
docs(adr): propose ADR 0044 gate workflows:write on repo secret allowlist#1739ifireball wants to merge 1 commit into
Conversation
Site previewPreview: https://65ad1ce2-site.fullsend-ai.workers.dev Commit: |
Records minimal-privilege design for fullsend-ai#470/fullsend-ai#788: triage intent labels, target-repo secret assessment in code only, PR labels for fix elevation. Update architecture.md. Implementation blocked until ADR 41 lands. Signed-off-by: Barak Korren <bkorren@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>
fad4933 to
834cb24
Compare
ReviewFindingsMedium
Low
|
ralphbean
left a comment
There was a problem hiding this comment.
I think this needs one change before we can merge. See inline.
| 4. **Default `GITHUB_TOKEN` permissions for workflows** — `GET /repos/{owner}/{repo}/actions/permissions/workflow`. Assessment fails unless defaults are read-only for contents and packages (and do not grant write-all). This blocks elevation when a rogue workflow could exfiltrate via the token even with no named secrets. | ||
|
|
||
| **Allowlist:** Every secret name from (1)–(3) must be in builtins ∪ configurable allowlist. (4) is a separate boolean gate, not a name list. | ||
|
|
There was a problem hiding this comment.
[critical] I think we should be explicit here that check (4) is defense-in-depth, not a reliable gate. GitHub Actions lets individual workflow YAML files set their own permissions: block, escalating beyond the repo default. A rogue workflow pushed via workflows: write can include permissions: contents: write regardless of the repo-level default — the default only controls what workflows get when they omit the permissions: key entirely.
So this check catches repos where the defaults are already wide open (a signal that the org isn't thinking about least privilege), but it can't actually prevent a rogue workflow from obtaining a write-scoped GITHUB_TOKEN. The assessment as written treats it as a hard gate alongside (1)–(3), which I think overstates what it provides.
There was a problem hiding this comment.
hmm, you are right, I was focused on secret exfiltration, but writing into main is also possible via a rogue workflow.
So perhaps I should refocus on trying to ensure the pushed bot commit does not create or modify any workflows that would trigger as a result of the merge itself, or perhaps add that in addition to the mechanism I have already described.
WDYT?
| ### Labels (two roles) | ||
|
|
||
| | Label | On | Meaning | Who may set (provenance) | | ||
| |-------|-----|---------|---------------------------| |
There was a problem hiding this comment.
[moderate, non-blocking] The authorization floor for the PR label override is OWNER | MEMBER | COLLABORATOR — same as /fs-fix. In large orgs, COLLABORATOR can be pretty broad (external contributors with push access). Since this override bypasses assessment entirely, do we want a tighter floor here (OWNER | MEMBER only)? Or does matching the /fs-fix authorization rule feel right for consistency?
|
This only prevent our agents from submitting a PR that wants to extract secrets, but what people is doing without agents? They shoyld have some prevention mechanism so GH does not run workflows without permission, right? Can't we just use that and do not implement this? |
This is not about running workflows, its about writing workflow files. Unauthorized humans typically create branches in their forks and PRs out of those - so as long as the PR isn't merged, a contributor cannot get his workflow to run in the context of the target repo. Bots, however, create PRs by pushing branches directly into the target repo, such a branch can contain a new workflow that would trigger on push. To mitigate that, GitHub blocks commits pushed with workflow file modifications unless the pusher has a token with a "workflow: write" permission. This whole PR and set of issues is about finding a way to grant "workflow: write" to the code agent so that it can edit workflows without causing it to become an attack platform into the repo. |
|
Ok, so the current protection is forcing contributors to open a fork. |
Yes, if you don't trust them. don't let them push (write) into the repo... |
waynesun09
left a comment
There was a problem hiding this comment.
Review squad (5 agents: 2x claude-coder, claude-researcher, gemini-code-review, cursor-code-review) — 14 findings total after dedup, 4 false positives removed.
Filtered out findings already raised by ralphbean (check 4 GITHUB_TOKEN, COLLABORATOR floor) and fullsend-ai-review[bot] (status field, architecture.md section placement). 5 unique MEDIUM+ findings posted inline.
Overall: Core design is sound. The main gap is that several mechanism details (label provenance verification, environment fail-closed criteria, allowlist matching semantics) need to be specified precisely enough for implementers to build consistent behavior. The workflow_secret_allowlist config field should be framed as a new introduction, not a reference to existing schema.
| **Allowlist:** Every secret name from (1)–(3) must be in builtins ∪ configurable allowlist. (4) is a separate boolean gate, not a name list. | ||
|
|
||
| **Builtin secret names (hardcoded):** `FULLSEND_GCP_WIF_PROVIDER`, `FULLSEND_GCP_PROJECT_ID`, `FULLSEND_DISPATCH_TOKEN`. | ||
|
|
There was a problem hiding this comment.
[high] workflow_secret_allowlist is referenced as if it already exists in ADR 0011 and ADR 0033, but neither ADR mentions this field and internal/config/config.go has no corresponding struct field (RepoConfig contains only Roles and Enabled; PerRepoConfig has only Version, KillSwitch, and Roles). This ADR is introducing a new config field, not referencing an existing one — the text should say so explicitly, and note that implementation requires schema changes to the Go structs and both referenced ADRs.
Additionally, "No prefix wildcards" is ambiguous — it could be read as "no wildcards at the prefix position" (allowing suffix wildcards like *_API_KEY). If the intent is exact-match-only, say "Exact secret name match only; no wildcards. Matching is case-insensitive (consistent with GitHub)."
| | `fullsend-workflow-write` | **PR** | Fix (and subsequent code pushes on this PR) may mint with `workflows: write` | **`{org}-coder[bot]`** after code passes assessment, **or** human with **OWNER \| MEMBER \| COLLABORATOR** (e.g. workflow edits were not foreseen by triage or the first code run) | | ||
|
|
||
| ### What assessment means | ||
|
|
There was a problem hiding this comment.
[medium] The threat model scopes to "runs on push to an agent branch" but on: push is not the only trigger that fires without review. A rogue workflow could use on: workflow_run (fires when another workflow completes), on: create (fires on branch creation), or other non-PR-gated triggers to execute on agent branches without merge or review.
Consider broadening to: "if the coder agent adds a workflow on source_repo that runs on an agent branch via any non-PR-gated trigger (on: push, on: create, on: workflow_run, etc.), what credentials could that workflow obtain?"
| **In scope — enumerate and allowlist-check:** | ||
|
|
||
| 1. **Repository Actions secrets** — `GET /repos/{owner}/{repo}/actions/secrets` | ||
| 2. **Organization Actions secrets visible to the repository** — `GET /repos/{owner}/{repo}/actions/organization-secrets` |
There was a problem hiding this comment.
[medium] "fail closed when policies are permissive" is underspecified — what counts as "permissive" for agent branches?
GitHub environment deployment-branch policies can be: (a) no restriction / "All branches", (b) "Protected branches only", (c) specific branch name patterns. The ADR should define which configurations are permissive in the context of agent branches. For example: fail closed if policy is "All branches" or includes patterns matching agent branch naming (fullsend/*, *, etc.). Environments restricted to specific protected branches where agent branches cannot match could be excluded.
Also: should environment protection rules (required reviewers, wait timers) factor in, or are they out of scope since a rogue workflow could bypass them with the right token?
| **Gate:** `grant_workflows = true` only when (1)–(3) pass the allowlist **and** (4) passes. On success, mint with `workflows: write` and add `fullsend-workflow-write` on the PR. On failure, mint without it; workflow output names unexpected secrets and/or which non-read-only `GITHUB_TOKEN` default blocked the grant. | ||
|
|
||
| ### When assessment runs | ||
|
|
There was a problem hiding this comment.
[medium] Label provenance verification via "issue labeled timeline" is underspecified. The ADR should clarify:
- Which event is authoritative? If a label is added by an unauthorized user, removed, then re-added by an authorized user (or vice versa), does the check use the most recent
labeledevent or any matching event? - Timing: Is the actor's association (OWNER/MEMBER/COLLABORATOR) checked at the time the label was applied, or at workflow run time? Associations can change between the two.
- Label removal semantics: If
fullsend-needs-workflow-writeis removed from the issue after assessment passes but before fix runs, does that invalidate the PR label grant? Similarly, does removingfullsend-workflow-writefrom the PR revoke elevation for subsequent fix runs?
Recommend specifying: "The most recent labeled timeline event for the label name must have a trusted actor. Label removal from the PR revokes elevation for subsequent runs."
| **Do not implement until ADR 41 is merged.** Touch triage post-script/agent, assess script/composite, `reusable-code.yml`, `reusable-fix.yml`, mint/coder API permissions as needed, and `config.yaml` v1. | ||
|
|
||
| ## Consequences | ||
|
|
There was a problem hiding this comment.
[medium] The consequences section covers the happy path well but omits key failure modes that implementers will need to handle:
- API errors during assessment — should fail closed (no elevation), but this isn't stated.
- Assessment staleness — secrets or environments can change between the code assessment and subsequent fix runs. The PR label represents a point-in-time grant, not a continuous assessment. Worth noting explicitly.
- UX on assessment failure — what feedback does the coder give when assessment fails? Does it produce a PR without workflow changes? Output which secrets blocked elevation? Both?
- Triage guidance — triage agent guidance and
label_actionsconfig need updating to recognize when issues require workflow edits; failure to label means the coder can't push workflow changes, leading to a failed run that must be retried after manual labeling.
|
Following @ralphbean comment, it seems the direction I was going for here has some major security flaws in it and it does not sufficiently address the code injection risk. I am moving the PR to draft for now to reconsider the approach. |
|
As mentioned before, the approach here was troublesome. PR #2548 was yet another attempt to fix the underlying issue (Now closed as well, but with a suggested plan at the closing comment) |
|
🤖 Finished Retro · ✅ Success · Started 9:59 AM UTC · Completed 10:05 AM UTC |
Retro: PR #1739 — ADR 0044 (gate
|
Summary
Proposes ADR 0044 (Proposed): a minimal-privilege path to grant the coder/fix GitHub App
workflows: writeonly when target-repo workflow exposure is acceptable — without implementing the full repo-readiness checklist from #788.Background / why this ADR exists
.github/workflows/changes today; GitHub requiresworkflowspermission (see failed push in that issue).workflows: writewithout repo hardening enables prompt-injection via newon: pushworkflows on agent branches; issue argues for a gate before elevation. PR #787 was closed pending this work.reusable-code.yml/reusable-fix.ymlneeds stable dispatch.Design discussion refined the #788 “full assessment” into a narrower, deterministic gate:
contents: writealone does not allow authoring workflow files;workflows: writeis the distinct escalation.source_repo: repository secrets, org secrets visible to the repo, environment secrets (with fail-closed deployment-branch rules), plus defaultGITHUB_TOKENworkflow permissions.fullsend-needs-workflow-writeon the issue; builtins (FULLSEND_GCP_*,FULLSEND_DISPATCH_TOKEN) are hard-allowlisted; admins may extend via explicitworkflow_secret_allowlistinconfig.yaml(noFULLSEND_*prefix wildcard).fullsend-workflow-writefrom coder (post-assess) or an authorized human (OWNER | MEMBER | COLLABORATOR).Test plan
pre-commitADR linters pass on the new fileMade with Cursor