Skip to content

Feature: Rewrite CI for dbt_artifacts and close pull_request_target vulnerability#554

Merged
mtcarlone merged 16 commits into
mainfrom
feat/improve-ci
May 18, 2026
Merged

Feature: Rewrite CI for dbt_artifacts and close pull_request_target vulnerability#554
mtcarlone merged 16 commits into
mainfrom
feat/improve-ci

Conversation

@mtcarlone
Copy link
Copy Markdown
Contributor

Security: rework CI to close pull_request_target vulnerability

TL;DR

Replaces the entire CI surface with a three-tier model gated by event
type and branch protection. Closes a live pull_request_target
vulnerability
that had been exposing warehouse credentials to PR code
on every PR, and restores integration test signal that has been broken
since the prior CI was disabled.


Security significance — read this first

Two workflow files were live on main and firing on every PR with full
warehouse credentials in scope:

  • .github/workflows/ci_test_latest_version_on_feature_branch.yml
  • .github/workflows/ci_test_supported_versions_on_feature_branch.yml

Both used pull_request_target (runs in the base repo context, with
secrets) and then checked out the PR head SHA. That gives
attacker-controlled PR code direct execution against Snowflake / GCP
Workload Identity Federation / Databricks
credentials.


The new model

Three tiers, gated by GitHub event:

Tier Trigger Secrets What runs
1pr.yml pull_requestmain None Postgres + Trino + SQL Server in Docker
2main.yml pushmain, manual Yes Tier 1 + lint + Snowflake + BigQuery on latest dbt
3release.yml pushrelease-candidate/**, manual, weekly Yes Tier 2 + 42-entry warehouse × dbt-version matrix

The critical change: Tier 1 uses pull_request (not
pull_request_target). Secrets are structurally absent from the
runner — not gated by approval, just not there. Branch protection on
main is the trust gate before Tier 2/3 can run.

All third-party actions SHA-pinned (not tag-pinned) for supply-chain
safety. Dependabot configured for weekly grouped re-resolution.


What's in this PR

  • scripts/ci/ — single source of truth for test execution. Local
    dev and CI both invoke the same scripts.
  • compose.yml — Postgres / Trino / SQL Server with healthchecks
    and shifted host ports (55432 / 58080 / 51433) to avoid clashing
    with developer-local services.
  • tox.ini — modernized: added dbt 1.10 / 1.11 envs, tightened
    adapter caps to <2.0.0, bumped sqlfluff to latest 3.x.
  • Three new workflows plus cut-release-candidate.yml (a
    workflow_dispatch cutter that bumps version + creates a
    release-candidate branch + pushes; auto-bump, not auto-merge).
  • .github/CODEOWNERS + .github/dependabot.yml for
    defense-in-depth on sensitive paths and weekly action-SHA updates.
  • Tracked pyproject.toml + uv.lock — both were gitignored.
    Without this fix, every workflow we added would have failed on the
    first CI run because uv sync would have had nothing to sync from.
    Latent P0 surfaced and resolved in passing.
  • Trino microbatch model gated with enabled = target.type != 'trino' (memory connector doesn't support MERGE).
  • pyarrow[pandas] extra dropped — extra doesn't exist; was
    emitting a uv warning.
  • Removed: Clean up of in progress files, not-secure actions
  • Docs: new specs/ci-rework/README.md (design + threat model +
    tech debt backlog), new docs/dev-workflow.md (feature → tag
    walkthrough), rewritten CONTRIBUTING.md, refreshed
    docs/MAINTAINERS.md (with organisation-specific identifiers
    scrubbed so the file is usable as a template), updated README.md
    badges.

How to review this

Suggested reading order for design context: specs/ci-rework/README.md
first, then commit-by-commit.


⚠ Action required before / at merge

  1. Update branch protection on main to reference the new Tier 1
    status checks (integration-local for postgres / trino /
    sqlserver) rather than the deleted main_test_package.yml jobs.
  2. Consider flagging this as security-relevant in the release notes
    for whatever release ships it

What's deferred (documented in specs/ci-rework/README.md §12)

  • Drop EOL dbt versions (1.3-1.6) → planned for 3.0.0. Draft
    public announcement included in §12.3.1 of the spec.
  • sqlfluff 4.x bump → coordinate with 3.0.0; migration plan in
    §12.2.1.

Test plan

  • Live-verified ./scripts/ci/test.sh postgres (PASS=49, ERROR=0)
  • Live-verified ./scripts/ci/test.sh trino (PASS=48, ERROR=0)
  • Live-verified ./scripts/ci/test.sh sqlserver (PASS=49, ERROR=0)
  • Live-verified ./scripts/ci/test.sh postgres 1_10_0 (new pinned
    env)
  • Validated pr.yml, main.yml, release.yml,
    cut-release-candidate.yml via act --list and YAML parsing
  • First real PR push tests pr.yml end-to-end on GitHub
  • First push to main after merge tests main.yml end-to-end on
    GitHub (incl. WIF for BigQuery)
  • First release-candidate/X.Y.Z push tests release.yml
    end-to-end

The scripts/ directory holds the test driver layer that local
development and CI workflows both invoke. No separate "how CI runs
tests" implementation that drifts from "how I run tests locally."

  setup.sh, compose-up.sh, compose-down.sh,
  lint.sh, test.sh, test-all-local.sh, _lib.sh, README.md

compose.yml updates:
- Healthchecks on postgres / trino / sqlserver so `compose up --wait`
  is deterministic; old 15-second sleep workaround no longer needed.
- Host ports shifted to 55432 / 58080 / 51433 to avoid clashing with
  local Postgres / web apps / SQL Server. profiles.yml updated to
  match.
- Image tags pinned (postgres:15-alpine, trinodb/trino:476).
- Broken sqlserver post_start removed (was installing ODBC inside
  the container, where it's useless — the driver belongs on the host).

tox.ini modernization:
- Add envs for dbt 1.10 and 1.11 (snowflake/bigquery/databricks).
  Postgres/Trino get 1.10 only; SQL Server stays at 1.9 (latest).
- Tighten unversioned envs from <3.0.0 to <2.0.0.
- Bump sqlfluff range from ~=3.0.0 to ~=3.0; companion dbt-snowflake
  ~=1.11.0 (was stale at 1.8.0).
- Commented stub for dbt Fusion.

integration_test_project/models/microbatch.sql:
- Gate with `enabled = target.type != 'trino'`. Trino's memory
  connector doesn't support row-level MERGE which the microbatch
  incremental_strategy requires.
Both files were gitignored. The new scripts/ci/setup.sh delegates
to `uv sync`, which needs pyproject.toml + uv.lock to install deps.
Without this change, every workflow we just added would have failed
on its first CI run — a freshly-checked-out runner would have had
nothing to sync from.

Standard pattern for any uv-managed project: track the manifest and
the lock file so fresh clones reproduce the exact same dep tree.

.gitignore changes:
- Removed `*.lock` (was catching uv.lock; Pipfile.lock is still
  explicitly listed)
- Removed `pyproject.toml`
- Added `__pycache__/` and `*.pyc` (for Python helper scripts)

pyproject.toml: pyarrow[pandas]<19.0.0 -> pyarrow<19.0.0. The
[pandas] extra doesn't exist on pyarrow and emitted a uv warning.
go-task-bin removed (superseded by scripts/ci/; Taskfile.yml deleted
in a later commit).
pr.yml runs on every pull_request to main, including PRs from forks.
Uses the `pull_request` event (NOT `pull_request_target`) so secrets
are structurally absent from the runner — not gated by approval,
absent entirely.

Jobs: matrix of postgres / trino / sqlserver via compose, latest dbt
only. fail-fast: false. concurrency cancels in-progress on PR push.

permissions: contents: read only. No id-token, no escalation.

Actions SHA-pinned (not tag-pinned):
  actions/checkout    @34e114876b0b11c390a56381ad16ebd13914f8d5 (v4)
  astral-sh/setup-uv  @caf0cab7a618c569241d31dcd442f54681755d39 (v3)

Lint is intentionally deferred to Tier 2 — the package's models call
adapter methods at compile time and the sqlfluff dbt-templater needs
a real Snowflake connection. See specs/ci-rework.md §9.
main.yml runs on push to main and workflow_dispatch. First workflow
that touches secrets. The trust gate is branch protection on main —
by the time this fires, a human has signed off on the diff via PR.

Four jobs with per-job secret minimization:
  lint                  Snowflake creds only
  integration-local     No secrets — re-runs Tier 1 matrix on
                        post-merge code to catch merge-resolution
                        issues PR-tier couldn't see
  integration-snowflake Snowflake creds only
  integration-bigquery  GCP_PROJECT + id-token: write (only job
                        with id-token; for Workload Identity
                        Federation)

GCP WIF preserved from pre-rework workflow — no static keys.

concurrency cancel-in-progress: false. Each merge to main deserves
its own validation.

Adds google-github-actions/auth
  @c200f3691d83b41bf9bbd8638997a462592937ed (v2)
release.yml: validation gate before tagging. Fires on push to
release-candidate/**, workflow_dispatch, and a weekly schedule
(Mondays 06:00 UTC) for regression detection against main.

Three jobs:
  lint                          Same as Tier 2
  version-matrix                42 entries (5 unversioned "latest" +
                                37 pinned warehouse x dbt-version).
                                max-parallel: 8 to throttle Snowflake
                                load; runs in ~30-40 min.
  integration-databricks-stub   Visible-but-skipped placeholder

id-token: write granted only at the matrix job level for the
conditional bigquery WIF step.

CODEOWNERS: defense-in-depth review requirement on .github/,
scripts/ci/, compose.yml, tox.ini, pyproject.toml, uv.lock, package
metadata, and specs/. NOTE: the team handle
@brooklyn-data/dbt-artifacts-maintainers is a PLACEHOLDER — replace
with the real team slug before relying on it.

dependabot.yml: weekly grouped action SHA updates. Closes the
maintenance gap that SHA-pinning otherwise creates. Python deps
intentionally NOT auto-bumped (dbt/adapter versions are test
surface area; need deliberate tox.ini coordination).
scripts/release/cut-candidate.py: reads current version from
dbt_project.yml, computes the bump (--patch / --minor / --major /
--version X.Y.Z), updates dbt_project.yml + README.md quickstart
example, creates release-candidate/X.Y.Z, commits, pushes. The
push fires Tier 3 (release.yml) for the full matrix.

Safety guards: must be on main, working tree must be clean, target
branch must not already exist locally or on origin.

.github/workflows/cut-release-candidate.yml: workflow_dispatch
shim. Inputs: bump type (patch/minor/major) or explicit X.Y.Z.
permissions: contents: write (needs to push the new branch).

Auto-bump, not auto-merge: the maintainer drives the tag and any
merge-back to main manually after reviewing Tier 3 results. Same
script runs locally: `./scripts/release/cut-candidate.py --minor`.
Two workflow files were live on main and firing on every PR with
full warehouse credentials in the runner environment:

  ci_test_latest_version_on_feature_branch.yml
  ci_test_supported_versions_on_feature_branch.yml

Both used `pull_request_target` (runs in base repo context, with
secrets in scope) and then checked out the PR head SHA via
`ref: ${{ github.event.pull_request.head.sha }}`. That gives
attacker-controlled PR code direct execution against Snowflake /
GCP Workload Identity Federation / Databricks credentials.

The state was incorrectly believed to be "disabled" because prior
commits had commented out the OTHER test workflows in this
directory; these two `*_on_feature_branch.yml` files were missed.

This commit removes both. The prior commits in this PR added the
replacement CI surface (pr.yml + main.yml + release.yml) using the
`pull_request` event, which is structurally incapable of leaking
secrets to PR code. Fork contributors keep PR signal through the
new Tier 1 workflow.

If a security audit asks "when was the pull_request_target exposure
closed?" — this commit.
Cleanup of files made redundant by the new scripts/ci/ + tiered-
workflow design. None of these are referenced by anything that
remains in the repo.

Deleted:

  .github/workflows/ci_test_package.yml             (commented out)
  .github/workflows/main_test_package.yml           (commented out)
  .github/workflows/main_test_latest_version.yml    (commented out)
  .github/workflows/main_test_supported_versions.yml (commented out)
  .github/workflows/main_lint_package.yml           (active; replaced
                                                     by main.yml's
                                                     lint job)
  Taskfile.yml                                      (parallel
                                                     orchestrator;
                                                     superseded by
                                                     scripts/ci/)
  integration_test_project/run_tests.sh             (replaced by
                                                     scripts/ci/
                                                     test-all-local.sh)
  integration_test_project/docker-compose.yml.bak   (stale duplicate)
  init-scripts/progress.sh                          (15s sleep loop;
                                                     replaced by
                                                     docker
                                                     healthchecks)

Post-cleanup `.github/workflows/` contains exactly four files:
pr.yml, main.yml, release.yml, cut-release-candidate.yml, plus the
unchanged publish_docs_on_release.yml.
Captures the design rationale, threat model, and tech debt backlog
for the three-tier CI rework that landed in the preceding commits.
Also refreshes every contributor-facing doc to match the new flow,
and scrubs organisation-specific identifiers so the docs are usable
as a template by anyone forking the package.

New files
---------

  specs/ci-rework/README.md
  Durable design record. 13 sections covering background, threat
  model (why pull_request_target was unsafe and how the new design
  closes the exposure structurally), the three-tier model, single
  source of truth in scripts/ci/, branching model, per-step
  implementation history, and a tech debt backlog with two
  prepared deprecation plans:

      §12.3.1  dbt 1.3-1.6 EOL drop  (bundled with 3.0.0)
      §12.2.1  sqlfluff 4.x bump      (bundled with 3.0.0)

    The EOL section includes a draft public announcement that can be
    lifted verbatim into release notes / a GitHub Discussion when
    ready.

    Layout convention: specs/<topic>/README.md lets each spec grow
    to include related artifacts (diagrams, draft announcements)
    without cluttering the top-level specs/ dir.

  docs/dev-workflow.md
  End-to-end walkthrough: feature branch -> PR -> main ->
  release-candidate -> tag. Workflow-to-tier table, hotfix flow,
  weekly regression detail, cross-links to the spec.

Updated files
-------------

  CONTRIBUTING.md
  Rewritten for the uv + scripts/ci/ flow. Adds a "what CI will
  and won't do on your PR" section so fork contributors understand
  the Tier 1 scope (no lint, no cloud DWH — by design, not
  oversight).

  README.md
  Badges replaced (the old ones linked to now-deleted workflows).
  Contributing section expanded with pointers.

  docs/MAINTAINERS.md
    - Removed the dead "Approve Integration Tests" section (that
      deployment environment is gone with the new tiered CI).
    - Rewrote the release procedure for the release-candidate/X.Y.Z
      flow.
    - Scrubbed organisation-specific identifiers (internal Slack
      channel names, account/project names) and replaced with
      placeholders so the file is usable as a template.

  scripts/ci/README.md
  .github/dependabot.yml
  .github/workflows/{pr,main,release}.yml
  Link / comment path updates pointing at the new
  specs/ci-rework/README.md location.

  .gitignore
  Adds CLAUDE.md (AI-assistant orientation file kept local while
  the team decides between AI tools; trivial to track later by
  removing the line or `git add -f CLAUDE.md`).
@mtcarlone mtcarlone merged commit 2392b16 into main May 18, 2026
3 checks passed
@mtcarlone mtcarlone deleted the feat/improve-ci branch May 18, 2026 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant