Skip to content

Bug: Fix CI Error In Production#557

Merged
mtcarlone merged 3 commits into
mainfrom
bug/fix-ci
May 18, 2026
Merged

Bug: Fix CI Error In Production#557
mtcarlone merged 3 commits into
mainfrom
bug/fix-ci

Conversation

@mtcarlone
Copy link
Copy Markdown
Contributor

Post-merge CI fixes: lint, SQL Server, Node 20 mitigation

Cleanup PR for issues surfaced by the first real main.yml run after the
CI rework landed. Three independent fixes:

1. Fix lint violations in dim_dbt__current_models + staging models

The new Tier 2 lint job caught real style debt the prior CI was either
tolerating or never running cleanly. Two rules involved:

  • LT08 (layout.cte_newline) — 8 violations across 7 files.
    Auto-fixed by sqlfluff fix: blank line inserted after each inline
    CTE closing bracket.
  • RF02 (references.qualification) — 1 violation in
    dim_dbt__current_models.sql. Resolved by aliasing the inner
    subquery's base table to b and qualifying the reference there.
    Outer WHERE stays unqualified to satisfy RF03.

No SQL semantics changed — layout and reference-style only.

2. Fix compose-up.sh for short-lived init containers

The Tier 2 SQL Server lane was failing even though init.sql ran
successfully. Root cause: docker compose up --wait treats not running
as a failure regardless of exit code, so a container that runs to
completion makes --wait return non-zero.

This worked locally because the configurator was still mid-execution
when --wait polled — a timing race that masked the bug on dev
laptops. CI runners are fast enough to lose the race.

Fix: split the service list in scripts/ci/compose-up.sh into
long-running (use --wait) vs. init containers (use docker compose wait — a different command that blocks on container exit and
surfaces the exit code). Also retracted a wrong comment in compose.yml.

3. Mitigate Node 20 deprecation warning

GitHub Actions is migrating its default JavaScript runtime from Node
20 to Node 24. Our SHA-pinned action versions
(actions/checkout@v4, astral-sh/setup-uv@v3,
google-github-actions/auth@v2) are Node 20-based and would fail
after the June 2, 2026 default flip.

Added ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION: "true" at workflow
scope in all four CI workflows. Keeps things running through the flip
and until we regression-test against Node 24-compatible action
versions.

⚠ Hard deadline: 2026-09-16. On that date Node 20 is removed
entirely. Before then, bump action SHAs to Node 24-compatible
releases and remove the env block. Tracked in
specs/ci-rework/README.md §12.5. Dependabot should surface the
bumps automatically.


Test plan

  • ./scripts/ci/lint.sh clean locally
  • ./scripts/ci/test.sh sqlserver passes end-to-end locally (new
    init-container path)
  • All four workflow files validate as YAML
  • First push to main after merge: lint and SQL Server lanes
    both green (previously both red)
  • Node 20 warning persists (expected — the actions themselves
    didn't change) but workflows don't fail when the June flip
    lands

mtcarlone added 3 commits May 18, 2026 17:25
Surfaced by the new Tier 2 lint job — the prior CI was either
tolerating these or never ran sqlfluff cleanly against the models.
The Tier 2 SQL Server lane was failing on CI even though the
sqlserver-configurator container ran init.sql successfully and
exited 0. Root cause: `docker compose up --wait` treats "not
running" as a failure regardless of exit code, so a container
that runs to completion makes --wait return non-zero.

This worked on developer laptops because the configurator was
still mid-execution when --wait polled — a timing race that
masked the bug. CI runners are fast enough to lose the race.

Fix: split the service list in scripts/ci/compose-up.sh into
long-running services (healthchecked, brought up with --wait)
and init containers (brought up plain, then waited on with
`docker compose wait`, a separate command that blocks on exit
and surfaces the exit code).

Convention used: anything matching `*-configurator` is treated
as an init container. Easy to extend if other init sidecars are
added later.

Also retracted a wrong assertion in compose.yml — the comment on
sqlserver-configurator claimed --wait treats exit-0 as success.
It does not. Replaced with an accurate description of the
constraint and a pointer to how compose-up.sh handles it.

Verified locally:
./scripts/ci/test.sh sqlserver   # PASS=49 ERROR=0
GitHub Actions is migrating its default JavaScript runtime from
Node 20 to Node 24. The currently SHA-pinned action versions we
use (actions/checkout@v4, astral-sh/setup-uv@v3,
google-github-actions/auth@v2) are Node 20-based and would fail
after the June 2, 2026 default flip.

Set `ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true` at workflow
scope in all four CI workflows (pr.yml, main.yml, release.yml,
cut-release-candidate.yml). This keeps the existing pins working
through the June flip and until we have time to regression-test
the workflows against Node 24-compatible action versions.

⚠ Hard deadline: 2026-09-16. On that date GitHub removes Node 20
from the runner entirely. The env var stops working then. Before
that date, bump each action SHA to a Node 24-compatible release,
verify all four workflows pass, then remove the env block from
each file.

Tracked in specs/ci-rework/README.md §12.5 alongside the existing
operational debt items. Dependabot (already configured) should
surface compatible bumps automatically once the upstream actions
tag Node 24 releases.
@mtcarlone mtcarlone merged commit 6d37fa0 into main May 18, 2026
3 checks passed
@mtcarlone mtcarlone deleted the bug/fix-ci branch May 18, 2026 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant