Follow-up to #238 (issue #129)
While hardening instance/node ID collisions in #238, the reviewers and I agreed to defer three small, independent items so they didn't expand that PR. Filing them together here. None is a correctness defect in the shipped change — they are test-coverage and operability improvements. They can be picked up independently.
Related: #129 (origin), #238 (the change), #241 (schema-snapshot gate follow-up), #261 (ID pruning, future).
1. Focused regression test for update_node_status instance scoping + placeholder/bind ordering (SF5)
Where: src/activities/update_node_status.rs
Problem. The activity builds three dynamic SQL shapes with conditional placeholder numbering — result-present uses $4, while running-no-result and non-running use $3 — and assembles the binds afterward. #238 added a single-row assertion (expected exactly 1) that now catches a wrong cardinality (0 or >1 rows), but it does not catch a wrong placeholder/bind that still updates exactly one row with the wrong column value. The only workflow that exercises this path (tests/e2e/sql/51_node_composite_pk.sql) hits just the happy path and never drives the no-result / non-running branches, so a placeholder↔bind regression in those branches could land silently.
Suggested action. Add a focused regression test that exercises all three branches and asserts the resulting row values (status, result, updated_at), plus behavior over duplicate (instance_id, id) fixtures (one matching instance, one decoy) to prove the instance_id scope is honored. Equivalently, extract the SQL/param-shape construction into a pure helper and unit-test the exact SQL string + ordered param roles for each branch.
2. Friendlier upgrade abort on a pre-0.2.2 NULL instance_id row (C4)
Where: sql/pg_durable--0.2.3--0.2.4.sql (the ALTER COLUMN instance_id SET NOT NULL in the nodes_pkey promotion block, ~L219–220)
Problem. nodes_instance_id_present_chk is NOT VALID, so it does not cover rows written before 0.2.2. If such a database still holds a NULL-instance_id node row, SET NOT NULL aborts with the bare native error column "instance_id" ... contains null values. The abort is safe — it rolls back atomically and the extension stays at 0.2.3 — but the remediation (backfill or remove the offending rows) is only described in a SQL comment and in docs/upgrade-testing.md, so the operator sees a cryptic message with no inline guidance.
Suggested action. Precede the SET NOT NULL with a DO block that counts offending rows and RAISE EXCEPTION with the offending count and the backfill instruction, preserving the atomic rollback. Add an upgrade-test fixture (a seeded NULL-instance_id row) asserting the clear error is raised and the transaction rolls back.
3. Residual: assert the real short_id() format invariant (SF6 — mostly done in #238)
Where: src/types.rs (short_id), src/dsl.rs (pick_id_with_retry + its unit tests)
Status. The core of the original SF6 finding was already addressed in #238: ID selection was extracted into the shared pick_id_with_retry helper, which now has unit tests for first-try success, re-roll on collision, exhaustion-without-returning-an-unverified-id, and claim-error propagation. This item is the small leftover only.
Problem. Those unit tests inject fake ID generators, so they never assert that the real short_id() actually produces the ^[0-9a-f]{8}$ shape the schema and retry logic assume. A future change to short_id() (length/charset) would not be caught.
Suggested action. Add a unit test asserting short_id() matches ^[0-9a-f]{8}$. Optionally, a light e2e assertion on the format of a freshly generated instance ID.
Acceptance criteria
Follow-up to #238 (issue #129)
While hardening instance/node ID collisions in #238, the reviewers and I agreed to defer three small, independent items so they didn't expand that PR. Filing them together here. None is a correctness defect in the shipped change — they are test-coverage and operability improvements. They can be picked up independently.
Related: #129 (origin), #238 (the change), #241 (schema-snapshot gate follow-up), #261 (ID pruning, future).
1. Focused regression test for
update_node_statusinstance scoping + placeholder/bind ordering (SF5)Where:
src/activities/update_node_status.rsProblem. The activity builds three dynamic SQL shapes with conditional placeholder numbering — result-present uses
$4, while running-no-result and non-running use$3— and assembles the binds afterward. #238 added a single-row assertion (expected exactly 1) that now catches a wrong cardinality (0 or >1 rows), but it does not catch a wrong placeholder/bind that still updates exactly one row with the wrong column value. The only workflow that exercises this path (tests/e2e/sql/51_node_composite_pk.sql) hits just the happy path and never drives the no-result / non-running branches, so a placeholder↔bind regression in those branches could land silently.Suggested action. Add a focused regression test that exercises all three branches and asserts the resulting row values (
status,result,updated_at), plus behavior over duplicate(instance_id, id)fixtures (one matching instance, one decoy) to prove theinstance_idscope is honored. Equivalently, extract the SQL/param-shape construction into a pure helper and unit-test the exact SQL string + ordered param roles for each branch.2. Friendlier upgrade abort on a pre-0.2.2 NULL
instance_idrow (C4)Where:
sql/pg_durable--0.2.3--0.2.4.sql(theALTER COLUMN instance_id SET NOT NULLin thenodes_pkeypromotion block, ~L219–220)Problem.
nodes_instance_id_present_chkisNOT VALID, so it does not cover rows written before 0.2.2. If such a database still holds a NULL-instance_idnode row,SET NOT NULLaborts with the bare native errorcolumn "instance_id" ... contains null values. The abort is safe — it rolls back atomically and the extension stays at 0.2.3 — but the remediation (backfill or remove the offending rows) is only described in a SQL comment and indocs/upgrade-testing.md, so the operator sees a cryptic message with no inline guidance.Suggested action. Precede the
SET NOT NULLwith aDOblock that counts offending rows andRAISE EXCEPTIONwith the offending count and the backfill instruction, preserving the atomic rollback. Add an upgrade-test fixture (a seeded NULL-instance_idrow) asserting the clear error is raised and the transaction rolls back.3. Residual: assert the real
short_id()format invariant (SF6 — mostly done in #238)Where:
src/types.rs(short_id),src/dsl.rs(pick_id_with_retry+ its unit tests)Status. The core of the original SF6 finding was already addressed in #238: ID selection was extracted into the shared
pick_id_with_retryhelper, which now has unit tests for first-try success, re-roll on collision, exhaustion-without-returning-an-unverified-id, and claim-error propagation. This item is the small leftover only.Problem. Those unit tests inject fake ID generators, so they never assert that the real
short_id()actually produces the^[0-9a-f]{8}$shape the schema and retry logic assume. A future change toshort_id()(length/charset) would not be caught.Suggested action. Add a unit test asserting
short_id()matches^[0-9a-f]{8}$. Optionally, a light e2e assertion on the format of a freshly generated instance ID.Acceptance criteria
update_node_statushas a focused test covering all three SQL branches, asserting correct row values, plus correct scoping over duplicate(instance_id, id)fixtures. (SF5)instance_idnode row raises a clear, actionable error (offending count + backfill instruction) and rolls back atomically; covered by an upgrade-test fixture. (C4)short_id()'s^[0-9a-f]{8}$format invariant is unit-tested. (SF6 residual)