perf(reparent): bulk SQL + concurrency hardening (4/4)#468
Conversation
🔐 Codex Security Review
Review SummaryOverall Risk: MEDIUM Findings[MEDIUM] Rack reparent lock does not serialize same-device moves
NotesNo auth, SQL injection, command injection, frontend XSS, infrastructure, protobuf wire-format, Rust plugin, or pool-hijack issues were evident in the reviewed diff. The SQL added here uses sqlc parameters rather than interpolation. Generated by Codex Security Review | |
9b93a22 to
a5f761f
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a5f761f9eb
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
a5f761f to
95fa4f3
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 95fa4f3262
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
95fa4f3 to
2f1de48
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2f1de48a93
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
2f1de48 to
7cfa2ea
Compare
7cfa2ea to
e4b6b95
Compare
There was a problem hiding this comment.
Pull request overview
This PR completes the reparent performance/concurrency work by introducing bulk SQL query variants to eliminate N+1 write patterns across building/site/rack reparent flows, adding a deadlock-preventing rack lock pre-pass for AssignDevicesToRack, and simplifying the ProtoFleet building management save flow to rely on server-side two-pass ordering.
Changes:
- Added bulk sqlc query family for building→site and rack placement/site cascade updates, reducing per-item DB writes to constant-count bulk statements.
- Hardened
AssignDevicesToRackconcurrency by locking the union of source + target rack IDs in a single globally-sortedFOR UPDATEacquisition. - Collapsed
ManageBuildingModalsave into serialized dispatch + single in-building batch, relying on server-side pass1(clear)/pass2(place) semantics.
Reviewed changes
Copilot reviewed 19 out of 26 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| server/sqlc/queries/site.sql | Adds bulk cascade queries for racks/devices under multiple buildings. |
| server/sqlc/queries/device_set.sql | Adds bulk rack placement updates, bulk device site cascade, and LockRacksForReparent. |
| server/sqlc/queries/building.sql | Adds bulk pass-1 clear and pass-2 place queries for rack grid positions; adds bulk building→site update. |
| server/internal/handlers/sites/handler_test.go | Updates handler tests to expect bulk store calls for building/site and rack/site flows. |
| server/internal/handlers/deviceset/handler_test.go | Updates handler tests to include the new rack-lock pre-pass call ordering. |
| server/internal/handlers/buildings/handler_test.go | Updates handler tests to expect bulk rack placement + bulk position clear/place calls. |
| server/internal/domain/stores/sqlstores/site.go | Implements bulk site-store methods bridging to new sqlc queries. |
| server/internal/domain/stores/sqlstores/collection.go | Implements bulk rack placement and cascade helpers + rack lock pre-pass. |
| server/internal/domain/stores/sqlstores/building.go | Implements bulk clear/place wrappers, including argument length validation. |
| server/internal/domain/stores/interfaces/site.go | Extends SiteStore interface with bulk building/rack/device cascade methods. |
| server/internal/domain/stores/interfaces/collection.go | Extends CollectionStore interface with bulk placement/cascade + rack lock pre-pass. |
| server/internal/domain/stores/interfaces/building.go | Extends BuildingStore interface with bulk clear/place grid operations. |
| server/internal/domain/stores/interfaces/mocks/mock_site_store.go | Regenerates mocks to include new SiteStore bulk methods. |
| server/internal/domain/stores/interfaces/mocks/mock_collection_store.go | Regenerates mocks to include new CollectionStore bulk methods. |
| server/internal/domain/stores/interfaces/mocks/mock_building_store.go | Regenerates mocks to include new BuildingStore bulk methods. |
| server/internal/domain/sites/service.go | Refactors AssignBuildingsToSite and AssignRacksToSite to lock sequentially then write in bulk. |
| server/internal/domain/sites/service_test.go | Adds/updates service tests to assert single bulk write behavior and rollback semantics. |
| server/internal/domain/collection/service.go | Adds rack lock pre-pass to AssignDevicesToRack to prevent deadlocks under concurrent reparents. |
| server/internal/domain/collection/service_test.go | Adds tests pinning lock-order invariants and target-inclusion behavior. |
| server/internal/domain/buildings/service.go | Refactors AssignRacksToBuilding to “lock/read phase” then bulk writes + two-pass position updates. |
| server/internal/domain/buildings/service_test.go | Adds swap/mixed clear+place tests and large-batch “single bulk write” guards. |
| client/src/protoFleet/features/buildings/components/ManageBuildingModal/ManageBuildingModal.tsx | Simplifies save flow to a single in-building batch + serialized unassign-before-assign dispatch. |
| server/generated/sqlc/site.sql.go | Generated sqlc output for new site bulk queries (generated — skip). |
| server/generated/sqlc/device_set.sql.go | Generated sqlc output for new bulk placement/cascade + lock query (generated — skip). |
| server/generated/sqlc/building.sql.go | Generated sqlc output for new bulk building update + clear/place queries (generated — skip). |
| server/generated/sqlc/db.go | Generated statement prepare/close wiring for new sqlc queries (generated — skip). |
Files not reviewed (3)
- server/internal/domain/stores/interfaces/mocks/mock_building_store.go: Generated file
- server/internal/domain/stores/interfaces/mocks/mock_collection_store.go: Generated file
- server/internal/domain/stores/interfaces/mocks/mock_site_store.go: Generated file
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 752506aba1
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…ForReparent Adds bulk SQL queries used by the upcoming AssignRacksToBuilding, AssignBuildingsToSite, AssignRacksToSite refactors and the AssignDevicesToRack lock-pre-pass: - device_set.sql: UpdateRackPlacementBulkForBuilding, UpdateRackPlacementBulkForSite, CascadeRackDeviceSitesBulk, LockRacksForReparent (deadlock-safe via UNION + ORDER BY id). - building.sql: SetRackBuildingPositionBulkClear, SetRackBuildingPositionBulkPlace (parallel arrays + generate_subscripts), AssignBuildingsToSiteBulk. - site.sql: ReassignRacksUnderBuildingsBulk, ReassignDevicesUnderBuildingsBulk. Regenerated sqlc bindings and mocks. Adds store interface + sqlstore wrappers around the new queries. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Refactors AssignRacksToBuilding, AssignBuildingsToSite, and AssignRacksToSite to split into Phase A (sequential per-row locks in sorted order, deadlock-safe) and Phase B (bulk writes for the classified set). - AssignRacksToBuilding: 4 bulk writes (placement, cascade, position clear pass-1, position place pass-2). The explicit clear-before-place ordering preserves F5 swap-safety on the partial unique index. - AssignBuildingsToSite: 3 bulk writes (building.site_id, racks cascade, devices cascade). - AssignRacksToSite: 1 bulk placement + 1 bulk cascade, filtered to site-changed racks. Adds large-batch + swap-safety tests; updates handler test mock expectations for the new bulk signatures. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…nt deadlock
Locks source and target racks together in id-sorted order via the new
LockRacksForReparent SQL query (UNIONs the target rack into the lock
set). Two concurrent AssignDevicesToRack calls moving devices in
opposite directions between the same rack pair now acquire {1, 2} in
{1, 2} order regardless of which is source vs target — no deadlock.
LockRackPlacementForWrite stays on the target path because it also
locks the device_set_rack child row and reads site/building/zone; the
device_set re-acquire is a no-op since the same tx already holds it.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
handleSave now sends a single mixed batch (clears + places) per target building bucket. The server's AssignRacksToBuilding tx runs an internal two-pass write (clear-all then place-all) so the partial unique index uk_device_set_rack_building_position cannot collide on swaps or "move into occupied cell" cases inside one call. Removes the old client-side vacate-then-place split and its "retry to finish saving" partial-failure path. The unassign bucket (target_building_id=undefined) is dispatched sequentially BEFORE the in-building batch so the two concurrent calls cannot race the partial unique index on a cell that's about to be freed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
UpdateRackPlacementBulkForBuilding wrote zone = '' when a rack crossed buildings; the per-row UpdateRackPlacement passed an empty string through toNullString which collapses to SQL NULL. Because collection_sort.go orders racks by zone NULLS LAST, the empty-string result sorts as a real-but-empty zone instead of falling into the NULLS LAST bucket. Switch the bulk CASE to emit NULL explicitly. UpdateRackPlacementBulkForSite unconditionally cleared zone, but the old per-rack AssignRacksToSite path only cleared zone when the rack was actually leaving a building. Racks with building_id IS NULL surface as building_id=0 zone refs via ListRackZoneRefs, and the bulk path was silently wiping that user-curated metadata. Wrap the assignment in a CASE so zone clears (to NULL) only when the rack had a building. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Switch the bulk update from :exec to :execrows and have AssignRacksToBuilding compare the returned count against len(allRackIDs). On mismatch, return NotFound rather than silently dropping the unresolved ids. Defense-in-depth — the existing per-rack LockRackPlacementForWrite pre-pass already errors on missing or cross-org ids before any write runs. The count check locks the contract in case that pre-pass is ever refactored: an UPDATE that touches fewer rows than requested means one or more rack ids didn't resolve to a row in this org, and the caller deserves an error rather than a partial commit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Large floor-plan saves chunk the in-building rack list at RACKS_PER_RPC = 1000 to stay under the proto cap. The server's two-pass clear-then-place ordering only applies WITHIN one AssignRacksToBuilding RPC, so a >1000-rack save where chunk 1 places a rack into a cell chunk 2 hasn't vacated yet would trip uk_device_set_rack_building_position. Split the in-building bucket into vacate entries (no aisle/ position, racks staying in the building but clearing their cell) and place entries (with aisle/position). Dispatch all vacates from both buckets (unassign + in-building vacate) first, then dispatch all places. This preserves the global vacate-before-place invariant across chunk boundaries. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ntracts
LockRacksForReparent UNION semantics: existing tests already cover
the happy path with target>0 and clear-rack path with target=0.
Add two more pins:
* Zero target returns only source rack ids — the SQL UNION's
target arm is filtered by `target_rack_id > 0`, so the store
must return the source-only set and the service must not
dereference a target.
* Cross-org target id surfaces NotFound from the downstream
LockRackPlacementForWrite gate, not from a partial commit. The
empty slice from LockRacksForReparent alone must not let the
call slip past the lock pre-pass.
SetRackBuildingPositionBulkPlace length guard: the wrapper takes
three parallel arrays (rack ids, aisles, positions) and refuses
mismatched lengths before reaching sqlc. A new unit test exercises
both mismatch directions plus the empty-rackIDs short-circuit. The
length check fires before any DB call, so the test runs without a
postgres connection.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
752506a to
b677ef9
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b677ef9a8d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Replace `for i := 0; i < N; i++` loops with `for i := range N` in three bulk-write contract tests (intrange lint rule), and add `#nosec G115` comments on the two `int32(i / 10)` / `int32(i % 10)` narrow conversions in the buildings test — `i` is bounded by the loop's constant N=100, so the conversion can't overflow int32. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The old partition split the in-building bucket into vacate (no aisle/position) vs place (with aisle/position), but a rack moving from cell X to cell Y was only in the place bucket — its old cell didn't clear until the place chunk that owned it ran. A >1000-rack save where chunk 1 placed rack A at cell Y while rack B (currently at Y, moving to Z in chunk 2) hadn't been touched yet would trip uk_device_set_rack_building_position. Build the vacate set from the snapshot instead: any rack with a prior non-unplaced placement gets a synthetic pre-place vacate entry, dedup'd by rackId so we never send two clears for the same rack. The existing two-pass dispatch order (unassign → vacate → place) stays. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
LockRacksForReparent previously locked only device_set rows. A
concurrent SaveRack/DeleteCollection (which calls LockRackPlacementForWrite,
locking {device_set_rack, device_set}) could deadlock against an
AssignDevicesToRack tx holding {device_set} and waiting on
{device_set_rack}, with the other side waiting on the inverse.
Extend the query with a LEFT JOIN to device_set_rack and FOR UPDATE OF
ds, dsr so both code paths acquire the rows in the same order. The
wrapper signature (returning []int64 of rack ids) is unchanged — no
caller updates needed. Add an integration test pinning that the
wrapper still surfaces source + target ids (including a source rack
that has no device_set_rack row — the LEFT JOIN keeps it).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 83802331ab
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Postgres rejects FOR UPDATE on the nullable side of an outer join, so the LEFT JOIN + FOR UPDATE OF ds, dsr combination fails every AssignDevicesToRack request at runtime. Every live rack has a device_set_rack row by lifecycle invariant — LockRackPlacementForWrite also uses an inner join for the same reason — so switching LEFT to INNER preserves the cross-RPC lock-order parity without the SQL error. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0653ff357f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Switching LockRacksForReparent from LEFT to INNER JOIN excludes racks without a device_set_rack row from the lock set. The integration test fixture intentionally created an extension-less rack to pin the LEFT JOIN behavior; with the inner-join shape that rack now drops out of the result. Production racks always carry a device_set_rack row by lifecycle invariant, so the fixture is the only thing that needs to change: rackB now gets a CreateRackExtension call matching rackA and target, and the comment + assertion no longer reference the no-ext case. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7ea8e63392
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Same retry-unwrap issue as the earlier RemoveDevicesFromAnyRack / LockDevicesForReassign fix in PR 2. When LockRacksForReparent is the statement that hits 40P01 or 40001 under concurrent reparent/save traffic, %v hides the underlying *pgconn.PgError inside FleetError with no unwrap cause, so db.WithTransaction's IsRetryablePostgresError errors.As cannot reach it and the tx surfaces an internal error instead of retrying. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d58d8f3240
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
LockRackPlacementForWrite walks the per-rack join FROM device_set_rack dsr JOIN device_set ds. The LockRacksForReparent pre-pass was using the reverse order (FROM device_set ds JOIN device_set_rack dsr) with FOR UPDATE OF ds, dsr. Different table order means the planner can take the per-rack locks in opposite directions across the two code paths — concurrent SaveRack and AssignDevicesToRack on the same rack can hold one table while waiting on the other. Flip the FROM ordering and FOR UPDATE OF clause so the new pre-pass mirrors LockRackPlacementForWrite exactly. The deadlock-prevention guarantee now stands across the source/target lock set AND each rack's joined (dsr, ds) row pair. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The new bulk wrappers added in this PR (UpdateRackPlacementBulkForBuilding, UpdateRackPlacementBulkForSite, CascadeRackDeviceSitesBulk, SetRackBuildingPositionBulkClear/Place, AssignBuildingsToSiteBulk, ReassignRacksUnderBuildingsBulk, ReassignDevicesUnderBuildingsBulk) wrapped their pg errors with %v, dropping the unwrap cause. db.WithTransaction's retry loop uses errors.As(err, &pgErr) to detect 40P01 / 40001 — %v hid the PgError so the retry never fired. Swap to %w on every new bulk wrapper. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
DeleteSite uses LockBuildingsBySiteForWrite to row-lock every live building under a site before cascading. PR 4 established the deadlock-safe global lock-order contract (sorted ASC) for AssignBuildingsToSite / AssignRacksToBuilding bulk-lock paths. The existing query had no ORDER BY, so its lock acquisition order was plan-dependent and could disagree with the bulk-lock paths, allowing a deadlock against a concurrent AssignBuildingsToSite touching the same buildings. Add ORDER BY id ASC. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…re error ManageBuildingModal's handleSave dispatches chunked saves sequentially. If chunk N+1 fails after chunk N committed, earlier chunks are persisted but the modal still shows the operator's intended state. The generic error toast hid the partial-commit reality, so operators might dismiss the error instead of refreshing and retrying. Track a savedAtLeastOne flag inside dispatch; when set, surface an explicit "some changes saved before the error — refresh the building view to see the current state, then retry" message instead of the generic one. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The interface doc claimed zone clears for every rack in the bulk site update, but the actual SQL preserves zone when building_id IS NULL (matching the per-row UpdateRackPlacement semantics — zone is building-scoped). Update the comment to match reality so future callers don't assume zone is always wiped. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Part 4 of 4 — Bulk SQL + Concurrency Hardening
1. Summary
Converts the three hierarchy-reparent RPCs (
AssignRacksToBuilding,AssignBuildingsToSite,AssignRacksToSite) from per-row write loops to bulk SQL — collapsing 3N–5N writes per request down to a constant 4 or to N+2/N+3 — while preserving the deadlock-safe per-row lock acquisition the previous PRs in this series established. Also tightensAssignDevicesToRack's cross-RPC lock ordering so concurrent reparent calls between the same rack pair can no longer deadlock against each other or againstSaveRack/DeleteCollection. Plus a client-side change that letsManageBuildingModalsave a building's full rack layout (including cross-chunk swaps) in a single coherent operation.Stack
Diff is relative to
main(all three ancestors merged).Required context from upstream:
AssignDevicesToRack(also from refactor(reparent): atomic reparent foundation (1/4) #463) usesLockRacksForReparentas its lock pre-pass; this PR introduces that query and refines it later in the PR to lock the joineddevice_set_rackrows too.AssignBuildingsToSitecross-collection invariants (rack/device site cascade) andAssignRacksToSitepartial-update semantics (building auto-clear on cross-site moves) — both established by refactor(reparent): atomic reparent foundation (1/4) #463 — must be preserved exactly. The bulk SQL encodes those rules inCASEexpressions instead of per-row branches.ManageBuildingModal's save flow was collapsed to a single RPC in earlier review-fix passes; this PR layers chunking + mover-vacate on top of that single-call shape.Out of scope, lands elsewhere:
AssignDevicesToSiteforce-clear andAssignDevicesToRack(closes the40P01deadlock window completely; today it's recoverable via%wretry, mentioned in feat(sites): atomic cross-site reparent (2/4) #464). Filed as a follow-up.2. How it works
Hierarchy-reparent RPCs: from per-row writes to bulk SQL
Each of the three RPCs follows the same phase shape:
Phase A — sequential per-row lock acquisition (unchanged from #463).
The closure iterates the sorted id list and calls the per-row lock primitive (
LockRackPlacementForWrite/LockBuildingForWrite). Locks are taken one row at a time in ascending id order to keep the deadlock-safe global ordering. No writes happen in this phase — only locks + snapshot reads needed to populate activity-log metadata and to classify each row into "site changed" vs "no-op" buckets.Phase B — bulk writes after every lock is held.
With every row lock in hand, the writes collapse into 1–4 bulk SQL statements operating on the whole batch via
ANY($1::bigint[])predicates. The SQL encodes the per-row branching the old loop did (zone clear on building change, site-id preservation on building-only unassign, position clear on building transition) viaCASEexpressions, so the semantics match the per-row path exactly.The three RPCs differ only in which bulk statements run in Phase B:
AssignRacksToBuildingAssignBuildingsToSiteAssignBuildingsToSiteBulk; (2) bulk rack cascade; (3) bulk device cascade.AssignRacksToSiteUpdateRackPlacementBulkForSite(only racks whose site actually changes); (2) bulk site cascade for the same set.Per-rack
LockRackPlacementForWritestill runs N times in Phase A — that's the deadlock-safety contract. Only the writes are bulk.AssignRacksToBuilding's pass-1/pass-2 splitPass-1 NULLs the position for every rack in the batch via
SetRackBuildingPositionBulkClear. Pass-2 then writes the actual(aisle_index, position_in_aisle)for racks that supplied one viaSetRackBuildingPositionBulkPlace(UNNEST over parallel arrays). Splitting the writes into two passes lets a swap or move-into-occupied-cell request commit without tripping the partial unique indexuk_device_set_rack_building_position— every cell touched by the batch holds NULL before any real position is written.AssignDevicesToRacklock pre-pass — deadlock preventionAssignDevicesToRackis racy against itself: two concurrent invocations operating on overlapping device sets that move between the same two racks can deadlock if they acquire row locks in opposite order. The lock pre-pass —LockRacksForReparent— takesFOR UPDATElocks on the source racks and the target rack in a single SELECT, ordered by ascendingdevice_set.id. Both concurrent calls now lock{1, 2}in{1, 2}order regardless of which rack is source vs target.The query joins
device_set_rackand locks rows on both tables (FOR UPDATE OF ds, dsr) so the acquisition order matchesLockRackPlacementForWrite's join shape. This eliminates the cross-RPC deadlock withSaveRack/DeleteCollection, which enterLockRackPlacementForWritedirectly and would otherwise hold one table while waiting for the other.ManageBuildingModaltwo-pass chunked saveThe proto caps
AssignRacksToBuilding.racksat 1000 entries; a building can hold up to 100 × 100 = 10,000 cells. Large floor plans must chunk. But the server's pass-1/pass-2 ordering only applies within one RPC — across chunks, a placement in chunk 1 could collide with an un-vacated cell in chunk 2.The client now:
unassign(removed from this building),inBuildingVacate(synthetic clear-only entries for every previously-placed rack that's moving or unplacing),inBuildingPlace(entries with a newaisle_index/position_in_aisle).unassignfirst, then allinBuildingVacate, then allinBuildingPlace.Per-row error-shape preservation
The bulk SQL keeps the contract of the per-row path:
AssignBuildingsToSiteBulkreturns row count; mismatch surfaces asNotFound(a building id resolved to no row).UpdateRackPlacementBulkForBuildingis:execrowsso the service can compare returned rows againstlen(allRackIDs)and returnNotFoundon mismatch — defense-in-depth against cross-org / stale ids slipping past the per-rackLockRackPlacementForWritepre-pass.NULL(not empty string) socollection_sort.go'szone … NULLS LASTordering still works.building_id IS NULLon cross-site moves (the per-rack path only cleared zone when leaving/crossing a building).3. Diagrams
AssignRacksToBuilding— phase shapeflowchart TB subgraph PhaseA["Phase A: sequential per-rack locks"] L1["LockRackPlacementForWrite (rack 1)"] --> L2["LockRackPlacementForWrite (rack 2)"] L2 --> L3["..."] L3 --> Ln["LockRackPlacementForWrite (rack N)"] Ln --> Cl["Classify each rack:<br/>cascade vs no-op,<br/>position write needed"] end subgraph PhaseB["Phase B: 4 bulk writes"] B1["UpdateRackPlacementBulkForBuilding<br/>site/building/zone/position via CASE"] B2["CascadeRackDeviceSitesBulk<br/>for racks that changed site"] B3["SetRackBuildingPositionBulkClear<br/>NULL every rack's cell"] B4["SetRackBuildingPositionBulkPlace<br/>per-rack aisle/position via UNNEST"] B1 --> B2 --> B3 --> B4 end Cl --> B1 B4 --> EmitAudit["Activity log (post-commit)"]AssignDevicesToRacklock pre-passsequenceDiagram participant TxA as Tx A "device d1: rack 1 → rack 2" participant TxB as Tx B "device d1: rack 2 → rack 1" participant DB Note over TxA,TxB: Both calls hit the server simultaneously TxA->>DB: LockRacksForReparent<br/>(orgID, [d1], target=2) TxB->>DB: LockRacksForReparent<br/>(orgID, [d1], target=1) Note over DB: Single SELECT FOR UPDATE OF ds, dsr<br/>UNION (source rack ids) ∪ (target rack id)<br/>ORDER BY ds.id ASC DB-->>TxA: locks acquired on {ds[1], dsr[1], ds[2], dsr[2]} in order DB-->>TxB: waits (Tx A holds ds[1]) Note over TxA: proceeds to LockRackPlacementForWrite<br/>(re-acquires ds[2]/dsr[2] no-op),<br/>then writes TxA->>DB: COMMIT DB-->>TxB: locks released, Tx B retriesManageBuildingModalchunked save — three-pass dispatchflowchart LR Save["handleSave"] --> P["Partition entries"] P --> U["unassign[]<br/>(removed from this building)"] P --> V["inBuildingVacate[]<br/>(every previously-placed mover<br/>+ in-place unplace)"] P --> Pl["inBuildingPlace[]<br/>(entries with new aisle/position)"] U --> D1["Dispatch unassign chunks<br/>(1000 racks/RPC)"] D1 --> D2["Dispatch inBuildingVacate chunks"] D2 --> D3["Dispatch inBuildingPlace chunks"] D3 --> Done["all chunks committed"]Lock-order parity with
LockRackPlacementForWriteflowchart TB subgraph Before["Before: cross-RPC deadlock window"] A1["AssignDevicesToRack:<br/>LockRacksForReparent<br/>locks ds only"] A2["...then LockRackPlacementForWrite<br/>locks dsr + ds"] B1["SaveRack / DeleteCollection:<br/>LockRackPlacementForWrite<br/>locks dsr + ds"] A1 --> A2 A1 -.->|"holds ds, waits dsr"| B1 B1 -.->|"holds dsr, waits ds"| A1 end subgraph After["After: same lock order across paths"] C1["AssignDevicesToRack:<br/>LockRacksForReparent<br/>locks ds + dsr in one SELECT<br/>(FOR UPDATE OF ds, dsr)"] C2["...then LockRackPlacementForWrite<br/>re-acquires (no-op, same tx)"] D1["SaveRack / DeleteCollection:<br/>LockRackPlacementForWrite<br/>locks ds + dsr"] C1 --> C2 C1 -.->|"global ascending id order"| D1 end4. Areas of the code involved
server/sqlc/queries/device_set.sql(modified)UpdateRackPlacementBulkForBuilding,UpdateRackPlacementBulkForSite,CascadeRackDeviceSitesBulk,LockRacksForReparent.LockRacksForReparentLEFT JOINsdevice_set_rackand usesFOR UPDATE OF ds, dsrso the lock-acquisition order matchesLockRackPlacementForWrite. Zone clears emit SQLNULL(not'') and preserve zone whenbuilding_id IS NULL.CASEexpressions match the per-row path exactly, (b)FOR UPDATE OF ds, dsris correct, (c) NULL vs empty-string zone discipline.server/sqlc/queries/building.sql(modified)SetRackBuildingPositionBulkClear,SetRackBuildingPositionBulkPlace(UNNEST parallel arrays),AssignBuildingsToSiteBulk(:execrows).server/sqlc/queries/site.sql(modified)ReassignRacksUnderBuildingsBulk,ReassignDevicesUnderBuildingsBulk.server/generated/sqlc/{building,db,device_set,site}.sql.go(modified)server/internal/domain/stores/interfaces/{building,collection,site}.go(modified)server/internal/domain/stores/interfaces/mocks/mock_{building,collection,site}_store.go(modified)server/internal/domain/stores/sqlstores/collection.go(modified)CascadeRackDeviceSitesBulk,LockRacksForReparent,UpdateRackPlacementBulkForBuilding(now returns(int64, error)for the row-count check),UpdateRackPlacementBulkForSite.%wdiscipline from #464 is preserved.server/internal/domain/stores/sqlstores/building.go(modified)SetRackBuildingPositionBulkClear/Place,AssignBuildingsToSiteBulk. ThePlacewrapper guards on parallel-array length mismatch.server/internal/domain/stores/sqlstores/site.go(modified)ReassignRacksUnderBuildingsBulk,ReassignDevicesUnderBuildingsBulk.server/internal/domain/buildings/service.go(modified)AssignRacksToBuildingrewritten into Phase A (per-rack lock + classify) + Phase B (4 bulk writes). Row-count check onUpdateRackPlacementBulkForBuildingreturnsNotFoundon mismatch.cascadeRackIDs/positionedRackIDspopulated in Phase A so the post-commit audit event is identical to the per-row path.server/internal/domain/sites/service.go(modified)AssignBuildingsToSiteandAssignRacksToSite.AssignRacksToSitefilters to thechangedRackIDssubset before the bulk write (only racks whose site actually changes get touched).changedRackIDs, (b) audit-event population from Phase A.server/internal/domain/collection/service.go(modified)AssignDevicesToRackaddsLockRacksForReparentas the first tx operation.LockRackPlacementForWritestill runs (re-acquisition is a no-op in the same tx).server/internal/domain/{buildings,sites,collection}/service_test.go(modified)AssignRacksToBuilding;LockRacksForReparentmock-call-order tests for both target and no-target paths.server/internal/domain/stores/sqlstores/building_bulk_place_test.go(new)SetRackBuildingPositionBulkPlace— both mismatch directions + empty-rackIDs short-circuit.server/internal/domain/stores/sqlstores/collection_site_placement_integration_test.go(modified)TestLockRacksForReparent_ReturnsSourceAndTargetIDscovering target+sources case (including a source rack without adevice_set_rackrow to pin the LEFT JOIN), and thetarget_rack_id=0clear path.FOR UPDATE OF ds, dsrsemantics against the schema.server/internal/handlers/{sites,buildings,deviceset}/handler_test.go(modified)client/src/protoFleet/features/buildings/components/ManageBuildingModal/ManageBuildingModal.tsx(modified)handleSaverewritten: single in-building call shape, then layered with 1000-rack chunking, then a mover-vacate pass synthesizes clear entries for every previously-placed rack that's about to move, so cross-chunk swaps preserve the vacate-before-place invariant.initialPlacementRef, (b) the three-pass dispatch order is preserved across chunk boundaries.5. Key technical decisions & trade-offs
SELECT … FOR UPDATE) would risk deadlock against any concurrent caller touching an overlapping set in a different order; the deadlock-safety contract is sorted-ascending acquisition.CASEexpressions in bulk SQL over conditionalWHEN MATCHED-style row scoring. Postgres lacksMERGE's per-row predicate logic in pure-UPDATE, so the per-row branching from the loop (zone clear on building change, site preserve on no-target, position null on building transition) becomes a stack ofCASE WHEN … THEN … ELSE …per column. Trade-off: SQL is denser but tracks the source semantics line-for-line.generate_subscriptsfor the pass-2 place query over a multi-argUNNEST(a, b, c). sqlc's analyzer fails type resolution on multi-arg UNNEST; thegenerate_subscriptsform is the same plan and accepted by both engines.UpdateRackPlacementBulkForBuilding(:execrows+NotFoundon mismatch) over relying solely on the per-rackLockRackPlacementForWritepre-pass. Defense-in-depth: if the pre-pass is ever refactored away, the bulk write still rejects cross-org / stale ids instead of silently no-opping.LockRacksForReparentUNIONs source + target into a single sorted lock set rather than two separate locking SELECTs. One statement, one sort, one global lock order — both concurrent callers see the same order regardless of which rack each one treats as source vs target.LEFT JOIN device_set_rackin the lock pre-pass soFOR UPDATE OF ds, dsracquires both tables in the same orderLockRackPlacementForWritedoes. The LEFT JOIN handles the defensive case of a rack without adevice_set_rackrow (no rows lost from the lock set).initialPlacementRef) and can synthesize the vacate entries cheaply.NULL(viaCASE WHEN … THEN NULL ELSE …) rather than''. Thecollection_sort.gocursor relies onzone … NULLS LASTsemantics — empty-string zones would sort as a real-but-empty zone instead of falling into the trailing bucket the per-row path produced.building_id IS NULL(cross-site moves of racks with no building). The old per-rack path only cleared zone whencurrent.BuildingID != nil; the bulk SQL has to encode that conditional inCASEso unparented racks' zone metadata isn't silently wiped on cross-site moves.6. Testing & validation
Service-layer tests (across
buildings/service_test.go,sites/service_test.go,collection/service_test.go):TestAssignRacksToBuilding_largeBatchIssuesSingleBulkWrites(N=100) — asserts single bulk call regardless of N.TestAssignRacksToBuilding_swapsPositionsInSingleBatch,_mixedClearAndPlaceInSingleBatch— swap safety on the server.TestAssignBuildingsToSite_largeBatchIssuesSingleBulkWrites,TestAssignRacksToSite_largeBatchIssuesSingleBulkWrites— same shape for the other two RPCs.TestService_AssignDevicesToRack_locksIncludeTargetRack,_lockReparentZeroTargetReturnsOnlySources,_lockReparentCrossOrgTargetReturnsEmpty— pin theLockRacksForReparentUNION semantics.TestService_AssignDevicesToRack_crossSiteEmitsCascadeMetadata,_sameSiteSkipsCascadeMetadata— confirm the lock pre-pass doesn't disturb the audit event shape from refactor(device-set): group RPCs + remove generic collection membership (3/4) #466.Store-level tests:
TestSetRackBuildingPositionBulkPlace_RejectsLengthMismatch(building_bulk_place_test.go) — both mismatch directions + empty short-circuit.TestLockRacksForReparent_ReturnsSourceAndTargetIDs(collection_site_placement_integration_test.go, DB-backed) — exercises theFOR UPDATE OF ds, dsrshape against real Postgres including a source rack without adevice_set_rackrow to pin the LEFT JOIN behavior.Handler-layer tests (
server/internal/handlers/{sites,buildings,deviceset}/handler_test.go):Verification:
go build ./...— clean.golangci-lint run -c .golangci.yaml ./...— 0 issues.go test ./internal/domain/sites/... ./internal/domain/buildings/... ./internal/domain/collection/... ./internal/handlers/sites/... ./internal/handlers/buildings/... ./internal/handlers/deviceset/... -count=1— pass.go test ./internal/domain/stores/sqlstores/... -count=1(with DB env vars) — pass.npx tsc --noEmit(client) — clean.npm run lint— clean.Explicitly NOT covered:
FOR UPDATE OF ds, dsrshape is exercised by the integration test against a real DB, but a concurrent stress test that firesAssignDevicesToRack+SaveRackagainst the same rack pair in opposite orders would be the canonical deadlock-prevention proof. Filed as a follow-up.AssignDevicesToSiteforce-clear (from feat(sites): atomic cross-site reparent (2/4) #464) andAssignDevicesToRack—#464's%wretry safety net handles the residual deadlock window for that pair; eliminating the window entirely needs a coordinated change to both flows and is filed separately.Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com