Skip to content

perf(reparent): bulk SQL + concurrency hardening (4/4)#468

Merged
flesher merged 19 commits into
mainfrom
perf/bulk-sql-concurrency-4of4
Jun 17, 2026
Merged

perf(reparent): bulk SQL + concurrency hardening (4/4)#468
flesher merged 19 commits into
mainfrom
perf/bulk-sql-concurrency-4of4

Conversation

@flesher

@flesher flesher commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Part 4 of 4 — Bulk SQL + Concurrency Hardening

1. Summary

Converts the three hierarchy-reparent RPCs (AssignRacksToBuilding, AssignBuildingsToSite, AssignRacksToSite) from per-row write loops to bulk SQL — collapsing 3N–5N writes per request down to a constant 4 or to N+2/N+3 — while preserving the deadlock-safe per-row lock acquisition the previous PRs in this series established. Also tightens AssignDevicesToRack's cross-RPC lock ordering so concurrent reparent calls between the same rack pair can no longer deadlock against each other or against SaveRack/DeleteCollection. Plus a client-side change that lets ManageBuildingModal save a building's full rack layout (including cross-chunk swaps) in a single coherent operation.

Stack

PR Status
#463 — atomic reparent foundation (1/4) merged
#464 — atomic cross-site reparent (2/4) merged
#466 — group RPCs + remove generic membership (3/4) merged
#468 — bulk SQL + concurrency hardening (4/4) this PR

Diff is relative to main (all three ancestors merged).

Required context from upstream:

  • The three RPCs being refactored were introduced by refactor(reparent): atomic reparent foundation (1/4) #463 with sequential per-row lock acquisition followed by per-row writes. This PR replaces the writes; the lock-acquisition phase stays.
  • AssignDevicesToRack (also from refactor(reparent): atomic reparent foundation (1/4) #463) uses LockRacksForReparent as its lock pre-pass; this PR introduces that query and refines it later in the PR to lock the joined device_set_rack rows too.
  • AssignBuildingsToSite cross-collection invariants (rack/device site cascade) and AssignRacksToSite partial-update semantics (building auto-clear on cross-site moves) — both established by refactor(reparent): atomic reparent foundation (1/4) #463 — must be preserved exactly. The bulk SQL encodes those rules in CASE expressions instead of per-row branches.
  • ManageBuildingModal's save flow was collapsed to a single RPC in earlier review-fix passes; this PR layers chunking + mover-vacate on top of that single-call shape.

Out of scope, lands elsewhere:

  • Lock-order alignment between AssignDevicesToSite force-clear and AssignDevicesToRack (closes the 40P01 deadlock window completely; today it's recoverable via %w retry, mentioned in feat(sites): atomic cross-site reparent (2/4) #464). Filed as a follow-up.
  • Real-Postgres concurrency tests for the cross-RPC deadlock paths. Service-layer unit tests pin the call shapes via mocks; a follow-up DB-backed test would lock the SQL semantics.

2. How it works

Hierarchy-reparent RPCs: from per-row writes to bulk SQL

Each of the three RPCs follows the same phase shape:

Phase A — sequential per-row lock acquisition (unchanged from #463).
The closure iterates the sorted id list and calls the per-row lock primitive (LockRackPlacementForWrite / LockBuildingForWrite). Locks are taken one row at a time in ascending id order to keep the deadlock-safe global ordering. No writes happen in this phase — only locks + snapshot reads needed to populate activity-log metadata and to classify each row into "site changed" vs "no-op" buckets.

Phase B — bulk writes after every lock is held.
With every row lock in hand, the writes collapse into 1–4 bulk SQL statements operating on the whole batch via ANY($1::bigint[]) predicates. The SQL encodes the per-row branching the old loop did (zone clear on building change, site-id preservation on building-only unassign, position clear on building transition) via CASE expressions, so the semantics match the per-row path exactly.

The three RPCs differ only in which bulk statements run in Phase B:

RPC Phase B writes Before After
AssignRacksToBuilding (1) bulk placement update — site/building/zone/position via CASE; (2) bulk site cascade for racks whose site changed; (3) bulk pass-1 vacate (position → NULL for every rack); (4) bulk pass-2 place (per-rack aisle/position via UNNEST). up to 5N writes 4 writes (constant)
AssignBuildingsToSite (1) bulk AssignBuildingsToSiteBulk; (2) bulk rack cascade; (3) bulk device cascade. 4N writes N+3
AssignRacksToSite (1) bulk UpdateRackPlacementBulkForSite (only racks whose site actually changes); (2) bulk site cascade for the same set. 3N writes N+2

Per-rack LockRackPlacementForWrite still runs N times in Phase A — that's the deadlock-safety contract. Only the writes are bulk.

AssignRacksToBuilding's pass-1/pass-2 split

Pass-1 NULLs the position for every rack in the batch via SetRackBuildingPositionBulkClear. Pass-2 then writes the actual (aisle_index, position_in_aisle) for racks that supplied one via SetRackBuildingPositionBulkPlace (UNNEST over parallel arrays). Splitting the writes into two passes lets a swap or move-into-occupied-cell request commit without tripping the partial unique index uk_device_set_rack_building_position — every cell touched by the batch holds NULL before any real position is written.

AssignDevicesToRack lock pre-pass — deadlock prevention

AssignDevicesToRack is racy against itself: two concurrent invocations operating on overlapping device sets that move between the same two racks can deadlock if they acquire row locks in opposite order. The lock pre-pass — LockRacksForReparent — takes FOR UPDATE locks on the source racks and the target rack in a single SELECT, ordered by ascending device_set.id. Both concurrent calls now lock {1, 2} in {1, 2} order regardless of which rack is source vs target.

The query joins device_set_rack and locks rows on both tables (FOR UPDATE OF ds, dsr) so the acquisition order matches LockRackPlacementForWrite's join shape. This eliminates the cross-RPC deadlock with SaveRack / DeleteCollection, which enter LockRackPlacementForWrite directly and would otherwise hold one table while waiting for the other.

ManageBuildingModal two-pass chunked save

The proto caps AssignRacksToBuilding.racks at 1000 entries; a building can hold up to 100 × 100 = 10,000 cells. Large floor plans must chunk. But the server's pass-1/pass-2 ordering only applies within one RPC — across chunks, a placement in chunk 1 could collide with an un-vacated cell in chunk 2.

The client now:

  1. Partitions the rack changes into three buckets: unassign (removed from this building), inBuildingVacate (synthetic clear-only entries for every previously-placed rack that's moving or unplacing), inBuildingPlace (entries with a new aisle_index/position_in_aisle).
  2. Dispatches the three buckets sequentially, each chunked into 1000-rack RPCs: all unassign first, then all inBuildingVacate, then all inBuildingPlace.
  3. The vacate buckets together vacate every cell that any place chunk will later try to claim, so the partial unique index can't trip across chunk boundaries.

Per-row error-shape preservation

The bulk SQL keeps the contract of the per-row path:

  • AssignBuildingsToSiteBulk returns row count; mismatch surfaces as NotFound (a building id resolved to no row).
  • UpdateRackPlacementBulkForBuilding is :execrows so the service can compare returned rows against len(allRackIDs) and return NotFound on mismatch — defense-in-depth against cross-org / stale ids slipping past the per-rack LockRackPlacementForWrite pre-pass.
  • Zone clears emit SQL NULL (not empty string) so collection_sort.go's zone … NULLS LAST ordering still works.
  • Zone is preserved when building_id IS NULL on cross-site moves (the per-rack path only cleared zone when leaving/crossing a building).

3. Diagrams

AssignRacksToBuilding — phase shape

flowchart TB
    subgraph PhaseA["Phase A: sequential per-rack locks"]
        L1["LockRackPlacementForWrite (rack 1)"] --> L2["LockRackPlacementForWrite (rack 2)"]
        L2 --> L3["..."]
        L3 --> Ln["LockRackPlacementForWrite (rack N)"]
        Ln --> Cl["Classify each rack:<br/>cascade vs no-op,<br/>position write needed"]
    end
    subgraph PhaseB["Phase B: 4 bulk writes"]
        B1["UpdateRackPlacementBulkForBuilding<br/>site/building/zone/position via CASE"]
        B2["CascadeRackDeviceSitesBulk<br/>for racks that changed site"]
        B3["SetRackBuildingPositionBulkClear<br/>NULL every rack's cell"]
        B4["SetRackBuildingPositionBulkPlace<br/>per-rack aisle/position via UNNEST"]
        B1 --> B2 --> B3 --> B4
    end
    Cl --> B1
    B4 --> EmitAudit["Activity log (post-commit)"]
Loading

AssignDevicesToRack lock pre-pass

sequenceDiagram
    participant TxA as Tx A "device d1: rack 1 → rack 2"
    participant TxB as Tx B "device d1: rack 2 → rack 1"
    participant DB

    Note over TxA,TxB: Both calls hit the server simultaneously
    TxA->>DB: LockRacksForReparent<br/>(orgID, [d1], target=2)
    TxB->>DB: LockRacksForReparent<br/>(orgID, [d1], target=1)
    Note over DB: Single SELECT FOR UPDATE OF ds, dsr<br/>UNION (source rack ids) ∪ (target rack id)<br/>ORDER BY ds.id ASC
    DB-->>TxA: locks acquired on {ds[1], dsr[1], ds[2], dsr[2]} in order
    DB-->>TxB: waits (Tx A holds ds[1])
    Note over TxA: proceeds to LockRackPlacementForWrite<br/>(re-acquires ds[2]/dsr[2] no-op),<br/>then writes
    TxA->>DB: COMMIT
    DB-->>TxB: locks released, Tx B retries
Loading

ManageBuildingModal chunked save — three-pass dispatch

flowchart LR
    Save["handleSave"] --> P["Partition entries"]
    P --> U["unassign[]<br/>(removed from this building)"]
    P --> V["inBuildingVacate[]<br/>(every previously-placed mover<br/>+ in-place unplace)"]
    P --> Pl["inBuildingPlace[]<br/>(entries with new aisle/position)"]
    U --> D1["Dispatch unassign chunks<br/>(1000 racks/RPC)"]
    D1 --> D2["Dispatch inBuildingVacate chunks"]
    D2 --> D3["Dispatch inBuildingPlace chunks"]
    D3 --> Done["all chunks committed"]
Loading

Lock-order parity with LockRackPlacementForWrite

flowchart TB
    subgraph Before["Before: cross-RPC deadlock window"]
        A1["AssignDevicesToRack:<br/>LockRacksForReparent<br/>locks ds only"]
        A2["...then LockRackPlacementForWrite<br/>locks dsr + ds"]
        B1["SaveRack / DeleteCollection:<br/>LockRackPlacementForWrite<br/>locks dsr + ds"]
        A1 --> A2
        A1 -.->|"holds ds, waits dsr"| B1
        B1 -.->|"holds dsr, waits ds"| A1
    end
    subgraph After["After: same lock order across paths"]
        C1["AssignDevicesToRack:<br/>LockRacksForReparent<br/>locks ds + dsr in one SELECT<br/>(FOR UPDATE OF ds, dsr)"]
        C2["...then LockRackPlacementForWrite<br/>re-acquires (no-op, same tx)"]
        D1["SaveRack / DeleteCollection:<br/>LockRackPlacementForWrite<br/>locks ds + dsr"]
        C1 --> C2
        C1 -.->|"global ascending id order"| D1
    end
Loading

4. Areas of the code involved

Area What changed Why it matters for review
server/sqlc/queries/device_set.sql (modified) Added UpdateRackPlacementBulkForBuilding, UpdateRackPlacementBulkForSite, CascadeRackDeviceSitesBulk, LockRacksForReparent. LockRacksForReparent LEFT JOINs device_set_rack and uses FOR UPDATE OF ds, dsr so the lock-acquisition order matches LockRackPlacementForWrite. Zone clears emit SQL NULL (not '') and preserve zone when building_id IS NULL. Core SQL semantics. Reviewer should verify (a) the CASE expressions match the per-row path exactly, (b) FOR UPDATE OF ds, dsr is correct, (c) NULL vs empty-string zone discipline.
server/sqlc/queries/building.sql (modified) Added SetRackBuildingPositionBulkClear, SetRackBuildingPositionBulkPlace (UNNEST parallel arrays), AssignBuildingsToSiteBulk (:execrows). Pass-1/pass-2 split for swap safety; row-count return for the row-count check.
server/sqlc/queries/site.sql (modified) Added ReassignRacksUnderBuildingsBulk, ReassignDevicesUnderBuildingsBulk. Site-cascade fan-out collapsed to two bulks.
server/generated/sqlc/{building,db,device_set,site}.sql.go (modified) sqlc regen. Generated — skip.
server/internal/domain/stores/interfaces/{building,collection,site}.go (modified) Added interface methods for every new sqlc query. Type-check surface for the bulk wrappers.
server/internal/domain/stores/interfaces/mocks/mock_{building,collection,site}_store.go (modified) gomock regen. Generated — skip.
server/internal/domain/stores/sqlstores/collection.go (modified) Wrappers for CascadeRackDeviceSitesBulk, LockRacksForReparent, UpdateRackPlacementBulkForBuilding (now returns (int64, error) for the row-count check), UpdateRackPlacementBulkForSite. Thin wrappers; the wrap-as-%w discipline from #464 is preserved.
server/internal/domain/stores/sqlstores/building.go (modified) Wrappers for SetRackBuildingPositionBulkClear/Place, AssignBuildingsToSiteBulk. The Place wrapper guards on parallel-array length mismatch. Defensive guard against caller bugs.
server/internal/domain/stores/sqlstores/site.go (modified) Wrappers for ReassignRacksUnderBuildingsBulk, ReassignDevicesUnderBuildingsBulk. Thin wrappers.
server/internal/domain/buildings/service.go (modified) AssignRacksToBuilding rewritten into Phase A (per-rack lock + classify) + Phase B (4 bulk writes). Row-count check on UpdateRackPlacementBulkForBuilding returns NotFound on mismatch. Core refactor. Reviewer focus: (a) lock acquisition still sequential in sorted order, (b) cascadeRackIDs / positionedRackIDs populated in Phase A so the post-commit audit event is identical to the per-row path.
server/internal/domain/sites/service.go (modified) Same Phase A / Phase B refactor for AssignBuildingsToSite and AssignRacksToSite. AssignRacksToSite filters to the changedRackIDs subset before the bulk write (only racks whose site actually changes get touched). Reviewer focus: (a) the filter that produces changedRackIDs, (b) audit-event population from Phase A.
server/internal/domain/collection/service.go (modified) AssignDevicesToRack adds LockRacksForReparent as the first tx operation. LockRackPlacementForWrite still runs (re-acquisition is a no-op in the same tx). Lock pre-pass insertion; concurrent-reparent deadlock prevention.
server/internal/domain/{buildings,sites,collection}/service_test.go (modified) Service-level tests: stress tests (N=100) asserting single-bulk-call shape; swap-safety tests for AssignRacksToBuilding; LockRacksForReparent mock-call-order tests for both target and no-target paths. Locks the call shape against future regressions.
server/internal/domain/stores/sqlstores/building_bulk_place_test.go (new) Pins the parallel-array length guard on SetRackBuildingPositionBulkPlace — both mismatch directions + empty-rackIDs short-circuit. Defense-in-depth on the wrapper guard.
server/internal/domain/stores/sqlstores/collection_site_placement_integration_test.go (modified) DB-backed TestLockRacksForReparent_ReturnsSourceAndTargetIDs covering target+sources case (including a source rack without a device_set_rack row to pin the LEFT JOIN), and the target_rack_id=0 clear path. Locks FOR UPDATE OF ds, dsr semantics against the schema.
server/internal/handlers/{sites,buildings,deviceset}/handler_test.go (modified) Updated mock-call expectations to match the new bulk shape. Mirror updates; no production logic change.
client/src/protoFleet/features/buildings/components/ManageBuildingModal/ManageBuildingModal.tsx (modified) handleSave rewritten: single in-building call shape, then layered with 1000-rack chunking, then a mover-vacate pass synthesizes clear entries for every previously-placed rack that's about to move, so cross-chunk swaps preserve the vacate-before-place invariant. Behavioral parity. Reviewer focus: (a) the partition correctly identifies movers via initialPlacementRef, (b) the three-pass dispatch order is preserved across chunk boundaries.

5. Key technical decisions & trade-offs

  • Phase A keeps sequential per-row locks instead of bulk-acquiring locks. Bulk lock acquisition (SELECT … FOR UPDATE) would risk deadlock against any concurrent caller touching an overlapping set in a different order; the deadlock-safety contract is sorted-ascending acquisition.
  • CASE expressions in bulk SQL over conditional WHEN MATCHED-style row scoring. Postgres lacks MERGE's per-row predicate logic in pure-UPDATE, so the per-row branching from the loop (zone clear on building change, site preserve on no-target, position null on building transition) becomes a stack of CASE WHEN … THEN … ELSE … per column. Trade-off: SQL is denser but tracks the source semantics line-for-line.
  • UNNEST + generate_subscripts for the pass-2 place query over a multi-arg UNNEST(a, b, c). sqlc's analyzer fails type resolution on multi-arg UNNEST; the generate_subscripts form is the same plan and accepted by both engines.
  • Row-count check on UpdateRackPlacementBulkForBuilding (:execrows + NotFound on mismatch) over relying solely on the per-rack LockRackPlacementForWrite pre-pass. Defense-in-depth: if the pre-pass is ever refactored away, the bulk write still rejects cross-org / stale ids instead of silently no-opping.
  • LockRacksForReparent UNIONs source + target into a single sorted lock set rather than two separate locking SELECTs. One statement, one sort, one global lock order — both concurrent callers see the same order regardless of which rack each one treats as source vs target.
  • LEFT JOIN device_set_rack in the lock pre-pass so FOR UPDATE OF ds, dsr acquires both tables in the same order LockRackPlacementForWrite does. The LEFT JOIN handles the defensive case of a rack without a device_set_rack row (no rows lost from the lock set).
  • Client-side mover vacate over server-side cross-call coordination. The server could (in theory) accept a multi-RPC "transaction" via a session token, but that's a much bigger contract change. The client knows which racks were previously placed (via initialPlacementRef) and can synthesize the vacate entries cheaply.
  • Zone clears emit SQL NULL (via CASE WHEN … THEN NULL ELSE …) rather than ''. The collection_sort.go cursor relies on zone … NULLS LAST semantics — empty-string zones would sort as a real-but-empty zone instead of falling into the trailing bucket the per-row path produced.
  • Bulk path preserves zone when building_id IS NULL (cross-site moves of racks with no building). The old per-rack path only cleared zone when current.BuildingID != nil; the bulk SQL has to encode that conditional in CASE so unparented racks' zone metadata isn't silently wiped on cross-site moves.

6. Testing & validation

Service-layer tests (across buildings/service_test.go, sites/service_test.go, collection/service_test.go):

  • TestAssignRacksToBuilding_largeBatchIssuesSingleBulkWrites (N=100) — asserts single bulk call regardless of N.
  • TestAssignRacksToBuilding_swapsPositionsInSingleBatch, _mixedClearAndPlaceInSingleBatch — swap safety on the server.
  • TestAssignBuildingsToSite_largeBatchIssuesSingleBulkWrites, TestAssignRacksToSite_largeBatchIssuesSingleBulkWrites — same shape for the other two RPCs.
  • TestService_AssignDevicesToRack_locksIncludeTargetRack, _lockReparentZeroTargetReturnsOnlySources, _lockReparentCrossOrgTargetReturnsEmpty — pin the LockRacksForReparent UNION semantics.
  • TestService_AssignDevicesToRack_crossSiteEmitsCascadeMetadata, _sameSiteSkipsCascadeMetadata — confirm the lock pre-pass doesn't disturb the audit event shape from refactor(device-set): group RPCs + remove generic collection membership (3/4) #466.

Store-level tests:

  • TestSetRackBuildingPositionBulkPlace_RejectsLengthMismatch (building_bulk_place_test.go) — both mismatch directions + empty short-circuit.
  • TestLockRacksForReparent_ReturnsSourceAndTargetIDs (collection_site_placement_integration_test.go, DB-backed) — exercises the FOR UPDATE OF ds, dsr shape against real Postgres including a source rack without a device_set_rack row to pin the LEFT JOIN behavior.

Handler-layer tests (server/internal/handlers/{sites,buildings,deviceset}/handler_test.go):

  • Existing handler tests updated to expect the new bulk mock signatures. The 3 refactored RPCs retain their existing permission gates (no authz changes in this PR).

Verification:

  • go build ./... — clean.
  • golangci-lint run -c .golangci.yaml ./... — 0 issues.
  • go test ./internal/domain/sites/... ./internal/domain/buildings/... ./internal/domain/collection/... ./internal/handlers/sites/... ./internal/handlers/buildings/... ./internal/handlers/deviceset/... -count=1 — pass.
  • go test ./internal/domain/stores/sqlstores/... -count=1 (with DB env vars) — pass.
  • npx tsc --noEmit (client) — clean.
  • npm run lint — clean.

Explicitly NOT covered:

  • Cross-RPC deadlock under real Postgres load — the new FOR UPDATE OF ds, dsr shape is exercised by the integration test against a real DB, but a concurrent stress test that fires AssignDevicesToRack + SaveRack against the same rack pair in opposite orders would be the canonical deadlock-prevention proof. Filed as a follow-up.
  • Lock-order alignment between AssignDevicesToSite force-clear (from feat(sites): atomic cross-site reparent (2/4) #464) and AssignDevicesToRack#464's %w retry safety net handles the residual deadlock window for that pair; eliminating the window entirely needs a coordinated change to both flows and is filed separately.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

@flesher flesher requested a review from a team as a code owner June 15, 2026 21:22
@github-actions github-actions Bot added javascript Pull requests that update javascript code client server labels Jun 15, 2026
@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown

🔐 Codex Security Review

Note: This is an automated security-focused code review generated by Codex.
It should be used as a supplementary check alongside human review.
False positives are possible - use your judgment.

Scope summary

  • Reviewed pull request diff only (d34b3cd7fc43747cf82d7170d3bd4e0b5756b8c9...4b99931b6ea0fb148a189f0cd708816340386151, exact PR three-dot diff)
  • Model: gpt-5.5

💡 Click "edited" above to see previous reviews for this PR.


Review Summary

Overall Risk: MEDIUM

Findings

[MEDIUM] Rack reparent lock does not serialize same-device moves

  • Category: Concurrency | Reliability
  • Location: server/sqlc/queries/device_set.sql:344
  • Description: The new LockRacksForReparent pre-pass locks only the source/target rack rows derived from the current membership snapshot. It does not lock the requested device rows or the device_set_membership rows that determine the source set. For two concurrent assignments of the same unassigned device to different racks, the lock sets share no row, so both transactions can reach AddDevicesToCollection; one then hits the partial unique index that allows a device in only one rack. A similar stale-source race can happen when one transaction computes its source set before another commits a reparent.
  • Impact: Concurrent rack assignment can still surface as an internal unique-constraint failure instead of serializing cleanly or returning a deterministic application error. The transaction should roll back, so this is primarily reliability/API correctness rather than data corruption.
  • Recommendation: Serialize by device as well as by rack. For example, lock the selected live device rows with FOR UPDATE before deriving source racks, then compute and lock source+target racks from that stable state. Alternatively use per-device advisory locks or explicitly handle the rack-membership partial unique violation with a retry/reload path.

Notes

No auth, SQL injection, command injection, frontend XSS, infrastructure, protobuf wire-format, Rust plugin, or pool-hijack issues were evident in the reviewed diff. The SQL added here uses sqlc parameters rather than interpolation.


Generated by Codex Security Review |
Triggered by: @flesher |
Review workflow run

@flesher flesher force-pushed the perf/bulk-sql-concurrency-4of4 branch 2 times, most recently from 9b93a22 to a5f761f Compare June 15, 2026 21:50

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a5f761f9eb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread server/sqlc/queries/device_set.sql Outdated
@flesher flesher force-pushed the perf/bulk-sql-concurrency-4of4 branch from a5f761f to 95fa4f3 Compare June 15, 2026 22:03

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 95fa4f3262

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@flesher flesher force-pushed the perf/bulk-sql-concurrency-4of4 branch from 95fa4f3 to 2f1de48 Compare June 15, 2026 22:24

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2f1de48a93

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread server/sqlc/queries/device_set.sql Outdated
Base automatically changed from refactor/reparent-foundation-1of4 to main June 16, 2026 00:01
@flesher flesher force-pushed the perf/bulk-sql-concurrency-4of4 branch from 2f1de48 to 7cfa2ea Compare June 16, 2026 00:03
@flesher flesher force-pushed the perf/bulk-sql-concurrency-4of4 branch from 7cfa2ea to e4b6b95 Compare June 16, 2026 18:57
Copilot AI review requested due to automatic review settings June 16, 2026 18:57

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR completes the reparent performance/concurrency work by introducing bulk SQL query variants to eliminate N+1 write patterns across building/site/rack reparent flows, adding a deadlock-preventing rack lock pre-pass for AssignDevicesToRack, and simplifying the ProtoFleet building management save flow to rely on server-side two-pass ordering.

Changes:

  • Added bulk sqlc query family for building→site and rack placement/site cascade updates, reducing per-item DB writes to constant-count bulk statements.
  • Hardened AssignDevicesToRack concurrency by locking the union of source + target rack IDs in a single globally-sorted FOR UPDATE acquisition.
  • Collapsed ManageBuildingModal save into serialized dispatch + single in-building batch, relying on server-side pass1(clear)/pass2(place) semantics.

Reviewed changes

Copilot reviewed 19 out of 26 changed files in this pull request and generated no comments.

Show a summary per file
File Description
server/sqlc/queries/site.sql Adds bulk cascade queries for racks/devices under multiple buildings.
server/sqlc/queries/device_set.sql Adds bulk rack placement updates, bulk device site cascade, and LockRacksForReparent.
server/sqlc/queries/building.sql Adds bulk pass-1 clear and pass-2 place queries for rack grid positions; adds bulk building→site update.
server/internal/handlers/sites/handler_test.go Updates handler tests to expect bulk store calls for building/site and rack/site flows.
server/internal/handlers/deviceset/handler_test.go Updates handler tests to include the new rack-lock pre-pass call ordering.
server/internal/handlers/buildings/handler_test.go Updates handler tests to expect bulk rack placement + bulk position clear/place calls.
server/internal/domain/stores/sqlstores/site.go Implements bulk site-store methods bridging to new sqlc queries.
server/internal/domain/stores/sqlstores/collection.go Implements bulk rack placement and cascade helpers + rack lock pre-pass.
server/internal/domain/stores/sqlstores/building.go Implements bulk clear/place wrappers, including argument length validation.
server/internal/domain/stores/interfaces/site.go Extends SiteStore interface with bulk building/rack/device cascade methods.
server/internal/domain/stores/interfaces/collection.go Extends CollectionStore interface with bulk placement/cascade + rack lock pre-pass.
server/internal/domain/stores/interfaces/building.go Extends BuildingStore interface with bulk clear/place grid operations.
server/internal/domain/stores/interfaces/mocks/mock_site_store.go Regenerates mocks to include new SiteStore bulk methods.
server/internal/domain/stores/interfaces/mocks/mock_collection_store.go Regenerates mocks to include new CollectionStore bulk methods.
server/internal/domain/stores/interfaces/mocks/mock_building_store.go Regenerates mocks to include new BuildingStore bulk methods.
server/internal/domain/sites/service.go Refactors AssignBuildingsToSite and AssignRacksToSite to lock sequentially then write in bulk.
server/internal/domain/sites/service_test.go Adds/updates service tests to assert single bulk write behavior and rollback semantics.
server/internal/domain/collection/service.go Adds rack lock pre-pass to AssignDevicesToRack to prevent deadlocks under concurrent reparents.
server/internal/domain/collection/service_test.go Adds tests pinning lock-order invariants and target-inclusion behavior.
server/internal/domain/buildings/service.go Refactors AssignRacksToBuilding to “lock/read phase” then bulk writes + two-pass position updates.
server/internal/domain/buildings/service_test.go Adds swap/mixed clear+place tests and large-batch “single bulk write” guards.
client/src/protoFleet/features/buildings/components/ManageBuildingModal/ManageBuildingModal.tsx Simplifies save flow to a single in-building batch + serialized unassign-before-assign dispatch.
server/generated/sqlc/site.sql.go Generated sqlc output for new site bulk queries (generated — skip).
server/generated/sqlc/device_set.sql.go Generated sqlc output for new bulk placement/cascade + lock query (generated — skip).
server/generated/sqlc/building.sql.go Generated sqlc output for new bulk building update + clear/place queries (generated — skip).
server/generated/sqlc/db.go Generated statement prepare/close wiring for new sqlc queries (generated — skip).
Files not reviewed (3)
  • server/internal/domain/stores/interfaces/mocks/mock_building_store.go: Generated file
  • server/internal/domain/stores/interfaces/mocks/mock_collection_store.go: Generated file
  • server/internal/domain/stores/interfaces/mocks/mock_site_store.go: Generated file

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 752506aba1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

flesher and others added 7 commits June 16, 2026 15:37
…ForReparent

Adds bulk SQL queries used by the upcoming AssignRacksToBuilding,
AssignBuildingsToSite, AssignRacksToSite refactors and the
AssignDevicesToRack lock-pre-pass:

- device_set.sql: UpdateRackPlacementBulkForBuilding,
  UpdateRackPlacementBulkForSite, CascadeRackDeviceSitesBulk,
  LockRacksForReparent (deadlock-safe via UNION + ORDER BY id).
- building.sql: SetRackBuildingPositionBulkClear,
  SetRackBuildingPositionBulkPlace (parallel arrays +
  generate_subscripts), AssignBuildingsToSiteBulk.
- site.sql: ReassignRacksUnderBuildingsBulk,
  ReassignDevicesUnderBuildingsBulk.

Regenerated sqlc bindings and mocks. Adds store interface + sqlstore
wrappers around the new queries.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Refactors AssignRacksToBuilding, AssignBuildingsToSite, and
AssignRacksToSite to split into Phase A (sequential per-row locks in
sorted order, deadlock-safe) and Phase B (bulk writes for the
classified set).

- AssignRacksToBuilding: 4 bulk writes (placement, cascade, position
  clear pass-1, position place pass-2). The explicit clear-before-place
  ordering preserves F5 swap-safety on the partial unique index.
- AssignBuildingsToSite: 3 bulk writes (building.site_id, racks
  cascade, devices cascade).
- AssignRacksToSite: 1 bulk placement + 1 bulk cascade, filtered to
  site-changed racks.

Adds large-batch + swap-safety tests; updates handler test mock
expectations for the new bulk signatures.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…nt deadlock

Locks source and target racks together in id-sorted order via the new
LockRacksForReparent SQL query (UNIONs the target rack into the lock
set). Two concurrent AssignDevicesToRack calls moving devices in
opposite directions between the same rack pair now acquire {1, 2} in
{1, 2} order regardless of which is source vs target — no deadlock.

LockRackPlacementForWrite stays on the target path because it also
locks the device_set_rack child row and reads site/building/zone; the
device_set re-acquire is a no-op since the same tx already holds it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
handleSave now sends a single mixed batch (clears + places) per target
building bucket. The server's AssignRacksToBuilding tx runs an internal
two-pass write (clear-all then place-all) so the partial unique index
uk_device_set_rack_building_position cannot collide on swaps or
"move into occupied cell" cases inside one call. Removes the old
client-side vacate-then-place split and its "retry to finish saving"
partial-failure path.

The unassign bucket (target_building_id=undefined) is dispatched
sequentially BEFORE the in-building batch so the two concurrent calls
cannot race the partial unique index on a cell that's about to be
freed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
UpdateRackPlacementBulkForBuilding wrote zone = '' when a rack
crossed buildings; the per-row UpdateRackPlacement passed an empty
string through toNullString which collapses to SQL NULL. Because
collection_sort.go orders racks by zone NULLS LAST, the empty-string
result sorts as a real-but-empty zone instead of falling into the
NULLS LAST bucket. Switch the bulk CASE to emit NULL explicitly.

UpdateRackPlacementBulkForSite unconditionally cleared zone, but
the old per-rack AssignRacksToSite path only cleared zone when the
rack was actually leaving a building. Racks with building_id IS
NULL surface as building_id=0 zone refs via ListRackZoneRefs, and
the bulk path was silently wiping that user-curated metadata. Wrap
the assignment in a CASE so zone clears (to NULL) only when the
rack had a building.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Switch the bulk update from :exec to :execrows and have
AssignRacksToBuilding compare the returned count against
len(allRackIDs). On mismatch, return NotFound rather than silently
dropping the unresolved ids.

Defense-in-depth — the existing per-rack
LockRackPlacementForWrite pre-pass already errors on missing or
cross-org ids before any write runs. The count check locks the
contract in case that pre-pass is ever refactored: an UPDATE that
touches fewer rows than requested means one or more rack ids didn't
resolve to a row in this org, and the caller deserves an error
rather than a partial commit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Large floor-plan saves chunk the in-building rack list at
RACKS_PER_RPC = 1000 to stay under the proto cap. The server's
two-pass clear-then-place ordering only applies WITHIN one
AssignRacksToBuilding RPC, so a >1000-rack save where chunk 1
places a rack into a cell chunk 2 hasn't vacated yet would trip
uk_device_set_rack_building_position.

Split the in-building bucket into vacate entries (no aisle/
position, racks staying in the building but clearing their cell)
and place entries (with aisle/position). Dispatch all vacates from
both buckets (unassign + in-building vacate) first, then dispatch
all places. This preserves the global vacate-before-place
invariant across chunk boundaries.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ntracts

LockRacksForReparent UNION semantics: existing tests already cover
the happy path with target>0 and clear-rack path with target=0.
Add two more pins:

  * Zero target returns only source rack ids — the SQL UNION's
    target arm is filtered by `target_rack_id > 0`, so the store
    must return the source-only set and the service must not
    dereference a target.
  * Cross-org target id surfaces NotFound from the downstream
    LockRackPlacementForWrite gate, not from a partial commit. The
    empty slice from LockRacksForReparent alone must not let the
    call slip past the lock pre-pass.

SetRackBuildingPositionBulkPlace length guard: the wrapper takes
three parallel arrays (rack ids, aisles, positions) and refuses
mismatched lengths before reaching sqlc. A new unit test exercises
both mismatch directions plus the empty-rackIDs short-circuit. The
length check fires before any DB call, so the test runs without a
postgres connection.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@flesher flesher force-pushed the perf/bulk-sql-concurrency-4of4 branch from 752506a to b677ef9 Compare June 16, 2026 22:40

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b677ef9a8d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread server/internal/domain/collection/service.go
flesher and others added 3 commits June 16, 2026 17:38
Replace `for i := 0; i < N; i++` loops with `for i := range N` in three
bulk-write contract tests (intrange lint rule), and add `#nosec G115`
comments on the two `int32(i / 10)` / `int32(i % 10)` narrow conversions
in the buildings test — `i` is bounded by the loop's constant N=100, so
the conversion can't overflow int32.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The old partition split the in-building bucket into vacate (no
aisle/position) vs place (with aisle/position), but a rack moving from
cell X to cell Y was only in the place bucket — its old cell didn't
clear until the place chunk that owned it ran. A >1000-rack save where
chunk 1 placed rack A at cell Y while rack B (currently at Y, moving to
Z in chunk 2) hadn't been touched yet would trip
uk_device_set_rack_building_position.

Build the vacate set from the snapshot instead: any rack with a prior
non-unplaced placement gets a synthetic pre-place vacate entry,
dedup'd by rackId so we never send two clears for the same rack. The
existing two-pass dispatch order (unassign → vacate → place) stays.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
LockRacksForReparent previously locked only device_set rows. A
concurrent SaveRack/DeleteCollection (which calls LockRackPlacementForWrite,
locking {device_set_rack, device_set}) could deadlock against an
AssignDevicesToRack tx holding {device_set} and waiting on
{device_set_rack}, with the other side waiting on the inverse.

Extend the query with a LEFT JOIN to device_set_rack and FOR UPDATE OF
ds, dsr so both code paths acquire the rows in the same order. The
wrapper signature (returning []int64 of rack ids) is unchanged — no
caller updates needed. Add an integration test pinning that the
wrapper still surfaces source + target ids (including a source rack
that has no device_set_rack row — the LEFT JOIN keeps it).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 83802331ab

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread server/sqlc/queries/device_set.sql Outdated
Postgres rejects FOR UPDATE on the nullable side of an outer join, so
the LEFT JOIN + FOR UPDATE OF ds, dsr combination fails every
AssignDevicesToRack request at runtime. Every live rack has a
device_set_rack row by lifecycle invariant — LockRackPlacementForWrite
also uses an inner join for the same reason — so switching LEFT to
INNER preserves the cross-RPC lock-order parity without the SQL error.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0653ff357f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Switching LockRacksForReparent from LEFT to INNER JOIN excludes racks
without a device_set_rack row from the lock set. The integration test
fixture intentionally created an extension-less rack to pin the LEFT
JOIN behavior; with the inner-join shape that rack now drops out of
the result. Production racks always carry a device_set_rack row by
lifecycle invariant, so the fixture is the only thing that needs to
change: rackB now gets a CreateRackExtension call matching rackA and
target, and the comment + assertion no longer reference the no-ext
case.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7ea8e63392

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread server/internal/domain/stores/sqlstores/collection.go Outdated
Same retry-unwrap issue as the earlier RemoveDevicesFromAnyRack /
LockDevicesForReassign fix in PR 2. When LockRacksForReparent is the
statement that hits 40P01 or 40001 under concurrent reparent/save
traffic, %v hides the underlying *pgconn.PgError inside FleetError
with no unwrap cause, so db.WithTransaction's IsRetryablePostgresError
errors.As cannot reach it and the tx surfaces an internal error
instead of retrying.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d58d8f3240

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread server/sqlc/queries/device_set.sql Outdated
LockRackPlacementForWrite walks the per-rack join FROM device_set_rack
dsr JOIN device_set ds. The LockRacksForReparent pre-pass was using
the reverse order (FROM device_set ds JOIN device_set_rack dsr) with
FOR UPDATE OF ds, dsr. Different table order means the planner can
take the per-rack locks in opposite directions across the two code
paths — concurrent SaveRack and AssignDevicesToRack on the same rack
can hold one table while waiting on the other.

Flip the FROM ordering and FOR UPDATE OF clause so the new pre-pass
mirrors LockRackPlacementForWrite exactly. The deadlock-prevention
guarantee now stands across the source/target lock set AND each
rack's joined (dsr, ds) row pair.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Comment thread server/internal/domain/stores/sqlstores/collection.go Outdated
Comment thread server/internal/domain/sites/service.go
Comment thread server/internal/domain/stores/interfaces/collection.go
flesher and others added 4 commits June 17, 2026 11:05
The new bulk wrappers added in this PR (UpdateRackPlacementBulkForBuilding,
UpdateRackPlacementBulkForSite, CascadeRackDeviceSitesBulk,
SetRackBuildingPositionBulkClear/Place, AssignBuildingsToSiteBulk,
ReassignRacksUnderBuildingsBulk, ReassignDevicesUnderBuildingsBulk) wrapped
their pg errors with %v, dropping the unwrap cause. db.WithTransaction's
retry loop uses errors.As(err, &pgErr) to detect 40P01 / 40001 — %v hid
the PgError so the retry never fired. Swap to %w on every new bulk wrapper.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
DeleteSite uses LockBuildingsBySiteForWrite to row-lock every live building
under a site before cascading. PR 4 established the deadlock-safe global
lock-order contract (sorted ASC) for AssignBuildingsToSite /
AssignRacksToBuilding bulk-lock paths. The existing query had no ORDER BY,
so its lock acquisition order was plan-dependent and could disagree with
the bulk-lock paths, allowing a deadlock against a concurrent
AssignBuildingsToSite touching the same buildings. Add ORDER BY id ASC.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…re error

ManageBuildingModal's handleSave dispatches chunked saves sequentially. If
chunk N+1 fails after chunk N committed, earlier chunks are persisted but
the modal still shows the operator's intended state. The generic error
toast hid the partial-commit reality, so operators might dismiss the error
instead of refreshing and retrying. Track a savedAtLeastOne flag inside
dispatch; when set, surface an explicit "some changes saved before the
error — refresh the building view to see the current state, then retry"
message instead of the generic one.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The interface doc claimed zone clears for every rack in the bulk site
update, but the actual SQL preserves zone when building_id IS NULL
(matching the per-row UpdateRackPlacement semantics — zone is
building-scoped). Update the comment to match reality so future callers
don't assume zone is always wiped.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@flesher flesher enabled auto-merge (squash) June 17, 2026 18:34
@flesher flesher merged commit 299dae6 into main Jun 17, 2026
214 of 220 checks passed
@flesher flesher deleted the perf/bulk-sql-concurrency-4of4 branch June 17, 2026 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

client javascript Pull requests that update javascript code server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants