Skip to content

Atomic reparent RPCs + naming normalization for fleet hierarchy#430

Closed
flesher wants to merge 10 commits into
mainfrom
issue-420
Closed

Atomic reparent RPCs + naming normalization for fleet hierarchy#430
flesher wants to merge 10 commits into
mainfrom
issue-420

Conversation

@flesher

@flesher flesher commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes the orphan window and rename race in the miner reparent flow (issue #420) and normalizes the parent-assignment surface across the fleet hierarchy.

Architectural picture. The fleet has three levels of parent-assignment (site, building, rack). Before this PR, each level had its own RPC verb, shape, and arity (some singular, one prefixed Reassign, one with an add+remove client-side orchestration). The matrix is now consistent: every parent-assignment is one Assign<Children>To<Parent> RPC, bulk-friendly (repeated child ids), and null-aware for unassignment (optional target_*_id unset → "Unassigned"). All five RPCs land their write inside a single transaction.

Orphan window + rename race. The miner→rack reparent flow previously orchestrated RemoveDevicesFromDeviceSet(source) followed by AddDevicesToDeviceSet(target) as two independent calls. A server error or network blip between them left miners removed from their source rack but never added to the target; concurrent rename of the source rack between the client's label→id resolution and the remove call caused the same partial-state outcome. The new AssignDevicesToRack RPC bundles both halves into one server-side transaction with rack identity resolved under lock, closing both holes; the client orchestration collapses from ~70 lines of remove-then-add looping to a single dispatch.

Partial-update rack→site reparent. SaveRack previously required a full round-trip of label / layout / members / slot assignments on every site change. The new AssignRacksToSite is a partial update that only touches site_id; building_id is auto-cleared on site transition (a building belongs to a single site) and the response reports cleared-building count so callers can prompt for re-assignment.

Non-goals / deferred

  • Miner→building reparent (AssignDevicesToBuilding + device.building_id column). Adding a direct FK for building requires five cascade invariants on existing RPCs (rack→building, rack→site, save rack, devices→rack, devices→site) to keep the new column in sync with the rack-derived view. The migration and cascade design exist on a working branch but are deferred to a follow-up PR to keep this one reviewable. Until then, miner→building goes through the existing rack-picker flow.
  • No production data migrations. This PR is server-code + proto only; no schema changes.
  • Naming normalization touches in-flight UI. PR feat(fleet): row-action ellipsis menus + ParentPickerModal re-parent #412's row-action modals had to be re-wired to the new RPC shapes; this PR carries those edits but the underlying UI design is unchanged.

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings June 11, 2026 16:52
@flesher flesher requested a review from a team as a code owner June 11, 2026 16:52
@github-actions github-actions Bot added javascript Pull requests that update javascript code client server shared labels Jun 11, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4f38d54b85

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread server/internal/domain/collection/service.go Outdated
@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown

🔐 Codex Security Review

Note: This is an automated security-focused code review generated by Codex.
It should be used as a supplementary check alongside human review.
False positives are possible - use your judgment.

Scope summary

  • Reviewed pull request diff only (3f14d374f5c67e14b70014c14d582c2c27273fee...c8bea373a1594f2529195c4780c60c0824547f80, exact PR three-dot diff)
  • Model: gpt-5.5

💡 Click "edited" above to see previous reviews for this PR.


Review Summary

Overall Risk: HIGH

Findings

[HIGH] Site reassignment can delete rack memberships without rack permission

  • Category: Auth
  • Location: server/internal/handlers/sites/handler.go:89
  • Description: AssignDevicesToSite is gated only by site:manage, but the new force_clear_conflicting_rack_membership path deletes rack memberships via RemoveDevicesFromAnyRack before applying the site move.
  • Impact: A caller with site-management permission but no rack-management permission can strip miners out of racks, including across site boundaries, causing unauthorized topology changes and rack-slot data loss.
  • Recommendation: Require rack:manage for the force-clear branch, scoped to the affected source racks/sites, or move the clear operation behind the rack-management API and keep this endpoint as site assignment only.

[HIGH] New bulk mutations bypass site-scoped authorization context

  • Category: Auth
  • Location: server/internal/handlers/sites/handler.go:104
  • Description: The new bulk site/building/rack mutation handlers use authz.ResourceContext{} instead of resolving the affected site IDs before RequirePermission. The same pattern exists in AssignRacksToBuilding and AssignDevicesToRack.
  • Impact: Site-scoped RBAC narrowing is not enforced for these mutations. A user with an org-level grant that is narrowed at a specific site can still mutate resources at that site through these new endpoints.
  • Recommendation: Resolve all affected source and target site IDs before mutation and require the relevant permission with ResourceContext{SiteID: &id} for each. Reject mixed batches if any scoped check fails.

[MEDIUM] Force-clear site moves can deadlock with rack reassignment

  • Category: Concurrency
  • Location: server/internal/domain/sites/service.go:358
  • Description: AssignDevicesToSite locks device rows, then deletes rack membership rows in the force-clear branch. AssignDevicesToRack does the opposite order: it locks rack/source membership state, deletes memberships, then updates device site IDs.
  • Impact: Concurrent site move and rack reassignment for the same miner can form a DB deadlock, causing one bulk operation to fail unpredictably.
  • Recommendation: Establish one lock order across both flows, then update both implementations to follow it. Add an integration test with two concurrent transactions covering force-clear site assignment versus rack reassignment.

[MEDIUM] Existing v1 RPCs are removed or renamed without compatibility shims

  • Category: Protobuf
  • Location: proto/collection/v1/collection.proto:14
  • Description: The diff removes/renames existing v1 RPCs such as AddDevicesToCollection, RemoveDevicesFromCollection, AddDevicesToDeviceSet, RemoveDevicesFromDeviceSet, AssignRackToBuilding, ReassignDevicesToSite, and AssignBuildingToSite.
  • Impact: Existing Connect clients and generated SDK users will get missing-method/unimplemented failures after upgrade, even if the replacement RPCs are functionally equivalent.
  • Recommendation: Keep deprecated RPC methods delegating to the new implementations until a versioned API break, or explicitly document this as an intentional breaking change and coordinate client rollout.

Notes

No new pool/wallet rewriting, shell command execution, plugin execution, migrations, or infrastructure changes were present in the reviewed diff. Also consider adding a server-side max item cap for AssignDevicesToRack device lists, since it currently relies on the common DeviceSelector list shape without a per-RPC bound.


Generated by Codex Security Review |
Triggered by: @flesher |
Review workflow run

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR standardizes fleet hierarchy “parent assignment” RPCs into consistent, bulk-friendly Assign<Children>To<Parent> shapes and closes the miner→rack orphan window by introducing an atomic server-side AssignDevicesToRack operation. It also adds a partial-update rack→site assignment flow and updates the ProtoFleet UI to use the new RPCs.

Changes:

  • Added atomic AssignDevicesToRack (server + client) to replace the prior client-side remove→add orchestration and eliminate partial-state orphaning.
  • Renamed/normalized site-level device + building assignment RPCs (AssignDevicesToSite, AssignBuildingsToSite) and added AssignRacksToSite.
  • Converted rack→building assignment to a bulk RPC (AssignRacksToBuilding) and rewired UI workflows to use the new request shapes.

Reviewed changes

Copilot reviewed 47 out of 59 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
server/sqlc/queries/site.sql Renames ReassignDevicesToSite query to AssignDevicesToSite and updates comments for renamed RPCs.
server/sqlc/queries/device_set.sql Adds RemoveDevicesFromAnyRack query to enable atomic rack reassignment.
server/internal/handlers/sites/translate.go Updates request→domain param translation for renamed/bulk site RPCs.
server/internal/handlers/sites/handler.go Renames device/site + building/site handlers and adds AssignRacksToSite handler.
server/internal/handlers/sites/handler_test.go Updates handler tests for renamed/bulk RPCs and new rack→site flow.
server/internal/handlers/sites/handler_stats_test.go Updates sites.NewService wiring after constructor signature change.
server/internal/handlers/middleware/rpc_permissions.go Updates procedure permission map for renamed/new RPC procedures.
server/internal/handlers/deviceset/handler.go Adds AssignDevicesToRack RPC handler.
server/internal/handlers/buildings/translate.go Updates translation for new bulk AssignRacksToBuilding request shape.
server/internal/handlers/buildings/handler.go Renames handler to AssignRacksToBuilding.
server/internal/handlers/buildings/handler_test.go Updates building handler tests for bulk assignment shape.
server/internal/domain/stores/sqlstores/site.go Renames store method to AssignDevicesToSite.
server/internal/domain/stores/sqlstores/site_invariant_integration_test.go Updates integration tests for renamed site assignment store call.
server/internal/domain/stores/sqlstores/collection.go Adds SQL-backed RemoveDevicesFromAnyRack store method.
server/internal/domain/stores/sqlstores/collection_site_placement_integration_test.go Updates integration tests for renamed site assignment store call(s).
server/internal/domain/stores/interfaces/site.go Renames interface method to AssignDevicesToSite and updates comments.
server/internal/domain/stores/interfaces/mocks/mock_site_store.go Regenerates mock to include AssignDevicesToSite and remove old method.
server/internal/domain/stores/interfaces/mocks/mock_collection_store.go Regenerates mock to include RemoveDevicesFromAnyRack.
server/internal/domain/stores/interfaces/collection.go Extends CollectionStore interface with RemoveDevicesFromAnyRack.
server/internal/domain/stores/interfaces/building.go Updates docs referencing renamed SiteService RPC.
server/internal/domain/sites/service.go Implements AssignBuildingsToSite (bulk) and adds AssignRacksToSite partial-update flow.
server/internal/domain/sites/service_test.go Updates unit tests for renamed/bulk site assignment flows + new rack→site behavior.
server/internal/domain/sites/service_stats_test.go Updates stats tests for sites.NewService signature change.
server/internal/domain/sites/models/models.go Renames/introduces models for bulk assignment flows + rack→site result/params.
server/internal/domain/fleetmanagement/collection_test.go Updates tests to use renamed AssignDevicesToSite store method.
server/internal/domain/collection/service.go Adds domain-level AssignDevicesToRack atomic reassignment behavior + activity logging.
server/internal/domain/collection/service_test.go Adds unit tests for atomic rack reassignment scenarios.
server/internal/domain/buildings/service.go Converts rack→building assignment to bulk AssignRacksToBuilding with validation + deterministic lock order.
server/internal/domain/buildings/service_test.go Updates tests for bulk rack→building assignment behavior.
server/internal/domain/buildings/service_stats_test.go Updates stats test comments referencing renamed SiteService RPC.
server/internal/domain/buildings/models/models.go Introduces bulk rack placement params and replaces singular assignment model.
server/generated/sqlc/site.sql.go Generated sqlc output for renamed AssignDevicesToSite query.
server/generated/sqlc/device_set.sql.go Generated sqlc output for RemoveDevicesFromAnyRack.
server/generated/sqlc/db.go Generated sqlc prepare/close wiring for new/renamed statements.
server/generated/grpc/sites/v1/sitesv1connect/sites.connect.go Generated Connect code for renamed/new site RPC procedures.
server/generated/grpc/device_set/v1/device_setv1connect/device_set.connect.go Generated Connect code for new AssignDevicesToRack RPC.
server/generated/grpc/buildings/v1/buildingsv1connect/buildings.connect.go Generated Connect code for bulk AssignRacksToBuilding RPC.
server/cmd/fleetd/main.go Updates production wiring for sites.NewService to pass collectionStore.
proto/sites/v1/sites.proto Renames ReassignDevicesToSiteAssignDevicesToSite, adds AssignBuildingsToSite + AssignRacksToSite.
proto/device_set/v1/device_set.proto Adds AssignDevicesToRack RPC and request/response messages.
proto/buildings/v1/buildings.proto Replaces singular rack→building RPC with bulk AssignRacksToBuilding + RackPlacement.
client/src/protoFleet/features/sites/hooks/useSiteModals.ts Updates inline TODO reference to renamed site assignment wrapper.
client/src/protoFleet/features/sites/hooks/useSiteModals.test.ts Updates mocks for renamed assignDevicesToSite API.
client/src/protoFleet/features/rackManagement/pages/RacksPage.tsx Rewires rack→building reparent UI to use bulk assignRacksToBuilding.
client/src/protoFleet/features/fleetManagement/pages/FleetBuildingsPage.tsx Rewires building→site reparent UI to use bulk assignBuildingsToSite.
client/src/protoFleet/features/fleetManagement/components/MinerActionsMenu/MinerReparentPicker.tsx Collapses miner→rack flow to single atomic assignDevicesToRack dispatch.
client/src/protoFleet/features/fleetManagement/components/FleetLayout/FleetLayout.test.tsx Updates sites API mock for renamed wrapper.
client/src/protoFleet/features/buildings/components/ManageBuildingModal/types.ts Updates comment to reference AssignRacksToBuilding.
client/src/protoFleet/features/buildings/components/ManageBuildingModal/ManageBuildingModal.tsx Batches rack placement/unplacement via bulk assignRacksToBuilding.
client/src/protoFleet/features/buildings/components/BuildingModals/BuildingModals.test.tsx Updates buildings API mock for renamed bulk wrapper.
client/src/protoFleet/api/useDeviceSets.ts Adds assignDevicesToRack wrapper for new atomic RPC.
client/src/protoFleet/api/sites.ts Renames site wrapper + adds bulk building/site + rack/site assignment wrappers.
client/src/protoFleet/api/generated/buildings/v1/buildings_pb.ts Generated TS protobuf types for bulk AssignRacksToBuilding.
client/src/protoFleet/api/buildings.ts Replaces singular rack→building wrapper with bulk assignRacksToBuilding + RackPlacementInput.

Comment thread server/internal/domain/collection/service.go Outdated
Comment thread server/internal/domain/sites/service.go
flesher and others added 8 commits June 15, 2026 12:14
Standardizes the parent-assignment RPC verb across the fleet hierarchy:
all hierarchical assignment RPCs now use Assign* (no Reassign/Set/Move
variants). No semantic change — the existing optional target_site_id
already encodes both initial assignment and re-parenting via the same
null-equals-unassigned convention.

Sets up the naming convention used by the new RPCs landing in
subsequent commits (AssignDevicesToRack, AssignDevicesToBuilding,
AssignRacksToSite).

Refs #420.
Promotes the building→site assignment RPC to a bulk shape so operators
can move multiple buildings between sites in one atomic transaction.
A single-building caller passes a one-element `building_ids` array;
the response aggregates cascade counts across every building.

The store-level operation stays singular (one UPDATE per building);
the service layer sorts by id for deterministic lock ordering then
loops inside a single tx — if any building fails, the whole batch
rolls back.

Updates the in-flight FleetBuildingsPage row-action call site (#412)
to pass the new array shape.

Refs #420.
…ilding

Promotes the rack→building assignment RPC to a bulk shape so the
ManageBuildingModal save flow (which previously fired N concurrent
singular calls) can land an entire grid layout in two atomic batches
— vacate first, then place — instead of N independent transactions.
Single-rack callers (RacksPage row action) pass a one-element array.

Request shape: repeated RackPlacement { rack_id, optional
aisle_index, optional position_in_aisle } + optional
target_building_id. Each entry carries its own placement so a single
call can mix placed and cleared positions.

Service-layer behavior preserved per rack — building lock once,
canonical lock order (building → rack id ascending), per-rack zone
clear / cascade / position write. If any rack fails, the whole tx
rolls back.

Activity-log payload now carries rack_ids[] + aggregate cascade
counts; consumers reading the singular rack_id metadata key will need
to migrate (no existing consumers in tree).

Refs #420.
Closes the orphan window in the miner reparent flow. Previously the
client orchestrated removeDevicesFromDeviceSet(source) +
addDevicesToDeviceSet(target) as two independent RPCs; a server error
or network blip between them left miners removed from their source
rack but never added to the target. The cross-rack rename race
(source rack renamed mid-confirm) had the same failure shape via the
client-side label→id resolution.

The new RPC bundles both halves into one server-side transaction:
remove from any current rack (single SQL DELETE on
device_set_membership filtered by device_set_type='rack'), insert
into target rack, cascade device.site_id when the target rack's site
differs. target_rack_id unset clears rack membership without
re-assigning (site/building stay intact).

MinerReparentPicker's rack path collapses from ~50 lines of
remove-then-add orchestration + listRacks label resolution to a
single dispatch. Site reparent path is unchanged.

Server: new sqlc query RemoveDevicesFromAnyRack, new
CollectionStore.RemoveDevicesFromAnyRack, new
collection.Service.AssignDevicesToRack with 4 unit tests covering
happy path, unassign, target-must-be-rack, and empty-input
rejection.

Refs #420.
Adds the missing rack→site reparent RPC called out in #420. SaveRack
was a full-replace requiring the rack's label, layout, members, and
slot assignments to round-trip on every site change; the new RPC is a
partial update — only site_id changes.

Behavior:
  - target_site_id unset → racks move to "Unassigned"
  - building_id is auto-cleared on any site transition (a building
    belongs to one site, so the move invalidates building membership);
    the response carries the count of racks whose building was cleared
    so the UI can prompt the operator to pick a building in the new
    site
  - same transaction cascades device.site_id for every rack member
  - lock order: target site → each rack id ascending (matches
    AssignBuildingsToSite + AssignDevicesToSite)
  - bulk-friendly: pass repeated rack_ids; the batch rolls back on any
    per-rack failure

Plumbs collectionStore through sites.Service so the new method can
reuse LockRackPlacementForWrite + UpdateRackPlacement +
CascadeRackDeviceSites instead of duplicating the rack-placement
write path.

Server: 4 handler tests (cascade + clear building, target-unset
unassign, same-site no-op, bulk aggregation with sorted lock order),
2 service tests (empty rejection, nil-store guard). Client:
useSites().assignRacksToSite wrapper.

Refs #420.
CI's generated-code check failed because the local regen on the
naming-normalization commits used protoc-gen-es 2.11.0 (a sibling
worktree's cached node_modules) instead of the pinned 2.12.0. The
2.12.0 output annotates optional Timestamp fields as `Timestamp |
undefined` rather than `Timestamp?`. Also fixes a prettier ternary
formatting issue prettier --write applied to MinerReparentPicker.tsx
during CI's gen pass that didn't run locally.

No semantic change.
… family

Cross-cutting review fixes for the atomic reparent PR.

Orphan-window closure:
- AssignDevicesToSite: optional force_clear_conflicting_rack_membership
  flag clears rack memberships in the same tx; MinerReparentPicker
  collapses to a single call
- AssignDevicesToRack: source-rack FOR UPDATE lock pre-pass serializes
  concurrent overlapping reparent calls; LockRackPlacementForWrite
  reordered before GetCollection so activity-log labels read under lock
- RemoveDevicesFromAnyRack: target-rack exclusion predicate prevents
  cascade-delete of rack_slot rows on no-op reassigns

Reparent surface cleanup:
- AssignDevicesToRack switched to DeviceSelector
- AddDevicesToGroup / RemoveDevicesFromGroup new RPCs replace the
  generic collection/v1 device-membership pair; type-checked at the
  service boundary, atomic clear-then-add for racks
- collection/v1 AddDevicesToCollection / RemoveDevicesFromCollection
  RPCs and Service methods removed; fleetimport migrated to the
  type-specific methods
- DeviceSetService added to reflectEnabledServices

Bulk SQL batch family (N+1 elimination):
- AssignRacksToBuilding: 5N -> 4 writes (constant)
- AssignBuildingsToSite: 4N -> N+3
- AssignRacksToSite: 3N -> N+2
- Sequential sorted-order locks preserved (deadlock invariant)

ManageBuildingModal collapse:
- handleSave two-phase vacate+place replaced by a single RPC
- Server-side two-pass ordering (clear-all then place-all in one tx)
  handles position swaps without the partial-failure window

Frontend reliability:
- MinerReparentPicker: AbortSignal threading, Promise wrapping, double-
  invoke guard, fire-and-forget RPCs awaited
- useDeviceSets.assignDevicesToRack: AbortSignal support
- useSiteModals.detailsSaveEdit: functional setState removes stale-
  closure guard race
- ManageBuildingModal handleSave: savingRef double-invoke guard

Validation + prior-comment hygiene:
- AssignBuildingsToSite / AssignRacksToSite reject TargetSiteID == 0
- AssignRacksToBuilding rejects duplicate rack_ids
- ScopeCount uses len(DeviceIdentifiers) instead of double-counting
- Single-rack building-only unassign captures the rack's current SiteID

Test coverage:
- AssignRacksToSite service-layer tests (cascade, no-op, rollback)
- AssignDevicesToRack handler tests (perm, happy, unassign, source-rack
  lock ordering)
- AssignRacksToBuilding rollback + empty-rejected + duplicate-rejected
- AssignBuildingsToSite empty-rejected
- DeviceSelector variant rejection
- AddDevicesToGroup / RemoveDevicesFromGroup rack-target rejection
- Bulk-write large-batch stress tests across three RPCs

Handler hygiene:
- deviceset/translate.go (new) extracts toAssignDevicesToRackParams
- sites/translate.go adds toAssignRacksToSiteParams

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Picks up main's curtailment automation query registrations alongside
the branch's new bulk SQL queries.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 767a22b171

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread server/internal/domain/sites/service.go Outdated
Comment thread server/sqlc/queries/device_set.sql Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2ced5b8211

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread server/internal/domain/collection/service.go Outdated
LockSourceRacksForDevices: Postgres rejects DISTINCT + FOR UPDATE.
Rewrite with subquery-derived distinct ids and outer FOR UPDATE
SELECT keyed on device_set.id. Every AssignDevicesToRack call was
failing at runtime in prod; mock-only tests didn't catch it.

AssignDevicesToSite force-clear: partition conflicts into clearable
(DEVICE_IN_RACK_AT_OTHER_SITE) vs residual. Abort BEFORE any
RemoveDevicesFromAnyRack deletion when residual non-clearable
conflicts (e.g. DEVICE_NOT_FOUND) remain. When clearable: pass
only the conflicting identifiers so devices already at the target
site keep their rack/rack_slot rows.

ManageBuildingModal.handleSave: serialize unassign before in-
building dispatch. F5's two-pass clear-then-place ordering only
applies within one AssignRacksToBuilding call. With concurrent
calls, the in-building call could hit the partial unique index
before the unassign committed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 211beb85b4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

}
out.SiteReassignedDeviceCount = count
cascadeRan = true
out.SiteReassignedDeviceCount += count

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reset rack-building counts on tx retry

Because out, cascadeRackIDs, and positionedRackIDs are declared outside the RunInTx closure, a retryable transaction failure after this cascade can leave the rolled-back attempt's state in memory and then add the committed attempt's count again. SQLTransactor.RunInTxWithResult delegates to db.WithTransaction, which reruns closures on retryable PostgreSQL/deadlock errors, so a rack move that cascades device sites can return/log an inflated site_reassigned_device_count even though only one transaction committed; keep these tallies in per-attempt locals and copy them out only after the successful attempt.

Useful? React with 👍 / 👎.

changedRackIDs = append(changedRackIDs, rackID)
cascadedRackIDs = append(cascadedRackIDs, rackID)
if current.BuildingID != nil {
clearedCount++

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reset rack-site counts on tx retry

This counter is outside the RunInTx closure, alongside deviceCount and cascadedRackIDs, so any retryable failure after a rack has been classified as site-changing can make the next transaction attempt reuse the rolled-back attempt's counts/metadata. Since the production transactor reruns the closure on retryable PostgreSQL/deadlock errors, AssignRacksToSite can report/log too many cleared buildings or reassigned devices after a retry; accumulate into tx-local variables and publish them only from the successful attempt.

Useful? React with 👍 / 👎.

Codex P2 review: excluding the target rack from the sorted source-
lock query and then locking it separately via LockRackPlacementForWrite
produces inconsistent global lock order. Two concurrent calls moving
devices in opposite directions between the same rack pair can deadlock
(Tx A holds rack 2 waits on rack 1; Tx B holds rack 1 waits on rack 2).

Renamed LockSourceRacksForDevices → LockRacksForReparent and changed
semantics: the query now UNIONs the target rack id into the lock set
when present, so both txs lock {1, 2} in {1, 2} order regardless of
which is source vs target. The membership DELETE in RemoveDevicesFromAnyRack
still excludes the target via its own predicate — this query is purely
about lock acquisition order.

LockRackPlacementForWrite stays on the target path because it also
locks the device_set_rack child row; the parent-row re-acquire is a
no-op since the same tx already holds it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@flesher

flesher commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Closing in favor of the 4 split PRs for easier review:

Dependency graph:

#463 (foundation)
 ├── #464 (cross-site atomic)
 │    └── #466 (group + cleanup — depends on #464 removing the last legacy caller)
 └── #468 (bulk SQL + concurrency — parallel to #464/#466)

PRs 2-4 will rebase off main once PR 1 lands. PRs 3 and 4 are parallel-mergeable after PR 2 lands.

@flesher flesher closed this Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

client javascript Pull requests that update javascript code server shared

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants