refactor: gracefully abort restic tasks on shutdown by nicotsx · Pull Request #947 · nicotsx/zerobyte

nicotsx · 2026-06-05T15:38:36Z

Refactored the repo mutex to be the owner of the abort signal. On a shutdown signal, a simple repoMutex.shutdown() cancels all the in-flight restic operations leaving no orphaned locks behind

Summary by CodeRabbit

Release Notes

New Features
- Added graceful shutdown handling for process termination signals (SIGTERM/SIGINT).
Improvements
- Enhanced support for cancelling long-running operations (backups, restores, and repository operations).
- Improved cleanup and lifecycle management during shutdown and operation termination.

coderabbitai · 2026-06-05T15:38:44Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: eb052770-132c-4cb9-9706-c9b629562bc1

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

nicotsx · 2026-06-05T15:38:49Z

refactor: gracefully abort restic tasks on shutdown #947 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5860e2bab4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

coderabbitai

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

app/server/modules/lifecycle/shutdown.ts (1)

12-26: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

The local-agent unmount guard is inverted.

This only tries to unmount LOCAL_AGENT_ID volumes when enableLocalAgent is false, so the normal local-agent shutdown path skips the cleanup entirely.

Suggested fix

 const unmountVolumes = async () => {
-	if (!config.flags.enableLocalAgent) {
-		const volumes = await db.query.volumesTable.findMany({
-			where: { AND: [{ status: "mounted" }, { agentId: LOCAL_AGENT_ID }] },
-		});
-
-		for (const volume of volumes) {
-			const { status, error } = await withContext({ organizationId: volume.organizationId }, () =>
-				volumeService.unmountVolume(volume.shortId, { persistStatus: false }),
-			).catch((error) => ({ status: "error" as const, error: toMessage(error) }));
-
-			logger.info(`Volume ${volume.name} unmount status: ${status}${error ? `, error: ${error}` : ""}`);
-		}
+	if (!config.flags.enableLocalAgent) {
+		return;
+	}
+
+	const volumes = await db.query.volumesTable.findMany({
+		where: { AND: [{ status: "mounted" }, { agentId: LOCAL_AGENT_ID }] },
+	});
+
+	for (const volume of volumes) {
+		const { status, error } = await withContext({ organizationId: volume.organizationId }, () =>
+			volumeService.unmountVolume(volume.shortId, { persistStatus: false }),
+		).catch((error) => ({ status: "error" as const, error: toMessage(error) }));
+
+		logger.info(`Volume ${volume.name} unmount status: ${status}${error ? `, error: ${error}` : ""}`);
 	}
 };

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/server/modules/lifecycle/shutdown.ts` around lines 12 - 26, The guard in
unmountVolumes is inverted: it currently runs the LOCAL_AGENT_ID unmount logic
only when config.flags.enableLocalAgent is false, skipping cleanup on normal
local-agent shutdown; change the condition to run the findMany/unmount loop when
enableLocalAgent is true (i.e., remove or invert the negation) so volumes from
db.query.volumesTable.findMany filtered by { agentId: LOCAL_AGENT_ID, status:
"mounted" } are iterated and each unmounted via withContext(..., () =>
volumeService.unmountVolume(volume.shortId, { persistStatus: false })) with the
same error handling and logger call preserved.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/server/core/repository-mutex.ts`:
- Around line 148-153: runWithLease should avoid invoking the operation if the
lease's abort signal is already aborted: check lease.signal.aborted immediately
after acquiring the lease and before calling operation({ signal: lease.signal
}); if aborted, still call lease.release() and propagate the abort (e.g. reject
with an AbortError/DOMException or rethrow the signal.reason) instead of
invoking RepositoryOperation; update the runWithLease implementation
(referencing runWithLease, RepositoryLease, RepositoryOperation, lease.signal,
lease.release) to perform this pre-check and proper abort propagation.

In `@app/server/modules/backups/backups.service.ts`:
- Around line 454-514: The catch path for repoMutex.runShared currently treats
shutdown rejections as failures because it only checks
abortController.signal.aborted in one place; update the catch handling (the
block surrounding the repoMutex.runShared call and the similar block at 515-526)
so that if abortController.signal.aborted (or the rejection is an AbortError)
you call handleBackupCancellation(scheduleId, ctx.organizationId, <optional
message>) and taskStore.cancel(task.id, message) and set domainHandlerCompleted
= true, otherwise continue to call handleBackupFailure/taskStore.fail; reference
repoMutex.runShared, abortController.signal, handleBackupCancellation,
handleBackupFailure, taskStore.cancel, and taskStore.fail and apply the same
change to both occurrences.

In `@app/server/modules/backups/helpers/backup-maintenance.ts`:
- Around line 114-130: The mirror copy is incorrectly taking an exclusive lock
on the destination repository; change the lock type for the mirror repository
entries in the repoMutex.runMany calls used by mirror sync to "shared" (i.e.,
where you currently pass { repositoryId: mirrorRepository.id, type: "exclusive",
operation: `mirror_sync:${scheduleId}` } and similar entries) so both source and
mirror use "shared" locks while calling restic.copy (and mirror-related
operations such as those near the other runMany block referenced in the
comment); keep exclusive locks reserved only for maintenance
(prune/rebuild-index).

In `@app/server/modules/lifecycle/shutdown.ts`:
- Around line 28-30: The shutdown function currently awaits
Promise.allSettled([Scheduler.stop(), repoMutex.shutdown(), unmountVolumes(),
stopApplicationRuntime()]) and ignores results so any rejected step is lost;
modify shutdown in shutdown.ts to collect the settled results, find any rejected
entries (e.g., from repoMutex.shutdown() or stopApplicationRuntime()), log or
aggregate their errors, and rethrow (or throw an AggregateError) so failures
propagate to callers like app/server/plugins/bootstrap.ts; reference the
shutdown function and the Promise.allSettled call and ensure the code returns a
rejected promise when any step failed.

In `@app/server/modules/repositories/__tests__/repositories.service.test.ts`:
- Around line 435-456: The test calls the shared singleton repoMutex.shutdown(),
causing later tests to fail; change the test to use an isolated mutex instance
(or a test-only reset) instead of the imported repoMutex: create or inject a
fresh RepositoryMutex for this test and pass it to the code under test (or
reinitialize the module so repositoriesService.dumpSnapshot uses the fresh
mutex), ensure the holder uses that fresh instance (the holderStarted Promise
using repoMutex.runExclusive), and avoid calling shutdown() on the global
repoMutex so other tests are unaffected.

In `@app/server/modules/repositories/restore-executor.ts`:
- Around line 117-134: The catch currently treats any rejection from
repoMutex.runShared as a failure; change it to map abort-style rejections to
cancellation: in the catch block check if the outer signal is aborted OR the
caught error is an abort error (e.g. error?.name === "AbortError" || error?.code
=== "ERR_ABORTED" or use a helper isAbortError) and return { status:
"cancelled", message: "Restore was cancelled" }; otherwise keep returning {
status: "failed", error: toMessage(error) }. Update the catch around
repoMutex.runShared/restore executor accordingly (references:
repoMutex.runShared, signal, executeControllerRestore, executeAgentRestore).

In `@app/server/plugins/bootstrap.ts`:
- Around line 9-24: runGracefulShutdown currently swallows errors by catching
them and not propagating them, so runSignalShutdown always calls
process.exit(0); change runGracefulShutdown (and its use in runSignalShutdown)
so shutdown errors are propagated or converted to a non-zero exit code: let
shutdownPromise rethrow or return a rejected promise when shutdown() fails, and
in runSignalShutdown await runGracefulShutdown(reason).catch(err => {
logger.error("Error during shutdown", err); process.exit(1); }) (or rethrow to
let the caller handle exit code) so a failing shutdown results in process.exit
with a non-zero code; update references to runGracefulShutdown,
runSignalShutdown, shutdownPromise and shutdown accordingly.

In `@packages/core/src/restic/commands/copy.ts`:
- Around line 66-73: The post-run check of options.signal?.aborted wrongly
reports a cancellation after safeExec completes; instead determine cancellation
from safeExec's result/error itself and ensure cleanupTemporaryKeys runs in a
finally block. Change the flow in copy.ts so you call safeExec(...) and catch
its error; if the caught error indicates an abort (e.g. error.name ===
'AbortError' or a specific abort sentinel returned by safeExec) then log via
logger.warn and rethrow; otherwise treat a successful safeExec result as success
even if options.signal is now aborted. Also move cleanupTemporaryKeys(sourceEnv,
deps) and cleanupTemporaryKeys(destEnv, deps) into a finally block so they
always run regardless of success/abort; reference safeExec,
cleanupTemporaryKeys, options.signal, and logger.warn when making these edits.

In `@packages/core/src/restic/commands/delete-snapshots.ts`:
- Around line 39-42: The current abort check inside deleteSnapshots() uses
options.signal?.aborted and throws even after restic forget --prune has
completed; change it to only treat the operation as aborted when the safeExec
result indicates an aborted process (or when safeExec returns an explicit
aborted flag) rather than relying solely on options.signal?.aborted — locate the
aborting branch around cleanupTemporaryKeys() in deleteSnapshots(), consult the
safeExec call that ran restic forget --prune, and gate the throw on the safeExec
result (or an explicit aborted boolean from safeExec) so successful deletes
still run cache invalidation and stats refresh.

In `@packages/core/src/restic/commands/dump.ts`:
- Around line 68-73: The code currently aborts the local abortController
(abortDump) before calling safeSpawn/onSpawn which can cause stream to be unset
and surface a generic "Failed to initialize restic dump stream" instead of
propagating cancellation; update the logic in the dump routine to first
short-circuit when options.signal?.aborted (or when options.signal fires
before/while spawning) by returning/throwing an abort-specific error (e.g.,
propagate the original AbortSignal reason or throw an AbortError) instead of
proceeding to safeSpawn, and also ensure the abortDump handler and the
post-spawn !stream check map their failure path to the same abort-specific
error; apply the same change to the analogous branch around lines 115-117 so all
early/shutdown aborts consistently propagate cancellation rather than a generic
initialization failure.

In `@packages/core/src/restic/commands/forget.ts`:
- Around line 63-69: The current code checks extra.signal?.aborted after
awaiting safeExec and cleanupTemporaryKeys, which can incorrectly report an
already-completed restic forget/prune as "Operation aborted"; instead, determine
cancellation from the safeExec outcome itself (e.g., the returned result or any
thrown abort-specific error) before deciding to throw. Update the forget.ts
block that calls safeExec and cleanupTemporaryKeys so that you inspect the
safeExec result (res, exit code, or thrown AbortError) to detect aborts, only
throw "Operation aborted" when safeExec indicates it was terminated by the
signal, and always perform cleanupTemporaryKeys regardless of abort detection.

---

Outside diff comments:
In `@app/server/modules/lifecycle/shutdown.ts`:
- Around line 12-26: The guard in unmountVolumes is inverted: it currently runs
the LOCAL_AGENT_ID unmount logic only when config.flags.enableLocalAgent is
false, skipping cleanup on normal local-agent shutdown; change the condition to
run the findMany/unmount loop when enableLocalAgent is true (i.e., remove or
invert the negation) so volumes from db.query.volumesTable.findMany filtered by
{ agentId: LOCAL_AGENT_ID, status: "mounted" } are iterated and each unmounted
via withContext(..., () => volumeService.unmountVolume(volume.shortId, {
persistStatus: false })) with the same error handling and logger call preserved.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4016cf83-68ad-4827-a3e2-cadb91a8cca0

📥 Commits

Reviewing files that changed from the base of the PR and between 34fc4f0 and 5860e2b.

📒 Files selected for processing (24)

app/server.ts
app/server/core/__tests__/repository-mutex.test.ts
app/server/core/repository-mutex.ts
app/server/modules/agents/agents-manager.ts
app/server/modules/backups/__tests__/backups.service.execution.test.ts
app/server/modules/backups/backup-executor.ts
app/server/modules/backups/backups.service.ts
app/server/modules/backups/helpers/backup-maintenance.ts
app/server/modules/lifecycle/__tests__/shutdown.test.ts
app/server/modules/lifecycle/shutdown.ts
app/server/modules/repositories/__tests__/repositories.service.test.ts
app/server/modules/repositories/helpers/doctor.ts
app/server/modules/repositories/repositories.service.ts
app/server/modules/repositories/restore-executor.ts
app/server/plugins/bootstrap.ts
app/test/helpers/repository-mutex.ts
packages/core/src/restic/commands/copy.ts
packages/core/src/restic/commands/delete-snapshots.ts
packages/core/src/restic/commands/dump.ts
packages/core/src/restic/commands/forget.ts
packages/core/src/restic/commands/ls.ts
packages/core/src/restic/commands/snapshots.ts
packages/core/src/restic/commands/stats.ts
packages/core/src/restic/commands/tag-snapshots.ts

💤 Files with no reviewable changes (1)

app/server.ts

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e674a8f6bf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c27e717e0f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 729f6822d5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

nicotsx marked this pull request as ready for review June 5, 2026 15:55

chatgpt-codex-connector Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread app/server/modules/backups/backups.service.ts

Comment thread app/server/modules/backups/backup-executor.ts Outdated

Comment thread app/server/modules/repositories/repositories.service.ts

refactor: gracefully abort restic tasks on shutdown

31112f4

coderabbitai Bot reviewed Jun 5, 2026

View reviewed changes

refactor: pr feedbacks

e674a8f

nicotsx force-pushed the 06-05-refactor_shutdown-mutex branch from 5860e2b to e674a8f Compare June 5, 2026 17:29

chatgpt-codex-connector Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread app/server/modules/repositories/restore-executor.ts

Comment thread app/server/modules/repositories/helpers/doctor.ts

Comment thread packages/core/src/restic/commands/tag-snapshots.ts Outdated

Comment thread app/server/modules/backups/helpers/backup-maintenance.ts

nicotsx added 2 commits June 5, 2026 20:12

fix: handle shutdown aborts without persisting failures

6f0bac7

chore: fix liniting issue

c27e717

chatgpt-codex-connector Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread app/server/modules/lifecycle/shutdown.ts

refactor: controller doesn't shudown abort down to the agent

729f682

chatgpt-codex-connector Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread app/server/modules/agents/agents-manager.ts

Comment thread app/server/modules/agents/agents-manager.ts

Comment thread app/server/modules/repositories/repositories.service.ts Outdated

fix: cache set race

9553f8b

nicotsx marked this pull request as draft June 6, 2026 11:38

Conversation

nicotsx commented Jun 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

nicotsx commented Jun 5, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nicotsx commented Jun 5, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading