fix: exit cleanly on fatal startup errors instead of crash-looping (#4253) by apotema · Pull Request #4601 · Dokploy/dokploy

apotema · 2026-06-09T13:09:59Z

Problem

Fixes #4253. After Migration complete, a fatal error in the server startup window did not terminate the process. Background handles — the ioredis reconnect loop, open sockets — keep the event loop alive, so instead of exiting, the process spins at high CPU, never passes the healthcheck (/api/trpc/settings.health), and Docker Swarm crash-loops the container showing only:

Using Docker socket (Standard Docker socket): /var/run/docker.sock
ELIFECYCLE  Command failed.

This matches every detail the reporter described: the crash loop pegging CPU (their 700–900%), the server never reaching "Server Started", and Swarm marking the task non-zero exit (1): unhealthy container.

Root cause

In apps/dokploy/server/server.ts:

The top-level setupDirectories() / createDefaultTraefikConfig() block had no try/catch.
app.prepare() had error handling inside its .then() but no .catch() — a rejected prepare() became an unhandled rejection.
There were no process-level uncaughtException / unhandledRejection handlers, and a dependency registers a logging-only unhandledRejection listener that suppresses Node's default "crash on unhandled rejection". So a fatal startup error is logged (or not) but the process never exits — it just spins.

Fix

Phase-gated error handling, so a failed startup exits cleanly without destabilizing a healthy server:

Before the HTTP server is listening → an uncaught exception / unhandled rejection / sync-init throw / prepare() rejection / bind failure logs the cause and exit(1)s, so the orchestrator restarts cleanly instead of spinning.
After it is listening → a stray unhandled rejection is only logged, so an otherwise-healthy serving instance is never killed.
await the listen() bind so a bind failure (e.g. EADDRINUSE) exits instead of spinning; mark the server ready only once actually listening.
try/catch around the synchronous directory/Traefik init; .catch() on app.prepare() with a labeled diagnostic.

Verification (real compiled bundle)

Built the real dist/migration.mjs + dist/server.mjs from this branch (node:24.4.0-slim, real Postgres 16 + Redis on a Docker network) and exercised the actual migration → server boot:

Scenario	`canary`	This PR
Pre-listen `prepare()` failure	spins forever, high CPU, killed at timeout (exit 124)	logs `Failed to prepare…`, clean exit 1 in ~5s
Normal boot	—	reaches `Server Started on`, healthcheck HTTP 200, running
Real post-listen rejection (`docker.sock` ENOENT)	—	logged, server survives and stays healthy

The post-listen case is why the handlers are phase-gated: a naive "always exit(1) on unhandled rejection" would kill a healthy server on that exact background docker.sock rejection.

Scope

This fixes the crash-loop mechanism — any fatal startup failure now produces a clean, logged exit(1) instead of a silent high-CPU spin. The separate report in the comments (the dokploy-postgres role disappearing after ~1–2h) is a distinct database-lifecycle issue this PR does not address; with this change, that failure would surface as a clean diagnostic exit rather than a silent spin.

Testing

tsc --noEmit clean; biome check clean.
Reproduced before/after with the real bundle as above.

…okploy#4253) After "Migration complete", a fatal error in the server startup window (e.g. a rejected `app.prepare()`) did not terminate the process. Background handles — the ioredis reconnect loop, open sockets — kept the event loop alive, so instead of exiting the process spun at high CPU, never passed the healthcheck, and Docker Swarm crash-looped the container showing only "ELIFECYCLE Command failed." Reproduced with the real compiled bundle (node:24.4.0-slim + Postgres 16): - before: fatal startup error -> process never exits, spins until killed - after: fatal startup error -> logs the cause, exits 1 in ~3-5s Changes in apps/dokploy/server/server.ts: - Phase-gated process handlers: before the HTTP server is listening, an uncaught exception or unhandled rejection logs the cause and exit(1)s so the orchestrator restarts cleanly. After it is listening, a stray rejection is only logged, so a healthy serving instance is never killed (verified: real post-listen docker.sock ENOENT rejection is survived). - try/catch around the synchronous directory/Traefik init. - await the listen() bind so a bind failure (e.g. EADDRINUSE) exits instead of spinning; only mark the server ready once actually listening. - .catch() on app.prepare() with a labeled diagnostic.

apotema requested a review from Siumauricio as a code owner June 9, 2026 13:10

dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jun 9, 2026

apotema force-pushed the fix/4253-silent-server-crash-diagnostics branch from e2a1f9f to 9e122de Compare June 9, 2026 13:38

apotema changed the title ~~fix: log fatal startup errors instead of crashing silently (#4253)~~ fix: exit cleanly on fatal startup errors instead of crash-looping (#4253) Jun 9, 2026

apotema mentioned this pull request Jun 9, 2026

v0.29.0: Silent ELIFECYCLE crash after "Migration complete" - no error output, crash loops indefinitely #4253

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: exit cleanly on fatal startup errors instead of crash-looping (#4253)#4601

fix: exit cleanly on fatal startup errors instead of crash-looping (#4253)#4601
apotema wants to merge 1 commit into
Dokploy:canaryfrom
apotema:fix/4253-silent-server-crash-diagnostics

apotema commented Jun 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

apotema commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root cause

Fix

Verification (real compiled bundle)

Scope

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

apotema commented Jun 9, 2026 •

edited

Loading