-
-
Notifications
You must be signed in to change notification settings - Fork 1
Closed
Labels
Description
Problem
With the new DB-persisted fleet sync tracking (#1707), if the SLM backend process dies mid-sync (e.g. OOM, crash, or unclean restart), the job row stays in status='running' forever. There is no startup reconciliation to detect and mark stale jobs as failed.
Expected
On SLM backend startup, any fleet sync job with status='running' that was created more than N minutes ago should be marked as failed with a message like "interrupted by service restart".
Fix
Add a startup hook in main.py lifespan or the code-sync module init that:
- Queries
fleet_sync_jobs WHERE status = 'running' - Marks them as
failedwithcompleted_at = now() - Logs a warning for each recovered job
Impact
Severity: low — cosmetic/reporting, no functional harm. Jobs show incorrect status in API/UI.
Discovered During
Implementing #1707 — fleet sync job DB persistence
Location
autobot-slm-backend/api/code_sync.py — needs startup reconciliation hook
Reactions are currently unavailable