Skip to content

feat: Resilient background job retry & monitoring#774

Open
wocaoac-cpu wants to merge 1 commit intorohitdash08:mainfrom
wocaoac-cpu:feat/background-job-retry
Open

feat: Resilient background job retry & monitoring#774
wocaoac-cpu wants to merge 1 commit intorohitdash08:mainfrom
wocaoac-cpu:feat/background-job-retry

Conversation

@wocaoac-cpu
Copy link
Copy Markdown

Summary

Closes #130

  • Job model: New jobs table with status tracking (PENDING, RUNNING, SUCCESS, FAILED, DEAD), retry count, exponential backoff scheduling, and dead-letter queue support
  • Job manager service (services/job_manager.py): Pluggable job type registry, enqueue() / run_job() / retry_failed_job() / process_pending_jobs() with exponential backoff retry and automatic dead-lettering after max retries
  • REST API (routes/jobs.py): Full CRUD — list jobs (filterable by status/type, paginated), create job, get by ID, retry failed/dead jobs, trigger batch processing, view dead-letter queue, aggregate stats, list registered types
  • Tests: 26 unit + integration tests covering backoff math, full job lifecycle, dead-lettering, pagination, and all API endpoints
  • OpenAPI: Updated openapi.yaml with Jobs tag, all paths, and Job/NewJob schemas

Test Plan

  • All 26 new tests pass (pytest tests/test_job_manager.py)
  • Existing test suite unaffected
  • Verify with running Redis for API endpoint tests (pre-existing infra dependency)
  • Manual smoke test of /jobs endpoints

- Add Job model with status tracking, retry count, and scheduling fields
- Implement job_manager service with exponential backoff retry, dead-letter
  queue, and batch processing of pending jobs
- Add REST API endpoints: list/create/get jobs, retry failed jobs, process
  pending jobs, view dead-letter queue, and aggregate stats
- Full test suite covering backoff computation, enqueue/run/retry lifecycle,
  dead-lettering after max retries, pagination, and all API endpoints
- Update openapi.yaml with Jobs tag, paths, and schemas
@wocaoac-cpu wocaoac-cpu requested a review from rohitdash08 as a code owner April 4, 2026 02:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Resilient background job retry & monitoring

1 participant