Skip to content

feat: Resilient background job retry & monitoring#755

Open
shaidshark wants to merge 5 commits intorohitdash08:mainfrom
shaidshark:feature/job-retry-monitoring
Open

feat: Resilient background job retry & monitoring#755
shaidshark wants to merge 5 commits intorohitdash08:mainfrom
shaidshark:feature/job-retry-monitoring

Conversation

@shaidshark
Copy link
Copy Markdown

Bounty #130 — Resilient background job retry & monitoring

What's included

Core: Redis-backed Job Queue

  • \job_queue.py\ — Enqueue, dequeue, mark success/fail, retry with exponential backoff
  • Configurable \RetryPolicy\ (max retries, base delay, backoff multiplier, max delay)
  • Job statuses: PENDING → RUNNING → SUCCEEDED | FAILED → RETRYING → DEAD
  • Dead letter queue for permanently failed jobs
  • Job statistics API

Worker Process

  • \worker.py\ — Background worker with task registry and graceful shutdown
  • Auto-registers example tasks (send_email, generate_report)
  • Extensible: just call
    egister_task(name, handler)\

REST API at /jobs:

Endpoint Method Description
/jobs/stats\ GET Queue statistics (pending, running, dead, etc.)
/jobs\ GET Job history with status filtering
/jobs/:id\ GET Specific job details
/jobs\ POST Enqueue new job with optional retry policy
/jobs/:id/retry\ POST Re-queue dead letter job
/jobs/all\ GET All jobs with stats overview

Tests:

  • 10 test cases with MockRedis covering all scenarios
  • Enqueue, dequeue, success, retry, dead letter, stats, retry policy

Acceptance Criteria

  • Improve reliability of async job execution
  • Production ready implementation
  • Includes tests (10 test cases)
  • Documentation (API endpoints, inline docs, worker setup)

shaidshark added 5 commits April 3, 2026 10:48
…sh08#130)

- Redis-backed job queue with priority support
- Configurable retry policy with exponential backoff
- Job statuses: PENDING, RUNNING, SUCCEEDED, FAILED, RETRYING, DEAD
- Dead letter queue for permanently failed jobs
- Job worker process with graceful shutdown
- REST API at /jobs for monitoring and management
- GET /jobs/stats — queue statistics
- GET /jobs — job history with filtering
- POST /jobs — enqueue new jobs
- POST /jobs/:id/retry — retry dead letter jobs
- 10 test cases covering enqueue, dequeue, retry, dead letter, stats
- Implement real retry delay using Redis sorted set (ZADD)
- Remove from PROCESSING_KEY on retry (was leaking)
- Replace keys() with scan_iter for performance
- Add admin check to /all endpoint (403 for non-admins)
- Add ownership check to get_job_detail
- Add task_name whitelist validation
- Fix payload deserialization (robust try/except)
- Add job execution timeout (30min) in worker
- Promote delayed jobs to main queue on dequeue
- keys() blocks Redis event loop on large datasets; scan_iter is cursor-based
- Removed uid==1 hardcoded admin bypass from /all endpoint
- Admin access now purely via JWT is_admin claim
Lines 41, 42, 46, 103 had leading backticks that would cause SyntaxError.
Addresses review feedback from rohitdash08#755
@shaidshark
Copy link
Copy Markdown
Author

Bounty submission — ready for review! All code review feedback from previous rounds has been addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant