Skip to content

feat: implement asynchronous scan execution with background worker#129

Open
ritiksah141 wants to merge 3 commits into
openshield-org:devfrom
ritiksah141:feat/async-scan-execution
Open

feat: implement asynchronous scan execution with background worker#129
ritiksah141 wants to merge 3 commits into
openshield-org:devfrom
ritiksah141:feat/async-scan-execution

Conversation

@ritiksah141
Copy link
Copy Markdown
Collaborator

What does this PR do?
This PR transitions the OpenShield scan execution model from a synchronous, blocking request path to a decoupled, asynchronous architecture. It introduces a
database-backed background worker to handle long-running Azure posture scans, ensuring the API remains highly responsive and immune to web server timeouts even when
scanning enterprise-scale subscriptions.

Type of change

  • API endpoint (Added status polling and async trigger)
  • Documentation (Added async architecture guide)
  • Background worker implementation (Database-backed queue logic)
  • Stability/Performance improvement (Isolated external API latency)

Detailed Summary of Changes

  • Asynchronous Lifecycle: The POST /api/scans/trigger endpoint was modified to validate requests and immediately return an HTTP 202 Accepted response along with a
    unique scan ID. It no longer waits for the scan to finish.
  • Database-Backed Queue: The scans table in PostgreSQL was enhanced with status (pending, running, completed, failed) and error_message columns. This allows the
    database to function as a persistent, ACID-compliant task queue without requiring additional infrastructure like Redis.
  • Dedicated Background Worker: A new process, scanner/worker.py, was implemented to independently poll for pending scans, manage their state transitions, and execute
    the core scanning logic. It includes robust error handling to capture and persist tracebacks upon failure.
  • Status Polling Endpoint: Added GET /api/scans/<scan_id> to provide the frontend with real-time feedback on scan progress, completion timestamps, and error details.
  • Automatic Process Management: The startup.sh script was updated to automatically spawn the background worker alongside the Gunicorn web server, ensuring a seamless
    deployment experience.
  • Refined Documentation: Created docs/async-scan-architecture.md to explain the new system flow, technical rationale, and integration patterns for frontend
    developers.

Technical Rationale
Moving to a decoupled worker model addresses the fundamental limitation of synchronous web requests for security scanning. By using a database-backed queue rather
than ephemeral threads or complex message brokers, the system achieves maximum reliability with minimal infrastructure overhead. This architecture allows OpenShield
to compete with enterprise CSPM products by handling thousands of Azure resources without performance degradation.

Testing and Verification

  • Unit Tests: Implemented tests/test_worker.py using industry-standard mocking to verify the worker state machine.
  • E2E Smoke Tests: Hardened tests/smoke_test.py to verify the full async lifecycle, including successful 202 responses and status polling.
  • Local CI: Successfully ran the consolidated ci.yml logic locally, verifying syntax, rule structure, and security measures across all 44 rule files and new backend
    components.
  • Dependency Audit: Verified that requirements.txt correctly covers all imports used in the new asynchronous logic.

Checklist

  • My code follows the rule template in CONTRIBUTING.md
  • I have not committed any real Azure credentials
  • My branch name follows the convention: feat/description

Closes Issue #112

@ritiksah141 ritiksah141 self-assigned this Jun 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant