Add real-time drone telemetry pipeline: Kafka + Spark Structured Streaming + AWS S3/Redshift by Copilot · Pull Request #1 · Bitu-Singh-Rathoud/drone-delivery-optimization

Copilot · 2026-04-01T13:59:20Z

Implements a full big data pipeline for real-time drone telemetry ingestion, route optimization, and collision detection. All configuration is environment-variable-driven; no secrets in source.

Data Flow

Drone Fleet (simulated)
  → Kafka (drone-telemetry)
  → Spark Structured Streaming
      ├─ Route optimization  → Kafka (drone-processed) + S3 Parquet
      └─ Collision detection → Kafka (drone-alerts)
  → Redshift (COPY FROM S3)

Components

src/models/drone_telemetry.py — @dataclass capturing drone state (drone_id, lat/lon/alt, speed, heading, battery_level, status, payload_weight) with JSON (de)serialisation helpers
src/kafka/ — DroneProducer simulates N drones at configurable rate (GZIP, acks=all, auto topic creation); DroneConsumer wraps confluent-kafka with manual commit and pluggable handler
src/spark/telemetry_processor.py — Reads raw JSON from Kafka, applies typed schema, writes date-partitioned Parquet to S3, and forwards enriched records to downstream Kafka topic
src/spark/route_optimizer.py — Adds per-row columns via Spark UDFs: Haversine distance_to_dest_m, estimated_flight_time_s, recommended_action (continue / return_to_base / emergency_land / reduce_speed), and a 0–100 optimisation_score (battery × 0.5, speed × 0.3, payload × 0.2)
src/spark/collision_detector.py — Self-join on each micro-batch; pairs within COLLISION_SAFE_DISTANCE_M horizontally are emitted as WARNING; both axes breached → CRITICAL. Published to Kafka alerts topic via foreachBatch
src/aws/s3_handler.py — boto3 helpers with Hive-style date partitioning (year=/month=/day=)
src/aws/redshift_handler.py — Auto-DDL (DISTKEY/SORTKEY), bulk inserts via execute_values, COPY … FORMAT AS PARQUET from S3, analytics queries (get_low_battery_drones, get_recent_alerts)
config/ — kafka_config.py, spark_config.py, aws_config.py — all tuneable via env vars, with sane defaults for local dev
docker-compose.yml / Dockerfile — ZooKeeper, Kafka, Schema Registry, Kafka UI (localhost:8080); producer and Spark pipeline as services
scripts/run_pipeline.py — Entrypoint that wires all three streaming queries and calls awaitAnyTermination()

Tests

57 unit tests across all components. Spark tests (test_route_optimizer.py, test_collision_detector.py) use a local SparkSession — no cluster required.

Original prompt

Create a big data pipeline using Kafka and Spark to process real-time drone telemetry data for route optimization and collision detection. Integrate AWS S3 and Redshift for scalable storage and analytics.

Agent-Logs-Url: https://github.com/Bitu-Singh-Rathoud/drone-delivery-optimization/sessions/fffb3c85-307e-416e-bdcc-f97093353833 Co-authored-by: Bitu-Singh-Rathoud <247644259+Bitu-Singh-Rathoud@users.noreply.github.com>

Initial plan

699804e

Copilot AI assigned Copilot and Bitu-Singh-Rathoud Apr 1, 2026

Copilot started work on behalf of Bitu-Singh-Rathoud April 1, 2026 13:59 View session

Copilot AI and others added 2 commits April 1, 2026 14:10

Add Kafka+Spark+AWS big data pipeline for drone telemetry

cc29f56

Agent-Logs-Url: https://github.com/Bitu-Singh-Rathoud/drone-delivery-optimization/sessions/fffb3c85-307e-416e-bdcc-f97093353833 Co-authored-by: Bitu-Singh-Rathoud <247644259+Bitu-Singh-Rathoud@users.noreply.github.com>

Fix CodeQL: remove precise lat/lon from consumer log output

ea756ee

Agent-Logs-Url: https://github.com/Bitu-Singh-Rathoud/drone-delivery-optimization/sessions/fffb3c85-307e-416e-bdcc-f97093353833 Co-authored-by: Bitu-Singh-Rathoud <247644259+Bitu-Singh-Rathoud@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add big data pipeline for processing drone telemetry data~~ Add real-time drone telemetry pipeline: Kafka + Spark Structured Streaming + AWS S3/Redshift Apr 1, 2026

Copilot AI requested a review from Bitu-Singh-Rathoud April 1, 2026 14:15

Copilot finished work on behalf of Bitu-Singh-Rathoud April 1, 2026 14:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add real-time drone telemetry pipeline: Kafka + Spark Structured Streaming + AWS S3/Redshift#1

Add real-time drone telemetry pipeline: Kafka + Spark Structured Streaming + AWS S3/Redshift#1
Copilot wants to merge 3 commits intomainfrom
copilot/create-big-data-pipeline

Copilot AI commented Apr 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Data Flow

Components

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Apr 1, 2026 •

edited

Loading