Skip to content

Add real-time drone telemetry pipeline: Kafka + Spark Structured Streaming + AWS S3/Redshift#1

Draft
Copilot wants to merge 3 commits intomainfrom
copilot/create-big-data-pipeline
Draft

Add real-time drone telemetry pipeline: Kafka + Spark Structured Streaming + AWS S3/Redshift#1
Copilot wants to merge 3 commits intomainfrom
copilot/create-big-data-pipeline

Conversation

Copy link
Copy Markdown

Copilot AI commented Apr 1, 2026

Implements a full big data pipeline for real-time drone telemetry ingestion, route optimization, and collision detection. All configuration is environment-variable-driven; no secrets in source.

Data Flow

Drone Fleet (simulated)
  → Kafka (drone-telemetry)
  → Spark Structured Streaming
      ├─ Route optimization  → Kafka (drone-processed) + S3 Parquet
      └─ Collision detection → Kafka (drone-alerts)
  → Redshift (COPY FROM S3)

Components

  • src/models/drone_telemetry.py@dataclass capturing drone state (drone_id, lat/lon/alt, speed, heading, battery_level, status, payload_weight) with JSON (de)serialisation helpers

  • src/kafka/DroneProducer simulates N drones at configurable rate (GZIP, acks=all, auto topic creation); DroneConsumer wraps confluent-kafka with manual commit and pluggable handler

  • src/spark/telemetry_processor.py — Reads raw JSON from Kafka, applies typed schema, writes date-partitioned Parquet to S3, and forwards enriched records to downstream Kafka topic

  • src/spark/route_optimizer.py — Adds per-row columns via Spark UDFs: Haversine distance_to_dest_m, estimated_flight_time_s, recommended_action (continue / return_to_base / emergency_land / reduce_speed), and a 0–100 optimisation_score (battery × 0.5, speed × 0.3, payload × 0.2)

  • src/spark/collision_detector.py — Self-join on each micro-batch; pairs within COLLISION_SAFE_DISTANCE_M horizontally are emitted as WARNING; both axes breached → CRITICAL. Published to Kafka alerts topic via foreachBatch

  • src/aws/s3_handler.py — boto3 helpers with Hive-style date partitioning (year=/month=/day=)

  • src/aws/redshift_handler.py — Auto-DDL (DISTKEY/SORTKEY), bulk inserts via execute_values, COPY … FORMAT AS PARQUET from S3, analytics queries (get_low_battery_drones, get_recent_alerts)

  • config/kafka_config.py, spark_config.py, aws_config.py — all tuneable via env vars, with sane defaults for local dev

  • docker-compose.yml / Dockerfile — ZooKeeper, Kafka, Schema Registry, Kafka UI (localhost:8080); producer and Spark pipeline as services

  • scripts/run_pipeline.py — Entrypoint that wires all three streaming queries and calls awaitAnyTermination()

Tests

57 unit tests across all components. Spark tests (test_route_optimizer.py, test_collision_detector.py) use a local SparkSession — no cluster required.

Original prompt

Create a big data pipeline using Kafka and Spark to process real-time drone telemetry data for route optimization and collision detection. Integrate AWS S3 and Redshift for scalable storage and analytics.

Copilot AI and others added 2 commits April 1, 2026 14:10
Copilot AI changed the title [WIP] Add big data pipeline for processing drone telemetry data Add real-time drone telemetry pipeline: Kafka + Spark Structured Streaming + AWS S3/Redshift Apr 1, 2026
Copilot AI requested a review from Bitu-Singh-Rathoud April 1, 2026 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants