Add real-time drone telemetry pipeline: Kafka + Spark Structured Streaming + AWS S3/Redshift#1
Draft
Add real-time drone telemetry pipeline: Kafka + Spark Structured Streaming + AWS S3/Redshift#1
Conversation
Agent-Logs-Url: https://github.com/Bitu-Singh-Rathoud/drone-delivery-optimization/sessions/fffb3c85-307e-416e-bdcc-f97093353833 Co-authored-by: Bitu-Singh-Rathoud <247644259+Bitu-Singh-Rathoud@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Bitu-Singh-Rathoud/drone-delivery-optimization/sessions/fffb3c85-307e-416e-bdcc-f97093353833 Co-authored-by: Bitu-Singh-Rathoud <247644259+Bitu-Singh-Rathoud@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Add big data pipeline for processing drone telemetry data
Add real-time drone telemetry pipeline: Kafka + Spark Structured Streaming + AWS S3/Redshift
Apr 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements a full big data pipeline for real-time drone telemetry ingestion, route optimization, and collision detection. All configuration is environment-variable-driven; no secrets in source.
Data Flow
Components
src/models/drone_telemetry.py—@dataclasscapturing drone state (drone_id,lat/lon/alt,speed,heading,battery_level,status,payload_weight) with JSON (de)serialisation helperssrc/kafka/—DroneProducersimulates N drones at configurable rate (GZIP,acks=all, auto topic creation);DroneConsumerwraps confluent-kafka with manual commit and pluggable handlersrc/spark/telemetry_processor.py— Reads raw JSON from Kafka, applies typed schema, writes date-partitioned Parquet to S3, and forwards enriched records to downstream Kafka topicsrc/spark/route_optimizer.py— Adds per-row columns via Spark UDFs: Haversinedistance_to_dest_m,estimated_flight_time_s,recommended_action(continue/return_to_base/emergency_land/reduce_speed), and a 0–100optimisation_score(battery × 0.5, speed × 0.3, payload × 0.2)src/spark/collision_detector.py— Self-join on each micro-batch; pairs withinCOLLISION_SAFE_DISTANCE_Mhorizontally are emitted asWARNING; both axes breached →CRITICAL. Published to Kafka alerts topic viaforeachBatchsrc/aws/s3_handler.py— boto3 helpers with Hive-style date partitioning (year=/month=/day=)src/aws/redshift_handler.py— Auto-DDL (DISTKEY/SORTKEY), bulk inserts viaexecute_values,COPY … FORMAT AS PARQUETfrom S3, analytics queries (get_low_battery_drones,get_recent_alerts)config/—kafka_config.py,spark_config.py,aws_config.py— all tuneable via env vars, with sane defaults for local devdocker-compose.yml/Dockerfile— ZooKeeper, Kafka, Schema Registry, Kafka UI (localhost:8080); producer and Spark pipeline as servicesscripts/run_pipeline.py— Entrypoint that wires all three streaming queries and callsawaitAnyTermination()Tests
57 unit tests across all components. Spark tests (
test_route_optimizer.py,test_collision_detector.py) use a localSparkSession— no cluster required.Original prompt