streaming-pipeline

Here are 25 public repositories matching this topic...

qubole / streaminglens

Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines

streaming scala spark spark-streaming structured-streaming sla cluster-management sparklens streaming-pipeline micro-batches

Updated Jan 21, 2020
Scala

prakashdontaraju / google-cloud-ecommerce

Star

ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipeline ― Cloud Storage, Dataproc, PySpark, Cloud Spanner and Tableau

Updated Mar 9, 2022
Python

CodersAcademy006 / Data-Sanitizer

Star

AI-powered data sanitizer with schema detection, dedupe, outlier detection, and LLM enrichment.

etl sqlite data-engineering outlier-detection data-cleaning data-pipeline data-quality data-enrichment jsonl streaming-pipeline csv-cleaning

Updated Nov 26, 2025
Python

itsmawna / Seismic-Realtime-Pipeline

Star

react nodejs python api iot kafka spark mongodb websocket seismology data-visualization real-time-data event-processing earthquake structured-streaming streaming-pipeline

Updated Nov 12, 2025
Python

mujahidniaz / iot_device_streaming_pipeline_cloudera-kakfa-spark-hbase

Star

Real Time Data Streaming Pipeline

kafka spark impala cloudera hbase data-pipeline streaming-data data-ingestion streaming-pipeline iots

Updated Jan 9, 2020
Java

quixio / community-highlights

Star

Collecting highlights from the Quix community and social media in the form of interesting questions, comments, challenges, solutions and insights

Updated Aug 30, 2024
Python

KumarVaibhav27 / DATABRICKS-x-DBT-End-To-End-Data-Engineering-Project

Star

This project implements a modern data engineering pipeline using Databricks, PySpark, DBT, and Delta Live Tables. It follows the Medallion Architecture, supports realtime data ingestion with Autoloader, and models data with fact and dimension tables, including Slowly Changing Dimensions (SCD Type 2), all orchestrated in a scalable cloud environment

pyspark databricks data-build-tool delta-lake streaming-pipeline delta-live-tables databricks-unity-catalog dimensional-data-modeling databricks-autoloder

Updated Jul 15, 2025

sahilgundu / big4-audit-compliance-anomaly-detection-azure-databricks

Star

Docs-only case study of a compliance & anomaly detection platform on Azure + Databricks (Streaming ETL + Batch ELT + ML).

etl azure data-engineering elt databricks anomaly-detection delta-lake streaming-pipeline ml-pipelines batch-pipeline bfsi audit-compliance

Updated Nov 21, 2025

JonFillip / transloc_api_gcp_pipeline

Star

Stream data directly from an API using Apache Beam to BigQuery.

gcp apache-beam etl-automation streaming-pipeline gcp-project

Updated Jan 2, 2024
Python

torvyn / torvyn

Star

Ownership-aware reactive streaming runtime on the WebAssembly Component Model

rust observable reactive-streams polyglot rust-library streaming-pipeline contract-first streaming-runtime ownership-aware

Updated Apr 6, 2026
Rust

stereosky / community-highlights

Star

Collecting highlights from the Quix community and social media in the form of interesting questions, comments, challenges, solutions and insights

Updated Feb 4, 2025
Python

rakyankay / GCP-DataEngineerLearningPath

Star

Data Engineer Training Using Google Cloud Platform

machine-learning google-cloud-platform data-engineer data-warehousing etl-pipeline data-analysis-python streaming-pipeline batch-pipeline

Updated Aug 28, 2023
Jupyter Notebook

Yash170204 / crypto-pipeline

Star

Kafka-based real-time cryptocurrency data ingestion pipeline with Python and MongoDB

python docker kafka mongodb etl realtime-data data-engineering-pipeline streaming-pipeline

Updated Jan 21, 2026
Python

sahilgundu / big4-audit-compliance-reporting-azure

Star

Docs-only case study – Compliance Reporting data platform on Azure for a Big-4 Audit & Consulting Firm (BFSI, healthcare-style datasets) using Streaming Pipeline (ETL) + Batch Pipeline (ELT) with Snowflake, Synapse, ADF, Power BI, ML risk scoring, DQ, governance, and lineage.

etl azure power-bi snowflake data-engineering healthcare elt hipaa azure-data-factory streaming-pipeline ml-pipelines batch-pipeline azure-synapse bfsi audit-compliance

Updated Nov 21, 2025

escobarana / twitter_msk_emr

Star

Streaming pipeline using AWS MSK and AWS EMR with Spark, retrieving the data from Twitter Streams API

emr serverless twitter-api amazon pyspark msk streaming-pipeline

Updated Sep 10, 2023
HCL

CrossFil / goit-de-fp

Star

Data Engineering (Final Project)

streaming-pipeline batch-data-lake

Updated Jun 15, 2025
Python

LesiaUKR / goit-de-fp

Star

Masters degree | Data Engineering | Final course projects | goit-de-fp

python docker apache-spark data-lake apache-kafka apache-airflow streaming-pipeline goit-de-fp

Updated Dec 7, 2024
Python

janaom / Streaming-Pipeline-on-GCP

Star

gcp pubsub streaming-pipeline

Updated Nov 27, 2023
Python

JavierPachas / gcp_aviation

Star

Event-driven data pipeline on Google Cloud: AviationStack API → Pub/Sub → Cloud Functions → BigQuery for real-time flight data ingestion.

python bigquery etl data-engineering pub-sub google-cloud-platform cloud-functions streaming-pipeline

Updated Apr 20, 2026
Jupyter Notebook

YuraYara2005 / Hospital

Star

An end-to-end real-time Big Data pipeline for hospital operations. Processes streaming admission/discharge data using Kafka, NiFi, and PySpark to provide live bed occupancy and ER wait time analytics in MySQL.

mysql kafka big-data apache-spark pyspark kafka-consumer apache-nifi nifi-processors nifi-custom-processor streaming-pipeline real-time-analytics-data-engineering

Updated Feb 13, 2026
Python

Improve this page

Add a description, image, and links to the streaming-pipeline topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the streaming-pipeline topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

streaming-pipeline

Here are 25 public repositories matching this topic...

qubole / streaminglens

prakashdontaraju / google-cloud-ecommerce

CodersAcademy006 / Data-Sanitizer

itsmawna / Seismic-Realtime-Pipeline

mujahidniaz / iot_device_streaming_pipeline_cloudera-kakfa-spark-hbase

quixio / community-highlights

KumarVaibhav27 / DATABRICKS-x-DBT-End-To-End-Data-Engineering-Project

sahilgundu / big4-audit-compliance-anomaly-detection-azure-databricks

JonFillip / transloc_api_gcp_pipeline

torvyn / torvyn

stereosky / community-highlights

rakyankay / GCP-DataEngineerLearningPath

Yash170204 / crypto-pipeline

sahilgundu / big4-audit-compliance-reporting-azure

escobarana / twitter_msk_emr

CrossFil / goit-de-fp

LesiaUKR / goit-de-fp

janaom / Streaming-Pipeline-on-GCP

JavierPachas / gcp_aviation

YuraYara2005 / Hospital

Improve this page

Add this topic to your repo