Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines
-
Updated
Jan 21, 2020 - Scala
Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines
ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipeline ― Cloud Storage, Dataproc, PySpark, Cloud Spanner and Tableau
AI-powered data sanitizer with schema detection, dedupe, outlier detection, and LLM enrichment.
Real Time Data Streaming Pipeline
Collecting highlights from the Quix community and social media in the form of interesting questions, comments, challenges, solutions and insights
This project implements a modern data engineering pipeline using Databricks, PySpark, DBT, and Delta Live Tables. It follows the Medallion Architecture, supports realtime data ingestion with Autoloader, and models data with fact and dimension tables, including Slowly Changing Dimensions (SCD Type 2), all orchestrated in a scalable cloud environment
Docs-only case study of a compliance & anomaly detection platform on Azure + Databricks (Streaming ETL + Batch ELT + ML).
Stream data directly from an API using Apache Beam to BigQuery.
Ownership-aware reactive streaming runtime on the WebAssembly Component Model
Collecting highlights from the Quix community and social media in the form of interesting questions, comments, challenges, solutions and insights
Data Engineer Training Using Google Cloud Platform
Kafka-based real-time cryptocurrency data ingestion pipeline with Python and MongoDB
Docs-only case study – Compliance Reporting data platform on Azure for a Big-4 Audit & Consulting Firm (BFSI, healthcare-style datasets) using Streaming Pipeline (ETL) + Batch Pipeline (ELT) with Snowflake, Synapse, ADF, Power BI, ML risk scoring, DQ, governance, and lineage.
Streaming pipeline using AWS MSK and AWS EMR with Spark, retrieving the data from Twitter Streams API
Masters degree | Data Engineering | Final course projects | goit-de-fp
Event-driven data pipeline on Google Cloud: AviationStack API → Pub/Sub → Cloud Functions → BigQuery for real-time flight data ingestion.
An end-to-end real-time Big Data pipeline for hospital operations. Processes streaming admission/discharge data using Kafka, NiFi, and PySpark to provide live bed occupancy and ER wait time analytics in MySQL.
Add a description, image, and links to the streaming-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the streaming-pipeline topic, visit your repo's landing page and select "manage topics."