NeoStats_CaseStudy

Technologies:
Azure Data Factory | Azure Data Lake Gen2 | Parquet | Power BI | GitHub

Project Objective

Build a complete analytics pipeline to monitor server performance by:

Ingesting raw server logs
Cleaning and aggregating data using ADF
Storing data in Bronze, Silver, and Gold layers
Creating interactive dashboards in Power BI
Managing everything using GitHub version control

Overall Architecture

Raw Server Logs + Metadata
Azure Data Factory (ETL)
Azure Data Lake Storage Gen2 (Bronze Container,Silver Container,Gold Container)
powerBI DashBoards (UseD the gold Layer tables in PowerBi)

Tools & Services Used

Layer	Tool
ETL	Azure Data Factory
Storage	ADLS Gen2
File Format	Parquet
Analytics	Power BI Desktop
Version Control	GitHub
Auth	Azure AD (Organizational Account)

Data Layers Explained

Bronze Layer (Raw)

Stores data exactly as received
No transformation
Format: Parquet

Example fields:

server_id
log_timestamp
cpu_utilization
memory_usage
disk_io
uptime_hours
downtime_hours

Silver Layer (Cleaned)

Transformations applied:

Removed null values
Standardized column names
Converted timestamps
Joined server metadata
Unified multiple sources using UNION

Output:

Clean, analytics-ready dataset

Gold Layer (Aggregated)

Aggregations:

Avg CPU utilization
Avg memory usage
Total uptime hours
Total downtime hours
Metrics grouped by:
- server_id
- server_cluster
- date

Stored in Parquet for reporting.

Azure Data Factory

ADF Components

Pipelines
Dataflows
Linked Services
Datasets

ETL Flow

Source → read raw station logs
Select → required columns only
Derived Column → clean & standardize
Union → merge multiple station inputs
Join → enrich with metadata
Aggregate → compute KPIs
Sink → write to Silver / Gold containers

GitHub Integration with ADF

All ADF artifacts are stored in GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
DataSets		DataSets
Documentation		Documentation
Images		Images
PowerBi_DashBoard		PowerBi_DashBoard
dataflow		dataflow
dataset		dataset
factory		factory
linkedService		linkedService
pipeline		pipeline
README.md		README.md
publish_config.json		publish_config.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NeoStats_CaseStudy

Project Objective

Overall Architecture

Tools & Services Used

Data Layers Explained

Bronze Layer (Raw)

Silver Layer (Cleaned)

Gold Layer (Aggregated)

Azure Data Factory

ADF Components

ETL Flow

GitHub Integration with ADF

About

Uh oh!

Releases

Packages

Expelliarmus-R/NeoStats_CaseStudy

Folders and files

Latest commit

History

Repository files navigation

NeoStats_CaseStudy

Project Objective

Overall Architecture

Tools & Services Used

Data Layers Explained

Bronze Layer (Raw)

Silver Layer (Cleaned)

Gold Layer (Aggregated)

Azure Data Factory

ADF Components

ETL Flow

GitHub Integration with ADF

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages