Technologies:
Azure Data Factory | Azure Data Lake Gen2 | Parquet | Power BI | GitHub
Build a complete analytics pipeline to monitor server performance by:
- Ingesting raw server logs
- Cleaning and aggregating data using ADF
- Storing data in Bronze, Silver, and Gold layers
- Creating interactive dashboards in Power BI
- Managing everything using GitHub version control
- Raw Server Logs + Metadata
- Azure Data Factory (ETL)
- Azure Data Lake Storage Gen2 (Bronze Container,Silver Container,Gold Container)
- powerBI DashBoards (UseD the gold Layer tables in PowerBi)
| Layer | Tool |
|---|---|
| ETL | Azure Data Factory |
| Storage | ADLS Gen2 |
| File Format | Parquet |
| Analytics | Power BI Desktop |
| Version Control | GitHub |
| Auth | Azure AD (Organizational Account) |
- Stores data exactly as received
- No transformation
- Format: Parquet
Example fields:
server_idlog_timestampcpu_utilizationmemory_usagedisk_iouptime_hoursdowntime_hours
Transformations applied:
- Removed null values
- Standardized column names
- Converted timestamps
- Joined server metadata
- Unified multiple sources using UNION
Output:
- Clean, analytics-ready dataset
Aggregations:
- Avg CPU utilization
- Avg memory usage
- Total uptime hours
- Total downtime hours
- Metrics grouped by:
- server_id
- server_cluster
- date
Stored in Parquet for reporting.
- Pipelines
- Dataflows
- Linked Services
- Datasets
- Source → read raw station logs
- Select → required columns only
- Derived Column → clean & standardize
- Union → merge multiple station inputs
- Join → enrich with metadata
- Aggregate → compute KPIs
- Sink → write to Silver / Gold containers
All ADF artifacts are stored in GitHub.