A sample project demonstrating configuration-driven Apache Airflow DAG development using dag-factory.
This project showcases how to build scalable Apache Airflow workflows using YAML configuration files instead of traditional Python DAG definitions. By leveraging dag-factory, DAGs can be defined declaratively, making them easier to maintain, version, and manage at scale.
- Configuration-Driven DAGs: Define workflows in YAML for better maintainability
- Docker Development Environment: Complete local Airflow setup with Docker Compose
- Code Quality Automation: Pre-commit hooks with linting and formatting
- Multiple DAG Examples: Sample workflows demonstrating different scheduling patterns
- Scalable Architecture: Easy to add new DAGs without writing Python code
.
├── .dev/
│ ├── Dockerfile # Custom Airflow image
│ └── docker-compose.yaml # Local development environment
├── dags/
│ ├── example_dag.py # Python wrapper for dag-factory
│ ├── example_dag_1.yml # DAG configuration (runs at 1 AM)
│ ├── example_dag_2.yml # DAG configuration (runs at 2 AM)
│ └── example_dag_3.yml # DAG configuration (runs at 3 AM)
├── .pre-commit-config.yaml # Code quality automation
├── Makefile # Development workflow automation
├── pyproject.toml # Python dependencies
└── README.md
- Python 3.12+
- Docker and Docker Compose
- Git
-
Clone the repository:
git clone https://github.com/dave6892/airflow-dags-sample.git cd airflow-dags-sample -
Install dependencies:
# Using uv (recommended) uv sync # Or using pip pip install -e .
-
Set up pre-commit hooks:
pre-commit install
-
Start the Airflow environment:
Option A: Using Makefile (Recommended):
make airflow-up
Option B: Using Docker Compose directly:
cd .dev docker-compose up -d -
Access Airflow UI:
- URL: http://localhost:8080
- Username:
airflow - Password:
airflow
To create a new DAG, simply add a new YAML file in the dags/ directory:
# dags/my_new_dag.yml
default:
default_args:
catchup: false
start_date: 2024-11-11
my_workflow:
default_args:
owner: "your_name"
description: "Description of your workflow"
schedule_interval: "0 4 * * *" # Run at 4 AM daily
render_template_as_native_obj: True
tasks:
extract_data:
operator: airflow.operators.bash_operator.BashOperator
bash_command: "echo 'Extracting data...'"
process_data:
operator: airflow.operators.bash_operator.BashOperator
bash_command: "echo 'Processing data...'"
dependencies: [extract_data]
load_data:
operator: airflow.operators.bash_operator.BashOperator
bash_command: "echo 'Loading data...'"
dependencies: [process_data]The DAG will be automatically discovered and loaded by Airflow.
The project includes three sample DAGs:
- example_dag_1: Runs daily at 1:00 AM
- example_dag_2: Runs daily at 2:00 AM
- example_dag_3: Runs daily at 3:00 AM
Each DAG demonstrates a simple workflow with three bash tasks showing task dependencies.
This project includes a comprehensive Makefile to streamline development workflows. Here are the most commonly used commands:
make setup # Complete project setup (uv + pre-commit)
make setup-pip # Complete project setup using pip
make dev-start # Setup and start complete development environmentmake airflow-up # Start Airflow development environment
make airflow-down # Stop Airflow development environment
make airflow-restart # Restart Airflow services
make airflow-logs # View all Airflow logs
make airflow-status # Check status of Airflow services
make airflow-shell # Open shell in Airflow containermake lint # Run linting checks
make format # Format code
make lint-fix # Auto-fix linting issues
make check # Run all quality checks (lint + format + test)
make fix # Fix all code issues and formatmake test # Run tests
make test-cov # Run tests with coverage
make test-verbose # Run tests with verbose outputmake clean # Clean temporary files and caches
make clean-all # Clean everything including Docker resources
make dev-reset # Reset and restart development environmentmake help # Show all available commands
make # Default target (shows help)This project uses several tools to maintain code quality:
- Ruff: Python linting and formatting
- Pre-commit hooks: Automatic checks before commits
- YAML validation: Ensures configuration files are valid
To run tests:
pytestThe Docker Compose setup includes:
- Airflow Webserver: Web UI at http://localhost:8080
- Airflow Scheduler: DAG scheduling and execution
- Airflow Worker: Task execution (Celery)
- PostgreSQL: Metadata database
- Redis: Message broker for Celery
Customize the environment by creating a .env file in the .dev/ directory:
# .dev/.env
AIRFLOW_UID=50000
_AIRFLOW_WWW_USER_USERNAME=admin
_AIRFLOW_WWW_USER_PASSWORD=admin
ENVIRONMENT=developmentRefer to the dag-factory documentation for advanced configuration options.
example_dag.py) rather than the actual YAML configuration that defines the DAG structure.
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make your changes
- Run pre-commit checks:
pre-commit run --all-files - Commit your changes:
git commit -m 'Add feature' - Push to the branch:
git push origin feature-name - Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.