Schema management for AWS Athena - a CLI tool for managing Athena table schemas as code.
- DDL as Code: Manage Athena table definitions using SQL files in a Git-friendly directory structure
- Change Preview: See exactly what will change before applying (similar to Terraform plan)
- Export Capability: Export existing tables to local SQL files
- Target Filtering: Apply changes to specific tables or databases using flexible patterns
- Parallel Execution: Fast operations with concurrent query execution
- CI/CD Ready: GitHub Action available for automated deployments
brew install rieshia/x/athenadefDownload pre-compiled binaries from the release page.
cargo install --git https://github.com/riseshia/athenadefjobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- uses: riseshia/athenadef@v0
with:
version: v0.1.0 # or latest- Export existing tables (optional, to get started with existing infrastructure):
athenadef exportThis creates SQL files for all your existing tables.
- Review changes:
athenadef plan- Apply changes:
athenadef applyAvailable for all commands:
-c, --config <FILE> Config file path [default: athenadef.yaml]
-t, --target <TABLES> Filter tables using <database>.<table> format
--debug Enable debug logging
-h, --help Print help information
-V, --version Print version information
Show what changes will be made to match your local configuration:
athenadef plan [OPTIONS]Options:
--show-unchanged: Show tables with no changes
Example output:
Plan: 2 to add, 1 to change, 0 to destroy.
+ salesdb.new_customers
Will create table
~ marketingdb.leads
Will update table
--- remote: marketingdb.leads
+++ local: marketingdb.leads
CREATE EXTERNAL TABLE leads (
- score int,
+ score double,
+ created_at timestamp,
email string
)
Apply the changes to make your Athena tables match your local configuration:
athenadef apply [OPTIONS]Options:
-a, --auto-approve: Skip interactive approval--dry-run: Show what would be done without executing
DROP TABLE followed by CREATE TABLE. This means tables will be temporarily unavailable during the update process
Example output:
Plan: 2 to add, 1 to change, 0 to destroy.
Do you want to perform these actions? (yes/no): yes
salesdb.new_customers: Creating...
salesdb.new_customers: Creation complete
marketingdb.leads: Modifying...
marketingdb.leads: Modification complete
Apply complete! Resources: 1 added, 1 changed, 0 destroyed.
Export existing Athena table definitions to local SQL files:
athenadef export [OPTIONS]Options:
--overwrite: Overwrite existing files
Example output:
Exporting table definitions...
salesdb.customers: Exported to salesdb/customers.sql
salesdb.orders: Exported to salesdb/orders.sql
Export complete! 2 tables exported.
Use --target to filter operations to specific tables or databases:
# Specific table
athenadef plan --target salesdb.customers
# Multiple tables
athenadef plan --target salesdb.customers --target marketingdb.leads
# All tables in a database
athenadef plan --target salesdb.*
# Tables with same name across databases
athenadef plan --target *.customersOrganize your SQL files in a directory structure that mirrors your databases and tables:
project-root/
├── athenadef.yaml # Configuration file
├── salesdb/ # Database name
│ ├── customers.sql # Table definition
│ └── orders.sql
└── marketingdb/
├── leads.sql
└── campaigns.sql
Each .sql file should contain a complete CREATE EXTERNAL TABLE statement:
-- customers.sql
CREATE EXTERNAL TABLE customers (
customer_id bigint,
name string,
email string COMMENT 'Customer email address',
registration_date date
)
PARTITIONED BY (
year string,
month string
)
STORED AS PARQUET
LOCATION 's3://your-data-bucket/customers/'
TBLPROPERTIES (
'projection.enabled' = 'true',
'projection.year.type' = 'integer',
'projection.year.range' = '2020,2030',
'projection.month.type' = 'integer',
'projection.month.range' = '1,12',
'projection.month.digits' = '2'
);Create an athenadef.yaml file in your project root:
# athenadef.yaml
# Optional: Athena workgroup (default: "primary")
workgroup: "primary"
# Optional: S3 location for query results
# If not specified, uses workgroup's default output location (recommended)
# output_location: "s3://athena-results-bucket/athenadef/"
# Optional: List of databases to manage
# If specified and --target is not provided, only these databases will be processed
# This is useful to avoid scanning all databases in your account
# databases:
# - salesdb
# - marketingdb
# Optional: AWS region (uses default from AWS config if not specified)
# region: "us-west-2"
# Optional: Query timeout in seconds (default: 300)
# query_timeout_seconds: 600
# Optional: Maximum concurrent queries (default: 5)
# max_concurrent_queries: 10See the examples directory for complete sample projects:
- examples/basic - Simple setup with a few tables
- examples/partitioned - Tables with partitions and partition projection
- examples/multi-database - Multiple databases with many tables
When using AWS managed storage (default, no output_location specified), you only need these permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"athena:StartQueryExecution",
"athena:GetQueryExecution",
"athena:GetQueryResults",
"athena:StopQueryExecution"
],
"Resource": "arn:aws:athena:*:*:workgroup/*"
},
{
"Effect": "Allow",
"Action": [
"glue:GetDatabase",
"glue:GetDatabases",
"glue:GetTable",
"glue:GetTables",
"glue:CreateTable",
"glue:UpdateTable",
"glue:DeleteTable"
],
"Resource": "*"
}
]
}Only required when specifying output_location in your configuration:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetBucketLocation",
"s3:GetObject",
"s3:ListBucket",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::your-query-results-bucket",
"arn:aws:s3:::your-query-results-bucket/*"
]
}
]
}References:
athenadef uses a simple but effective approach:
- Reads local SQL files organized in a
database/table.sqlstructure - Fetches current state from AWS Athena using
SHOW CREATE TABLE - Compares definitions using text-based diff (like git diff)
- Delegates SQL validation to AWS Athena (no local parsing)
- Applies changes by executing DDL statements through Athena
This design ensures:
- Simplicity: No complex SQL parsing
- Compatibility: Supports all Athena features automatically
- Reliability: SQL validation by AWS Athena itself
Configuration file not found:
# Specify config file explicitly
athenadef plan --config path/to/athenadef.yamlAWS authentication errors:
# Check AWS credentials
aws sts get-caller-identity
# Set AWS profile
export AWS_PROFILE=your-profileSQL syntax errors:
SQL errors are reported by Athena and include the file name and query that failed. Check the SQL syntax in your .sql files.
Enable debug logging to see detailed execution information:
athenadef plan --debugFor more detailed troubleshooting help, see the Troubleshooting Guide.
- Troubleshooting Guide - Detailed solutions for common issues
- AWS Permissions Guide - Complete IAM permissions reference and examples
- Advanced Usage - Advanced patterns, CI/CD integration, multi-environment setup
- FAQ - Frequently asked questions
- Architecture - System architecture and design
- Specification - Complete technical specification
- Technical Design - Core algorithms and implementation details
- JSON Output - JSON output format specification
- Basic Example - Simple setup with a few tables
- Partitioned Tables - Partition projection and time-series data
- Multi-Database - Managing multiple databases
Contributions are welcome! Please see our contributing guidelines for details.
This project is licensed under the MIT License - see the LICENSE file for details.
This project includes software developed by:
- aws-sdk-config: Licensed under the Apache License, Version 2.0
- aws-sdk-athena: Licensed under the Apache License, Version 2.0
- aws-sdk-s3: Licensed under the Apache License, Version 2.0