A serverless ETL pipeline for tracking stock market data using AWS Lambda, RDS PostgreSQL, and EventBridge. Originally built as a local Python script with SQLite, this project marked my transition to cloud-based data engineering.
Automatically tracks daily stock prices, calculates technical indicators, and sends weekly email reports with performance metrics and candlestick charts.
Evolution: Local SQLite script → AWS Lambda + RDS → Airflow + dbt + BigQuery
Tracked Stocks: GOOG, MSFT, NVDA, QUBT, RKLB (configurable)
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Polygon.io │────────>│ AWS Lambda │────────>│ RDS │
│ API │ │ + Layers │ │ PostgreSQL │
└─────────────┘ └──────────────┘ └─────────────┘
│
┌──────▼──────┐
│ EventBridge │
└─────────────┘
│
▼
Weekly Email Report
AWS Services:
- Lambda - Serverless compute with event-driven architecture
- Lambda Layers - AWSSDKPandas-Python313 + custom dependencies layer
- RDS PostgreSQL - Time-series data storage (db.t3.micro)
- EventBridge - Scheduled triggers (daily updates, weekly reports)
Event Types:
initialize- Load ~70 days historical data per tickerdaily_update- Fetch previous trading day (Tue-Sat at 1300 UTC)weekly_report- Generate email with charts (Sat at 1600 UTC)
Core:
- Python 3.13
- pandas, numpy (via AWS Lambda Layer)
- pg8000, sqlalchemy
- requests
External APIs:
- Polygon.io (stock data - free tier: 5 calls/min, unlimited daily)
- QuickChart.io (candlestick chart generation)
CREATE TABLE stock_data (
date TIMESTAMP NOT NULL,
symbol VARCHAR(10) NOT NULL,
open NUMERIC,
high NUMERIC,
low NUMERIC,
close NUMERIC,
volume BIGINT,
volatility NUMERIC,
rolling_avg NUMERIC,
daily_pct_change NUMERIC,
UNIQUE(symbol, date)
);Design highlights:
UNIQUE(symbol, date)for idempotent loading- Indexed on
(symbol, date)for time-series queries NUMERICtypes avoid floating-point precision issues
- Volatility -
(high - low) / close * 100 - 5-Day Rolling Average
- Daily % Change
- Weekly Return % - Open-to-close performance
- Weekly Range % - High-low spread
Subject: Weekly Market Report - Best: NVDA: 8.5% | Worst: QUBT: -3.2%
Includes:
- Performance summary for all tickers
- Weekly return % and range %
- Attached candlestick charts (30 days)
aws-stock-tracker/
├── api_client.py # Polygon.io API with retry logic
├── config.py # Environment variables
├── database.py # PostgreSQL operations
├── transformations.py # Data cleaning and metrics
├── utils.py # Database query helpers
├── reporting.py # Email generation and charts
├── lambda_main.py # AWS Lambda handler
├── main.py # Original local scheduler
├── requirements.txt
├── .env.example # Template for environment variables
├── .gitignore
├── LICENSE
└── README.md
1. AWS-Managed Layer:
Layer: AWSSDKPandas-Python313
ARN: arn:aws:lambda:<region>:336392948345:layer:AWSSDKPandas-Python313:<version>
Includes: pandas, numpy, pyarrow
2. Custom Layer:
mkdir -p layer/python/lib/python3.13/site-packages
pip install pg8000 sqlalchemy requests python-dotenv \
-t layer/python/lib/python3.13/site-packages/
cd layer && zip -r ../custom-layer.zip . && cd ..
aws s3 cp custom-layer.zip s3://your-bucket/layers/Why Layers? Reduces package from ~150MB to ~10MB, enables faster deployments.
# Package application code (no dependencies)
mkdir package && cp *.py package/
cd package && zip -r ../lambda_function.zip . && cd ..
# Upload to S3 (required for packages > 50MB)
aws s3 cp lambda_function.zip s3://your-bucket/
# Create Lambda function via console:
# - Runtime: Python 3.9
# - Timeout: 5 minutes
# - Memory: 256 MB
# - Layers: AWSSDKPandas-Python39 + custom layer
# - Code: Upload from S3
# - VPC: Same as RDSDaily: cron(0 13 ? * TUE-SAT *) → {"job_type": "daily_update"}
Weekly: cron(0 16 ? * SAT *) → {"job_type": "weekly_report"}
Challenge: Built initially with SQLite + schedule library. Needed production reliability and cloud scheduling.
Solution:
- Migrated SQLite → RDS PostgreSQL (proper indexing, connection pooling)
- Replaced
schedule→ EventBridge (managed scheduling, no always-on machine) - Refactored monolithic script → event-driven Lambda functions
Learning: Local prototyping is great for iteration; cloud requires stateless, event-driven design.
Challenge: Initial package with mplfinance exceeded 250MB limit.
Solution:
- Switched to QuickChart.io API (eliminated matplotlib ~100MB)
- Used Lambda Layers (AWSSDKPandas + custom)
- Final package: ~10MB
Learning: External APIs can replace heavy dependencies; layers enable modular dependency management.
Challenge: RDS consumed ~$10/month free tier credits for only ~350 records.
Solution: Eventually migrated to BigQuery (serverless, pay-per-query).
Learning: Managed databases have hidden costs. Evaluate serverless alternatives for cost-sensitive projects.
Week 1: Local Python script with SQLite
Week 2: Migrated to AWS (RDS + Lambda + EventBridge)
Weeks 3-4: Optimizations (layers, QuickChart.io, monitoring)
Month 2+: Hit limits, rebuilt as Airflow + dbt + BigQuery
Key Insight: This progression taught me when to use serverless (lightweight workloads) vs. orchestration tools (complex dependencies, large-scale processing).
Limitations encountered:
- Observability - CloudWatch logs scattered, no pipeline visualization
- Orchestration - EventBridge can't handle complex dependencies
- Cost - RDS expensive for small datasets
- Scalability - Expanding to Russell 3000 would be prohibitively expensive
- Development - Lambda packaging and env var management was tedious
New pipeline: 5.3M+ records, better observability, lower costs, easier development.
What Worked:
- Event-driven architecture with clean separation of concerns
- Polygon.io's unlimited daily calls vs. competitors' strict limits
- Retry logic prevented silent failures
- Local prototyping with SQLite enabled fast iteration
What I'd Improve:
- Use Lambda Layers from day one
- Add CloudWatch alarms for monitoring
- Implement data quality tests
- Start with Airflow for complex orchestration
Skills Gained:
- Serverless architecture (Lambda, EventBridge)
- Managed databases (RDS PostgreSQL)
- ETL design patterns
- API integration with error handling
- Event-driven systems
Monthly (AWS Free Tier):
- Lambda: $0 (~200 invocations/month)
- RDS: ~$10 in credits (storage + compute + backups)
- EventBridge: $0 (always free)
Total: ~$10/month
MIT License - see LICENSE
Trent Moore
GitHub: @tmoore-prog
LinkedIn: https://www.linkedin.com/in/trent-moore-data-engineer
Email: trent.moore.prog@gmail.com
Local prototype → Cloud migration → Production system. Built with curiosity, debugged with persistence, deprecated with wisdom.