Skip to content

Serverless ETL pipeline tracking stock data with AWS Lambda, RDS, and EventBridge. My first cloud data engineering project - later evolved into Airflow + dbt pipeline.

License

Notifications You must be signed in to change notification settings

tmoore-prog/aws_stock_tracker

Repository files navigation

AWS Serverless Stock Tracker

A serverless ETL pipeline for tracking stock market data using AWS Lambda, RDS PostgreSQL, and EventBridge. Originally built as a local Python script with SQLite, this project marked my transition to cloud-based data engineering.

License: MIT

Overview

Automatically tracks daily stock prices, calculates technical indicators, and sends weekly email reports with performance metrics and candlestick charts.

Evolution: Local SQLite script → AWS Lambda + RDS → Airflow + dbt + BigQuery

Tracked Stocks: GOOG, MSFT, NVDA, QUBT, RKLB (configurable)


Architecture

┌─────────────┐         ┌──────────────┐         ┌─────────────┐
│  Polygon.io │────────>│ AWS Lambda   │────────>│     RDS     │
│     API     │         │  + Layers    │         │  PostgreSQL │
└─────────────┘         └──────────────┘         └─────────────┘
                               │
                        ┌──────▼──────┐
                        │ EventBridge │
                        └─────────────┘
                               │
                               ▼
                        Weekly Email Report

AWS Services:

  • Lambda - Serverless compute with event-driven architecture
  • Lambda Layers - AWSSDKPandas-Python313 + custom dependencies layer
  • RDS PostgreSQL - Time-series data storage (db.t3.micro)
  • EventBridge - Scheduled triggers (daily updates, weekly reports)

Event Types:

  • initialize - Load ~70 days historical data per ticker
  • daily_update - Fetch previous trading day (Tue-Sat at 1300 UTC)
  • weekly_report - Generate email with charts (Sat at 1600 UTC)

Tech Stack

Core:

  • Python 3.13
  • pandas, numpy (via AWS Lambda Layer)
  • pg8000, sqlalchemy
  • requests

External APIs:

  • Polygon.io (stock data - free tier: 5 calls/min, unlimited daily)
  • QuickChart.io (candlestick chart generation)

Key Features

Database Schema

CREATE TABLE stock_data (
    date TIMESTAMP NOT NULL,
    symbol VARCHAR(10) NOT NULL,
    open NUMERIC,
    high NUMERIC,
    low NUMERIC,
    close NUMERIC,
    volume BIGINT,
    volatility NUMERIC,
    rolling_avg NUMERIC,
    daily_pct_change NUMERIC,
    UNIQUE(symbol, date)
);

Design highlights:

  • UNIQUE(symbol, date) for idempotent loading
  • Indexed on (symbol, date) for time-series queries
  • NUMERIC types avoid floating-point precision issues

Calculated Metrics

  • Volatility - (high - low) / close * 100
  • 5-Day Rolling Average
  • Daily % Change
  • Weekly Return % - Open-to-close performance
  • Weekly Range % - High-low spread

Weekly Email Reports

Subject: Weekly Market Report - Best: NVDA: 8.5% | Worst: QUBT: -3.2%

Includes:

  • Performance summary for all tickers
  • Weekly return % and range %
  • Attached candlestick charts (30 days)

Project Structure

aws-stock-tracker/
├── api_client.py          # Polygon.io API with retry logic
├── config.py              # Environment variables
├── database.py            # PostgreSQL operations
├── transformations.py     # Data cleaning and metrics
├── utils.py               # Database query helpers
├── reporting.py           # Email generation and charts
├── lambda_main.py         # AWS Lambda handler
├── main.py                # Original local scheduler
├── requirements.txt
├── .env.example          # Template for environment variables
├── .gitignore
├── LICENSE
└── README.md

Deployment

Lambda Layers Setup

1. AWS-Managed Layer:

Layer: AWSSDKPandas-Python313
ARN: arn:aws:lambda:<region>:336392948345:layer:AWSSDKPandas-Python313:<version>
Includes: pandas, numpy, pyarrow

2. Custom Layer:

mkdir -p layer/python/lib/python3.13/site-packages
pip install pg8000 sqlalchemy requests python-dotenv \
    -t layer/python/lib/python3.13/site-packages/
cd layer && zip -r ../custom-layer.zip . && cd ..
aws s3 cp custom-layer.zip s3://your-bucket/layers/

Why Layers? Reduces package from ~150MB to ~10MB, enables faster deployments.

Deploy to Lambda

# Package application code (no dependencies)
mkdir package && cp *.py package/
cd package && zip -r ../lambda_function.zip . && cd ..

# Upload to S3 (required for packages > 50MB)
aws s3 cp lambda_function.zip s3://your-bucket/

# Create Lambda function via console:
# - Runtime: Python 3.9
# - Timeout: 5 minutes
# - Memory: 256 MB
# - Layers: AWSSDKPandas-Python39 + custom layer
# - Code: Upload from S3
# - VPC: Same as RDS

EventBridge Schedules

Daily: cron(0 13 ? * TUE-SAT *) → {"job_type": "daily_update"}
Weekly: cron(0 16 ? * SAT *) → {"job_type": "weekly_report"}

Challenges & Solutions

Local to Cloud Migration

Challenge: Built initially with SQLite + schedule library. Needed production reliability and cloud scheduling.

Solution:

  • Migrated SQLite → RDS PostgreSQL (proper indexing, connection pooling)
  • Replaced schedule → EventBridge (managed scheduling, no always-on machine)
  • Refactored monolithic script → event-driven Lambda functions

Learning: Local prototyping is great for iteration; cloud requires stateless, event-driven design.

Lambda Package Size

Challenge: Initial package with mplfinance exceeded 250MB limit.

Solution:

  • Switched to QuickChart.io API (eliminated matplotlib ~100MB)
  • Used Lambda Layers (AWSSDKPandas + custom)
  • Final package: ~10MB

Learning: External APIs can replace heavy dependencies; layers enable modular dependency management.

Database Costs

Challenge: RDS consumed ~$10/month free tier credits for only ~350 records.

Solution: Eventually migrated to BigQuery (serverless, pay-per-query).

Learning: Managed databases have hidden costs. Evaluate serverless alternatives for cost-sensitive projects.


Project Evolution

Week 1: Local Python script with SQLite
Week 2: Migrated to AWS (RDS + Lambda + EventBridge)
Weeks 3-4: Optimizations (layers, QuickChart.io, monitoring)
Month 2+: Hit limits, rebuilt as Airflow + dbt + BigQuery

Key Insight: This progression taught me when to use serverless (lightweight workloads) vs. orchestration tools (complex dependencies, large-scale processing).


Why I Migrated to Airflow

Limitations encountered:

  1. Observability - CloudWatch logs scattered, no pipeline visualization
  2. Orchestration - EventBridge can't handle complex dependencies
  3. Cost - RDS expensive for small datasets
  4. Scalability - Expanding to Russell 3000 would be prohibitively expensive
  5. Development - Lambda packaging and env var management was tedious

New pipeline: 5.3M+ records, better observability, lower costs, easier development.


Lessons Learned

What Worked:

  • Event-driven architecture with clean separation of concerns
  • Polygon.io's unlimited daily calls vs. competitors' strict limits
  • Retry logic prevented silent failures
  • Local prototyping with SQLite enabled fast iteration

What I'd Improve:

  • Use Lambda Layers from day one
  • Add CloudWatch alarms for monitoring
  • Implement data quality tests
  • Start with Airflow for complex orchestration

Skills Gained:

  • Serverless architecture (Lambda, EventBridge)
  • Managed databases (RDS PostgreSQL)
  • ETL design patterns
  • API integration with error handling
  • Event-driven systems

Cost Breakdown

Monthly (AWS Free Tier):

  • Lambda: $0 (~200 invocations/month)
  • RDS: ~$10 in credits (storage + compute + backups)
  • EventBridge: $0 (always free)

Total: ~$10/month


License

MIT License - see LICENSE


Contact

Trent Moore
GitHub: @tmoore-prog
LinkedIn: https://www.linkedin.com/in/trent-moore-data-engineer Email: trent.moore.prog@gmail.com


Local prototype → Cloud migration → Production system. Built with curiosity, debugged with persistence, deprecated with wisdom.

About

Serverless ETL pipeline tracking stock data with AWS Lambda, RDS, and EventBridge. My first cloud data engineering project - later evolved into Airflow + dbt pipeline.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages