-
Notifications
You must be signed in to change notification settings - Fork 23
Description
Hello!
I was looking through the self-hosted infrastructure setup in the legacy/terraform/aws/ folder to understand the baseline AWS deployment. Since this is often used by teams spinning up MLOps infrastructure at scale, I decided to run it through a static analysis tool my team is developing (InfraScan) to see if there are any opportunities to tighten the out-of-the-box cost and security defaults.
Overall, the infrastructure looks solid, but the scanner highlighted a few quick-wins:
1. Expensive NAT Gateway
The aws_nat_gateway.wandb resource (infra.tf:131) is provisioned by default.
- Impact: NAT Gateways, plus data processing fees. In an MLOps environment where massive datasets or models might be pulled/pushed, this data processing fee can silently explode.
- Suggestion: It might be worth documenting this cost heavily or providing an option to use VPC endpoints for S3/ECR to bypass the NAT Gateway for heavy data transfers.
2. Missing S3 Lifecycle Policy
The aws_s3_bucket.file_storage (infra.tf:551) lacks a lifecycle configuration.
- Impact: W&B artifacts, model checkpoints, and logs can accumulate into terabytes very quickly. Without a lifecycle rule transitioning old data to cheaper storage tiers (like Glacier) or expiring it, self-hosted users will see their S3 costs creep up indefinitely.
3. No AWS Budget
- Impact: Adding an optional
aws_budgets_budgetresource would act as a great financial safety net for teams evaluating the self-hosted version, preventing accidental overspending on compute or storage.
If you'd like to trace these findings down to the specific files and lines of code, the full interactive report is available here:
👉 View Full InfraScan Report for wandb/server
(Full disclosure: the link above is generated by our tool, but I manually reviewed the findings to make sure I'm only suggesting things that actually make sense for your specific use case).
If you're open to it, I’d be happy to submit a quick PR. Let me know what you think!