Skip to content

fix: add resource limits, PodDisruptionBudgets, and backup error handling#427

Open
Flegma wants to merge 3 commits intomainfrom
audit/415-prod-readiness
Open

fix: add resource limits, PodDisruptionBudgets, and backup error handling#427
Flegma wants to merge 3 commits intomainfrom
audit/415-prod-readiness

Conversation

@Flegma
Copy link
Copy Markdown
Contributor

@Flegma Flegma commented Apr 8, 2026

Summary

  • Add resource requests/limits to all deployments and statefulsets (API, Web, Redis, Hasura, MinIO, Typesense, TimescaleDB)
  • Create PodDisruptionBudgets for API, Hasura, and TimescaleDB to protect against involuntary disruptions
  • Fix backup CronJob: remove || true from apk add, add pg_dump output validation, add S3 upload error checking

Addresses #415 and #416

Test plan

  • Verify all pods start within resource limits
  • Verify PDBs are created and prevent draining below minAvailable
  • Verify backup CronJob fails properly on pg_dump or S3 upload errors
  • Confirm kustomize build succeeds with new PDB resources

Flegma added 2 commits April 8, 2026 14:21
Change PDBs from minAvailable:1 to maxUnavailable:1 so single-replica
workloads don't block node drains and cluster upgrades. Bump API and
Hasura memory limits from 512Mi to 1Gi and CPU from 500m to 1000m to
handle NestJS+BullMQ+WebSocket and Hasura subscription load.
memory: "256Mi"
cpu: "250m"
limits:
memory: "1Gi"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for larger installs no. do max 4gb

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bumped API memory limit to 4Gi in ffa968e.

memory: "256Mi"
cpu: "250m"
limits:
memory: "1Gi"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

way too little , in a production instance i have 4 GB. lets lmit to 4gb

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bumped Hasura memory limit to 4Gi in ffa968e.

@lukepolo
Copy link
Copy Markdown
Contributor

lukepolo commented Apr 9, 2026

also not a fan of CPU limits. we are not concerned enough here to starve these . memory limits on thse si mostly too low.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants