fix(deploy): remove startCommand from railway.json to restore prod#310
Conversation
Railway changed how startCommand is evaluated; the $PORT variable was being passed as the literal string instead of being shell-expanded, causing uvicorn to crash on every boot and the /health probe to time out across 11 attempts. The Dockerfile's built-in CMD already binds to the EXPOSE'd port with --proxy-headers, so removing the override restores boot. Same railway.json shipped fine for PR OpenCodeIntel#293 two months ago, and no runtime code, Dockerfile, or requirements changed between OpenCodeIntel#293 and the failing OpenCodeIntel#302 deploy (only docs touched). Root cause is a Railway platform behavior change. Hotfix: skipped /oci-design gate (Phase 1F warn) because prod is fully down. Backfilling an ADR or dogfood finding after recovery.
|
@DevanshuNEU is attempting to deploy a commit to the Dev's projects Team on Vercel. A member of the Team first needs to authorize it. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
💤 Files with no reviewable changes (1)
📝 WalkthroughWalkthroughThe PR removes the ChangesDeployment Configuration
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
Summary
Production is down.
api.opencodeintel.comis timing out on every request. The Railway healthcheck failure on PR #302 took the live replica with it; subsequent retries cannot boot.Root cause: Railway changed how
startCommandinrailway.jsonis evaluated. The$PORTvariable is now being passed as the literal string"$PORT"to uvicorn instead of being shell-expanded, so uvicorn crashes on every start attempt. The healthcheck at/healththen fails all 11 attempts over 5 minutes because the server never comes up.The same
railway.jsondeployed successfully for PR #293 two months ago. A full diff of every file between PR #293 (last working) and PR #302 (failed) shows zero runtime changes - only docs andbackend/CLAUDE.mdtouched. No Dockerfile, requirements.txt, or backend code changed. This is a platform-side behavior change, not a code regression.Fix
Remove the
startCommandline fromrailway.json. The Dockerfile's built-in CMD already handles boot correctly:Bonus side effect: this also restores
--proxy-headerswhich the deletedstartCommandwas silently dropping (the comment inbackend/Dockerfile:32-33flagged this for the proxy IP allowlist use case).Risk
Low. Failed deploys do not take prod down further than it already is - it is already fully offline. If this fix is wrong, the deploy fails again and we stay where we are. If it is right, prod comes back on the latest main.
Process note
This hotfix skipped the
/oci-designADR gate (Phase 1F warn-only hook fired). Justification: production restoration is higher priority than the design-gate process. Will backfill an ADR ordogfood-findingentry after prod recovers documenting the bypass and the Railway platform behavior change.Test plan
curl https://api.opencodeintel.com/healthreturns 200Summary by CodeRabbit