Deployment: Production CD for guardrails by nishika26 · Pull Request #100 · ProjectTech4DevAI/kaapi-guardrails

nishika26 · 2026-04-30T04:20:44Z

Summary

Target issue is #103

Notes

Chores
- Added production deployment automation supporting version-tagged releases and manual dispatch; includes orchestration of services, remote instance deployment, repository synchronization, Docker image management, and automated database migrations to streamline production releases.
Bug Fixes
- Validator configuration processing updated to exclude the name field from individual validator definitions during API schema normalization.

Summary by CodeRabbit

Deployment
- Added automated production deployment pipeline that triggers on version releases, handling Docker image builds and database migrations.
Configuration
- Updated validator configuration processing to refine how validators are normalized in API configurations.

coderabbitai · 2026-04-30T04:20:54Z

📝 Walkthrough

Walkthrough

Adds a GitHub Actions CD workflow that deploys tagged releases to an EC2 instance via SSM (assumes an AWS OIDC role, runs a shell script to fetch code, build Docker Compose, run Alembic migrations, start services, and prune images) and a polling loop to wait for SSM command completion. Updates validator normalization to drop the name field from validator objects.

Changes

Production CD Workflow

Layer / File(s)	Summary
Workflow Definition `.github/workflows/cd-production.yml`	New workflow triggered on tag pushes `v*`; sets concurrency and run permissions.
Permissions / AWS Setup `.github/workflows/cd-production.yml`	Workflow-level permissions include `id-token: write` and `contents: read`; uses `aws-actions/configure-aws-credentials@v6` to assume role via OIDC with `EC2_DEPLOY_ROLE_ARN` and `AWS_REGION`.
SSM Command Construction `.github/workflows/cd-production.yml`	Sends `AWS-RunShellScript` to `EC2_INSTANCE_ID` targeting `BUILD_DIRECTORY` and `${{ github.sha }}`; captures `CommandId` to outputs.
Remote Script Actions `.github/workflows/cd-production.yml`	Remote script: `set -eux`, `chown` build dir, `git fetch --tags`, `git checkout` to exact SHA, build Docker Compose images, run `uv run alembic upgrade head` inside `backend`, `docker compose up -d --remove-orphans`, and prune images.
Polling / Completion `.github/workflows/cd-production.yml`	Polls `aws ssm get-command-invocation` up to 20 attempts (15s sleep) for `Success`; on success prints invocation JSON and exits 0; on `Failed`/`Cancelled`/`TimedOut` or timeout prints invocation JSON and exits 1.

Validator Normalization

Layer / File(s)	Summary
Normalization Logic `backend/app/schemas/guardrail_config.py`	`normalize_validators_from_config_api` appends `"name"` to `drop_fields`, removing validator-level `name` from normalized output.
Behavior Preservation `backend/app/schemas/guardrail_config.py`	Other normalization behavior unchanged, including mapping/handling of `on_fail_action`.

Sequence Diagram

sequenceDiagram
    participant GH as "GitHub Actions"
    participant AWS as "AWS (STS / SSM)"
    participant EC2 as "EC2 Instance (SSM Agent)"
    participant Backend as "Backend Service (Docker Compose)"
    participant DB as "Database"

    GH->>AWS: Assume role via OIDC
    AWS-->>GH: Temporary credentials
    GH->>AWS: SendCommand (AWS-RunShellScript) -> target EC2_INSTANCE_ID, script (checkout, build, alembic, up, prune)
    AWS->>EC2: Deliver SSM command
    EC2->>EC2: Execute script (chown, checkout, build images)
    EC2->>Backend: Start containers via docker compose
    Backend->>DB: Run Alembic migrations (uv run alembic upgrade head)
    EC2->>AWS: Command execution result (stdout/stderr, status)
    loop Poll until completion
      GH->>AWS: get-command-invocation(CommandId)
      AWS-->>GH: status (Pending/Success/Failed/...)
    end
    GH-->>GH: On Success -> print output & exit 0
    GH-->>GH: On Failure/Timeout -> print invocation JSON & exit 1

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰
A tag hops in and the action wakes,
SSM whispers, the EC2 bakes.
Images bloom and migrations stride,
Validators shed a name with pride.
Deploys curl home — a carrot-sized ride.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Deployment: Production CD for guardrails' is directly related to the main change—a production CD workflow for guardrails. However, it is somewhat generic and doesn't specify key details like the EC2 deployment mechanism or version-tagged triggers.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch enhancement/cd_production

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 60 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/app/schemas/guardrail_config.py (1)

78-88: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove duplicate "name" in drop_fields set (lint failure risk).

"name" appears twice in the same set (Line 80 and Line 87). This is redundant and triggers Ruff B033.

Suggested fix

         drop_fields = {
             "id",
             "name",
             "organization_id",
             "project_id",
             "stage",
             "is_enabled",
             "created_at",
             "updated_at",
-            "name",
         }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/schemas/guardrail_config.py` around lines 78 - 88, The
drop_fields set in guardrail_config.py contains a duplicate entry for "name"
which is redundant and triggers lint B033; edit the drop_fields set (the
variable named drop_fields) to remove the duplicate "name" so each field appears
only once.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/cd-production.yml:
- Around line 16-17: The workflow is currently bound to staging-protected
environment and instance secrets (e.g., the environment key
"AWS_STAGING_ENV_SECRETS" and the staging instance secret referenced later);
update all staging-scoped references to the production equivalents so the CD job
targets production infrastructure — replace "AWS_STAGING_ENV_SECRETS" with the
production environment secret name and swap any staging instance secret names
(the ones used around lines 31-33 and 47-49) for their production instance
secret names so the release tag deploys to production.
- Around line 34-40: The deployment currently invokes Alembic directly and
bypasses the project's prestart flow; instead invoke the documented prestart
profile so backend/scripts/prestart.sh (which runs backend_pre_start.py and
initial_data.py) executes. Replace the direct alembic run in the remote command
sequence with the compose-based prestart invocation: after building (docker
compose build) run the service using the prestart profile (docker compose run
--rm --profile prestart backend) so the prestart script runs, then bring
services up (docker compose up -d --remove-orphans) and prune images as before;
update the command sequence around the docker compose build / run / up steps
accordingly.
- Around line 34-40: The deployment script currently hard-resets the remote
checkout to origin/main (the commands array contains git reset --hard
origin/main) which ignores the release tag; change the remote git commands
inside the AWS SSM send-command parameters so the instance checks out the exact
release tag that triggered the workflow (e.g., fetch tags and git checkout the
provided release/tag variable) instead of resetting to origin/main; update the
invocation that builds the commands string (the parameters entry in the
send-command call that uses BUILD_DIRECTORY and AWS-RunShellScript) to use the
workflow-provided tag variable (or GITHUB_REF/GITHUB_SHA passed into the runner)
and ensure quoting/escaping is preserved.

---

Outside diff comments:
In `@backend/app/schemas/guardrail_config.py`:
- Around line 78-88: The drop_fields set in guardrail_config.py contains a
duplicate entry for "name" which is redundant and triggers lint B033; edit the
drop_fields set (the variable named drop_fields) to remove the duplicate "name"
so each field appears only once.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b3324244-b54a-4600-a7cf-0b8df45b36b0

📥 Commits

Reviewing files that changed from the base of the PR and between 401e397 and f892fd3.

📒 Files selected for processing (2)

.github/workflows/cd-production.yml
backend/app/schemas/guardrail_config.py

Ayush8923 · 2026-04-30T05:05:30Z

+jobs:
+  deploy:
+    runs-on: ubuntu-latest
+    environment: AWS_STAGING_ENV_SECRETS


Suggested change

environment: AWS_STAGING_ENV_SECRETS

environment: AWS_ENV_SECRETS

Ayush8923 · 2026-04-30T05:06:17Z

+
+      - name: Wait for SSM command to finish
+        env:
+          INSTANCE_ID: ${{ secrets.STAGING_EC2_INSTANCE_ID }}


Suggested change

INSTANCE_ID: ${{ secrets.STAGING_EC2_INSTANCE_ID }}

INSTANCE_ID: ${{ secrets.EC2_INSTANCE_ID }}

Ayush8923 · 2026-04-30T05:06:27Z

+      - name: Trigger deploy on EC2 via SSM
+        id: ssm
+        env:
+          INSTANCE_ID: ${{ secrets.STAGING_EC2_INSTANCE_ID }}


Suggested change

INSTANCE_ID: ${{ secrets.STAGING_EC2_INSTANCE_ID }}

INSTANCE_ID: ${{ secrets. EC2_INSTANCE_ID }}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/cd-production.yml:
- Around line 51-85: The polling loop uses a fixed 20-iteration limit (for i in
{1..20}) which gives only 300s total and can time out during cold builds; update
the workflow to increase the polling budget (e.g., raise the iteration count or
make the sleep/iterations configurable via inputs/env) and align it with the SSM
command timeout so CMD_ID/INSTANCE_ID polling waits long enough for long docker
compose builds; adjust the loop controlling symbols (for i in {1..20}, sleep 15)
and expose a configurable variable for total timeout or number of retries used
when checking aws ssm get-command-invocation so the workflow can be bumped for
production cold builds.
- Around line 3-7: The workflow currently allows manual runs via
workflow_dispatch which can bypass the tag-only deploy gate; update the
deployment job/steps that perform the production deploy (the steps triggered by
on: workflow_dispatch) to only proceed when the run target is a tag by adding a
conditional check like if: github.ref_type == 'tag' on the job or critical
steps, or change workflow_dispatch to require an explicit ref input (e.g.,
inputs: ref) and validate that input is a tag before running the deploy; look
for the workflow_dispatch block and the production deploy job/steps to apply the
if check or input validation.
- Line 39: Update the commands array used in the CD workflow where the shell
string currently runs "docker compose up -d --remove-orphans" so it uses "docker
compose up --wait --remove-orphans" (you may also add "--wait-timeout <seconds>"
if you want an upper bound). Change the literal in the parameter passed to
--parameters commands to replace the "-d" form with the "--wait" form so the
step will block until container healthchecks (e.g., the backend healthcheck
defined in docker-compose) report healthy before the script exits.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6e765c79-6adb-43f9-bee0-77aa5ba9cf2b

📥 Commits

Reviewing files that changed from the base of the PR and between f892fd3 and 9e8c080.

📒 Files selected for processing (1)

.github/workflows/cd-production.yml

Ayush8923 · 2026-05-04T09:49:28Z

+  push:
+    tags:
+      - "v*" # Deploy only when tags like v1.0.0, v2.1.0, etc., are created
+  workflow_dispatch:


We don;t need this workflow_dispatch tag, this flow is automatic trigger when the release tag added.

Ayush8923 · 2026-05-04T09:49:59Z

+            --instance-ids "$INSTANCE_ID" \
+            --document-name "AWS-RunShellScript" \
+            --comment "Deploy kaapi-guardrails production" \
+            --parameters commands="[\"set -eux\",\"sudo chown -R ec2-user:ec2-user ${BUILD_DIRECTORY}\",\"sudo -iu ec2-user bash -lc \\\"cd ${BUILD_DIRECTORY} && git fetch --tags origin && git checkout --force ${{ github.sha }} && docker compose build && docker compose run --rm --entrypoint \\\\\\\"\\\\\\\" backend uv run alembic upgrade head && docker compose up -d --remove-orphans && docker image prune -f\\\"\"]" \


Remove these unwanted revert back slash, and write the cleans command here.

coderabbitai

♻️ Duplicate comments (2)

.github/workflows/cd-production.yml (2)

34-34: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use the repo’s prestart path and block until services are actually ready.

At Line 34, the command still runs Alembic directly and then uses detached startup. That bypasses the prestart service flow defined in docker-compose.yml (which runs backend/scripts/prestart.sh) and can mark deployment successful before service health is confirmed.

Suggested update

-          DEPLOY_CMD="cd ${BUILD_DIRECTORY} && git fetch --tags origin && git checkout --force ${{ github.sha }} && docker compose build && docker compose run --rm --entrypoint '' backend uv run alembic upgrade head && docker compose up -d --remove-orphans && docker image prune -f"
+          DEPLOY_CMD="cd ${BUILD_DIRECTORY} && git fetch --tags origin && git checkout --force ${{ github.sha }} && docker compose build && docker compose --profile prestart up prestart && docker compose up --wait --wait-timeout 300 --remove-orphans && docker image prune -f"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/workflows/cd-production.yml at line 34, Replace the one-liner
DEPLOY_CMD that runs Alembic directly and starts services detached with a flow
that invokes the repo's prestart path and blocks until services are healthy:
update the DEPLOY_CMD to fetch and checkout the commit as before, then use
docker compose to run the prestart flow (the service that executes
backend/scripts/prestart.sh as defined in docker-compose.yml) instead of running
Alembic directly, and bring the compose stack up in a foreground/blocking way
(or include an explicit health-check wait loop) so the job only finishes when
services report healthy; reference DEPLOY_CMD, backend/scripts/prestart.sh and
the docker-compose prestart service when making the change.

52-86: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Increase the polling window to match real production deploy duration.

At Line 52, 20 * 15s = 300s is a tight cap for cold builds/migrations and can fail the workflow while SSM is still progressing. Make timeout configurable and align it with your expected deploy envelope.

Suggested update

-          for i in {1..20}; do
+          MAX_WAIT_SECONDS=1800
+          POLL_INTERVAL_SECONDS=15
+          MAX_POLLS=$((MAX_WAIT_SECONDS / POLL_INTERVAL_SECONDS))
+          for i in $(seq 1 "$MAX_POLLS"); do
@@
-            sleep 15
+            sleep "$POLL_INTERVAL_SECONDS"
           done

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/workflows/cd-production.yml around lines 52 - 86, The current SSM
polling loop uses a hardcoded 20 iterations with sleep 15 ("for i in {1..20}"
and "sleep 15"), which caps the wait to ~300s and can expire during cold
builds/migrations; make the timeout configurable by replacing the fixed
loop/count with a parameter (e.g. POLL_RETRIES or POLL_TIMEOUT) and a
configurable interval (e.g. POLL_INTERVAL) that are read from environment or
workflow inputs, then use those variables instead of the literal "1..20" and
"sleep 15" while still checking CMD_ID/INSTANCE_ID statuses and returning the
same success/failure behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In @.github/workflows/cd-production.yml:
- Line 34: Replace the one-liner DEPLOY_CMD that runs Alembic directly and
starts services detached with a flow that invokes the repo's prestart path and
blocks until services are healthy: update the DEPLOY_CMD to fetch and checkout
the commit as before, then use docker compose to run the prestart flow (the
service that executes backend/scripts/prestart.sh as defined in
docker-compose.yml) instead of running Alembic directly, and bring the compose
stack up in a foreground/blocking way (or include an explicit health-check wait
loop) so the job only finishes when services report healthy; reference
DEPLOY_CMD, backend/scripts/prestart.sh and the docker-compose prestart service
when making the change.
- Around line 52-86: The current SSM polling loop uses a hardcoded 20 iterations
with sleep 15 ("for i in {1..20}" and "sleep 15"), which caps the wait to ~300s
and can expire during cold builds/migrations; make the timeout configurable by
replacing the fixed loop/count with a parameter (e.g. POLL_RETRIES or
POLL_TIMEOUT) and a configurable interval (e.g. POLL_INTERVAL) that are read
from environment or workflow inputs, then use those variables instead of the
literal "1..20" and "sleep 15" while still checking CMD_ID/INSTANCE_ID statuses
and returning the same success/failure behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a8069f6c-57de-40b4-bda1-6c95b7a2676d

📥 Commits

Reviewing files that changed from the base of the PR and between 9e8c080 and d55f8f7.

📒 Files selected for processing (1)

.github/workflows/cd-production.yml

nishika26 added 2 commits April 30, 2026 09:12

hotfix: drop name field in run guardrail endpoint

834f047

Deployment: adding CD for production

f892fd3

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread .github/workflows/cd-production.yml Outdated

Comment thread .github/workflows/cd-production.yml

Ayush8923 requested changes Apr 30, 2026

View reviewed changes

pr review comments

9e8c080

coderabbitai Bot reviewed May 4, 2026

View reviewed changes

Comment thread .github/workflows/cd-production.yml Outdated

Comment thread .github/workflows/cd-production.yml Outdated

Comment thread .github/workflows/cd-production.yml

nishika26 linked an issue May 4, 2026 that may be closed by this pull request

CI/CD: Add Production CD pipeline #103

Closed

nishika26 self-assigned this May 4, 2026

nishika26 added enhancement New feature or request labels May 4, 2026

Ayush8923 requested changes May 4, 2026

View reviewed changes

pr review comments

d55f8f7

Ayush8923 approved these changes May 4, 2026

View reviewed changes

coderabbitai Bot reviewed May 4, 2026

View reviewed changes

Merge branch 'main' into enhancement/cd_production

97806e6

nishika26 merged commit 52de1e2 into main May 4, 2026
2 checks passed

nishika26 deleted the enhancement/cd_production branch May 4, 2026 12:54

	environment: AWS_STAGING_ENV_SECRETS
	environment: AWS_ENV_SECRETS

	INSTANCE_ID: ${{ secrets.STAGING_EC2_INSTANCE_ID }}
	INSTANCE_ID: ${{ secrets.EC2_INSTANCE_ID }}

	INSTANCE_ID: ${{ secrets.STAGING_EC2_INSTANCE_ID }}
	INSTANCE_ID: ${{ secrets. EC2_INSTANCE_ID }}

Conversation

nishika26 commented Apr 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Ayush8923 Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Ayush8923 Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Ayush8923 Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Ayush8923 May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ayush8923 May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nishika26 commented Apr 30, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 30, 2026 •

edited

Loading

Ayush8923 May 4, 2026 •

edited

Loading

Ayush8923 May 4, 2026 •

edited

Loading