add deployment and observability scaffold#1
Conversation
ansible/: systemd unit, env config, rolling deploy playbook, Vector→Axiom log forwarding, per-region inventory files (3 USW, 2 EUW, 2 EUC, staging) grafana/: dashboard with clone rate, cache hit rate, latency, upstream fetches, NVMe usage, per-repo traffic, errors. Alert rules for node health, NVMe usage, upstream errors, clone latency, eviction rate. Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
- grafana: align metric names with internal/metrics/metrics.go (requests_total, request_seconds, sync_total, errors_total, responses_total) instead of invented names - ansible/setup: fail explicitly when /mnt/nvme is missing instead of silently skipping mirror directory creation - ansible/roll: download goreleaser tar.gz archive and extract, matching the actual release asset naming convention Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
| shell: | | ||
| set -euo pipefail | ||
| export PATH=/usr/local/go/bin:$PATH | ||
| cd /tmp/smart-git-proxy-build | ||
| make build | ||
| args: | ||
| executable: /bin/bash | ||
| when: branch != "" |
There was a problem hiding this comment.
🔴 make build fails on remote servers because mise is not installed
The Makefile defines GO ?= mise exec -- go (Makefile:1), which only uses the environment's GO if it's already set. The Ansible build step (roll.yaml:48-54) exports PATH=/usr/local/go/bin:$PATH but never sets GO=go, so make build expands to mise exec -- go build .... Since mise is not installed on remote deployment targets, the build will always fail with a command-not-found error.
| shell: | | |
| set -euo pipefail | |
| export PATH=/usr/local/go/bin:$PATH | |
| cd /tmp/smart-git-proxy-build | |
| make build | |
| args: | |
| executable: /bin/bash | |
| when: branch != "" | |
| - name: Build from branch | |
| shell: | | |
| set -euo pipefail | |
| export PATH=/usr/local/go/bin:$PATH | |
| export GO=go | |
| cd /tmp/smart-git-proxy-build | |
| make build | |
| args: | |
| executable: /bin/bash | |
| when: branch != "" | |
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Fixed in aa31b37. Added export GO=go before make build so it bypasses the mise exec -- go default in the Makefile.
| shell: | | ||
| set -euo pipefail | ||
| mkdir -p /tmp/smart-git-proxy-release | ||
| tar -xzf /tmp/smart-git-proxy-release.tar.gz -C /tmp/smart-git-proxy-release | ||
| install -m 0755 /tmp/smart-git-proxy-release/smart-git-proxy /usr/local/bin/smart-git-proxy |
There was a problem hiding this comment.
🔴 Service unconditionally restarted even when no new binary is deployed
The "Restart smart-git-proxy" task at lines 81-85 has no when condition. If roll.yaml is invoked without -e branch=... or -e release_tag=..., both variables default to "", all binary-install steps are skipped, but the service is still restarted — causing unnecessary downtime on production nodes during a rolling deploy (serial: 1).
| shell: | | |
| set -euo pipefail | |
| mkdir -p /tmp/smart-git-proxy-release | |
| tar -xzf /tmp/smart-git-proxy-release.tar.gz -C /tmp/smart-git-proxy-release | |
| install -m 0755 /tmp/smart-git-proxy-release/smart-git-proxy /usr/local/bin/smart-git-proxy | |
| - name: Restart smart-git-proxy | |
| systemd: | |
| name: smart-git-proxy | |
| state: restarted | |
| enabled: yes | |
| when: branch != "" or release_tag != "" | |
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Fixed in aa31b37. Rather than a conditional when, the playbook now fails early if neither branch nor release_tag is provided — since running roll.yaml without deploying anything is always a mistake.
| # Encrypted with ansible-vault. | ||
| # Contains: github_token, axiom_token | ||
| # To edit: ansible-vault edit secrets.yml |
There was a problem hiding this comment.
🚩 secrets.yml is a plaintext placeholder committed to the repo
The ansible/secrets.yml file is committed as a plaintext comment-only placeholder. While it currently contains no actual secrets (just comments explaining what should go there), the intent per ansible/README.md:58-60 is for this to be an ansible-vault encrypted file. Having a plaintext placeholder in the repo means someone could accidentally add real secrets to it without encrypting first. Consider adding it to .gitignore or committing an actually-encrypted empty vault file instead.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Fair point. This is intentionally a placeholder — the file needs to be replaced with an actual ansible-vault encrypted file before first use. Adding to .gitignore would mean operators need to create it from scratch without a template. The README documents the expected contents and ansible-vault edit workflow. Happy to add a .gitignore entry if preferred.
| "id": 2, | ||
| "targets": [ | ||
| { | ||
| "expr": "sum(rate(smart_git_proxy_requests_total{instance=~\"$instance\"}[$__rate_interval])) by (kind)", |
There was a problem hiding this comment.
🚩 Grafana dashboard JSON has a truncated PromQL expression
At line 57 of the dashboard JSON, the Cache Hit Rate panel's expr field is truncated: it ends with *... followed by [truncated]. This appears to be a diff rendering artifact — if the actual file on disk is also truncated, the dashboard import will fail or show a broken panel. Worth verifying the actual file content.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
This is a diff rendering artifact — the actual file on disk is not truncated. The panel no longer exists in the latest commit anyway (replaced with "Requests by Kind" which uses the real smart_git_proxy_requests_total metric).
| type: axiom | ||
| inputs: | ||
| - add_metadata | ||
| token: "{{ axiom_token }}" |
There was a problem hiding this comment.
🚩 Vector config template exposes axiom_token in plaintext on disk
The Vector config at ansible/templates/vector.yaml.j2:33 renders {{ axiom_token }} directly into /etc/vector/vector.yaml with mode 0644 (world-readable, set at ansible/tasks/setup_axiom.yaml:42). This means the Axiom API token is readable by any user on the system. Consider tightening the file permissions to 0600 or using a systemd environment variable approach similar to how HOSTNAME is handled.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Fixed in aa31b37. Changed vector.yaml file mode from 0644 to 0600.
…, vector perms - roll.yaml: export GO=go before make build (Makefile defaults to mise exec -- go which isn't on deploy targets) - roll.yaml: fail early if neither branch nor release_tag provided, preventing needless service restart - grafana: group histogram_quantile by (le, instance) to avoid merging latencies across nodes - setup_axiom: tighten vector.yaml to 0600 (contains axiom_token) Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit aa31b37. Configure here.
| # Goreleaser publishes archives as: smart-git-proxy_<version>_linux_amd64.tar.gz | ||
| - name: Download release archive | ||
| get_url: | ||
| url: "https://github.com/useblacksmith/smart-git-proxy/releases/download/{{ release_tag }}/smart-git-proxy_{{ release_tag | regex_replace('^v', '') }}_linux_amd64.tar.gz" |
There was a problem hiding this comment.
Wrong GitHub release download URL
High Severity
The release deploy path downloads tarballs from useblacksmith/smart-git-proxy, but GoReleaser in this repo publishes GitHub releases to runs-on/smart-git-proxy. A -e release_tag=... roll therefore requests assets from a repo that does not receive those releases, so the download step fails or never finds the expected archive.
Reviewed by Cursor Bugbot for commit aa31b37. Configure here.
There was a problem hiding this comment.
This is by design — releases will be published to useblacksmith/smart-git-proxy (this fork), not the upstream runs-on/smart-git-proxy. The goreleaser config in this repo will publish to this repo's releases page once configured.
| - name: Ensure ENV is determined | ||
| fail: | ||
| msg: "Failed to determine environment from inventory file name." | ||
| when: ENV == '' |
There was a problem hiding this comment.
Environment tied to inventory filename
Medium Severity
Setup and roll playbooks derive ENV only from whether the inventory filename contains production or staging, then fail if neither substring matches. Inventory files already define env under [all:vars], so a valid inventory with a different filename fails even when env is set correctly for Vector metadata.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit aa31b37. Configure here.
There was a problem hiding this comment.
Intentional — all inventory files follow the production-*.ini / staging-*.ini naming convention, so the substring check is reliable. Using the inventory-level env var would work too, but the filename approach matches how the storage-agent playbooks handle it and provides an extra safety net (you can't accidentally run a production playbook against a staging inventory even if env is misconfigured).
Ensures apt cache is refreshed after adding the Vector repo so the package is discoverable on first run. Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>


Summary
Scaffolds the deployment and observability infrastructure for the git mirror proxy cluster (7 nodes: 3 USW, 2 EUW, 2 EUC), modeled after
storage-agent's Ansible/Grafana setup.ansible/— Deployment automation:setup.yaml: One-time node setup (systemd unit, NVMe mirror dir, env config, Vector→Axiom logging)roll.yaml: Rolling deploy withserial: 1— build from branch or download release binary, restart, health-check before moving to next nodetemplates/: systemd unit (SIGTERM graceful shutdown,EnvironmentFile), env config (MIRROR_DIR,MIRROR_MAX_SIZE=80%,SYNC_STALE_AFTER=2s,AUTH_MODE=pass-through), Vector config (journald → Axiomsmart-git-proxydataset)grafana/— Monitoring:Related: design doc PR
Link to Devin session: https://app.devin.ai/sessions/9193bf03dfa14cf684447384ef17a8db
Need help on this PR? Tag
/codesmithwith what you need. Autofix is disabled.Need help on this PR? Tag
/codesmithwith what you need. Autofix is disabled. (Staging)