notifications (4/5): Grafana-backed service, handlers, and server wiring#455
notifications (4/5): Grafana-backed service, handlers, and server wiring#455illegalprime wants to merge 3 commits into
Conversation
🔐 Codex Security Review
Review SummaryOverall Risk: HIGH Findings[HIGH] Notification destination SSRF validation is not enforced at send time
[MEDIUM] Grafana admin password is exposed in process arguments during token provisioning
NotesNo pool configuration, miner command execution, plugin execution, frontend rendering, protobuf wire-format changes, or nmap/discovery code was changed in Generated by Codex Security Review | |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8c55e88d7a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6cdd7d2724
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
507a4e5 to
3bd492f
Compare
The prod overlay left FLEET_METRICS_GRAFANA_PASSWORD empty so fleet-api never held the Grafana admin credential, but that meant a default deployment had no way to authenticate to Grafana and every Grafana-proxying notification RPC 401'd while the stack came up healthy. Instead of defaulting fleet-api to the admin password, mint a least-privilege Grafana service account (Editor on org 1) and token post-boot in run-fleet.sh, persist it to .env, and recreate fleet-api so it authenticates with the scoped token. The basic-auth password stays empty by default. Idempotent across re-runs. Addresses Codex security review finding (MEDIUM) on #455. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
6cdd7d2 to
6655a15
Compare
|
Codex findings on this PR:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6655a150e3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
3bd492f to
66e55db
Compare
The prod overlay left FLEET_METRICS_GRAFANA_PASSWORD empty so fleet-api never held the Grafana admin credential, but that meant a default deployment had no way to authenticate to Grafana and every Grafana-proxying notification RPC 401'd while the stack came up healthy. Instead of defaulting fleet-api to the admin password, mint a least-privilege Grafana service account (Editor on org 1) and token post-boot in run-fleet.sh, persist it to .env, and recreate fleet-api so it authenticates with the scoped token. The basic-auth password stays empty by default. Idempotent across re-runs. Addresses Codex security review finding (MEDIUM) on #455. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
6655a15 to
753a5b8
Compare
…Ls in parse errors Two follow-ups from review: - run-fleet.sh swallowed the post-token fleet-api restart (>/dev/null 2>&1 || true), so a failed recreate left fleet-api on the pre-token env (still 401ing) while the script reported success. Check the result and fail loudly with manual-restart instructions. - checkDestinationURL returned url.Parse's error verbatim, which embeds the raw input — a webhook/Slack URL can carry a capability token. Return a generic 'not parseable' message so the secret can't leak via errors or logs. Addresses Codex security review findings (MEDIUM) on #455. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Re the re-review (the first two are now fixed in commit 75185f6):
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 75185f6639
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| "smtpHost": c.SMTP.Host, | ||
| "smtpPort": c.SMTP.Port, | ||
| "smtpUsername": c.SMTP.Username, | ||
| "fromAddress": c.SMTP.From, |
There was a problem hiding this comment.
Don't store SMTP server settings on email contact points
For SMTP channels, these per-channel smtpHost/smtpPort/credential fields are written into the Grafana email contact point, but Grafana OSS reads SMTP server configuration from [smtp]/GF_SMTP_*, not from the contact point; the Grafana email docs require SMTP to be configured in grafana.ini and list the email contact-point settings as addresses/single-email/template fields (https://grafana.com/docs/grafana/latest/alerting/configure-notifications/manage-contact-points/integrations/configure-email/). Since the bundled server/monitoring/grafana/grafana.ini has no [smtp] section, creating/testing an SMTP channel with a host/password here can appear saved while real email delivery still uses an unconfigured global SMTP transport and fails.
Useful? React with 👍 / 👎.
66e55db to
d2685fc
Compare
The prod overlay left FLEET_METRICS_GRAFANA_PASSWORD empty so fleet-api never held the Grafana admin credential, but that meant a default deployment had no way to authenticate to Grafana and every Grafana-proxying notification RPC 401'd while the stack came up healthy. Instead of defaulting fleet-api to the admin password, mint a least-privilege Grafana service account (Editor on org 1) and token post-boot in run-fleet.sh, persist it to .env, and recreate fleet-api so it authenticates with the scoped token. The basic-auth password stays empty by default. Idempotent across re-runs. Addresses Codex security review finding (MEDIUM) on #455. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…Ls in parse errors Two follow-ups from review: - run-fleet.sh swallowed the post-token fleet-api restart (>/dev/null 2>&1 || true), so a failed recreate left fleet-api on the pre-token env (still 401ing) while the script reported success. Check the result and fail loudly with manual-restart instructions. - checkDestinationURL returned url.Parse's error verbatim, which embeds the raw input — a webhook/Slack URL can carry a capability token. Return a generic 'not parseable' message so the secret can't leak via errors or logs. Addresses Codex security review findings (MEDIUM) on #455. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…K, slack kind-change, and open-ended silences - Re-add the notification services to registeredServices here (with the main.go mounts), so TestRPCContract_RegisteredServicesMatchMainMux passes; the entries were moved out of the authz PR where the services aren't mounted yet. - UpdateContactPoint: Grafana's provisioning PUT answers a 202 Ack, not the contact point. Stop decoding the (empty) body and build the UpdateChannel response from the contact point we sent, so a successful edit is no longer reported as an RPC error. - UpdateChannel: only keep the stored Slack URL when the channel was already Slack; changing kind to Slack without a fresh URL no longer grafts the prior kind's secret (e.g. a webhook URL) onto the Slack contact point. - domainSilenceToGrafana: materialize an open-ended mute's zero EndsAt to the far-future sentinel pause silences already use, so Alertmanager (which requires endsAt) accepts indefinite user silences. Addresses Codex connector findings (P1/P2) on #455. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
75185f6 to
876a9df
Compare
|
codex-connector triage (the earlier #459 reference is folded in here — tracking on this thread, not as issues): Fixed:
Won't-change (intentional):
Deferred — design-level, tracked here (no separate issue):
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 876a9df6f9
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
d2685fc to
23cd610
Compare
The prod overlay left FLEET_METRICS_GRAFANA_PASSWORD empty so fleet-api never held the Grafana admin credential, but that meant a default deployment had no way to authenticate to Grafana and every Grafana-proxying notification RPC 401'd while the stack came up healthy. Instead of defaulting fleet-api to the admin password, mint a least-privilege Grafana service account (Editor on org 1) and token post-boot in run-fleet.sh, persist it to .env, and recreate fleet-api so it authenticates with the scoped token. The basic-auth password stays empty by default. Idempotent across re-runs. Addresses Codex security review finding (MEDIUM) on #455. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…Ls in parse errors Two follow-ups from review: - run-fleet.sh swallowed the post-token fleet-api restart (>/dev/null 2>&1 || true), so a failed recreate left fleet-api on the pre-token env (still 401ing) while the script reported success. Check the result and fail loudly with manual-restart instructions. - checkDestinationURL returned url.Parse's error verbatim, which embeds the raw input — a webhook/Slack URL can carry a capability token. Return a generic 'not parseable' message so the secret can't leak via errors or logs. Addresses Codex security review findings (MEDIUM) on #455. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…K, slack kind-change, and open-ended silences - Re-add the notification services to registeredServices here (with the main.go mounts), so TestRPCContract_RegisteredServicesMatchMainMux passes; the entries were moved out of the authz PR where the services aren't mounted yet. - UpdateContactPoint: Grafana's provisioning PUT answers a 202 Ack, not the contact point. Stop decoding the (empty) body and build the UpdateChannel response from the contact point we sent, so a successful edit is no longer reported as an RPC error. - UpdateChannel: only keep the stored Slack URL when the channel was already Slack; changing kind to Slack without a fresh URL no longer grafts the prior kind's secret (e.g. a webhook URL) onto the Slack contact point. - domainSilenceToGrafana: materialize an open-ended mute's zero EndsAt to the far-future sentinel pause silences already use, so Alertmanager (which requires endsAt) accepts indefinite user silences. Addresses Codex connector findings (P1/P2) on #455. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
876a9df to
5d7cfa0
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5d7cfa0298
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
23cd610 to
8cd5d9a
Compare
The prod overlay left FLEET_METRICS_GRAFANA_PASSWORD empty so fleet-api never held the Grafana admin credential, but that meant a default deployment had no way to authenticate to Grafana and every Grafana-proxying notification RPC 401'd while the stack came up healthy. Instead of defaulting fleet-api to the admin password, mint a least-privilege Grafana service account (Editor on org 1) and token post-boot in run-fleet.sh, persist it to .env, and recreate fleet-api so it authenticates with the scoped token. The basic-auth password stays empty by default. Idempotent across re-runs. Addresses Codex security review finding (MEDIUM) on #455. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…Ls in parse errors Two follow-ups from review: - run-fleet.sh swallowed the post-token fleet-api restart (>/dev/null 2>&1 || true), so a failed recreate left fleet-api on the pre-token env (still 401ing) while the script reported success. Check the result and fail loudly with manual-restart instructions. - checkDestinationURL returned url.Parse's error verbatim, which embeds the raw input — a webhook/Slack URL can carry a capability token. Return a generic 'not parseable' message so the secret can't leak via errors or logs. Addresses Codex security review findings (MEDIUM) on #455. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…K, slack kind-change, and open-ended silences - Re-add the notification services to registeredServices here (with the main.go mounts), so TestRPCContract_RegisteredServicesMatchMainMux passes; the entries were moved out of the authz PR where the services aren't mounted yet. - UpdateContactPoint: Grafana's provisioning PUT answers a 202 Ack, not the contact point. Stop decoding the (empty) body and build the UpdateChannel response from the contact point we sent, so a successful edit is no longer reported as an RPC error. - UpdateChannel: only keep the stored Slack URL when the channel was already Slack; changing kind to Slack without a fresh URL no longer grafts the prior kind's secret (e.g. a webhook URL) onto the Slack contact point. - domainSilenceToGrafana: materialize an open-ended mute's zero EndsAt to the far-future sentinel pause silences already use, so Alertmanager (which requires endsAt) accepts indefinite user silences. Addresses Codex connector findings (P1/P2) on #455. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
5d7cfa0 to
382a0e6
Compare
382a0e6 to
ffdd5ad
Compare
8cd5d9a to
341655e
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ffdd5addbf
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
341655e to
9548ca8
Compare
ffdd5ad to
7b2fdcd
Compare
Add the Grafana-backed notifications domain service and client, the Connect handlers, and the fleetd server wiring (mounting the Channel, Rule, MaintenanceWindow, and History services). run-fleet.sh provisions a least-privilege Grafana service-account token for fleet-api. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
7b2fdcd to
69f0098
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 69f0098d5d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…ator, SSRF ranges) - UpdateChannel: only reuse the stored webhook URL when the channel was already a webhook, mirroring the Slack fix, so a kind change can't graft the prior kind's secret (e.g. a Slack URL) onto a webhook contact point. - UpdateMaintenanceWindow: carry the matched silence's CreatedBy so an update (which has no created_by) doesn't wipe the audit owner. - checkDestinationHost: reject CGNAT (100.64/10), benchmarking (198.18/15), and reserved (240/4) ranges that net.IP.IsPrivate misses. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6901a8b145
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| create_body=$(curl -fsS --max-time 10 -u "admin:${admin_pass}" \ | ||
| -H "Content-Type: application/json" \ | ||
| -d "{\"name\":\"${sa_name}\",\"role\":\"Editor\",\"isDisabled\":false}" \ |
There was a problem hiding this comment.
Mint a token with provisioning permissions
In the production --enable-beta-notifications path, this creates the fleet-api service account as Editor, but the service only talks to Grafana's Alerting Provisioning API (/api/v1/provisioning/...) for contact points and rules; Grafana's role mapping assigns fixed:alerting.provisioning:writer to Admin, not Editor (https://grafana.com/docs/grafana/latest/administration/roles-and-permissions/access-control/rbac-fixed-basic-role-definitions/). With the generated token, channel/rule management will authenticate but still get Grafana 403s in deployments enforcing those roles, so mint/update the service account with a role that includes provisioning access.
Useful? React with 👍 / 👎.
TestChannel loads the stored Grafana contact point whenever an id is set and ignores kind/config, so don't run kind conversion (which rejects the default CHANNEL_KIND_UNSPECIFIED) for id-only test requests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 85f424eea1
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| out := Channel{ | ||
| ID: cp.UID, | ||
| OrganizationID: orgID, | ||
| Name: channelDisplayName(orgID, cp.Name), | ||
| } |
There was a problem hiding this comment.
Preserve channel timestamps when round-tripping Grafana contact points
When channels are created or listed, the only state reconstructed from Grafana is the contact point payload, and this initializes Channel without CreatedAt or UpdatedAt. Since CreateChannel and UpdateChannel both return the result of contactPointToChannel, and ListChannels sorts by CreatedAt, every Grafana-backed channel comes back with the Go zero timestamp rather than the time set just before the write; clients will display bogus dates and ordering is effectively meaningless unless these fields are persisted in settings or copied into the response.
Useful? React with 👍 / 👎.
Stack 4/5 — base:
eden/notifications-3-authz(#454)1. Summary
This PR makes the Notifications API real: operators can create and test alert-delivery channels (webhook / SMTP / Slack), browse the org's provisioned alert rules and pause/resume them, and schedule maintenance windows that mute alerts for a rule, group, site, or set of devices.
fleet-apiimplements all fournotifications.v1Connect services as a thin, org-scoped proxy in front of a Grafana sidecar — translating each RPC into Grafana HTTP API calls and persisting all state in Grafana, with no new notification tables of its own. Channel mutations are marked secret-redacted and the whole API is session-only so a leaked API key can't reconfigure delivery or silence alerts.2. How it works
The server owns no notification storage; Grafana is the source of truth and
fleet-apiis a translation + isolation layer in front of it.Request path (every RPC): client → Connect handler →
authorize()runs the per-route RBAC gate (notification:readfor lists,notification:managefor mutations) and extracts the caller'sorganization_idfrom the session (a zero org id fails closed) → the domainServicedoes org-scoped work → theGrafanaHTTP adapter calls the sidecar → state lands in Grafana (which itself queries TimescaleDB for alert evaluation).The three Grafana surfaces each concept maps to:
/api/v1/provisioning/contact-points).Testposts a synthetic alert to.../test./api/alertmanager/grafana/api/v2/silences).Org isolation is enforced entirely inside the domain layer because Grafana does not sandbox these objects per Proto-Fleet org:
org-<id>-; reads filter by prefix, writes inject it, and an update/delete first re-verifies the prefix before touching the row.organization_idlabel (a global YAML default) or one matching the caller.organization_id=<id>equality matcher; reads filter on it, writes always inject it. A cross-org id never matches, so it surfaces as not-found rather than acting as a list oracle.Secrets are write-only. Webhook bearer credentials, SMTP passwords, and Slack URLs are accepted on writes but never echoed on reads (the proto omits them; webhook URLs are reduced to
scheme://host). On an edit with no fresh secret, the stored secret is carried forward — but only if the destination is unchanged, so an old credential can never be replayed against a new destination. Grafana request/response bodies are scrubbed (known secret keys redacted, plus regex scrubbing of URLs andBearer …substrings) before they reach logs.Pause/resume does not flip Grafana's
isPaused(forbidden on YAML-provisioned rules). InsteadPauseRulewrites a marker "pause silence" — an org-scoped silence carrying__alert_rule_uid__=<rule>plus aproto_fleet_pause=truematcher and a far-future end time.ResumeRuledeletes that silence. List responses overlay active pause silences onto each rule'sEnabledflag, and pause silences are hidden from the maintenance-window list since they're an implementation detail.Maintenance-window scope → matchers. A structured scope (rule / group / site / device-ids) compiles to Alertmanager matchers alongside the org matcher. Device lists become an anchored, regex-escaped alternation (
^(?:a|b)$); ids are restricted to a safe alphabet and capped at 500 so a crafted id like.*can't widen the mute. Targetless scopes are rejected (they'd compile to just the org matcher and silence everything).SSRF pre-flight. Because Grafana opens the outbound connection, a user-supplied webhook/Slack URL or SMTP host is an SSRF vector. Before any write,
fleet-apiparses the destination, requires http(s), and resolves the host — rejecting loopback / link-local / private / unspecified addresses (DNS failure fails closed). This is a guardrail, not the egress boundary (see decisions).History (
ListNotifications) is the one read backed by Proto Fleet's own DB: it clamps page size (default 50, max 200), fetches one extra row to computehas_moreexactly, and calls the #453 keyset store.Auth to Grafana. The adapter prefers a bearer service-account token and falls back to basic auth only when no token is set.
run-fleet.shmints a least-privilege token post-boot (anfleet-apiservice account, Editor on org 1), writes it to.env(chmod 600), and recreatesfleet-apiso it stops using the admin password; the basic-auth password now defaults empty.flowchart LR client["UI / API client"] -->|"Connect RPC"| handler["notifications Connect handlers"] handler -->|"RBAC gate + org id"| authz["interceptors + middleware<br/>(session-only, redacted)"] authz --> svc["notifications.Service<br/>org-scoping, secret redaction,<br/>SSRF pre-flight, scope compile"] svc --> adapter["Grafana HTTP adapter<br/>(grafana_client.go)"] adapter -->|"provisioning + Alertmanager API"| grafana["Grafana sidecar"] grafana --> tsdb[("TimescaleDB")] handler -->|"ListNotifications only"| histstore["notification_history store (#453)"] histstore --> tsdbCreateChannel (representative mutating flow)
Rule paused/enabled state (via pause silence, not isPaused)
stateDiagram-v2 [*] --> Enabled Enabled --> Paused: "PauseRule writes org-scoped pause silence" Paused --> Enabled: "ResumeRule deletes pause silence" Paused --> Paused: "PauseRule (idempotent)" Enabled --> Enabled: "ResumeRule (idempotent)" note right of Paused ListRules ORs an active pause silence onto the rule's Enabled flag. A YAML isPaused stays paused even after Resume. end note3. Areas of the code involved
server/internal/domain/notifications/grafana_client.go(new)X-Disable-Provenanceon writes, token-vs-basic-auth, secret redaction in logs/errors. Keeps Grafana's "silence" vocabulary because it models the Alertmanager API.GrafanaErrorsurfacing, and that redaction covers logs and returned error messages.server/internal/domain/notifications/service.go(new)server/internal/domain/notifications/models.go(new)Channel/Rule/MaintenanceWindow/MaintenanceWindowScope, kinds, validation state.server/internal/handlers/notifications/handler.go(new)ListNotificationspage-size clamp +has_more.notification:manage, reads onnotification:read, and secrets aren't mapped back into responses.server/cmd/fleetd/main.goserver/internal/handlers/interceptors/config.goserver/internal/infrastructure/metrics/provider.goGrafanaConfigandDestinationPolicyconfig (env-prefixed).deployment-files/run-fleet.sh.env, recreatesfleet-api. Loud failure on restart error.deployment-files/docker-compose.notifications.yaml,server/docker-compose.notifications.yamlserver/internal/handlers/middleware/rpc_permissions_test.goTestRPCContract_RegisteredServicesMatchMainMuxgreen.server/internal/domain/notifications/service_test.go,grafana_client_test.go(new)Generated
pb.go/connect/pb.tsare not in this PR (they ship in #452) — nothing here is generated.4. Key technical decisions & trade-offs
isPaused— required because Grafana 11.6+ forbids the provisioning API from editing YAML-provisioned rules; trade-off is an overlay computed at list time and a hidden "pause silence" class.5. Testing & validation
Covered (unit): secret redaction (known keys, empty-value preservation, arrays, secrets embedded in string values, punctuation-bearing bearer tokens, non-JSON bodies); SSRF destination validation; maintenance-window scope validation + device-id regex compilation/rejection (
.*, alternation, anchors); webhook/SMTP/Slack secret preservation, replacement, and redaction; cross-org rejection; the RPC contract test confirming all four services are mounted.Explicitly NOT covered / known limitations:
group_id/site_idmatchers that the current alert rules don't emit, so only rule- and device-scoped windows actually match.🤖 Generated with Claude Code