Skip to content

notifications (1/5): proto definitions and generated code#452

Merged
illegalprime merged 1 commit into
mainfrom
eden/notifications-1-proto
Jun 16, 2026
Merged

notifications (1/5): proto definitions and generated code#452
illegalprime merged 1 commit into
mainfrom
eden/notifications-1-proto

Conversation

@illegalprime

@illegalprime illegalprime commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Summary

This is the bottom of a 5-PR stack introducing Notifications to Fleet — an operator-facing capability to manage where alerts are delivered, which alert rules are active, when alerts are temporarily suppressed during maintenance, and to review what has already fired. This PR delivers only the contract: the protobuf service/message definitions plus the generated Go and TypeScript code. It defines four services — ChannelService, RuleService, MaintenanceWindowService, and HistoryService — under notifications.v1. Nothing is wired up or served yet; these are the type/contract definitions consumed by the later PRs in the stack (history DB layer → authz → Grafana-backed service + handlers → client UI).

How it works

There is no runtime behavior in this PR. It establishes the API contract from which code is generated, and that generated code is what downstream PRs consume.

The shape of the eventual system, which this contract describes:

  • The contract lives in proto/notifications/v1/notifications.proto. It is compiled by buf into two consumer-facing artifacts: Go (message types + Connect handler interfaces/stubs) for the fleet-api server, and TypeScript (message types + client) for the web UI.
  • fleet-api is intended to be the security boundary: every RPC will be scoped to the caller's organization (note organization_id on Channel, Rule, MaintenanceWindow), and fleet-api will proxy a Grafana sidecar that actually performs delivery, rule evaluation, and suppression.
  • The four services map to four Grafana-backed concerns:
    • ChannelService — alert delivery destinations, backed by Grafana contact points. Channels carry a kind (webhook / SMTP / Slack) and a corresponding config message. Secrets are write-only: zeroed on reads, with Channel.has_secret signalling presence so the UI can render a masked "set" state. TestChannel sends a synthetic alert through either a saved channel or an unsaved definition.
    • RuleServiceread-only plus pause/resume. Rules are provisioned by ops YAML, so there is deliberately no create/update/delete. Rule.id is the Grafana alert-rule UID (stable across deploys because the YAML pins it); enabled is the inverse of Grafana's isPaused.
    • MaintenanceWindowService — temporary suppression of alerts during planned maintenance. A MaintenanceWindow carries a MaintenanceWindowScope that targets a rule, group, site, or set of devices, plus a time window and comment.
    • HistoryService — read-only access to the delivered-notification log, newest-first and keyset-paginated on the entry id (before_id cursor + has_more). NotificationHistoryEntry fields are denormalized free-form strings sourced directly from Grafana alerts.
  • Input constraints are declared inline via buf.validate field options (e.g. channel name length 1–255, comment max 1024, non-negative page_size, non-empty ids), so validation is part of the contract rather than hand-rolled later.

The end-to-end request path (realized in later PRs, shown here for context):

flowchart LR
    UI["Web UI<br/>(generated TS client)"] -->|"Connect-RPC over HTTP"| API["fleet-api<br/>(generated Go handlers)"]
    API -->|"scope to caller org"| AUTHZ["authz / org scoping"]
    AUTHZ --> GRAFANA["Grafana sidecar"]
    GRAFANA --> CP["contact points<br/>(channels)"]
    GRAFANA --> RULES["provisioned alert rules"]
    GRAFANA --> MW["maintenance windows<br/>(alert suppression)"]
    API -->|"history read path"| DB[("notification_history<br/>(keyset paginated)")]
Loading

This PR contributes the leftmost and the fleet-api boundary box as type definitions only.

Contract shape

flowchart TB
    subgraph svc["notifications.v1 services"]
        CS["ChannelService<br/>List / Create / Update / Delete / Test"]
        RS["RuleService<br/>List / Pause / Resume"]
        MWS["MaintenanceWindowService<br/>List / Create / Update / Delete"]
        HS["HistoryService<br/>ListNotifications"]
    end
    CS --> Channel["Channel<br/>+ Webhook/Smtp/Slack config<br/>(write-only secrets)"]
    RS --> Rule["Rule<br/>(RuleTemplate, enabled)"]
    MWS --> MaintenanceWindow["MaintenanceWindow<br/>+ MaintenanceWindowScope<br/>(rule/group/site/device)"]
    HS --> Entry["NotificationHistoryEntry<br/>(keyset by id)"]
Loading

The TestChannel flow is the one RPC with non-obvious selection logic:

sequenceDiagram
    participant UI
    participant API as "fleet-api (future)"
    participant G as "Grafana (future)"
    UI->>API: "TestChannelRequest (id OR inline kind+config)"
    Note over API: "unsaved definition wins when both set"
    API->>G: "send synthetic alert"
    G-->>API: "delivery result"
    API-->>UI: "TestChannelResponse{ok, error, response_code}"
Loading

Areas of the code involved

Area / package / file What changed Why it matters for review
proto/notifications/v1/notifications.proto New. Four services, their request/response messages, Channel/Rule/MaintenanceWindow/NotificationHistoryEntry types, enums (ChannelKind, ValidationState, RuleTemplate, MaintenanceWindowScopeKind), and buf.validate field constraints. The file is comment-free — there are no proto doc comments to read. The only hand-authored file to review. This is the entire contract; everything else is derived from it.
server/generated/grpc/notifications/v1/notifications.pb.go New. Generated Go message types. Generated — skip. Any comments here are generator boilerplate (// Code generated ... DO NOT EDIT), not authored proto docs.
server/generated/grpc/notifications/v1/notificationsv1connect/notifications.connect.go New. Generated Connect handler interfaces + client stubs for the four services. Generated — skip. Confirms the four services map to handler stubs the later PRs implement.
client/src/protoFleet/api/generated/notifications/v1/notifications_pb.ts New. Generated TypeScript message types / client. Generated — skip. Comments here are generator boilerplate, not authored proto docs.

Key technical decisions & trade-offs

  • RuleService is read-only + pause/resume only — chosen over a full CRUD rule surface, because rules are provisioned by ops YAML; the API never authors them.
  • Secrets are write-only with a has_secret boolean — chosen over returning masked secret values, so secret material never leaves the server while the UI can still show a "set" state.
  • History pagination is keyset on entry id (before_id / has_more) — chosen over offset/limit, to keep newest-first paging stable and cheap as the log grows.
  • Channel config is a separate message per kind (Webhook/Smtp/Slack) rather than a generic key/value map — gives each destination type an explicit, validated schema at the cost of one message per new kind.
  • Validation lives in the proto via buf.validate — chosen over hand-written server checks, so constraints travel with the contract and apply uniformly to generated consumers.
  • Rule.id reuses the Grafana alert-rule UID rather than a Fleet-minted id — relies on the YAML pinning the UID so it stays stable across deploys.
  • TestChannelRequest accepts an id or an inline definition, with the inline definition winning — lets the UI test before save, at the cost of a slightly ambiguous request shape.
  • The proto is stripped of doc comments — the contract is expressed entirely through names and buf.validate constraints, keeping the hand-authored file minimal; semantic documentation lives in this stack's PRs and the downstream service code.

Testing & validation

@illegalprime illegalprime requested a review from a team as a code owner June 15, 2026 15:53
@github-actions github-actions Bot added javascript Pull requests that update javascript code client server shared labels Jun 15, 2026
@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown

🔐 Codex Security Review

Note: This is an automated security-focused code review generated by Codex.
It should be used as a supplementary check alongside human review.
False positives are possible - use your judgment.

Scope summary

  • Reviewed pull request diff only (3f14d374f5c67e14b70014c14d582c2c27273fee...cd175c127b14bc0386b38e0f36e910d5d3111753, exact PR three-dot diff)
  • Model: gpt-5.5

💡 Click "edited" above to see previous reviews for this PR.


Review Summary

Overall Risk: HIGH

Findings

[HIGH] Channel list/response types can expose stored notification secrets

  • Category: gRPC
  • Location: proto/notifications/v1/notifications.proto:89
  • Description: ListChannelsResponse returns full Channel objects, and Channel embeds WebhookConfig, SmtpConfig, and SlackConfig. Those config messages contain secret-bearing fields such as bearer_header, SMTP password, and Slack webhook_url. The presence of has_secret suggests redaction is intended, but the response schema still provides normal fields for the raw secrets.
  • Impact: A caller that can list or receive channels could exfiltrate Slack webhook URLs, SMTP passwords, and bearer tokens if handlers populate these fields. These values may also end up in frontend state or logs.
  • Recommendation: Split request/write config from response/read models. Return a redacted ChannelSummary that omits secret values and includes only safe metadata plus has_secret. Do not reuse secret-bearing config messages in list/create/update responses.

[HIGH] Test/channel config accepts arbitrary outbound targets without validation

  • Category: gRPC
  • Location: proto/notifications/v1/notifications.proto:120
  • Description: TestChannelRequest can carry arbitrary webhook, Slack webhook, or SMTP host/port data. The target fields have no URI, scheme, host, port, length, or private-network restrictions.
  • Impact: A straightforward implementation of TestChannel becomes an SSRF primitive from the server: probing internal services, loopback, link-local/cloud metadata endpoints, or sending attacker-controlled requests with configured headers.
  • Recommendation: Add strict validation and server-side canonicalization before any outbound request: require HTTPS for webhooks/Slack, reject userinfo and redirects to disallowed networks, block loopback/private/link-local/multicast targets, constrain SMTP ports/hosts, set short timeouts, and rate-limit tests.

[MEDIUM] Channel create/update schema allows invalid or conflicting channel configurations

  • Category: gRPC
  • Location: proto/notifications/v1/notifications.proto:92
  • Description: CreateChannelRequest and UpdateChannelRequest accept kind, webhook, smtp, and slack as independent fields. kind permits CHANNEL_KIND_UNSPECIFIED, and clients can send no config or multiple configs at once.
  • Impact: Handlers must guess which config is authoritative, creating room for dropped secrets, wrong validation, inconsistent persistence, or bypasses where a request’s kind does not match the supplied secret config.
  • Recommendation: Use a required oneof for channel config, or add Buf validate/CEL rules enforcing exactly one config and matching kind. Add enum.defined_only and not_in: [0] for kind.

[MEDIUM] Maintenance windows can be created without required scope/time invariants

  • Category: Reliability
  • Location: proto/notifications/v1/notifications.proto:208
  • Description: CreateMaintenanceWindowRequest and UpdateMaintenanceWindowRequest do not require scope, starts_at, or ends_at, and there is no validation that ends_at is after starts_at. MaintenanceWindowScope also allows an unspecified kind and multiple/missing target fields.
  • Impact: Invalid windows can accidentally silence alerts too broadly or indefinitely, depending on handler defaults. Missing message fields can also lead to handler panics if dereferenced without checks.
  • Recommendation: Mark scope and timestamps required, reject unspecified scope kinds, enforce exactly one target field for the selected scope, cap device_ids, and add a CEL rule requiring ends_at > starts_at plus any maximum-duration policy.

[MEDIUM] Notification history page size is unbounded

  • Category: Reliability
  • Location: proto/notifications/v1/notifications.proto:253
  • Description: ListNotificationsRequest.page_size only validates gte = 0; there is no upper bound.
  • Impact: An authenticated caller can request very large notification pages, forcing large DB reads and large protobuf/JSON responses. Notification history is naturally unbounded over time, so this can become a memory/CPU pressure point.
  • Recommendation: Add an lte cap consistent with other list APIs, and have the handler clamp 0 to a safe default.

Notes

No hand-written server handlers, SQL, frontend views, plugin code, Docker files, or infrastructure changes are present in this diff. I did not find cryptostealing/pool-hijack logic in the changed files. The main risk is that the new protobuf contract bakes in unsafe defaults before the RPC implementation lands.


Generated by Codex Security Review |
Triggered by: @illegalprime |
Review workflow run

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c64a5c2fd8

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread proto/notifications/v1/notifications.proto Outdated
@illegalprime illegalprime force-pushed the eden/notifications-1-proto branch from c64a5c2 to d032a2d Compare June 15, 2026 16:02

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d032a2dfba

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread proto/notifications/v1/notifications.proto
@illegalprime

Copy link
Copy Markdown
Contributor Author

Both Codex findings on this PR are resolved downstream in #455 (server layer), not here — they read as open only because Codex reviews this PR's proto diff in isolation:

No change needed on this PR.

@illegalprime

Copy link
Copy Markdown
Contributor Author

codex-connector triage:

  • [P1] Scope rule pauses per organizationfalse positive. PauseRule (service.go) does not flip Grafana's global isPaused; it "mutes a rule via a marker pause-silence" — buildPauseSilence(orgID, ruleID, …) is org-scoped, and pauseSilencedRules derives enabled per-org. So a pause by org A does not affect org B. No change.
  • [P2] Explicit secret-preservation semantics — handled in notifications (4/5): Grafana-backed service, handlers, and server wiring #455's service impl: requestHasNewSecret + carrySecretSettings carry the stored secret forward only when the destination is unchanged. The shared message shape is intentional (reads return redacted/empty secrets). No proto change.

@illegalprime illegalprime force-pushed the eden/notifications-1-proto branch from d032a2d to 7a2ceb2 Compare June 15, 2026 18:35

@flesher flesher left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Where did we land on for the naming of "Silences" / how is it being reflected in the UI? I thought the general consensus was maintenance windows but forget where it ended up... Regardless we should try to align the api to whatever the UI calls them

@illegalprime

Copy link
Copy Markdown
Contributor Author

Great point. Thanks!

Adds the notifications.proto service definitions (Channel, Rule, Silence,
History) and the generated Go (Connect handlers + messages) and TypeScript
clients. Generated code only; no behavior is wired up yet.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@illegalprime illegalprime force-pushed the eden/notifications-1-proto branch from 7a2ceb2 to cd175c1 Compare June 15, 2026 19:24

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cd175c127b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread proto/notifications/v1/notifications.proto
@illegalprime illegalprime merged commit 591590e into main Jun 16, 2026
76 checks passed
@illegalprime illegalprime deleted the eden/notifications-1-proto branch June 16, 2026 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

client javascript Pull requests that update javascript code server shared

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants