diff --git a/hyperfleet/README.md b/hyperfleet/README.md index 9f7c2f1e..6a0d278c 100644 --- a/hyperfleet/README.md +++ b/hyperfleet/README.md @@ -145,7 +145,7 @@ Kubernetes resources (Jobs, Secrets, ConfigMaps, Services) created by adapters t - Creates Kubernetes resources if conditions met - Reports status → PUT /clusters/{id}/statuses 6. API aggregates adapter statuses → Updates cluster status -7. Cycle repeats until cluster reaches Ready phase +7. Cycle repeats until cluster reaches Reconciled state ``` --- diff --git a/hyperfleet/components/sentinel/sentinel.md b/hyperfleet/components/sentinel/sentinel.md index 781203f6..d1035f21 100644 --- a/hyperfleet/components/sentinel/sentinel.md +++ b/hyperfleet/components/sentinel/sentinel.md @@ -70,7 +70,7 @@ The Sentinel solves these problems by: - **Closing the reconciliation loop**: Continuously polls resources and publishes events to trigger adapter evaluation - **Uses adapter status updates**: Reads `status.conditions[].last_updated_time` and condition statuses (updated by adapters on every check) to determine when to create next event - **Fully configurable decision logic**: Named CEL params and a boolean result expression define the complete decision logic (e.g., different age thresholds for ready vs not-ready resources) -- **Relies on API-computed Ready condition**: The API aggregates adapter statuses into a `Ready` condition — when `Ready != True` (including after spec changes that increment `generation`), the Sentinel's default rules trigger reconciliation +- **Relies on API-computed Reconciled condition**: The API aggregates adapter statuses into a `Reconciled` condition — when `Reconciled != True` (including after spec changes that increment `generation`), the Sentinel's default rules trigger reconciliation - **Self-healing**: Automatically retries without manual intervention - **Horizontal scalability**: Resource filtering allows multiple Sentinels to handle different resource subsets - **Event-driven architecture**: Maintains decoupling by publishing CloudEvents to message broker @@ -251,7 +251,7 @@ The service uses a fully configurable decision logic based on the `message_decis **Publish Event IF**: - Evaluate all `message_decision.params` in dependency order (each param is a CEL expression or duration literal) -- Params can reference other params (e.g., `is_ready` can be used in `ready_and_stale`) +- Params can reference other params (e.g., `is_reconciled` can be used in `reconciled_and_stale`) - Evaluate `message_decision.result` boolean expression (standard CEL logical operators) - If result is `true` → publish event @@ -260,11 +260,11 @@ The service uses a fully configurable decision logic based on the `message_decis **Key Insight — Why No Hardcoded Generation Check**: -The API already aggregates adapter statuses into the `Ready` condition. When a user changes the resource spec (incrementing `generation`), the API sets `Ready` to `False` because not all adapters have reconciled the new generation yet. This means `Ready == False` already covers the generation mismatch case — there is no need for the Sentinel to duplicate this logic with a separate generation check. +The API already aggregates adapter statuses into the `Reconciled` condition. When a user changes the resource spec (incrementing `generation`), the API sets `Reconciled` to `False` because not all adapters have reconciled the new generation yet. This means `Reconciled == False` already covers the generation mismatch case — there is no need for the Sentinel to duplicate this logic with a separate generation check. This simplifies the Sentinel to a single unified rule engine: -- The `Ready` condition is the canonical signal for "this resource needs reconciliation" -- The `Available` condition is informational but not used for decision-making in the default configuration +- The `Reconciled` condition is the canonical signal for "this resource needs reconciliation" +- The `LastKnownReconciled` condition is informational but not used for decision-making in the default configuration - Operators can write custom rules using any condition type if their use case requires it ### Message Decision @@ -282,13 +282,14 @@ The Sentinel uses a `message_decision` configuration with named **params** and a | Param Name | Type | Expression | Purpose | |------------|------|------------|---------| -| `ref_time` | CEL → string | `condition("Ready").last_updated_time` | Reference timestamp for age calculation | -| `is_ready` | CEL → bool | `condition("Ready").status == "True"` | Whether resource is ready | -| `is_new_resource` | CEL → bool | `!is_ready && resource.generation == 1` | Brand-new resource that needs immediate reconciliation | -| `ready_and_stale` | CEL → bool | `is_ready && now - timestamp(ref_time) > duration("30m")` | Ready resource whose last check is stale | -| `not_ready_and_debounced` | CEL → bool | `!is_ready && now - timestamp(ref_time) > duration("10s")` | Not-ready resource, debounce period elapsed | +| `ref_time` | CEL → string | `condition("Reconciled").last_updated_time` | Reference timestamp for age calculation | +| `is_reconciled` | CEL → bool | `condition("Reconciled").status == "True"` | Whether resource is reconciled | +| `is_new_resource` | CEL → bool | `!is_reconciled && resource.generation == 1` | Brand-new resource that needs immediate reconciliation | +| `generation_mismatch` | CEL → bool | `resource.generation > condition("Reconciled").observed_generation` | Resource spec changed since last reconciliation | +| `reconciled_and_stale` | CEL → bool | `is_reconciled && now - timestamp(ref_time) > duration("30m")` | Reconciled resource whose last check is stale | +| `not_reconciled_and_debounced` | CEL → bool | `!is_reconciled && now - timestamp(ref_time) > duration("10s")` | Not-reconciled resource, debounce period elapsed | -**Result**: `is_new_resource || ready_and_stale || not_ready_and_debounced` +**Result**: `is_new_resource || generation_mismatch || reconciled_and_stale || not_reconciled_and_debounced` **Why debounce?** @@ -305,7 +306,7 @@ By introducing a debounce interval (10s default), the Sentinel limits the messag - The `now` variable (current timestamp) is available in all expressions - The `result` is the **sole decision maker** — all time-based checks, condition evaluations, and reconciliation triggers are encoded in params (no hardcoded checks) - The `result` expression uses standard CEL logical operators (`&&`, `||`). No aliases or custom operator syntax — pure CEL. -- A single custom helper function `condition(name)` provides access to resource status data (see reference below). Fields are accessed directly (e.g., `condition("Ready").status`), keeping the API surface minimal. +- A single custom helper function `condition(name)` provides access to resource status data (see reference below). Fields are accessed directly (e.g., `condition("Reconciled").status`), keeping the API surface minimal. - This aligns with the adapter framework's preconditions pattern (CEL-based evaluation) #### Custom CEL Function Reference @@ -324,14 +325,14 @@ Returns the full condition object matching the given `type` name, with fields: **Examples**: ```cel -condition("Ready").status == "True" # check if resource is ready -condition("Ready").last_updated_time # get timestamp for age calculation +condition("Reconciled").status == "True" # check if resource is ready +condition("Reconciled").last_updated_time # get timestamp for age calculation condition("Available").observed_generation # get last reconciled generation ``` **Notes**: - Searches `resource.status.conditions[]` by the `type` field -- Works with **any** condition type present on the resource (e.g., `"Ready"`, `"Available"`, `"Applied"`, `"Health"`, or custom conditions) +- Works with **any** condition type present on the resource (e.g., `"Reconciled"`, `"LastKnownReconciled"`, `"Applied"`, `"Health"`, or custom conditions) - When accessing `.last_updated_time` on a missing condition, the zero time value will cause `timestamp()` conversion to produce a very old timestamp, which naturally triggers age-exceeded checks — acting as a fail-safe that ensures new or unknown resources get reconciled (see Test 7) **Configuration** (via YAML files): @@ -347,12 +348,12 @@ poll_interval: 5s # Message decision - configurable decision logic message_decision: params: - ref_time: 'condition("Ready").last_updated_time' - is_ready: 'condition("Ready").status == "True"' - is_new_resource: '!is_ready && resource.generation == 1' - ready_and_stale: 'is_ready && now - timestamp(ref_time) > duration("30m")' - not_ready_and_debounced: '!is_ready && now - timestamp(ref_time) > duration("10s")' - result: 'is_new_resource || ready_and_stale || not_ready_and_debounced' + ref_time: 'condition("Reconciled").last_updated_time' + is_reconciled: 'condition("Reconciled").status == "True"' + is_new_resource: '!is_reconciled && resource.generation == 1' + reconciled_and_stale: 'is_reconciled && now - timestamp(ref_time) > duration("30m")' + not_reconciled_and_debounced: '!is_reconciled && now - timestamp(ref_time) > duration("10s")' + result: 'is_new_resource || reconciled_and_stale || not_reconciled_and_debounced' # Resource selector - only process resources matching these labels resource_selector: @@ -469,7 +470,7 @@ Integration tests MUST verify that: **Status Tracking**: -The Sentinel reads the resource's status conditions to evaluate the message decision rules. The default configuration relies on the `Ready` condition, but custom rules can reference any condition: +The Sentinel reads the resource's status conditions to evaluate the message decision rules. The default configuration relies on the `Reconciled` condition, but custom rules can reference any condition: ```json { @@ -485,7 +486,7 @@ The Sentinel reads the resource's status conditions to evaluate the message deci "last_transition_time": "2025-10-21T10:00:00Z" }, { - "type": "Ready", + "type": "Reconciled", "status": "False", "observed_generation": 1, "last_updated_time": "2025-10-21T12:00:00Z", @@ -500,22 +501,22 @@ The Sentinel reads the resource's status conditions to evaluate the message deci - **`generation`**: User's desired state version. Increments when the resource spec changes (e.g., user scales nodes from 3 to 5). This is the "what the user wants" field. -- **`condition.observed_generation`**: Which generation was last reconciled by a given adapter. The API uses this to compute the aggregated `Ready` condition — when any adapter's `observed_generation` is behind `resource.generation`, the API sets `Ready` to `False`. +- **`condition.observed_generation`**: Which generation was last reconciled by a given adapter. The API uses this to compute the aggregated `Reconciled` condition — when any adapter's `observed_generation` is behind `resource.generation`, the API sets `Reconciled` to `False`. -- **`condition.last_transition_time`**: Updates ONLY when the condition status changes (e.g., Ready False → True) +- **`condition.last_transition_time`**: Updates ONLY when the condition status changes (e.g., Reconciled False → True) - **`condition.last_updated_time`**: Updates EVERY time an adapter checks the resource, regardless of whether status changed **How generation changes flow through the system:** -When a user changes the cluster spec (e.g., scales nodes), `generation` increments (1 → 2). The API detects that not all adapters have reconciled this generation and sets `Ready` to `False`. The Sentinel's default rules see `Ready != True` and trigger reconciliation — no separate generation check is needed in the Sentinel. +When a user changes the cluster spec (e.g., scales nodes), `generation` increments (1 → 2). The API detects that not all adapters have reconciled this generation and sets `Reconciled` to `False`. The Sentinel's default rules see `Reconciled != True` and trigger reconciliation — no separate generation check is needed in the Sentinel. **Why this matters for age calculation in message decision:** If a cluster stays in "Provisioning" state for 2 hours, `last_transition_time` would remain at the time it entered "Provisioning" (e.g., 10:00), even though adapters check it at 11:00, 11:30, 12:00. Using `last_transition_time` for age calculation would incorrectly trigger events too frequently. Using `last_updated_time` ensures age is calculated from the last adapter check, not the last status change. **For complete details on generation and observed_generation semantics, see:** -- [HyperFleet Status Guide](../../docs/status-guide.md) - Complete documentation of the status contract, including how adapters report `observed_generation` and how the API aggregates it into the `Ready` condition +- [HyperFleet Status Guide](../../docs/status-guide.md) - Complete documentation of the status contract, including how adapters report `observed_generation` and how the API aggregates it into the `Reconciled` condition ### Resource Filtering Architecture @@ -550,12 +551,12 @@ resource_type: clusters poll_interval: 5s message_decision: params: - ref_time: 'condition("Ready").last_updated_time' - is_ready: 'condition("Ready").status == "True"' - is_new_resource: '!is_ready && resource.generation == 1' - ready_and_stale: 'is_ready && now - timestamp(ref_time) > duration("30m")' - not_ready_and_debounced: '!is_ready && now - timestamp(ref_time) > duration("10s")' - result: 'is_new_resource || ready_and_stale || not_ready_and_debounced' + ref_time: 'condition("Reconciled").last_updated_time' + is_reconciled: 'condition("Reconciled").status == "True"' + is_new_resource: '!is_reconciled && resource.generation == 1' + reconciled_and_stale: 'is_reconciled && now - timestamp(ref_time) > duration("30m")' + not_reconciled_and_debounced: '!is_reconciled && now - timestamp(ref_time) > duration("10s")' + result: 'is_new_resource || reconciled_and_stale || not_reconciled_and_debounced' resource_selector: - label: region value: us-east @@ -576,12 +577,12 @@ resource_type: clusters poll_interval: 5s message_decision: params: - ref_time: 'condition("Ready").last_updated_time' - is_ready: 'condition("Ready").status == "True"' - is_new_resource: '!is_ready && resource.generation == 1' - ready_and_stale: 'is_ready && now - timestamp(ref_time) > duration("1h")' # Different! - not_ready_and_debounced: '!is_ready && now - timestamp(ref_time) > duration("15s")' # Different! - result: 'is_new_resource || ready_and_stale || not_ready_and_debounced' + ref_time: 'condition("Reconciled").last_updated_time' + is_reconciled: 'condition("Reconciled").status == "True"' + is_new_resource: '!is_reconciled && resource.generation == 1' + reconciled_and_stale: 'is_reconciled && now - timestamp(ref_time) > duration("1h")' # Different! + not_reconciled_and_debounced: '!is_reconciled && now - timestamp(ref_time) > duration("15s")' # Different! + result: 'is_new_resource || reconciled_and_stale || not_reconciled_and_debounced' resource_selector: - label: region value: us-west @@ -602,12 +603,12 @@ resource_type: nodepools poll_interval: 5s message_decision: params: - ref_time: 'condition("Ready").last_updated_time' - is_ready: 'condition("Ready").status == "True"' - is_new_resource: '!is_ready && resource.generation == 1' - ready_and_stale: 'is_ready && now - timestamp(ref_time) > duration("10m")' - not_ready_and_debounced: '!is_ready && now - timestamp(ref_time) > duration("5s")' - result: 'is_new_resource || ready_and_stale || not_ready_and_debounced' + ref_time: 'condition("Reconciled").last_updated_time' + is_reconciled: 'condition("Reconciled").status == "True"' + is_new_resource: '!is_reconciled && resource.generation == 1' + reconciled_and_stale: 'is_reconciled && now - timestamp(ref_time) > duration("10m")' + not_reconciled_and_debounced: '!is_reconciled && now - timestamp(ref_time) > duration("5s")' + result: 'is_new_resource || reconciled_and_stale || not_reconciled_and_debounced' hyperfleet_api: endpoint: http://hyperfleet-api.hyperfleet-system.svc.cluster.local:8080 @@ -867,114 +868,114 @@ The following test scenarios ensure the Decision Engine correctly implements the ### Message Decision Tests -**Test 1: Ready resource with recent check → skip** +**Test 1: Reconciled resource with recent check → skip** ``` Given: - - Resource Ready condition status: True + - Resource Reconciled condition status: True - resource.generation = 2 - - condition("Ready").last_updated_time = now() - 5m (age < 30m) + - condition("Reconciled").last_updated_time = now() - 5m (age < 30m) Then: - Decision: SKIP - Reason: "message decision not matched" - - Params evaluated: ref_time, is_ready=true, is_new_resource=false, - ready_and_stale=false, not_ready_and_debounced=false + - Params evaluated: ref_time, is_reconciled=true, is_new_resource=false, + reconciled_and_stale=false, not_reconciled_and_debounced=false - Result: false || false || false = false ``` -**Test 2: Not-Ready resource with debounce elapsed → publish** +**Test 2: Not-reconciled resource with debounce elapsed → publish** ``` Given: - - Resource Ready condition status: False + - Resource Reconciled condition status: False - resource.generation = 2 - - condition("Ready").last_updated_time = now() - 15s (age > 10s) + - condition("Reconciled").last_updated_time = now() - 15s (age > 10s) Then: - Decision: PUBLISH - Reason: "message decision matched" - - Params evaluated: ref_time, is_ready=false, is_new_resource=false, - ready_and_stale=false, not_ready_and_debounced=true + - Params evaluated: ref_time, is_reconciled=false, is_new_resource=false, + reconciled_and_stale=false, not_reconciled_and_debounced=true - Result: false || false || true = true ``` -**Test 3: Not-Ready resource within debounce period → skip** +**Test 3: Not-reconciled resource within debounce period → skip** ``` Given: - - Resource Ready condition status: False + - Resource Reconciled condition status: False - resource.generation = 2 - - condition("Ready").last_updated_time = now() - 5s (age < 10s) + - condition("Reconciled").last_updated_time = now() - 5s (age < 10s) Then: - Decision: SKIP - Reason: "message decision not matched" - - Params evaluated: ref_time, is_ready=false, is_new_resource=false, - ready_and_stale=false, not_ready_and_debounced=false + - Params evaluated: ref_time, is_reconciled=false, is_new_resource=false, + reconciled_and_stale=false, not_reconciled_and_debounced=false - Result: false || false || false = false ``` -**Test 4: Ready resource with stale check → publish (periodic health check)** +**Test 4: Reconciled resource with stale check → publish (periodic health check)** ``` Given: - - Resource Ready condition status: True + - Resource Reconciled condition status: True - resource.generation = 2 - - condition("Ready").last_updated_time = now() - 31m (age > 30m) + - condition("Reconciled").last_updated_time = now() - 31m (age > 30m) Then: - Decision: PUBLISH - Reason: "message decision matched" - - Params evaluated: ref_time, is_ready=true, is_new_resource=false, - ready_and_stale=true, not_ready_and_debounced=false + - Params evaluated: ref_time, is_reconciled=true, is_new_resource=false, + reconciled_and_stale=true, not_reconciled_and_debounced=false - Result: false || true || false = true ``` **Test 5: Brand-new resource (generation 1, not ready) → publish immediately** ``` Given: - - Resource Ready condition status: False + - Resource Reconciled condition status: False - resource.generation = 1 - - condition("Ready").last_updated_time = now() - 2s (within debounce period) + - condition("Reconciled").last_updated_time = now() - 2s (within debounce period) Then: - Decision: PUBLISH - Reason: "message decision matched" - - Params evaluated: ref_time, is_ready=false, is_new_resource=true, - ready_and_stale=false, not_ready_and_debounced=false + - Params evaluated: ref_time, is_reconciled=false, is_new_resource=true, + reconciled_and_stale=false, not_reconciled_and_debounced=false - Result: true || false || false = true Note: - Brand-new resources bypass the debounce because no adapter has processed them yet — there is no "previous work" to wait for. ``` -**Test 6: Not-Ready resource due to generation mismatch → publish via debounce** +**Test 6: Not-reconciled resource due to generation mismatch → publish via debounce** ``` Given: - resource.generation = 2 (user changed spec) - - API has set Ready condition status: False (because adapters haven't reconciled generation 2) - - condition("Ready").last_updated_time = now() - 15s (debounce elapsed) + - API has set Reconciled condition status: False (because adapters haven't reconciled generation 2) + - condition("Reconciled").last_updated_time = now() - 15s (debounce elapsed) Then: - Decision: PUBLISH - Reason: "message decision matched" - - Params evaluated: ref_time, is_ready=false, is_new_resource=false, - ready_and_stale=false, not_ready_and_debounced=true + - Params evaluated: ref_time, is_reconciled=false, is_new_resource=false, + reconciled_and_stale=false, not_reconciled_and_debounced=true - Result: false || false || true = true Note: - - The generation mismatch is handled implicitly: the API sets Ready=False when + - The generation mismatch is handled implicitly: the API sets Reconciled=False when any adapter's observed_generation is behind resource.generation. - The Sentinel's default rules pick this up via the not_ready_and_debounced path. + The Sentinel's default rules pick this up via the not_reconciled_and_debounced path. ``` ### Edge Cases -**Test 7: Missing Ready condition on resource (zero-value fail-safe)** +**Test 7: Missing Reconciled condition on resource (zero-value fail-safe)** ``` Given: - - Resource has no Ready condition + - Resource has no Reconciled condition - resource.generation = 1 Then: - - condition("Ready") returns zero-value Condition - - is_ready = false (zero-value .status == "" != "True") - - is_new_resource = true (generation == 1 && !is_ready) + - condition("Reconciled") returns zero-value Condition + - is_reconciled = false (zero-value .status == "" != "True") + - is_new_resource = true (generation == 1 && !is_reconciled) - Decision: PUBLISH - Reason: "message decision matched" Note: - Brand-new resources with no conditions are caught by is_new_resource. Even without is_new_resource, the zero-value ref_time would produce - a very old timestamp, making not_ready_and_debounced true as well. + a very old timestamp, making not_reconciled_and_debounced true as well. ``` **Test 8: CEL expression compilation failure at startup** @@ -995,22 +996,22 @@ Then: - Clear error message indicating the circular dependency ``` -**Test 10: Brand-new resource with no Ready condition and generation > 1** +**Test 10: Brand-new resource with no Reconciled condition and generation > 1** ``` Given: - - Resource has no Ready condition (no adapter has reported yet) + - Resource has no Reconciled condition (no adapter has reported yet) - resource.generation = 2 (created with a spec update before any adapter ran) Then: - - condition("Ready") returns zero-value Condition - - is_ready = false (.status == "" != "True") + - condition("Reconciled") returns zero-value Condition + - is_reconciled = false (.status == "" != "True") - is_new_resource = false (generation != 1) - ref_time = zero time → age is effectively infinite - - not_ready_and_debounced = true + - not_reconciled_and_debounced = true - Decision: PUBLISH - Reason: "message decision matched" Note: - Even without is_new_resource, the zero-value ref_time produces a very old - timestamp, so not_ready_and_debounced catches it as a fail-safe. + timestamp, so not_reconciled_and_debounced catches it as a fail-safe. ``` **Test 11: message_decision omitted from configuration** @@ -1026,7 +1027,7 @@ Note: decision logic rather than relying on hidden defaults. ``` -**Test 12: Params reference non-Ready condition types** +**Test 12: Params reference non-Reconciled condition types** ``` Given: - Configuration uses custom condition types: @@ -1042,7 +1043,7 @@ Then: - Params evaluated: ref_time=Applied.last_updated_time, is_applied=true, age_exceeded=true Note: - The condition() function works with ANY condition type present in - resource.status.conditions[], not just "Ready". + resource.status.conditions[], not just "Reconciled". - If the referenced condition type does not exist on the resource, condition() returns a zero-value Condition, which naturally triggers age-exceeded checks (zero timestamp = very old age). @@ -1068,11 +1069,11 @@ Integration tests should verify the complete Sentinel workflow: 1. **Event Publishing**: Sentinel successfully publishes CloudEvents to the message broker when message decision result is true -2. **Not-Ready triggers reconciliation**: When a resource's Ready condition is False (including after spec changes that increment generation), Sentinel publishes an event based on message decision rules +2. **Not-reconciled triggers reconciliation**: When a resource's Reconciled condition is False (including after spec changes that increment generation), Sentinel publishes an event based on message decision rules 3. **Message decision evaluation**: Sentinel evaluates message_decision params and result to determine whether to publish -4. **Adapter feedback loop**: Adapters receive events, process resources, and update conditions correctly, which the API aggregates into the `Ready` condition for Sentinel to read in subsequent polls +4. **Adapter feedback loop**: Adapters receive events, process resources, and update conditions correctly, which the API aggregates into the `Reconciled` condition for Sentinel to read in subsequent polls --- diff --git a/hyperfleet/docs/glossary.md b/hyperfleet/docs/glossary.md index 697fec1b..7d8e056b 100644 --- a/hyperfleet/docs/glossary.md +++ b/hyperfleet/docs/glossary.md @@ -1,7 +1,7 @@ --- Status: Active Owner: HyperFleet Architecture Team -Last Updated: 2026-05-25 +Last Updated: 2026-06-10 --- # HyperFleet Glossary @@ -31,6 +31,7 @@ Definitions for HyperFleet-specific terms, concepts, and abbreviations used acro | **Fan-out** | The messaging pattern used in HyperFleet where a single reconciliation event published to one topic is independently delivered to multiple adapter subscriptions simultaneously. Each adapter receives its own copy of the event via a dedicated subscription. | Broker | | **GCP Pub/Sub** | Google Cloud Pub/Sub — the primary Message Broker implementation used in GCP-hosted HyperFleet deployments. Configured with `broker.type: googlepubsub` in `broker.yaml`. | Broker | | **Generation** | An integer field on HyperFleet API resources that increments each time the resource's spec changes. Adapters include `observed_generation` in their status reports to indicate which version of the spec they reconciled. Sentinel uses generation changes to detect new desired state requiring reconciliation. | API, Sentinel, Adapter Framework | +| **Finalized (Condition)** | An adapter condition introduced in v1.0.0 for the deletion lifecycle. Adapters report `Finalized: True` after completing cleanup of their managed resources during deletion. The API's hard-delete mechanism checks that all required adapters have reported `Finalized: True` before permanently removing a resource from the database. | Adapter Framework, API | | **Health (Condition)** | One of the three standard adapter status conditions. `Health: True` means the adapter is operating normally (no unexpected errors). `Health: False` indicates an infrastructure-level problem (e.g., can't connect to cloud API) as distinct from a business logic failure (e.g., validation failed, which sets `Available: False` but leaves `Health: True`). See also: Available, Applied | Adapter Framework, Sentinel | | **HyperFleet** | The Red Hat platform for managing the lifecycle of HyperShift-based OpenShift clusters at scale. HyperFleet provides APIs, orchestration (Sentinel), event-driven provisioning (Adapters), and observability for multi-cloud cluster provisioning. | All | | **HyperFleet API** | The REST API service providing CRUD operations for HyperFleet resources (clusters, node pools) and their statuses. The API is intentionally simple — no business logic, no event creation. It is the data layer for the system. See: [Architecture Summary](../README.md) | API | @@ -38,6 +39,7 @@ Definitions for HyperFleet-specific terms, concepts, and abbreviations used acro | **HyperShift** | The Red Hat project for running hosted OpenShift control planes on Kubernetes. HyperFleet manages the provisioning and lifecycle of HyperShift-based clusters. | Adapter Framework, API | | **Idempotent** | A property of Adapter operations: processing the same event multiple times produces the same result as processing it once. Required because the Message Broker provides at-least-once delivery (events may be delivered more than once). All HyperFleet Adapters must be idempotent. | Adapter Framework, Broker | | **Landing Zone Adapter** | An adapter that performs preparatory provisioning work before other adapters run — creating namespaces, secrets, and ConfigMaps that subsequent adapters depend on. | Adapter Framework | +| **LastKnownReconciled (Condition)** | A resource-level condition introduced in v1.0.0 that replaces the former `Available` aggregated condition. Represents the last-known reconciled state of the resource, computed from required adapter statuses. Unlike `Reconciled`, this condition preserves the last successful reconciliation state even when adapters are processing a new generation. | API | | **Maestro** | The work orchestration service that HyperFleet Adapters integrate with to manage distributed provisioning work items. Adapters use the Maestro SDK (not CLI) to submit and track work. See: [Maestro Integration](../components/adapter/maestro-integration/maestro-architecture-introduction.md) | Adapter Framework | | **Message Broker** | The pub/sub infrastructure component that decouples Sentinel from Adapters. Sentinel publishes CloudEvents to a topic; the broker delivers each event to every adapter subscription independently (fan-out). Supported implementations: GCP Pub/Sub (`googlepubsub`) and RabbitMQ (`rabbitmq`). See: [Broker Design](../components/broker/broker.md) | Broker, Sentinel, Adapter Framework | | **message_decision** | The Sentinel configuration block that defines when to publish a reconciliation event. Contains `params` (named CEL expressions) and a `result` (boolean CEL expression combining the params). Sentinel evaluates this logic for every resource on every poll cycle. | Sentinel | @@ -47,7 +49,8 @@ Definitions for HyperFleet-specific terms, concepts, and abbreviations used acro | **Precondition** | A condition an Adapter checks before deciding to act on a reconciliation event. Preconditions verify that dependencies are met (e.g., Validation adapter has completed before DNS adapter runs) and that the current resource state requires the adapter's action. | Adapter Framework | | **Pulse** | A proposed extension to the HyperFleet status model that introduces periodic heartbeat status updates from adapters. Pulses disambiguate between "new generation not yet reconciled" and "system error" in the `status.phase` field. See: [Sentinel Pulses](sentinel-pulses.md) | Sentinel, Adapter Framework | | **RabbitMQ** | An AMQP-based self-hosted message broker used in on-premise HyperFleet deployments. Configured with `broker.type: rabbitmq` in `broker.yaml`. | Broker | -| **Ready (Condition / Phase)** | The aggregated cluster-level status derived from all adapter conditions. A cluster reaches `phase: Ready` when all registered adapters report `Available: True`. Used by Sentinel's default decision logic to determine when a cluster is fully provisioned. | Sentinel, API, Adapter Framework | +| **Ready (Condition / Phase)** | **Removed in v1.0.0.** The aggregated cluster-level status formerly derived from all adapter conditions. Replaced by `Reconciled` and `LastKnownReconciled` conditions. See [v1.0.0 breaking changes](release/v1.0.0/breaking-changes.md). | Sentinel, API, Adapter Framework | +| **Reconciled (Condition)** | The resource-level condition that replaces `Ready` in v1.0.0. Set to `True` when all required adapters have reported `Available: True` at the current resource generation. Used by Sentinel's default CEL decision logic (`condition("Reconciled")`) to determine when a resource is fully reconciled. | API, Sentinel, Adapter Framework | | **Reconciliation** | The process of comparing desired state (resource spec) with actual state (cloud provider resources) and taking action to close the gap. In HyperFleet, reconciliation is triggered by Sentinel publishing a CloudEvent, which causes all relevant Adapters to evaluate whether they need to act. | Sentinel, Adapter Framework | | **Sentinel** | The HyperFleet reconciliation service that continuously polls the API for resources, evaluates configurable CEL-based decision logic, and publishes CloudEvents to the Message Broker to trigger adapter processing. Multiple Sentinel instances can be deployed with different resource selectors for horizontal scalability. See: [Sentinel Design](../components/sentinel/sentinel.md) | Sentinel | | **Shard / Sharding** | The strategy of deploying multiple Sentinel instances, each watching a different subset of resources (e.g., by region label). Enables horizontal scaling of the reconciliation loop without coordination between instances. Note: this is label-based filtering, not true sharding — operators must ensure full resource coverage. | Sentinel | diff --git a/hyperfleet/docs/release/v1.0.0/breaking-changes.md b/hyperfleet/docs/release/v1.0.0/breaking-changes.md new file mode 100644 index 00000000..f549269a --- /dev/null +++ b/hyperfleet/docs/release/v1.0.0/breaking-changes.md @@ -0,0 +1,61 @@ +--- +Status: Active +Owner: HyperFleet Team +Last Updated: 2026-06-10 +--- + +# v0.2.0 to v1.0.0 Breaking-Change and Reconfiguration Checklist + +> **Audience:** Internal partner teams (GCP, ROSA) upgrading from HyperFleet experimental (v0.2.0) to v1.0.0, and the HyperFleet release team. + +## Overview + +This document is the deliverable for [HYPERFLEET-1177](https://redhat.atlassian.net/browse/HYPERFLEET-1177). It lists every breaking change and required reconfiguration step between v0.2.0 and v1.0.0 so partner teams can reconfigure and redeploy with no surprises. + +## Breaking Changes + +### API Contract + +| # | Change | Partner Action | Impact if Missed | Classification | Ticket | Parent | +|---|--------|---------------|-----------------|----------------|--------|--------| +| 1 | **Status report: POST to PUT** for clusters and nodepools | Change HTTP method in all adapter post-actions from POST to PUT | **405 Method Not Allowed.** PUT-only routes confirmed in `plugins/clusters/plugin.go`; no POST route exists | Automatable (1178) | [HYPERFLEET-978](https://redhat.atlassian.net/browse/HYPERFLEET-978) | - | +| 2 | **Ready condition removed; replaced by Reconciled** | Replace all `Ready` references with `Reconciled` in status queries, scripts, and monitoring | **Silent hang.** API silently accepts `type="Ready"` in status reports but never aggregates it into resource conditions; scripts polling for Ready=True wait forever | Automatable (1178) | [HYPERFLEET-1052](https://redhat.atlassian.net/browse/HYPERFLEET-1052) | [HYPERFLEET-559](https://redhat.atlassian.net/browse/HYPERFLEET-559) | +| 3 | **Aggregated condition Available renamed to LastKnownReconciled** | Replace `Available` with `LastKnownReconciled` in all resource-level condition queries | **Not found.** Aggregation code produces only `Reconciled` and `LastKnownReconciled`; resource-level `Available` is no longer emitted | Automatable (1178) | [HYPERFLEET-1017](https://redhat.atlassian.net/browse/HYPERFLEET-1017) | - | +| 4 | **List responses: kind field removed** | Remove `kind` expectations in list response parsing | **Parse error** if client schema requires `kind`. Confirmed: `kind` removed from `ClusterList`, `NodePoolList`, `ResourceList`, `AdapterStatusList`; test asserts `raw.NotTo(HaveKey("kind"))` | Automatable (1178) | [HYPERFLEET-1143](https://redhat.atlassian.net/browse/HYPERFLEET-1143) | - | + +### Configuration + +| # | Change | Partner Action | Impact if Missed | Classification | Ticket | Parent | +|---|--------|---------------|-----------------|----------------|--------|--------| +| 5 | **Sentinel: messaging_system field removed and config parser now strict** | Remove `messaging_system` from all Sentinel configs and `MESSAGING_SYSTEM` env var. Also remove any other unrecognized fields | **Sentinel fails to start.** v0.2.0 used permissive `v.Unmarshal()`; v1.0.0 uses strict `v.UnmarshalExact()` which rejects unknown fields. Any unrecognized field in the config will cause a startup failure | Automatable (1178) | Sentinel CHANGELOG | - | +| 6 | **Sentinel and Adapter CEL: Ready to Reconciled** | Update ALL `message_decision` CEL in Sentinel AND all precondition/capture CEL in Adapter configs | **CRITICAL SILENT FAILURE.** `condition("Ready")` returns a zero-value struct (empty status, generation 0) via fallback in `decision.go`; CEL evaluates to false; Sentinel never publishes events; adapters never fire; clusters stuck in pending. No error logged | Automatable (1178) | [HYPERFLEET-857](https://redhat.atlassian.net/browse/HYPERFLEET-857) | [HYPERFLEET-559](https://redhat.atlassian.net/browse/HYPERFLEET-559) | +| 7 | **Sentinel Helm: config.hyperfleetApi moved to config.clients.hyperfleetApi** | Move API client config from `config.hyperfleetApi.baseUrl` to `config.clients.hyperfleetApi.baseUrl` in Sentinel Helm values | **Sentinel fails to start.** Helm template references `.Values.config.clients.hyperfleetApi.baseUrl`; old path renders empty; validation fails with "clients.hyperfleet_api.base_url required" | Automatable (1178) | [HYPERFLEET-549](https://redhat.atlassian.net/browse/HYPERFLEET-549), [HYPERFLEET-866](https://redhat.atlassian.net/browse/HYPERFLEET-866) | - | +| 8 | **Sentinel Helm: messageDecision params changed from map to list** | Restructure `messageDecision.params` from map format (`key: 'expr'`) to list format (`- name: key, expr: 'expr'`) | **Helm template fails to render.** Template iterates with `.name` and `.expr` fields; old map format has no such fields. Also fixes non-deterministic param ordering | Automatable (1178) | [HYPERFLEET-1011](https://redhat.atlassian.net/browse/HYPERFLEET-1011) | - | +| 9 | **API Helm: jwt.enabled default changed from true to false** | Explicitly set `config.server.jwt.enabled: true` in Helm values if JWT authentication is required | **API accepts unauthenticated requests.** v0.2.0 Helm chart defaulted to `jwt.enabled: true`; v1.0.0 defaults to `false`. Requests that were previously rejected without a valid JWT token are now accepted | Doc-only (1163/1179) | Helm chart values.yaml diff | - | +| 10 | **JWT identity_claim required when JWT enabled** | Add `server.jwt.identity_claim` pointing to JWT claim for caller identity | **API fails to start** if JWT enabled without identity_claim. Config validation: `if c.IdentityClaim == "" { return fmt.Errorf("server.jwt.identity_claim is required") }`. If claim name is set but does not exist in token, mutating requests (POST/PUT/PATCH/DELETE) fail with auth error; GET requests proceed without identity | Doc-only (1163/1179) | [HYPERFLEET-1134](https://redhat.atlassian.net/browse/HYPERFLEET-1134) | [HYPERFLEET-824](https://redhat.atlassian.net/browse/HYPERFLEET-824) | +| 11 | **Default log format changed to JSON (Sentinel and Adapter)** | Update log parsing pipelines if expecting text format from Sentinel or Adapter | **Log parsing breaks** for Sentinel and Adapter output. `DefaultConfig()` changed from `FormatText` to `FormatJSON` in both components (PRs #103 and #109). API was already JSON in v0.2.0 and is unchanged | Doc-only (1163/1179) | [HYPERFLEET-908](https://redhat.atlassian.net/browse/HYPERFLEET-908) | - | + +### Auth + +| # | Change | Partner Action | Impact if Missed | Classification | Ticket | Parent | +|---|--------|---------------|-----------------|----------------|--------|--------| +| 12 | **OCM SDK removed; standalone JWT handler** | Remove OCM-specific auth config (`server.acl`, `server.authz`); switch to JWT config (issuer_url, audience, identity_claim) | **Old auth config silently has no effect.** Entire `pkg/client/ocm/` and `pkg/config/ocm.go` removed. Old `server.acl` and `server.authz` Helm values are silently ignored. Combined with #9 (jwt.enabled defaults to false), the API may start with no authentication unless JWT is explicitly configured | Doc-only (1163/1179) | [HYPERFLEET-492](https://redhat.atlassian.net/browse/HYPERFLEET-492) | - | + +### Helm Charts + +| # | Change | Partner Action | Impact if Missed | Classification | Ticket | Parent | +|---|--------|---------------|-----------------|----------------|--------|--------| +| 13 | **PgBouncer sidecar replaced by generic sidecars** | Rewrite PgBouncer config as generic sidecar entry in `sidecars` list | **No DB proxy sidecar.** Entire `database.pgbouncer.*` Helm values tree removed (not deprecated); old values silently ignored; pod starts without proxy | Doc-only (1163/1179) | [HYPERFLEET-937](https://redhat.atlassian.net/browse/HYPERFLEET-937) | - | +| 14 | **Sentinel config mount paths changed** | Update any custom scripts, init containers, or sidecar configs referencing `/etc/sentinel/config.yaml` or `/etc/sentinel/broker.yaml` to `/etc/hyperfleet/config.yaml` and `/etc/hyperfleet/broker.yaml` | **Config not found.** Custom scripts or containers referencing old path fail | Doc-only (1163/1179) | [HYPERFLEET-549](https://redhat.atlassian.net/browse/HYPERFLEET-549) | - | +| 15 | **Sentinel BROKER_TOPIC env var removed** | Remove `BROKER_TOPIC` env var overrides; use `broker.topic` in Helm values instead | **Topic override ignored.** Env var no longer injected by deployment template; partners overriding topic via env var will have it silently ignored, falling back to Helm value | Doc-only (1163/1179) | [HYPERFLEET-549](https://redhat.atlassian.net/browse/HYPERFLEET-549) | - | +| 16 | **Adapter Helm: broker.type now explicitly required; RabbitMQ fields validated** | Add explicit `broker.type: rabbitmq` or `broker.type: googlepubsub` to adapter Helm values. For RabbitMQ: `url`, `queue`, `exchange`, and `routingKey` are now required | **Helm rendering fails.** `_helpers.tpl` `brokerType` function now uses `required` instead of inference; missing `broker.type` fails at render time. Additionally, `validateBrokerConfig` template requires `url`, `queue`, `exchange`, `routingKey` for RabbitMQ | Automatable (1178) | Adapter Helm chart `_helpers.tpl` | - | + +### Database + +| # | Change | Partner Action | Impact if Missed | Classification | Ticket | Parent | +|---|--------|---------------|-----------------|----------------|--------|--------| +| 17 | **Fresh database required** | Deploy a completely fresh database. Do not reuse v0.2.0 database | **Policy, not technical limitation.** Migrations technically CAN run on v0.2.0 schema (RENAME COLUMN works), but the project explicitly states "fresh DB, no migration." Reusing old DB risks table locks on rename and untested migration paths | Doc-only (1163/1179) | Epic description | [HYPERFLEET-1176](https://redhat.atlassian.net/browse/HYPERFLEET-1176) | + +## Open Actions + +1. **Track [HYPERFLEET-1117](https://redhat.atlassian.net/browse/HYPERFLEET-1117):** "API: skip Reconciled and LastKnownReconciled conditions when no required adapters are configured" is in New status. If it merges before v1.0.0, add to this checklist.