Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
166 changes: 166 additions & 0 deletions hyperfleet/docs/release-contract.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
---
Status: Draft
Owner: HyperFleet Team
Last Updated: 2026-06-02
---

# HyperFleet Release Contract

## What & Why

### What

This document defines the formal release contract between the HyperFleet team and its partner teams (GCP Offering Team and ROSA Regional Platform Team). It covers:

- **Release handoff contract**: what artifacts are produced, how partners are notified, and what SLAs apply
- **Integration testing strategy**: how each team gates HyperFleet changes and how tests are coordinated across teams
- **Test ownership map**: which team owns which layer of testing to eliminate redundancy and close coverage gaps

### Why

- Without a defined contract, release handoffs are ad-hoc and require manual coordination between teams, slowing delivery and increasing error risk
- Unclear test ownership creates either coverage gaps (bugs reaching production) or overlapping test suites (longer pipelines without benefit)
- Partner teams deploying HyperFleet via Argo CD and Terraform need predictable, machine-consumable artifacts (OCI Helm charts) to automate rollout
- A shared testing strategy is required before building confidence in continuous delivery pipelines end-to-end

### Out of scope

- No automated deployment of HyperFleet releases to partner integration environments

---

## Partner Teams

| Team | Platform | Deployment Method |
|------|----------|-------------------|
| GCP Offering Team | GCP | Argo CD + Terraform + Tekton Pipelines |
| ROSA Regional Platform | AWS ROSA | Argo CD + Terraform + AWS CodePipelines |

---

## Release Handoff Contract

### Release Artifacts

For each HyperFleet release, the following artifacts are produced and made available to partner teams:

| Artifact | Location | Format | Notes |
|----------|----------|--------|-------|
| Container images | `quay.io/redhat-services-prod/hyperfleet-tenant/hyperfleet/hyperfleet-{component}:{version}` | OCI image | Built automatically by Konflux on release tag push (PaC tag-triggered pipeline); includes signed provenance (SLSA L3) and SBOM |
| Helm charts | `quay.io/redhat-services-prod/hyperfleet-tenant/hyperfleet-{component}-chart:{version}` (see [Helm Chart Distribution](#helm-chart-distribution)) | OCI artifact | Published via Konflux `build-helm-chart-oci-ta` task; required for ROSA/Argo CD consumption |
| Release notes | `hyperfleet-release` repo, `releases/release-X.Y/` | Markdown | Compatibility matrix, breaking changes, upgrade guide |
Comment thread
rh-amarin marked this conversation as resolved.
| Compatibility matrix | `hyperfleet-release` repo | Markdown table | Maps validated component version combinations |
| Git tags | Per-component repos + `hyperfleet-release` | `vX.Y.Z` / `release-X.Y` | See [Release Process](hyperfleet-release-process.md) |

@mliptak0 mliptak0 Jun 2, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to navigate to this link but it's missing

See [Release Process](hyperfleet-release-process.md) |



When a GA release is published, it will have detail of which ROSA/GCP versions have passed the integration tests to use as compatibility matrix. This allows to potentially introduce a breaking change in one release, that may be only deployable by another pillar.
Comment thread
ciaranRoche marked this conversation as resolved.

### Helm Chart Distribution

HyperFleet team publishes charts as OCI artifacts to `quay.io/redhat-services-prod/hyperfleet-tenant/` via Konflux (using the `build-helm-chart-oci-ta` task), eliminating the temporary workaround and the Git coupling. Blocked on the Konflux team providing a managed release pipeline for external registries (RELEASE-2363).


### Notification SLA

When a HyperFleet GA release is published:

| Event | Channel | Timeline | Recipients |
|-------|---------|----------|------------|
| Release candidate available | `#hyperfleet-releases` Slack | RC cut day | GCP team, ROSA team |
| GA release published | `#hyperfleet-releases` Slack | GA day | GCP team, ROSA team |
| Breaking change in next release | `#hyperfleet-releases` Slack | ≥ 1 sprint before GA | GCP team, ROSA team |
Comment thread
ciaranRoche marked this conversation as resolved.
| Hotfix / patch release | `#hyperfleet-releases` Slack | Within 2 hours of GA tag | GCP team, ROSA team |


At this point in time (Jun 26) breaking changes are not blockers to HyperFleet releases as ROSA/GCP teams do not have to keep long running clusters and migrate data.

### Rollback / Recovery

HyperFleet uses a **roll-forward** strategy for MVP: issues are fixed via patch releases rather than rollback. See [Release Process — Release Recovery Strategy](hyperfleet-release-process.md#55-release-recovery-strategy).
Comment thread
ciaranRoche marked this conversation as resolved.

HyperFleet commits to (exact times TBD):

- Producing a patch release within **48 hours** for Blocker/Critical regressions
- Producing a patch release within **1 week** for Major regressions
Comment thread
rh-amarin marked this conversation as resolved.
- Maintaining N-1 backward compatibility so partner teams can remain pinned to the previous validated release while a fix is in flight

---

## Integration Testing Strategy

### Decision: Nightly Runs with OCI Chart Injection

**Agreed approach** (as of March 31, 2026 meeting):

- Start with **nightly runs** against HyperFleet `main` branch, not presubmit jobs
- Test against the **latest known-good stable version** of the ROSA regional platform (production Maestro version), replacing only the HyperFleet component under test
- The ROSA team will **temporarily enable OCI chart pushing** so the HyperFleet team can inject PR-built charts into the ROSA deployment pipeline
- Evaluate **non-blocking presubmit** integration with the HyperFleet release repository as a follow-up

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to add similar prow jobs—like the E2E testing Prow job—as a standard requirement and include them in the release X.Y checklist?


**Rationale**: Running full ROSA environment provisioning (~40 minutes + E2E duration) as a presubmit would significantly impact development velocity without proportional benefit at the current team scale. Nightly runs provide meaningful feedback without blocking day-to-day development.

**Note on ROSA's existing pre-merge capability**: The ROSA repo already has a working cross-component E2E pre-merge mechanism (triggered via Prow comment on PRs). The decision to start with nightly runs is about HyperFleet's readiness to onboard to that mechanism — not a limitation of the ROSA infrastructure. Per-PR testing remains the target once the OCI chart injection step is stable.


### Team Test Ownership

| Layer | Owner | Scope | Runs on |
|-------|-------|-------|---------|
| Unit tests | HyperFleet | Each component in isolation | Every PR (presubmit) |
| Integration tests | HyperFleet | Cross-component API contracts | Every PR (presubmit) |
| HyperFleet E2E | HyperFleet | HyperFleet stack end-to-end | Nightly (main branch) |
| ROSA integration | ROSA Team | Full ROSA region + HyperFleet override | Nightly (HyperFleet main) |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who is responsible to update the configuration of breaking changes for nightly integration testing? And how to notify?

| GCP integration | GCP Team | GCP deployment + HyperFleet | Nightly (HyperFleet main) via Tekton `gcp-region-e2e-pipeline` |
| Release gate | HyperFleet | All of the above must pass | Before GA tag |

Comment thread
coderabbitai[bot] marked this conversation as resolved.
**Important**: HyperFleet releases ship with a compatibility matrix for the ROSA/GCP versions tested. If newer ROSA/GCP integration tests fail, that doesn't gate the HyperFleet release as long as it passes the successful ROSA/GCP compatible versions.

Comment on lines +114 to +117

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make GA gate criteria internally operable and unambiguous.

Line 114 says “all of the above must pass,” but Line 116 introduces an exception for failed newer ROSA/GCP versions. This creates two gate definitions in the same contract. Define one canonical rule in the table row itself (e.g., “all mandatory compatible-version layers must pass”).

🧰 Tools
🪛 LanguageTool

[grammar] ~116-~116: Use a hyphen to join words.
Context: ...ong as it passes the successful ROSA/GCP compatible versions. ### Testing Gaps I...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hyperfleet/docs/release-contract.md` around lines 114 - 117, The "Release
gate" table row currently says "All of the above must pass" but is contradicted
by the ROSA/GCP exception; update the table row text to a single, unambiguous
canonical rule (for example: "All mandatory compatible-version layers must
pass") and move the ROSA/GCP exception out of the gate cell into a separate
clarifying note under the table (or rephrase it as "Compatibility matrix: newer
untested ROSA/GCP versions do not block release if tested compatible versions
pass") so the gate definition and the exception do not conflict; edit the table
row entry (the Release gate / HyperFleet cell) and adjust the following
paragraph accordingly to ensure only one authoritative gate rule exists.

### Testing Gaps Identified

| Gap | Owning Team | Mitigation |
|-----|-------------|------------|
| HyperFleet not yet onboarded to ROSA's pre-merge E2E mechanism | HyperFleet | Onboard to `openshift/release` Prow config + create `quay.io/rrp-dev-ci/` image repos (see onboarding steps above) |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the onboarding steps referencing here?

| Helm chart override (OCI) not yet wired into ROSA CI | ROSA + HyperFleet | Temporary OCI setup by ROSA team (Q2 2026, immediate action); replaced by Konflux `build-helm-chart-oci-ta` pipeline — blocked on Konflux managed release pipeline for external registries (RELEASE-2363); see [Helm OCI Distribution Design](../release/helm-oci-distribution-design.md) |
| GCP integration tests not yet publishing chart overrides into HyperFleet CI | GCP + HyperFleet | Blocked on GCP-334 (CLM/Argo CD integration in progress); nightly Tekton pipeline already exists but consumes pinned tags, not PR-built charts |
| Multi-component PR testing (API + Adapter in same PR) | HyperFleet | Nightly tests use `main` for all other components; single-component override per nightly run is the starting point |
| Presubmit integration gate for HyperFleet release repo | HyperFleet | Future action: non-blocking presubmit on `hyperfleet-release` repo |


---

## Alternatives Considered

### 1. Non-blocking Presubmit on HyperFleet Release Repository

Run the full ROSA integration pipeline as an optional, non-blocking presubmit job triggered on the `hyperfleet-release` repo.

**Rejected for now**: A ~40-minute+ non-blocking job provides weak signal — developers may ignore it, especially if failures are infrequent. Starting with nightly runs builds confidence in the pipeline before promoting it to presubmit. This remains a **future action**.

### 2. Consumer-Driven Contract Testing (Pact-style)

Define formal API contracts using a consumer-driven contract testing tool (e.g., Pact). ROSA and GCP publish their expectations; HyperFleet CI verifies them on every PR.

**Rejected for MVP**: The integration surface between HyperFleet and partner teams is primarily at the Helm chart / deployment configuration level, not a REST API contract boundary. Consumer-driven contract testing tools are better suited to service-to-service REST contracts. Helm value schema validation is a lighter-weight alternative to investigate post-MVP.

### 3. Automated Rollout to Integration Environments on GA

Trigger automatic deployment of each HyperFleet GA release to ROSA and GCP integration environments via webhooks.

**Rejected for MVP**: ROSA's pipeline takes ~40 minutes per run and requires environment-specific configuration overrides. Automating this safely requires tooling (OCI charts via Konflux, pipeline webhooks) not yet in place. Deferred to post-Q2 2026.

Comment on lines +131 to +150

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add an explicit “Trade-offs” section for this architectural decision doc.

You have “Alternatives Considered,” but there is no explicit “Trade-offs” section describing gains/losses/risks of the chosen approach. Please add one near the decision content.
As per coding guidelines, “when adding/altering architectural decision content, ensure the document includes required context: ‘Trade-offs’ and ‘Alternatives Considered’ (trade-offs and alternatives go hand-in-hand).”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hyperfleet/docs/release-contract.md` around lines 135 - 154, The document
lacks an explicit "Trade-offs" section; add a new "## Trade-offs" heading near
the decision content (for example immediately after the "## Alternatives
Considered" block) and list the specific gains, losses, and risks of the chosen
approach (e.g., nightly runs vs presubmit, skipping consumer-driven contract
testing, deferred automated rollout), mapping each trade-off to the related
alternative to satisfy the guideline that both "Alternatives Considered" and
"Trade-offs" are present; ensure the section uses the same markdown heading
style and is concise but covers operational risk, developer experience,
confidence level, and future work implications.

---

## Related Documents

- [HyperFleet Release Process](../release/hyperfleet-release-process.md) — release cadence, branching, artifacts

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these paths work? I think it should just be release/ instead of ../

- [Konflux Release Pipeline Design](../release/konflux-release-pipeline-design.md) — build and release pipeline architecture, CEL tag matching, version injection
- [Helm OCI Distribution Design](../release/helm-oci-distribution-design.md) — how Helm charts are published as OCI artifacts via Konflux
- [ADR 0014 — Konflux for Container Image Build and Release](../../adrs/0014-konflux-build-and-release.md)
- [ADR 0016 — Helm OCI Distribution via Konflux](../../adrs/0016-helm-oci-distribution.md)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, think we are jumping back an extra hop to get to adrs

- [Versioning Trade-offs](versioning-trade-offs.md) — SDK versioning, rollback considerations
- [E2E Testing Framework Spike Report](e2e-testing/e2e-testing-framework-spike-report.md)
- [E2E Run Strategy Spike Report](e2e-testing/e2e-run-strategy-spike-report.md)
- [ROSA — Adding a Component for Pre-merge E2E Testing](https://github.com/openshift-online/rosa-regional-platform/blob/main/docs/adding-component-pre-merge.md) — onboarding guide for the `/test rosa-regionality-compatibility-e2e` Prow trigger
- [ROSA — Testing Strategy Design](https://github.com/openshift-online/rosa-regional-platform/blob/main/docs/design/testing-strategy.md) — three CI workflows (pre-merge, nightly integration, nightly ephemeral)
- GCP-334 — CLM Components Deployment (linked Jira epic)
- HYPERFLEET-633 — Define release contract and integration testing strategy