From 5243d5330737f1a888de5748b7d966d2bcc08353 Mon Sep 17 00:00:00 2001 From: sherine-k Date: Wed, 10 Jun 2026 14:16:50 +0200 Subject: [PATCH 1/5] HYPERFLEET-1186 - docs: align documentation with codebase Update README.md, api-resources.md, database.md, and deployment.md to match the current state of the codebase. - Update Go version from 1.24 to 1.25 and PostgreSQL from 13 to 14 - Add generic resource types (WifConfigs, Channels, Versions) and plugins/ to project structure - Remove Ready condition from all JSON response examples (only Reconciled and LastKnownReconciled remain) - Fix adapter_statuses polymorphic columns: owner_type/owner_id to resource_type/resource_id - Fix soft delete column: deleted_at to deleted_time with deleted_by audit field - Replace labels table documentation with resources table documentation - Add Spec Validation section with opt-in qualifier linking to deployment.md - Add Statuses Endpoint vs. Resource Endpoint section - Restructure deployment.md into two chapters: Kubernetes/Helm and Local Execution - Fix pageSize default from 100 to 20 to match TypeSpec contract - Change migration tool reference from GORM to gormigrate - Update license to Apache License 2.0 Co-Authored-By: Claude Opus 4.6 --- README.md | 47 +++-- docs/api-resources.md | 61 +++---- docs/database.md | 27 +-- docs/deployment.md | 398 ++++++++++++++++++++++-------------------- 4 files changed, 284 insertions(+), 249 deletions(-) diff --git a/README.md b/README.md index b683cdea..0d6b5a29 100755 --- a/README.md +++ b/README.md @@ -6,7 +6,7 @@ HyperFleet API - Simple REST API for cluster lifecycle management. Provides CRUD ### Technology Stack -- **Language**: Go 1.24+ +- **Language**: Go 1.25+ - **API Definition**: OpenAPI 3.0 - **Code Generation**: oapi-codegen - **Database**: PostgreSQL with GORM ORM @@ -18,10 +18,12 @@ HyperFleet API - Simple REST API for cluster lifecycle management. Provides CRUD * OpenAPI 3.0 specification * Automated Go code generation from OpenAPI * Cluster and NodePool lifecycle management +* Generic resource types (WifConfigs, Channels, Versions) via plugin-based registration * Adapter-based status reporting with Kubernetes-style conditions +* Runtime spec validation against custom OpenAPI schemas * Pagination and search capabilities * Complete integration test coverage -* Database migrations with GORM +* Database migrations with gormigrate * Embedded OpenAPI specification using `//go:embed` ### Project Structure @@ -30,11 +32,18 @@ HyperFleet API - Simple REST API for cluster lifecycle management. Provides CRUD hyperfleet-api/ ├── cmd/hyperfleet-api/ # Application entry point ├── pkg/ -│ ├── api/ # API models and handlers +│ ├── api/ # API models and type definitions │ ├── dao/ # Data access layer -│ ├── db/ # Database setup and migrations +│ ├── db/ # Database setup, migrations, and session management │ ├── handlers/ # HTTP request handlers -│ └── services/ # Business logic +│ └── services/ # Service layer (status aggregation, CRUD) +├── plugins/ # Plugin-based resource registration +│ ├── clusters/ # Cluster resource plugin +│ ├── nodePools/ # NodePool resource plugin +│ ├── wifconfigs/ # WifConfig resource plugin +│ ├── channels/ # Channel resource plugin +│ ├── versions/ # Version resource plugin (child of Channel) +│ └── generic/ # Generic resource framework ├── openapi/ # Generated artifacts from hyperfleet-api-spec module ├── test/ # Integration tests and factories ├── docs/ # Detailed documentation @@ -45,7 +54,7 @@ hyperfleet-api/ ### Prerequisites -- **Go 1.24+**, **Podman**, **PostgreSQL 13+**, **Make** +- **Go 1.25+**, **Podman**, **PostgreSQL 14+**, **Make** See [PREREQUISITES.md](PREREQUISITES.md) for installation instructions. @@ -61,18 +70,26 @@ go mod download # 3. Build binary make build -# 4. Setup database +# 4. Setup database (local PostgreSQL container) make db/setup -# 5. Run migrations +# 5. Copy config file +cp configs/config.yaml.example configs/config.yaml + +# 6. Run migrations ./bin/hyperfleet-api migrate -# 6. Start service (no auth) +# 7. Start service (no auth) make run-no-auth ``` **Note**: Generated code is not tracked in git. You must run `make generate-all` after cloning. +The `migrate` and `serve` commands require a configuration file. The loader checks `--config` flag, then `HYPERFLEET_CONFIG` env var, then `/etc/hyperfleet/config.yaml`, then `./configs/config.yaml`. +If none are found, the command fails with `failed to load configuration`. Copy the example config or point to your own. + +For production database setup (external PostgreSQL, Cloud SQL, etc.), see [docs/deployment.md](docs/deployment.md#production-deployment). + ### Accessing the API The service starts on `localhost:8000`: @@ -110,7 +127,15 @@ Groups of compute nodes within clusters. - `GET /api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id}` - `GET/PUT /api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id}/statuses` -Both resources support pagination, label-based search, and adapter status reporting. See [docs/api-resources.md](docs/api-resources.md) for complete API documentation. +### Generic Resources + +The API also supports generic resource types registered via the plugin system. Currently available: + +- **WifConfigs** — `GET/POST /api/hyperfleet/v1/wifconfigs`, `GET/PATCH/DELETE .../wifconfigs/{id}` +- **Channels** — `GET/POST /api/hyperfleet/v1/channels`, `GET/PATCH/DELETE .../channels/{id}` +- **Versions** — `GET/POST /api/hyperfleet/v1/channels/{parent_id}/versions`, `GET/PATCH/DELETE .../versions/{id}` (child of Channel) + +All resources support pagination, label-based search, and spec validation. Clusters and NodePools additionally support adapter status reporting. See [docs/api-resources.md](docs/api-resources.md) for complete API documentation. ## Example Usage @@ -183,4 +208,4 @@ This project uses [pre-commit](https://pre-commit.io/) for code quality checks. ## License -[License information to be added] +This project is licensed under the Apache License 2.0. See [LICENSE](LICENSE) for details. diff --git a/docs/api-resources.md b/docs/api-resources.md index 37f2d1b4..14fc9b65 100644 --- a/docs/api-resources.md +++ b/docs/api-resources.md @@ -72,16 +72,6 @@ PUT /api/hyperfleet/v1/clusters/{cluster_id}/statuses "last_updated_time": "2025-01-01T00:00:00Z", "last_transition_time": "2025-01-01T00:00:00Z" }, - { - "type": "Ready", - "status": "False", - "reason": "ReconciledMissingAdapters", - "message": "Required adapters have not yet reported status", - "observed_generation": 1, - "created_time": "2025-01-01T00:00:00Z", - "last_updated_time": "2025-01-01T00:00:00Z", - "last_transition_time": "2025-01-01T00:00:00Z" - } ] } } @@ -89,7 +79,7 @@ PUT /api/hyperfleet/v1/clusters/{cluster_id}/statuses -**Note**: Status initially has `Reconciled=False`, `LastKnownReconciled=False`, and `Ready=False` conditions until adapters report status. +**Note**: Status initially has `Reconciled=False` and `LastKnownReconciled=False` conditions until adapters report status. ### Get Cluster @@ -137,16 +127,6 @@ PUT /api/hyperfleet/v1/clusters/{cluster_id}/statuses "last_updated_time": "2025-01-01T00:00:00Z", "last_transition_time": "2025-01-01T00:00:00Z" }, - { - "type": "Ready", - "status": "True", - "reason": "ReconciledAll", - "message": "All required adapters reported Available=True or Finalized=True at the current generation", - "observed_generation": 1, - "created_time": "2025-01-01T00:00:00Z", - "last_updated_time": "2025-01-01T00:00:00Z", - "last_transition_time": "2025-01-01T00:00:00Z" - } ] } } @@ -343,16 +323,6 @@ PUT /api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id}/statuses "last_updated_time": "2025-01-01T00:00:00Z", "last_transition_time": "2025-01-01T00:00:00Z" }, - { - "type": "Ready", - "status": "False", - "reason": "ReconciledMissingAdapters", - "message": "Required adapters have not yet reported status", - "observed_generation": 1, - "created_time": "2025-01-01T00:00:00Z", - "last_updated_time": "2025-01-01T00:00:00Z", - "last_transition_time": "2025-01-01T00:00:00Z" - } ] } } @@ -404,13 +374,6 @@ PUT /api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id}/statuses "message": "All required adapters report Available=True for the tracked generation", "observed_generation": 1 }, - { - "type": "Ready", - "status": "True", - "reason": "ReconciledAll", - "message": "All required adapters reported Available=True or Finalized=True at the current generation", - "observed_generation": 1 - } ] } } @@ -437,7 +400,7 @@ GET /api/hyperfleet/v1/clusters?page=1&pageSize=10 **Parameters:** - `page` - Page number (default: 1) -- `pageSize` - Items per page (default: 100) +- `pageSize` - Items per page (default: 20) **Response:** @@ -485,7 +448,6 @@ The status object contains synthesized conditions computed from adapter reports: - `conditions` - Array of resource conditions, including: - **Reconciled** - Whether all adapters have reconciled at the current spec generation - **LastKnownReconciled** - Whether resource is running at any known good configuration - - **Ready** *(deprecated — alias of Reconciled)* - Same semantics as Reconciled; prefer `Reconciled` for new integrations - Additional conditions from adapters (with `observed_generation`, timestamps) ### Condition Fields @@ -502,7 +464,7 @@ The status object contains synthesized conditions computed from adapter reports: - All above fields plus: - `observed_generation` - Generation this condition reflects - `created_time` - When condition was first created (API-managed) -- `last_updated_time` - API-managed. For per-adapter conditions, taken from `AdapterStatus.last_report_time`. For aggregated conditions (`Reconciled`, `LastKnownReconciled`, `Ready`), computed as the oldest valid adapter report time within the relevant generation bucket — not the latest report time +- `last_updated_time` - API-managed. For per-adapter conditions, taken from `AdapterStatus.last_report_time`. For aggregated conditions (`Reconciled`, `LastKnownReconciled`), computed as the oldest valid adapter report time within the relevant generation bucket — not the latest report time - `last_transition_time` - When status last changed (API-managed) ## Parameter Restrictions @@ -578,6 +540,23 @@ Same naming rules as cluster, but with a shorter maximum length. - **ResourceConditionStatus** (used in cluster/nodepool conditions): `True`, `False` - **OrderDirection**: `asc`, `desc` +## Spec Validation + +When an OpenAPI schema is configured (see [deployment.md](deployment.md#schema-validation) for setup), the API validates cluster and nodepool `spec` fields on every create and update request. If no schema is configured, all specs are accepted without validation. When a schema is configured: + +- `POST /clusters` and `POST /nodepools` validate `spec` against `ClusterSpec` or `NodePoolSpec` from the schema +- `PATCH /clusters/{id}` and `PATCH /nodepools/{id}` validate the merged result +- Invalid specs return a `400` with validation details in the error response + +The schema is configured via `--server-openapi-schema-path` or the `validationSchema` section in the Helm chart. See [Validation Schema](../openapi/README.md#validation-schema) for details. + +## Statuses Endpoint vs. Resource Endpoint + +- `GET /clusters/{id}` returns the cluster with **aggregated** status conditions (`Reconciled`, `LastKnownReconciled`, and per-adapter conditions synthesized from adapter reports). +- `GET /clusters/{id}/statuses` returns the **raw adapter status records** — one per adapter that has reported. These are the individual reports, not the aggregated view. + +The same distinction applies to nodepools. + ## Related Documentation - [Example Usage](../README.md#example-usage) - Practical examples diff --git a/docs/database.md b/docs/database.md index 9f40ed0c..eaf89185 100644 --- a/docs/database.md +++ b/docs/database.md @@ -9,20 +9,26 @@ HyperFleet API uses PostgreSQL with GORM ORM. The schema follows a simple relati ## Core Tables ### clusters -Primary resources for cluster management. Contains cluster metadata and JSONB spec field for provider-specific configuration. +Primary resources for cluster management. It contains : +* cluster metadata, +* a JSONB `spec` field for provider-specific configuration, +* a JSONB `labels` field for key-value categorization, +* a JSONB `status_conditions` field for synthesized status. +* `deleted_time` for soft delete +* and `deleted_by` for audit. ### node_pools -Child resources owned by clusters, representing groups of compute nodes. Uses foreign key relationship with cascade delete. +Child resources owned by clusters, representing groups of compute nodes. References clusters via `owner_id` with a `RESTRICT` foreign key. Same column layout as clusters (including `labels`, `status_conditions`, `deleted_time`, `deleted_by`). ### adapter_statuses -Polymorphic status records for both clusters and node pools. Stores adapter-reported conditions in JSONB format. +Polymorphic status records for both clusters and node pools. Stores adapter-reported conditions in JSONB format. No soft delete — rows are hard-deleted or replaced. **Polymorphic pattern:** -- `owner_type` + `owner_id` allows one table to serve both clusters and node pools -- Enables efficient status lookups across resource types +- `resource_type` + `resource_id` allows one table to serve both clusters and node pools +- Unique constraint on `(resource_type, resource_id, adapter)` — one record per adapter per resource -### labels -Key-value pairs for resource categorization and search. Uses polymorphic association to support both clusters and node pools. +### resources +Generic resource table used by the plugin system for extensible resource types (WifConfigs, Channels, Versions, etc.). Stores `kind`, `name`, `spec` (JSONB), `labels` (JSONB), and optional owner references (`owner_id`, `owner_kind`, `owner_href`) for parent-child relationships. Uses `deleted_time`/`deleted_by` for soft delete. Unique name constraints are scoped by `kind` and `owner_id`. ## Schema Relationships @@ -32,8 +38,9 @@ clusters (1) ──→ (N) node_pools │ │ └────────┬───────────┘ │ - ├──→ adapter_statuses (polymorphic) - └──→ labels (polymorphic) + └──→ adapter_statuses (polymorphic via resource_type + resource_id) + +resources (standalone, self-referencing parent-child via owner_id) ``` ## Key Design Patterns @@ -52,7 +59,7 @@ Flexible schema storage for: ### Soft Delete -Resources use GORM's soft delete pattern with `deleted_at` timestamp. Soft-deleted records are excluded from queries by default. +Clusters, node pools, and generic resources use a custom soft delete pattern with a `deleted_time` timestamp and `deleted_by` audit field. Soft-deleted records are excluded from queries by default. Adapter statuses do not use soft delete. ### Migration System diff --git a/docs/deployment.md b/docs/deployment.md index 9d4d2fa5..71c75435 100644 --- a/docs/deployment.md +++ b/docs/deployment.md @@ -1,12 +1,19 @@ # Deployment Guide -This guide covers building container images and deploying HyperFleet API to Kubernetes using Helm. +This guide covers two deployment modes: -## Container Image +- **[Kubernetes Deployment (Helm)](#kubernetes-deployment-helm)** — deploying to a cluster via Helm chart (partners, staging, production) +- **[Local Execution](#local-execution)** — running the binary directly on your machine (HF engineers, development, debugging) -### Building Images +--- -Build and push container images: +## Kubernetes Deployment (Helm) + +Deploy HyperFleet API to a Kubernetes cluster using the included Helm chart. Typical use cases: partner deployments, staging, production, engineer validation on a cluster. + +### Container Image + +#### Building Images ```bash # Build container image with default tag @@ -22,11 +29,9 @@ make image-push QUAY_USER=myuser make image-dev ``` -### Image Registry Configuration +#### Image Registry Configuration -The `image.registry` value defaults to `CHANGE_ME` - a placeholder that intentionally prevents accidental deployments with an incorrect registry. You **must** set this to your actual container registry before deploying. - -#### Image Locations by Environment +The `image.registry` value in [`charts/values.yaml`](../charts/values.yaml) defaults to `CHANGE_ME` — a placeholder that intentionally prevents accidental deployments with an incorrect registry. You **must** set this to your actual container registry before deploying. | Environment | Image | |-------------|-------| @@ -34,53 +39,39 @@ The `image.registry` value defaults to `CHANGE_ME` - a placeholder that intentio | Staging | `quay.io/openshift-hyperfleet/hyperfleet-api:v` | | Production | `quay.io/openshift-hyperfleet/hyperfleet-api:v` | -#### Example values.yaml +Example `values.yaml` overrides: -Personal development image: ```yaml +# Production/Staging (official image) +image: + registry: quay.io + repository: openshift-hyperfleet/hyperfleet-api + tag: v1.2.3 + +# Personal development image image: registry: quay.io repository: user/hyperfleet-api tag: dev-abc1234 -``` -Production/Staging (official image): -```yaml -image: - registry: quay.io - repository: openshift-hyperfleet/hyperfleet-api - tag: v1.2.3 -``` -### Custom Registry +``` -To use a custom container registry: +#### Custom Registry ```bash -# Build with custom registry make image \ IMAGE_REGISTRY=your-registry.io/yourorg \ IMAGE_TAG=v1.0.0 -# Push to custom registry podman push your-registry.io/yourorg/hyperfleet-api:v1.0.0 ``` -## Configuration - -HyperFleet API is configured via environment variables and configuration files. - -### Configuration Methods +### Configuration in Kubernetes -**Kubernetes deployments (recommended):** -- Non-sensitive config: ConfigMap (automatically created by Helm Chart from `values.yaml`) -- Sensitive data: Secrets with `secretKeyRef` (Kubernetes best practice, automatic via Helm Chart) - -**Local development:** -- Configuration file: `./configs/config.yaml` or `--config` flag -- Environment variables: Direct values for quick testing - -**See [Configuration Guide](config.md) for complete reference and priority rules.** +The Helm chart manages configuration through: +- **ConfigMap** — generated from [`charts/values.yaml`](../charts/values.yaml) for non-sensitive settings +- **Secrets** — database credentials injected via `secretKeyRef`
Configuration Flow in Kubernetes (click to expand) @@ -141,15 +132,17 @@ HyperFleet API is configured via environment variables and configuration files.
-### Schema Validation - -The API validates cluster and nodepool `spec` fields against an OpenAPI schema. This allows different providers (GCP, AWS, Azure) to have different spec structures. +**Example: Setting required adapters:** +```bash +--set 'config.adapters.required.cluster={validation,dns,pullsecret,hypershift}' \ +--set 'config.adapters.required.nodepool={validation,hypershift}' +``` -The schema path is configured via `--server-openapi-schema-path` (or `HYPERFLEET_SERVER_OPENAPI_SCHEMA_PATH`). The default is `openapi/openapi.yaml`. The API **will fail to start** if the configured schema file is missing, unreadable, or invalid — this ensures misconfigured deployments are caught immediately rather than silently accepting invalid data. +See [Configuration Guide](config.md) for the complete reference, and [`charts/values.yaml`](../charts/values.yaml) for all Helm-specific settings. -#### Validation Schema via Helm +### Schema Validation via Helm -Partners can supply a custom OpenAPI schema using the Helm chart: +Partners can supply a custom OpenAPI schema for `spec` field validation: ```yaml validationSchema: @@ -186,58 +179,7 @@ validationSchema: existingConfigMap: my-validation-schema ``` -See [Configuration Guide](config.md) for all configuration options. - -### Configuration - -HyperFleet API configuration is managed through: -- **Helm Chart values** (`values.yaml`) for Kubernetes deployments -- **Configuration file** (`config.yaml`) for local development -- **Environment variables** for overrides - -**For Kubernetes deployments**, the Helm Chart generates: -- **ConfigMap** from `values.yaml` for non-sensitive configuration -- **Secret mounts** for credentials (using `*_FILE` environment variables) - -**Example: Setting required adapters (Helm):** -```bash ---set 'config.adapters.required.cluster={validation,dns,pullsecret,hypershift}' \ ---set 'config.adapters.required.nodepool={validation,hypershift}' -``` - -**Example: Development override (environment variable):** -```bash -export HYPERFLEET_LOGGING_LEVEL=debug -``` - -**For complete configuration reference**, including all available settings, defaults, and validation rules, see: -- **[Configuration Guide](config.md)** - Complete reference for all configuration options -- **[Helm Chart values.yaml](../charts/values.yaml)** - Kubernetes-specific settings - -## Kubernetes Deployment - -### Using Helm Chart - -The project includes a Helm chart for Kubernetes deployment with configurable PostgreSQL support. - -#### Development Deployment - -Deploy with built-in PostgreSQL for development and testing: - -```bash -helm install hyperfleet-api ./charts/ \ - --namespace hyperfleet-system \ - --create-namespace \ - --set image.registry=quay.io \ - --set 'config.adapters.required.cluster={validation,dns,pullsecret,hypershift}' \ - --set 'config.adapters.required.nodepool={validation,hypershift}' -``` - -This creates: -- HyperFleet API deployment -- PostgreSQL StatefulSet -- Services for both components -- ConfigMaps and Secrets +### Deploying #### Production Deployment @@ -277,13 +219,14 @@ helm install hyperfleet-api ./charts/ \ This is the Kubernetes-native pattern for handling sensitive data securely. -#### Custom Image Deployment +#### Development Deployment (Using custom images) -Deploy with custom container image (e.g., `quay.io/myuser/hyperfleet-api:v1.0.0`): +Deploy with built-in PostgreSQL for development and testing (e.g., for engineer validation on a cluster): ```bash helm install hyperfleet-api ./charts/ \ --namespace hyperfleet-system \ + --create-namespace \ --set image.registry=quay.io \ --set image.repository=myuser/hyperfleet-api \ --set image.tag=v1.0.0 \ @@ -291,11 +234,16 @@ helm install hyperfleet-api ./charts/ \ --set 'config.adapters.required.nodepool={validation,hypershift}' ``` -**Note**: The `registry` should contain only the registry domain (e.g., `quay.io`, `docker.io`). The `repository` includes the organization and image name (e.g., `myuser/hyperfleet-api`). +This creates: +- HyperFleet API deployment +- PostgreSQL StatefulSet +- Services for both components +- ConfigMaps and Secrets + -#### Upgrade Deployment +**Note**: The `registry` should contain only the registry domain (e.g., `quay.io`, `docker.io`). The `repository` includes the organization and image name (e.g., `myuser/hyperfleet-api`). -Upgrade to a new version: +#### Upgrade ```bash helm upgrade hyperfleet-api ./charts/ \ @@ -305,46 +253,15 @@ helm upgrade hyperfleet-api ./charts/ \ #### Uninstall -Remove the deployment: - ```bash helm uninstall hyperfleet-api --namespace hyperfleet-system ``` -## Helm Values +#### Custom Values File -### Key Configuration Options - -| Parameter | Description | Default | -|-----------|-------------|---------| -| `image.registry` | Container registry | `CHANGE_ME` (must be set explicitly) | -| `image.repository` | Image repository | `openshift-hyperfleet/hyperfleet-api` | -| `image.tag` | Image tag | `latest` | -| `image.pullPolicy` | Image pull policy | `Always` | -| `config.adapters.required.cluster` | Cluster adapters required for Ready state | `[]` | -| `config.adapters.required.nodepool` | Nodepool adapters required for Ready state | `[]` | -| `config.server.jwt.enabled` | Enable JWT authentication | `true` | -| `database.postgresql.enabled` | Enable built-in PostgreSQL | `true` | -| `database.external.enabled` | Use external database | `false` | -| `database.external.secretName` | Secret containing database credentials | `hyperfleet-db-external` | -| `serviceMonitor.enabled` | Enable Prometheus Operator ServiceMonitor | `false` | -| `serviceMonitor.interval` | Metrics scrape interval | `30s` | -| `serviceMonitor.scrapeTimeout` | Metrics scrape timeout | `10s` | -| `serviceMonitor.labels` | Additional labels for Prometheus selector | `{}` | -| `serviceMonitor.namespace` | Namespace for ServiceMonitor (if different) | `""` | -| `replicaCount` | Number of API replicas | `1` | -| `resources.limits.cpu` | CPU limit | `500m` | -| `resources.limits.memory` | Memory limit | `512Mi` | -| `podDisruptionBudget.enabled` | Enable PodDisruptionBudget | `false` | -| `podDisruptionBudget.minAvailable` | Minimum available pods during disruption | `1` | -| `podDisruptionBudget.maxUnavailable` | Maximum unavailable pods during disruption | - | - -### Custom Values File - -Create a `values.yaml` file: +Create a `values.yaml` file for repeatable deployments: ```yaml -# values.yaml image: registry: quay.io repository: myuser/hyperfleet-api @@ -384,73 +301,78 @@ resources: memory: 512Mi ``` -Deploy with custom values: ```bash helm install hyperfleet-api ./charts/ \ --namespace hyperfleet-system \ --values values.yaml ``` -## Helm Operations +### Helm Values Reference + +| Parameter | Description | Default | +|-----------|-------------|---------| +| `image.registry` | Container registry | `CHANGE_ME` (must be set explicitly) | +| `image.repository` | Image repository | `openshift-hyperfleet/hyperfleet-api` | +| `image.tag` | Image tag | `latest` | +| `image.pullPolicy` | Image pull policy | `Always` | +| `config.adapters.required.cluster` | Cluster adapters required for Reconciled state | `[]` | +| `config.adapters.required.nodepool` | Nodepool adapters required for Reconciled state | `[]` | +| `config.server.jwt.enabled` | Enable JWT authentication | `true` | +| `database.postgresql.enabled` | Enable built-in PostgreSQL | `true` | +| `database.external.enabled` | Use external database | `false` | +| `database.external.secretName` | Secret containing database credentials | `hyperfleet-db-external` | +| `serviceMonitor.enabled` | Enable Prometheus Operator ServiceMonitor | `false` | +| `serviceMonitor.interval` | Metrics scrape interval | `30s` | +| `serviceMonitor.scrapeTimeout` | Metrics scrape timeout | `10s` | +| `serviceMonitor.labels` | Additional labels for Prometheus selector | `{}` | +| `serviceMonitor.namespace` | Namespace for ServiceMonitor (if different) | `""` | +| `replicaCount` | Number of API replicas | `1` | +| `resources.limits.cpu` | CPU limit | `500m` | +| `resources.limits.memory` | Memory limit | `512Mi` | +| `podDisruptionBudget.enabled` | Enable PodDisruptionBudget | `false` | +| `podDisruptionBudget.minAvailable` | Minimum available pods during disruption | `1` | +| `podDisruptionBudget.maxUnavailable` | Maximum unavailable pods during disruption | - | -### Check Deployment Status +### Operations + +#### Check Deployment Status ```bash -# Get deployment status helm status hyperfleet-api --namespace hyperfleet-system - -# List all releases helm list --namespace hyperfleet-system - -# Check pods kubectl get pods --namespace hyperfleet-system - -# Check services kubectl get svc --namespace hyperfleet-system ``` -### View Logs +#### View Logs ```bash -# View API logs kubectl logs -f deployment/hyperfleet-api --namespace hyperfleet-system - -# View logs from all pods kubectl logs -f -l app=hyperfleet-api --namespace hyperfleet-system -# View PostgreSQL logs (if using built-in) +# PostgreSQL logs (if using built-in) kubectl logs -f statefulset/hyperfleet-postgresql --namespace hyperfleet-system ``` -### Troubleshooting +#### Troubleshooting ```bash -# Describe pod for events and status kubectl describe pod --namespace hyperfleet-system - -# Check deployment events kubectl get events --namespace hyperfleet-system --sort-by='.lastTimestamp' - -# Exec into pod for debugging kubectl exec -it deployment/hyperfleet-api --namespace hyperfleet-system -- /bin/sh - -# Check secrets kubectl get secrets --namespace hyperfleet-system - -# Verify ConfigMaps kubectl get configmaps --namespace hyperfleet-system ``` -## Health Checks +### Health Checks The deployment includes: - Liveness probe: `GET /healthz` (port 8080) - Returns 200 if the process is alive - Readiness probe: `GET /readyz` (port 8080) - Returns 200 when ready to receive traffic, 503 during startup/shutdown - Metrics: `GET /metrics` (port 9090) - Prometheus metrics endpoint -## Scaling +### Scaling -Scale replicas: ```bash # Manual scaling kubectl scale deployment hyperfleet-api --replicas=3 --namespace hyperfleet-system @@ -463,34 +385,27 @@ helm upgrade hyperfleet-api ./charts/ \ Enable autoscaling via Helm values (`autoscaling.enabled=true`). -## Monitoring +### Monitoring Prometheus metrics available at `http://:9090/metrics`. -### Prometheus Operator Integration - -For clusters with Prometheus Operator, enable the ServiceMonitor to automatically discover and scrape metrics: +#### Prometheus Operator Integration ```bash +# Enable ServiceMonitor helm install hyperfleet-api ./charts/ \ --namespace hyperfleet-system \ --set image.registry=quay.io \ --set serviceMonitor.enabled=true -``` - -If your Prometheus requires specific labels for service discovery, add them: -```bash +# With custom Prometheus selector labels helm install hyperfleet-api ./charts/ \ --namespace hyperfleet-system \ --set image.registry=quay.io \ --set serviceMonitor.enabled=true \ --set serviceMonitor.labels.release=prometheus -``` - -To create the ServiceMonitor in a different namespace (e.g., `monitoring`): -```bash +# ServiceMonitor in a different namespace helm install hyperfleet-api ./charts/ \ --namespace hyperfleet-system \ --set image.registry=quay.io \ @@ -498,7 +413,7 @@ helm install hyperfleet-api ./charts/ \ --set serviceMonitor.namespace=monitoring ``` -## Production Deployment Checklist +### Production Checklist Before deploying to production, ensure: @@ -513,19 +428,7 @@ Before deploying to production, ensure: - [ ] **Monitoring**: ServiceMonitor enabled if using Prometheus Operator - [ ] **TLS**: HTTPS enabled for API endpoint (optional) -## Production Best Practices - -- Use external managed database (Cloud SQL, RDS, Azure Database) with automated backups -- Store all sensitive data in Kubernetes Secrets, never in ConfigMap or values.yaml -- Enable authentication with `config.server.jwt.enabled=true` -- Set resource limits and use multiple replicas for high availability -- Use specific image tags (semantic versioning) instead of `latest` -- Enable PodDisruptionBudget for zero-downtime during cluster maintenance -- Configure health probes with appropriate timeouts for your workload - -## Complete Deployment Example - -### GKE Deployment +### Complete Example: GKE Deployment ```bash # 1. Build and push image @@ -570,7 +473,128 @@ kubectl port-forward svc/hyperfleet-api 8000:8000 curl http://localhost:8000/api/hyperfleet/v1/clusters ``` +--- + +## Local Execution + +Run HyperFleet API directly on your machine without Helm or Kubernetes. Typical use cases: local development, debugging, integration testing. + +### Prerequisites + +- Go 1.25+, Podman, Make +- A running PostgreSQL instance (local container or external) + +### Configuration + +The application loads configuration in this priority order: **CLI flags > environment variables > config file > defaults**. + +**Config file:** Copy the example and adjust as needed: + +```bash +cp configs/config.yaml.example configs/config.yaml +``` + +The loader searches for a config file in this order: +1. `--config` flag (explicit path) +2. `HYPERFLEET_CONFIG` environment variable +3. `/etc/hyperfleet/config.yaml` (production default) +4. `./configs/config.yaml` (development default) + +If none are found, the command fails with `failed to load configuration`. + +**Environment variables:** Override any config value with the `HYPERFLEET_*` prefix: + +```bash +export HYPERFLEET_DATABASE_HOST=localhost +export HYPERFLEET_DATABASE_PORT=5432 +export HYPERFLEET_DATABASE_NAME=hyperfleet +export HYPERFLEET_DATABASE_USER=hyperfleet +export HYPERFLEET_DATABASE_PASSWORD=hyperfleet-dev-password +export HYPERFLEET_LOGGING_LEVEL=debug +export HYPERFLEET_SERVER_PORT=8000 +``` + +See [Configuration Guide](config.md) for the complete reference and all available settings. + +### Database Setup + +**Option A: Local PostgreSQL container (quickest)** + +```bash +make db/setup # Creates a PostgreSQL container via Podman +make db/login # Connect to the database for inspection +``` + +**Option B: External PostgreSQL** + +Point the config or environment variables to your PostgreSQL instance: + +```bash +export HYPERFLEET_DATABASE_HOST=my-postgres-host.example.com +export HYPERFLEET_DATABASE_PORT=5432 +export HYPERFLEET_DATABASE_NAME=hyperfleet +export HYPERFLEET_DATABASE_USER=hyperfleet +export HYPERFLEET_DATABASE_PASSWORD=my-password +export HYPERFLEET_DATABASE_SSL_MODE=require # for remote databases +``` + +### Running + +```bash +# 1. Generate code (required after clone) +make generate-all + +# 2. Build +make build + +# 3. Run migrations +./bin/hyperfleet-api migrate + +# 4. Start the server (no JWT auth) +make run-no-auth + +# Or start with auth enabled: +./bin/hyperfleet-api serve +``` + +### Schema Validation (Local) + +The API validates cluster and nodepool `spec` fields against an OpenAPI schema. Configure the schema path: + +```bash +# Via flag +./bin/hyperfleet-api serve --server-openapi-schema-path ./openapi/openapi.yaml + +# Via environment variable +export HYPERFLEET_SERVER_OPENAPI_SCHEMA_PATH=./openapi/openapi.yaml +``` + +The API **will fail to start** if the configured schema file is missing, unreadable, or invalid. + +### Endpoints + +Once running, the API is available at: + +- **REST API**: `http://localhost:8000/api/hyperfleet/v1/` +- **OpenAPI spec**: `http://localhost:8000/api/hyperfleet/v1/openapi` +- **Swagger UI**: `http://localhost:8000/api/hyperfleet/v1/openapi.html` +- **Liveness probe**: `http://localhost:8080/healthz` +- **Readiness probe**: `http://localhost:8080/readyz` +- **Metrics**: `http://localhost:9090/metrics` + +### CLI Subcommands + +```bash +./bin/hyperfleet-api serve # Start the HTTP server +./bin/hyperfleet-api migrate # Run database migrations +./bin/hyperfleet-api version # Print version, commit, and build date +``` + +--- + + ## Related Documentation -- [Development Guide](development.md) - Local development setup +- [Configuration Guide](config.md) - Complete configuration reference - [Authentication](authentication.md) - Authentication configuration +- [Development Guide](development.md) - Local development setup and workflows \ No newline at end of file From d2d1e75e3b970b55e033b5050a507425fc0741fb Mon Sep 17 00:00:00 2001 From: sherine-k Date: Wed, 10 Jun 2026 14:21:19 +0200 Subject: [PATCH 2/5] HYPERFLEET-1186 - fix: align pageSize default with TypeSpec contract The runtime default was 100 but the TypeSpec API contract defines pageSize as 20. Update the code to match the spec. - Change default Size from 100 to 20 in NewListArguments (types.go) - Update nil-guard defaults in ListByKind and ListByOwner (resource.go) - Update unit test assertion to expect 20 (types_test.go) Co-Authored-By: Claude --- pkg/services/resource.go | 4 ++-- pkg/services/types.go | 2 +- pkg/services/types_test.go | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/pkg/services/resource.go b/pkg/services/resource.go index 47c407f3..3423ab63 100644 --- a/pkg/services/resource.go +++ b/pkg/services/resource.go @@ -272,7 +272,7 @@ func (s *sqlResourceService) List( return nil, nil, svcErr } if args == nil { - args = &ListArguments{Page: 1, Size: 100} + args = &ListArguments{Page: 1, Size: 20} } scopedArgs := *args kindFilter := fmt.Sprintf("kind = '%s'", kind) @@ -300,7 +300,7 @@ func (s *sqlResourceService) ListByOwner( return nil, nil, svcErr } if args == nil { - args = &ListArguments{Page: 1, Size: 100} + args = &ListArguments{Page: 1, Size: 20} } scopedArgs := *args kindFilter := fmt.Sprintf("kind = '%s' AND owner_id = '%s'", kind, ownerID) diff --git a/pkg/services/types.go b/pkg/services/types.go index 9466a83a..e52bd812 100755 --- a/pkg/services/types.go +++ b/pkg/services/types.go @@ -25,7 +25,7 @@ const MaxListSize = 65500 func NewListArguments(params url.Values) *ListArguments { listArgs := &ListArguments{ Page: 1, - Size: 100, + Size: 20, Search: "", } if v := strings.Trim(params.Get("page"), " "); v != "" { diff --git a/pkg/services/types_test.go b/pkg/services/types_test.go index b5d700b7..813f8946 100644 --- a/pkg/services/types_test.go +++ b/pkg/services/types_test.go @@ -74,7 +74,7 @@ func TestNewListArguments_DefaultValues(t *testing.T) { listArgs := NewListArguments(url.Values{}) Expect(listArgs.Page).To(Equal(1), "Default page should be 1") - Expect(listArgs.Size).To(Equal(int64(100)), "Default size should be 100") + Expect(listArgs.Size).To(Equal(int64(20)), "Default size should be 20") Expect(listArgs.Search).To(Equal(""), "Default search should be empty") Expect(listArgs.OrderBy).To(Equal([]string{"created_time desc"}), "Default orderBy should be created_time desc") } From bdb9bed521431eafe2c657bf2fcadd4c8672242a Mon Sep 17 00:00:00 2001 From: sherine-k Date: Thu, 11 Jun 2026 13:21:23 +0200 Subject: [PATCH 3/5] HYPERFLEET-1168 - docs: consolidate documentation with v1.0.0 features MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Document all features implemented between v0.2.0 and v1.0.0 that were missing or partially covered in existing documentation. - Add PATCH, DELETE, and force-delete endpoint sections to api-resources.md - Add RFC 9457 Problem Details error response section with examples - Add delete lifecycle (Active → Finalizing → Hard-Deleted) to database.md - Add descriptor-driven delete policies (restrict/cascade) to database.md - Add caller identity documentation (header vs JWT precedence) to config.md - Add caller identity Helm config examples to deployment.md - Update endpoint listing tables for clusters and nodepools - Update README.md feature list and endpoint summaries for v1.0.0 - Fix incorrect "fails to load configuration" claims in deployment.md - Fix "startup remains non-blocking" claim in openapi/README.md - Fix trailing whitespace and double blank lines in database.md, deployment.md Co-Authored-By: Claude --- README.md | 13 +- docs/api-resources.md | 276 +++++++++++++++++++++++++++++++++++++++++- docs/config.md | 34 +++++- docs/database.md | 41 +++++-- docs/deployment.md | 18 ++- openapi/README.md | 68 ++++++----- 6 files changed, 393 insertions(+), 57 deletions(-) diff --git a/README.md b/README.md index 0d6b5a29..bf1fd207 100755 --- a/README.md +++ b/README.md @@ -17,9 +17,13 @@ HyperFleet API - Simple REST API for cluster lifecycle management. Provides CRUD * OpenAPI 3.0 specification * Automated Go code generation from OpenAPI -* Cluster and NodePool lifecycle management +* Cluster and NodePool lifecycle management (create, patch, delete, force-delete) * Generic resource types (WifConfigs, Channels, Versions) via plugin-based registration * Adapter-based status reporting with Kubernetes-style conditions +* Soft-delete with adapter finalization and force-delete for stuck resources +* Descriptor-driven delete policies (restrict/cascade) for generic resources +* RFC 9457 Problem Details error responses +* Configurable caller identity for audit fields (HTTP header or JWT claim) * Runtime spec validation against custom OpenAPI schemas * Pagination and search capabilities * Complete integration test coverage @@ -114,7 +118,8 @@ Kubernetes clusters with provider-specific configurations, labels, and adapter-b **Main endpoints:** - `GET/POST /api/hyperfleet/v1/clusters` -- `GET /api/hyperfleet/v1/clusters/{id}` +- `GET/PATCH/DELETE /api/hyperfleet/v1/clusters/{id}` +- `POST /api/hyperfleet/v1/clusters/{id}/force-delete` - `GET/PUT /api/hyperfleet/v1/clusters/{id}/statuses` ### NodePools @@ -124,7 +129,8 @@ Groups of compute nodes within clusters. **Main endpoints:** - `GET /api/hyperfleet/v1/nodepools` - `GET/POST /api/hyperfleet/v1/clusters/{cluster_id}/nodepools` -- `GET /api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id}` +- `GET/PATCH/DELETE /api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id}` +- `POST /api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id}/force-delete` - `GET/PUT /api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id}/statuses` ### Generic Resources @@ -194,6 +200,7 @@ This project uses [pre-commit](https://pre-commit.io/) for code quality checks. - **[Development Guide](docs/development.md)** - Local setup, testing, code generation, and workflows - **[Database](docs/database.md)** - Schema, migrations, and data model - **[Deployment](docs/deployment.md)** - Container images, Kubernetes deployment, and configuration +- **[Configuration](docs/config.md)** - Complete configuration reference (database, server, caller identity, adapters) - **[Authentication](docs/authentication.md)** - Development and production auth - **[Logging](docs/logging.md)** - Structured logging, OpenTelemetry integration, and data masking - **[Validation Schema](openapi/README.md#validation-schema)** - How to supply a custom OpenAPI schema for runtime `spec` field validation diff --git a/docs/api-resources.md b/docs/api-resources.md index 14fc9b65..c126577a 100644 --- a/docs/api-resources.md +++ b/docs/api-resources.md @@ -10,8 +10,11 @@ This document provides detailed information about the HyperFleet API resources, GET /api/hyperfleet/v1/clusters POST /api/hyperfleet/v1/clusters GET /api/hyperfleet/v1/clusters/{cluster_id} +PATCH /api/hyperfleet/v1/clusters/{cluster_id} +DELETE /api/hyperfleet/v1/clusters/{cluster_id} +POST /api/hyperfleet/v1/clusters/{cluster_id}/force-delete GET /api/hyperfleet/v1/clusters/{cluster_id}/statuses -PUT /api/hyperfleet/v1/clusters/{cluster_id}/statuses +PUT /api/hyperfleet/v1/clusters/{cluster_id}/statuses ``` ### Create Cluster @@ -247,6 +250,138 @@ Adapters use this endpoint to report their status. **Note**: The API automatically sets `created_time`, `last_report_time`, and `last_transition_time` fields. +### Patch Cluster + +**PATCH** `/api/hyperfleet/v1/clusters/{cluster_id}` + +Updates a cluster's `spec` and/or `labels`. Only the fields provided in the request body are modified; omitted fields are left unchanged. The `generation` counter increments when `spec` is updated. + +**Request Body:** + +```json +{ + "spec": { + "region": "us-east-1", + "instanceType": "m5.xlarge" + }, + "labels": { + "environment": "staging" + } +} +``` + +**Response (200 OK):** + +
+JSON response + +```json +{ + "kind": "Cluster", + "id": "2abc123...", + "href": "/api/hyperfleet/v1/clusters/2abc123...", + "name": "my-cluster", + "generation": 2, + "spec": { + "region": "us-east-1", + "instanceType": "m5.xlarge" + }, + "labels": { + "environment": "staging" + }, + "created_time": "2025-01-01T00:00:00Z", + "updated_time": "2025-01-01T12:00:00Z", + "created_by": "user@example.com", + "updated_by": "user@example.com", + "status": { + "conditions": [ + { + "type": "Reconciled", + "status": "False", + "reason": "ReconciledMissingAdapters", + "message": "Required adapters have not yet reported status", + "observed_generation": 2, + "created_time": "2025-01-01T00:00:00Z", + "last_updated_time": "2025-01-01T12:00:00Z", + "last_transition_time": "2025-01-01T12:00:00Z" + }, + { + "type": "LastKnownReconciled", + "status": "True", + "reason": "AllAdaptersReconciled", + "message": "All required adapters report Available=True for the tracked generation", + "observed_generation": 1, + "created_time": "2025-01-01T00:00:00Z", + "last_updated_time": "2025-01-01T00:00:00Z", + "last_transition_time": "2025-01-01T00:00:00Z" + } + ] + } +} +``` + +
+ +**Note**: After a spec update, `Reconciled` transitions to `False` until adapters report at the new generation. `LastKnownReconciled` retains the last known good state. + +### Delete Cluster + +**DELETE** `/api/hyperfleet/v1/clusters/{cluster_id}` + +Soft-deletes a cluster. Sets `deleted_time` and `deleted_by`, increments `generation`, and cascades the soft-delete to all child nodepools. The cluster enters a **Finalizing** state — it remains in the database until adapters report `Finalized=True`, at which point it is hard-deleted automatically. + +**Response (202 Accepted):** + +
+JSON response + +```json +{ + "kind": "Cluster", + "id": "2abc123...", + "href": "/api/hyperfleet/v1/clusters/2abc123...", + "name": "my-cluster", + "generation": 3, + "spec": {}, + "labels": {}, + "created_time": "2025-01-01T00:00:00Z", + "updated_time": "2025-01-01T14:00:00Z", + "created_by": "user@example.com", + "updated_by": "user@example.com", + "deleted_time": "2025-01-01T14:00:00Z", + "deleted_by": "user@example.com", + "status": { + "conditions": [...] + } +} +``` + +
+ +Once a cluster is soft-deleted, creating or updating child nodepools returns `409 Conflict`. + +### Force Delete Cluster + +**POST** `/api/hyperfleet/v1/clusters/{cluster_id}/force-delete` + +Permanently removes a cluster that is stuck in the Finalizing state. This bypasses the normal adapter finalization flow — use it only when adapters are unable to report `Finalized=True`. The cluster, all its child nodepools, and all associated adapter statuses are hard-deleted immediately. + +The cluster **must** already be soft-deleted (have a `deleted_time`). Calling force-delete on an active cluster returns `409 Conflict`. + +**Request Body:** + +```json +{ + "reason": "Adapter crashed and cannot finalize" +} +``` + +| Field | Type | Required | Constraints | +|----------|--------|----------|---------------------| +| `reason` | string | Yes | Non-empty, max 1024 | + +**Response:** `204 No Content` + ## NodePool Management ### Endpoints @@ -256,8 +391,11 @@ GET /api/hyperfleet/v1/nodepools GET /api/hyperfleet/v1/clusters/{cluster_id}/nodepools POST /api/hyperfleet/v1/clusters/{cluster_id}/nodepools GET /api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id} +PATCH /api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id} +DELETE /api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id} +POST /api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id}/force-delete GET /api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id}/statuses -PUT /api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id}/statuses +PUT /api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id}/statuses ``` ### Create NodePool @@ -387,6 +525,49 @@ PUT /api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id}/statuses Same format as cluster status reporting (see above). +### Patch NodePool + +**PATCH** `/api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id}` + +Updates a nodepool's `spec` and/or `labels`. Same semantics as [Patch Cluster](#patch-cluster) — only provided fields are modified, and `generation` increments on spec changes. + +**Request Body:** + +```json +{ + "spec": { + "machineType": "n2-standard-4", + "replicas": 5 + } +} +``` + +**Response (200 OK):** Full nodepool resource with incremented `generation` and updated `updated_time`/`updated_by`. + +### Delete NodePool + +**DELETE** `/api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id}` + +Soft-deletes a nodepool. Same lifecycle as [Delete Cluster](#delete-cluster) — sets `deleted_time` and `deleted_by`, enters the Finalizing state, and is hard-deleted when adapters report `Finalized=True`. + +**Response (202 Accepted):** Full nodepool resource with `deleted_time` and `deleted_by` fields set. + +### Force Delete NodePool + +**POST** `/api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id}/force-delete` + +Same semantics as [Force Delete Cluster](#force-delete-cluster). The nodepool must already be soft-deleted. + +**Request Body:** + +```json +{ + "reason": "Adapter unable to finalize nodepool" +} +``` + +**Response:** `204 No Content` + ## Pagination and Search ### Pagination @@ -542,7 +723,7 @@ Same naming rules as cluster, but with a shorter maximum length. ## Spec Validation -When an OpenAPI schema is configured (see [deployment.md](deployment.md#schema-validation) for setup), the API validates cluster and nodepool `spec` fields on every create and update request. If no schema is configured, all specs are accepted without validation. When a schema is configured: +When an OpenAPI schema is configured (see [deployment.md](deployment.md#schema-validation-via-helm) for setup), the API validates cluster and nodepool `spec` fields on every create and update request. If no schema is configured, all specs are accepted without validation. When a schema is configured: - `POST /clusters` and `POST /nodepools` validate `spec` against `ClusterSpec` or `NodePoolSpec` from the schema - `PATCH /clusters/{id}` and `PATCH /nodepools/{id}` validate the merged result @@ -557,6 +738,95 @@ The schema is configured via `--server-openapi-schema-path` or the `validationSc The same distinction applies to nodepools. +## Error Responses + +All error responses use the [RFC 9457](https://www.rfc-editor.org/rfc/rfc9457) Problem Details format with content type `application/problem+json`. + +### Fields + +| Field | Type | Always present | Description | +|-------------|----------|----------------|-------------| +| `type` | string | Yes | URI reference identifying the problem type | +| `title` | string | Yes | Short human-readable summary | +| `status` | integer | Yes | HTTP status code | +| `detail` | string | No | Human-readable explanation specific to this occurrence | +| `code` | string | No | Machine-readable error code in `HYPERFLEET-CAT-NUM` format | +| `timestamp` | string | No | RFC 3339 timestamp of when the error occurred | +| `trace_id` | string | No | Distributed trace ID for correlation (from `X-Request-Id` header) | +| `instance` | string | No | URI reference for this specific occurrence | +| `errors` | array | No | Field-level validation errors (see below) | + +### Error Code Categories + +Error codes follow the `HYPERFLEET-CAT-NUM` format: + +| Category | Meaning | +|----------|---------| +| `VAL` | Request validation failures | +| `AUT` | Authentication errors | +| `NTF` | Resource not found | +| `CNF` | Resource conflicts | +| `LMT` | Rate limiting | +| `INT` | Internal server errors | +| `SVC` | Upstream service errors | + +### Example: Validation Error (400) + +
+JSON response + +```json +{ + "type": "about:blank", + "title": "Validation failed", + "status": 400, + "detail": "Request body validation failed", + "code": "HYPERFLEET-VAL-003", + "timestamp": "2025-01-01T12:00:00Z", + "trace_id": "abc123-def456", + "instance": "/api/hyperfleet/v1/clusters", + "errors": [ + { + "field": "name", + "message": "name is required" + }, + { + "field": "spec", + "message": "spec must not be null", + "constraint": "required" + } + ] +} +``` + +
+ +### Example: Not Found (404) + +```json +{ + "type": "about:blank", + "title": "Not found", + "status": 404, + "detail": "Cluster with id='2abc123...' not found", + "code": "HYPERFLEET-NTF-001", + "timestamp": "2025-01-01T12:00:00Z" +} +``` + +### Example: Conflict (409) + +```json +{ + "type": "about:blank", + "title": "Conflict", + "status": 409, + "detail": "Cannot create nodepool: parent cluster is being deleted", + "code": "HYPERFLEET-CNF-001", + "timestamp": "2025-01-01T12:00:00Z" +} +``` + ## Related Documentation - [Example Usage](../README.md#example-usage) - Practical examples diff --git a/docs/config.md b/docs/config.md index 88b5416d..2b855ba4 100644 --- a/docs/config.md +++ b/docs/config.md @@ -100,7 +100,7 @@ The configuration file is resolved in the following order: - Production: `/etc/hyperfleet/config.yaml` - Development: `./configs/config.yaml` -If no configuration file is found, the application continues using environment variables, CLI flags, and defaults. +If none are found, the command fails with `failed to load configuration`. Copy the example config or point to your own. --- @@ -280,6 +280,38 @@ server: cert_url: https://your-idp.example.com/auth/realms/your-realm/protocol/openid-connect/certs ``` +#### Caller Identity + +The API records who performed each mutation in the `created_by`, `updated_by`, and `deleted_by` audit fields. Two settings control how the caller identity is resolved: + +| Setting | Purpose | +|---------|---------| +| `server.identity_header` | HTTP header to read the caller identity from (e.g., `X-Forwarded-Email`) | +| `server.jwt.identity_claim` | JWT claim to use as fallback (e.g., `email`, `preferred_username`, `sub`) | + +**Precedence:** If both are configured and the header is present in the request, the header value wins. The JWT claim is used only when the header is not configured or is empty in the request. + +**Validation:** Identity values are trimmed, must not exceed 256 characters, and must not contain control characters. + +**Example — header-based identity (behind an authenticating proxy):** +```yaml +server: + identity_header: X-Forwarded-Email + jwt: + enabled: false +``` + +**Example — JWT-based identity:** +```yaml +server: + jwt: + enabled: true + issuer_url: https://idp.example.com/realms/hyperfleet + identity_claim: email + jwk: + cert_url: https://idp.example.com/realms/hyperfleet/protocol/openid-connect/certs +``` +
diff --git a/docs/database.md b/docs/database.md index eaf89185..18ce79c4 100644 --- a/docs/database.md +++ b/docs/database.md @@ -9,12 +9,12 @@ HyperFleet API uses PostgreSQL with GORM ORM. The schema follows a simple relati ## Core Tables ### clusters -Primary resources for cluster management. It contains : -* cluster metadata, -* a JSONB `spec` field for provider-specific configuration, -* a JSONB `labels` field for key-value categorization, -* a JSONB `status_conditions` field for synthesized status. -* `deleted_time` for soft delete +Primary resources for cluster management. It contains: +* cluster metadata, +* a JSONB `spec` field for provider-specific configuration, +* a JSONB `labels` field for key-value categorization, +* a JSONB `status_conditions` field for synthesized status, +* `deleted_time` for soft delete, * and `deleted_by` for audit. ### node_pools @@ -57,9 +57,34 @@ Flexible schema storage for: - Runtime validation against OpenAPI schema - PostgreSQL JSON query capabilities -### Soft Delete +### Delete Lifecycle -Clusters, node pools, and generic resources use a custom soft delete pattern with a `deleted_time` timestamp and `deleted_by` audit field. Soft-deleted records are excluded from queries by default. Adapter statuses do not use soft delete. +Resources follow a three-phase delete lifecycle: + +```text +Active ──(DELETE)──▶ Finalizing ──(adapters report Finalized=True)──▶ Hard-Deleted + │ + └──(POST /force-delete)──▶ Hard-Deleted +``` + +1. **Active** — Normal state. Resource is visible in list queries and can be updated. +2. **Finalizing** (soft-deleted) — `DELETE` sets `deleted_time` and `deleted_by`, increments `generation`. The resource stays in the database so adapters can observe the deletion and clean up external state. Soft-deleted records are excluded from list queries by default. New child resources cannot be created under a finalizing parent (returns `409 Conflict`). +3. **Hard-Deleted** — Permanently removed from the database. This happens automatically when all required adapters report `Finalized=True` at the current generation. If adapters are stuck, `POST .../force-delete` bypasses finalization and hard-deletes immediately. + +Adapter statuses do not use soft delete — they are hard-deleted when their parent resource is hard-deleted. + +### Delete Policies + +Generic resources (the `resources` table) use descriptor-driven delete policies to control child behavior when a parent is deleted. Each resource type declares its policy in its `EntityDescriptor`: + +| Policy | Behavior | +|------------|----------| +| `restrict` | Parent delete is rejected with `409 Conflict` if active children exist | +| `cascade` | All children are soft-deleted (marked Finalizing) along with the parent | + +Policies are enforced recursively — a cascade on a parent triggers policy checks on grandchildren. For clusters and nodepools, the cascade is built-in: deleting a cluster always cascades the soft-delete to all its nodepools. + +Resources without `RequiredAdapters` in their descriptor skip the Finalizing phase entirely — they are hard-deleted immediately on `DELETE`. ### Migration System diff --git a/docs/deployment.md b/docs/deployment.md index 71c75435..91881f1e 100644 --- a/docs/deployment.md +++ b/docs/deployment.md @@ -39,22 +39,22 @@ The `image.registry` value in [`charts/values.yaml`](../charts/values.yaml) defa | Staging | `quay.io/openshift-hyperfleet/hyperfleet-api:v` | | Production | `quay.io/openshift-hyperfleet/hyperfleet-api:v` | -Example `values.yaml` overrides: +Example `values.yaml` overrides (pick one): +**Production/Staging:** ```yaml -# Production/Staging (official image) image: registry: quay.io repository: openshift-hyperfleet/hyperfleet-api tag: v1.2.3 - -# Personal development image +``` + +**Development:** +```yaml image: registry: quay.io repository: user/hyperfleet-api tag: dev-abc1234 - - ``` #### Custom Registry @@ -138,7 +138,7 @@ The Helm chart manages configuration through: --set 'config.adapters.required.nodepool={validation,hypershift}' ``` -See [Configuration Guide](config.md) for the complete reference, and [`charts/values.yaml`](../charts/values.yaml) for all Helm-specific settings. +See [Configuration Guide](config.md) for the complete reference (including [caller identity details](config.md#caller-identity)), and [`charts/values.yaml`](../charts/values.yaml) for all Helm-specific settings. ### Schema Validation via Helm @@ -240,7 +240,6 @@ This creates: - Services for both components - ConfigMaps and Secrets - **Note**: The `registry` should contain only the registry domain (e.g., `quay.io`, `docker.io`). The `repository` includes the organization and image name (e.g., `myuser/hyperfleet-api`). #### Upgrade @@ -592,9 +591,8 @@ Once running, the API is available at: --- - ## Related Documentation - [Configuration Guide](config.md) - Complete configuration reference - [Authentication](authentication.md) - Authentication configuration -- [Development Guide](development.md) - Local development setup and workflows \ No newline at end of file +- [Development Guide](development.md) - Local development setup and workflows diff --git a/openapi/README.md b/openapi/README.md index 9705eb88..ffd50a9c 100644 --- a/openapi/README.md +++ b/openapi/README.md @@ -6,32 +6,11 @@ This directory contains the code-generation configuration for the HyperFleet API OpenAPI schemas are **not authored here**. They are defined in the [`hyperfleet-api-spec`](https://github.com/openshift-hyperfleet/hyperfleet-api-spec) repository (TypeSpec) and consumed by this repository as a Go module. The `openapi/openapi.yaml` file is extracted from the module cache at code-generation time and is **not tracked in git**. -## Directory Contents +## For Operators -| File | Purpose | -|------|---------| -| `oapi-codegen.yaml` | Code-generation config for `oapi-codegen` | -| `openapi.yaml` | **Not in git** — extracted from the Go module by `make generate` | - -## How Schemas Are Imported - -1. The `github.com/openshift-hyperfleet/hyperfleet-api-spec` module is declared in `go.mod`. -2. `make generate` locates the module's on-disk path via `go list -m -f '{{.Dir}}'` and copies `schemas/core/openapi.yaml` to `openapi/openapi.yaml`. Code generation always uses the `core` variant. -3. `oapi-codegen` reads `openapi/openapi.yaml` and produces `pkg/api/openapi/openapi.gen.go` — Go model structs, an HTTP client, and an embedded resolved spec. - -## Generated Artifacts +### Validation Schema -| Artifact | Location | Description | -|----------|----------|-------------| -| Extracted spec | `openapi/openapi.yaml` | Copied from Go module; input to oapi-codegen | -| Go models + client | `pkg/api/openapi/openapi.gen.go` | Never edit — regenerate with `make generate` | -| Embedded resolved spec | Inside `openapi.gen.go` | Fully resolved; served at `/api/hyperfleet/v1/openapi` | - -**Never edit `openapi.yaml` or `openapi.gen.go` directly.** Both are overwritten by `make generate`. - -## Validation Schema - -### Why this exists +#### Why this exists HyperFleet API is intentionally schema-agnostic at its core: it stores clusters and nodepools as long as the `spec` field is present and non-null, without caring what is inside it. This is by design — the API serves multiple deployments with different provider-specific payloads. @@ -39,7 +18,7 @@ Deployers, however, **do** care. A GCP deployment might require a `region` field The `--server-openapi-schema-path` flag solves this: at deploy time, the operator points the API at a deployment-specific OpenAPI schema file. The API then validates every `POST`/`PATCH` request's `spec` payload against that schema in HTTP middleware — before any service or database code runs. -### What the schema file must contain +#### What the schema file must contain The schema file must be a valid OpenAPI 3.0 document. The API looks up two specific component schemas by name: @@ -82,9 +61,9 @@ components: maximum: 100 ``` -If `ClusterSpec` or `NodePoolSpec` is absent from the file, the API will fail to load the validator and log a warning (startup remains non-blocking). +If `ClusterSpec` or `NodePoolSpec` is absent from the file, the API **fails to start** with an error — this ensures invalid schemas are caught immediately rather than silently skipping validation. -### How to configure it +#### How to configure it Three equivalent ways to supply the path: @@ -96,13 +75,38 @@ Three equivalent ways to supply the path: **Default:** `openapi/openapi.yaml` (the core schema extracted by `make generate` — provider-agnostic, accepts any non-null spec). -### Runtime behaviour +#### Runtime behaviour - Validation runs in HTTP middleware on every `POST` and `PATCH` request, before the service or database layer. - Invalid specs return `400 Bad Request` with field-level error details. -- Startup is **non-blocking**: if the schema file is missing or malformed, the API logs a warning and starts without validation — specs are accepted without field-level checks. +- If validationSchema is enabled and the schema file is missing or malformed, the API **fails to start** with an error — this ensures misconfigured deployments are caught immediately. + +## For Developers + +### Directory Contents + +| File | Purpose | +|------|---------| +| `oapi-codegen.yaml` | Code-generation config for `oapi-codegen` | +| `openapi.yaml` | **Not in git** — extracted from the Go module by `make generate` | + +### How Schemas Are Imported + +1. The `github.com/openshift-hyperfleet/hyperfleet-api-spec` module is declared in `go.mod`. +2. `make generate` locates the module's on-disk path via `go list -m -f '{{.Dir}}'` and copies `schemas/core/openapi.yaml` to `openapi/openapi.yaml`. Code generation always uses the `core` variant. +3. `oapi-codegen` reads `openapi/openapi.yaml` and produces `pkg/api/openapi/openapi.gen.go` — Go model structs, an HTTP client, and an embedded resolved spec. + +### Generated Artifacts + +| Artifact | Location | Description | +|----------|----------|-------------| +| Extracted spec | `openapi/openapi.yaml` | Copied from Go module; input to oapi-codegen | +| Go models + client | `pkg/api/openapi/openapi.gen.go` | Never edit — regenerate with `make generate` | +| Embedded resolved spec | Inside `openapi.gen.go` | Fully resolved; served at `/api/hyperfleet/v1/openapi` | + +**Never edit `openapi.yaml` or `openapi.gen.go` directly.** Both are overwritten by `make generate`. -## Updating the API Schema +### Updating the API Schema 1. Update TypeSpec definitions in the [`hyperfleet-api-spec`](https://github.com/openshift-hyperfleet/hyperfleet-api-spec) repository and publish a new release. @@ -126,7 +130,7 @@ For local development before a new spec version is published, add a `replace` di replace github.com/openshift-hyperfleet/hyperfleet-api-spec => /path/to/local/hyperfleet-api-spec ``` -## Code Generation Commands +### Code Generation Commands ```shell make generate # Extract schema from spec module, then run oapi-codegen @@ -134,7 +138,7 @@ make generate-mocks # Regenerate mock implementations (go generate) make generate-all # Both of the above ``` -## oapi-codegen Configuration +### oapi-codegen Configuration From `oapi-codegen.yaml`: From 83ff86644dff6d71e4d40f7928a9b23882e816b2 Mon Sep 17 00:00:00 2001 From: sherine-k Date: Fri, 12 Jun 2026 14:05:27 +0200 Subject: [PATCH 4/5] HYPERFLEET-1168 - docs: streamline docs and take various reviews into account --- CLAUDE.md | 2 +- README.md | 111 ++-------- docs/api-resources.md | 22 +- docs/config.md | 2 +- docs/database.md | 89 ++------ docs/deployment.md | 473 +++++++++++++++++------------------------- docs/development.md | 145 +++++++++---- 7 files changed, 345 insertions(+), 499 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index cdeefc50..d0a9cb46 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -4,7 +4,7 @@ HyperFleet API is a **stateless REST API** serving as the pure CRUD data layer for HyperFleet cluster lifecycle management. It persists clusters, node pools, and adapter statuses to PostgreSQL — no business logic, no events. Sentinel handles orchestration; adapters execute and report back. -- **Language**: Go 1.24+ with FIPS crypto (`CGO_ENABLED=1 GOEXPERIMENT=boringcrypto`) +- **Language**: Go 1.25+ with FIPS crypto (`CGO_ENABLED=1 GOEXPERIMENT=boringcrypto`) - **Database**: PostgreSQL 14.2 with GORM ORM - **API Spec**: TypeSpec → `hyperfleet-api-spec` Go module → oapi-codegen → Go models - **Architecture**: Plugin-based route registration, transaction-per-request middleware diff --git a/README.md b/README.md index bf1fd207..c71a56ed 100755 --- a/README.md +++ b/README.md @@ -4,14 +4,6 @@ HyperFleet API - Simple REST API for cluster lifecycle management. Provides CRUD ## Architecture -### Technology Stack - -- **Language**: Go 1.25+ -- **API Definition**: OpenAPI 3.0 -- **Code Generation**: oapi-codegen -- **Database**: PostgreSQL with GORM ORM -- **Container Runtime**: Podman -- **Testing**: Gomega + Resty ### Core Features @@ -22,77 +14,28 @@ HyperFleet API - Simple REST API for cluster lifecycle management. Provides CRUD * Adapter-based status reporting with Kubernetes-style conditions * Soft-delete with adapter finalization and force-delete for stuck resources * Descriptor-driven delete policies (restrict/cascade) for generic resources -* RFC 9457 Problem Details error responses * Configurable caller identity for audit fields (HTTP header or JWT claim) * Runtime spec validation against custom OpenAPI schemas * Pagination and search capabilities -* Complete integration test coverage -* Database migrations with gormigrate -* Embedded OpenAPI specification using `//go:embed` - -### Project Structure - -```text -hyperfleet-api/ -├── cmd/hyperfleet-api/ # Application entry point -├── pkg/ -│ ├── api/ # API models and type definitions -│ ├── dao/ # Data access layer -│ ├── db/ # Database setup, migrations, and session management -│ ├── handlers/ # HTTP request handlers -│ └── services/ # Service layer (status aggregation, CRUD) -├── plugins/ # Plugin-based resource registration -│ ├── clusters/ # Cluster resource plugin -│ ├── nodePools/ # NodePool resource plugin -│ ├── wifconfigs/ # WifConfig resource plugin -│ ├── channels/ # Channel resource plugin -│ ├── versions/ # Version resource plugin (child of Channel) -│ └── generic/ # Generic resource framework -├── openapi/ # Generated artifacts from hyperfleet-api-spec module -├── test/ # Integration tests and factories -├── docs/ # Detailed documentation -└── Makefile # Build automation -``` - -## Quick Start -### Prerequisites - -- **Go 1.25+**, **Podman**, **PostgreSQL 14+**, **Make** - -See [PREREQUISITES.md](PREREQUISITES.md) for installation instructions. - -### Installation - -```bash -# 1. Generate OpenAPI code and mocks -make generate-all - -# 2. Install dependencies -go mod download - -# 3. Build binary -make build - -# 4. Setup database (local PostgreSQL container) -make db/setup +### Technology Stack -# 5. Copy config file -cp configs/config.yaml.example configs/config.yaml +- **Language**: Go 1.25+ +- **API Definition**: OpenAPI 3.0 +- **Code Generation**: oapi-codegen +- **Database**: PostgreSQL with GORM ORM +- **Container Runtime**: Podman +- **Testing**: Gomega + Resty -# 6. Run migrations -./bin/hyperfleet-api migrate +## Getting Started -# 7. Start service (no auth) -make run-no-auth -``` +### Deploying to Kubernetes -**Note**: Generated code is not tracked in git. You must run `make generate-all` after cloning. +For Helm-based deployment to staging, production, or partner environments, see the **[Deployment Guide](docs/deployment.md)** — covers container images, Helm values, external databases, schema validation, monitoring, and production checklists. -The `migrate` and `serve` commands require a configuration file. The loader checks `--config` flag, then `HYPERFLEET_CONFIG` env var, then `/etc/hyperfleet/config.yaml`, then `./configs/config.yaml`. -If none are found, the command fails with `failed to load configuration`. Copy the example config or point to your own. +### Local Development -For production database setup (external PostgreSQL, Cloud SQL, etc.), see [docs/deployment.md](docs/deployment.md#production-deployment). +For setting up a local development environment, see the **[Development Guide](docs/development.md)** — covers prerequisites, code generation, mock generation, database setup, running tests, pre-commit hooks, and development workflows. ### Accessing the API @@ -162,36 +105,6 @@ curl -G http://localhost:8000/api/hyperfleet/v1/clusters \ See [docs/search.md](docs/search.md) for search and filtering documentation. -## Development - -### Common Commands - -```bash -make build # Build binary to bin/ -make run-no-auth # Run without authentication -make test # Run unit tests -make test-integration # Run integration tests -make generate # Generate OpenAPI models -make generate-mocks # Generate test mocks -make generate-all # Generate OpenAPI models and mocks -make db/setup # Create PostgreSQL container -make image # Build container image -``` - -See [docs/development.md](docs/development.md) for detailed workflows. - -### CLI Subcommands - -```bash -./bin/hyperfleet-api serve # Start the HTTP server -./bin/hyperfleet-api migrate # Run database migrations -./bin/hyperfleet-api version # Print version, commit, and build date -``` - -### Pre-commit Hooks - -This project uses [pre-commit](https://pre-commit.io/) for code quality checks. See [docs/development.md](docs/development.md#pre-commit-hooks-optional) for setup instructions. - ## Documentation ### Core Documentation diff --git a/docs/api-resources.md b/docs/api-resources.md index c126577a..c0636c15 100644 --- a/docs/api-resources.md +++ b/docs/api-resources.md @@ -2,6 +2,14 @@ This document provides detailed information about the HyperFleet API resources, including endpoints, request/response formats, and usage patterns. +## Authentication Prerequisites + +All API endpoints require a valid JWT bearer token when authentication is enabled (the default in production). Requests without a valid token receive `401 Unauthorized`. See [authentication.md](authentication.md) for configuration details, token format, and caller identity resolution. + +Mutating requests (POST, PATCH, PUT, DELETE) additionally require a resolvable caller identity — either from a JWT claim or an identity header — which is recorded in audit fields (`created_by`, `updated_by`, `deleted_by`). Read requests (GET, LIST) are allowed without caller identity. + +> **Note**: The API does not enforce role-based access control (RBAC). Any authenticated caller can invoke any endpoint, including destructive operations like force-delete. Access control should be enforced at the infrastructure layer (e.g., ingress policies, gateway authorization). + ## Cluster Management ### Endpoints @@ -328,7 +336,7 @@ Updates a cluster's `spec` and/or `labels`. Only the fields provided in the requ **DELETE** `/api/hyperfleet/v1/clusters/{cluster_id}` -Soft-deletes a cluster. Sets `deleted_time` and `deleted_by`, increments `generation`, and cascades the soft-delete to all child nodepools. The cluster enters a **Finalizing** state — it remains in the database until adapters report `Finalized=True`, at which point it is hard-deleted automatically. +Soft-deletes a cluster. Sets `deleted_time` and `deleted_by`, increments `generation`, and cascades deletion to child nodepools according to the deletion policy: nodepools with required adapters are soft-deleted (their `deleted_time` and `deleted_by` are set and `generation` is incremented, entering **Finalizing**), while nodepools without required adapters are hard-deleted immediately. The cluster itself enters a **Finalizing** state — it remains in the database until adapters report `Finalized=True`, at which point it is hard-deleted automatically. **Response (202 Accepted):** @@ -364,7 +372,7 @@ Once a cluster is soft-deleted, creating or updating child nodepools returns `40 **POST** `/api/hyperfleet/v1/clusters/{cluster_id}/force-delete` -Permanently removes a cluster that is stuck in the Finalizing state. This bypasses the normal adapter finalization flow — use it only when adapters are unable to report `Finalized=True`. The cluster, all its child nodepools, and all associated adapter statuses are hard-deleted immediately. +Permanently removes a cluster that is stuck in the Finalizing state. This bypasses the normal adapter finalization flow — use it only when adapters are unable to report `Finalized=True`. The cluster, all its child nodepools, and all associated adapter statuses are hard-deleted immediately. The caller and reason are recorded in an audit log entry before deletion. The cluster **must** already be soft-deleted (have a `deleted_time`). Calling force-delete on an active cluster returns `409 Conflict`. @@ -723,7 +731,7 @@ Same naming rules as cluster, but with a shorter maximum length. ## Spec Validation -When an OpenAPI schema is configured (see [deployment.md](deployment.md#schema-validation-via-helm) for setup), the API validates cluster and nodepool `spec` fields on every create and update request. If no schema is configured, all specs are accepted without validation. When a schema is configured: +When an OpenAPI schema is configured (see [deployment.md](deployment.md#configuring-schema-validation) for setup), the API validates cluster and nodepool `spec` fields on every create and update request. If no schema is configured, all specs are accepted without validation. When a schema is configured: - `POST /clusters` and `POST /nodepools` validate `spec` against `ClusterSpec` or `NodePoolSpec` from the schema - `PATCH /clusters/{id}` and `PATCH /nodepools/{id}` validate the merged result @@ -749,7 +757,7 @@ All error responses use the [RFC 9457](https://www.rfc-editor.org/rfc/rfc9457) P | `type` | string | Yes | URI reference identifying the problem type | | `title` | string | Yes | Short human-readable summary | | `status` | integer | Yes | HTTP status code | -| `detail` | string | No | Human-readable explanation specific to this occurrence | +| `detail` | string | Yes | Human-readable explanation specific to this occurrence | | `code` | string | No | Machine-readable error code in `HYPERFLEET-CAT-NUM` format | | `timestamp` | string | No | RFC 3339 timestamp of when the error occurred | | `trace_id` | string | No | Distributed trace ID for correlation (from `X-Request-Id` header) | @@ -777,7 +785,7 @@ Error codes follow the `HYPERFLEET-CAT-NUM` format: ```json { - "type": "about:blank", + "type": "https://api.hyperfleet.io/errors/validation-error", "title": "Validation failed", "status": 400, "detail": "Request body validation failed", @@ -805,7 +813,7 @@ Error codes follow the `HYPERFLEET-CAT-NUM` format: ```json { - "type": "about:blank", + "type": "https://api.hyperfleet.io/errors/not-found", "title": "Not found", "status": 404, "detail": "Cluster with id='2abc123...' not found", @@ -818,7 +826,7 @@ Error codes follow the `HYPERFLEET-CAT-NUM` format: ```json { - "type": "about:blank", + "type": "https://api.hyperfleet.io/errors/conflict", "title": "Conflict", "status": 409, "detail": "Cannot create nodepool: parent cluster is being deleted", diff --git a/docs/config.md b/docs/config.md index 2b855ba4..b4e8fa20 100644 --- a/docs/config.md +++ b/docs/config.md @@ -100,7 +100,7 @@ The configuration file is resolved in the following order: - Production: `/etc/hyperfleet/config.yaml` - Development: `./configs/config.yaml` -If none are found, the command fails with `failed to load configuration`. Copy the example config or point to your own. +If none are found, the application continues normally using environment variables and CLI flags. --- diff --git a/docs/database.md b/docs/database.md index 18ce79c4..3996ff60 100644 --- a/docs/database.md +++ b/docs/database.md @@ -18,7 +18,7 @@ Primary resources for cluster management. It contains: * and `deleted_by` for audit. ### node_pools -Child resources owned by clusters, representing groups of compute nodes. References clusters via `owner_id` with a `RESTRICT` foreign key. Same column layout as clusters (including `labels`, `status_conditions`, `deleted_time`, `deleted_by`). +Child resources owned by clusters, representing groups of compute nodes. References clusters via `owner_id` with a `RESTRICT` foreign key. Shares the same core columns as clusters (`labels`, `status_conditions`, `deleted_time`, `deleted_by`) plus `owner_id` for the parent relationship. ### adapter_statuses Polymorphic status records for both clusters and node pools. Stores adapter-reported conditions in JSONB format. No soft delete — rows are hard-deleted or replaced. @@ -68,14 +68,14 @@ Active ──(DELETE)──▶ Finalizing ──(adapters report Finalized=True) ``` 1. **Active** — Normal state. Resource is visible in list queries and can be updated. -2. **Finalizing** (soft-deleted) — `DELETE` sets `deleted_time` and `deleted_by`, increments `generation`. The resource stays in the database so adapters can observe the deletion and clean up external state. Soft-deleted records are excluded from list queries by default. New child resources cannot be created under a finalizing parent (returns `409 Conflict`). -3. **Hard-Deleted** — Permanently removed from the database. This happens automatically when all required adapters report `Finalized=True` at the current generation. If adapters are stuck, `POST .../force-delete` bypasses finalization and hard-deletes immediately. +2. **Finalizing** (soft-deleted) — `DELETE` sets `deleted_time` and `deleted_by`, increments `generation`. The resource stays in the database so adapters can observe the deletion and clean up external state. Soft-deleted records are excluded from list queries by default. Creating new child resources under a finalizing parent is rejected with `409 Conflict`. +3. **Hard-Deleted** — Permanently removed from the database. This happens automatically when all required adapters report `Finalized=True` at the current generation. If adapters are stuck, `POST .../force-delete` bypasses the adapter gating and hard-deletes immediately — but the resource must already be in Finalizing state; calling force-delete on an active resource returns `409 Conflict`. Repeated force-delete calls after hard-deletion return `404 Not Found`. Cluster force-delete cascades to all child NodePools and their adapter statuses. NodePool force-delete only removes the NodePool and its adapter statuses. Adapter statuses do not use soft delete — they are hard-deleted when their parent resource is hard-deleted. ### Delete Policies -Generic resources (the `resources` table) use descriptor-driven delete policies to control child behavior when a parent is deleted. Each resource type declares its policy in its `EntityDescriptor`: +Generic resources (the `resources` table) use delete policies to control child behavior when a parent is deleted. Each resource type declares its policy: | Policy | Behavior | |------------|----------| @@ -84,56 +84,18 @@ Generic resources (the `resources` table) use descriptor-driven delete policies Policies are enforced recursively — a cascade on a parent triggers policy checks on grandchildren. For clusters and nodepools, the cascade is built-in: deleting a cluster always cascades the soft-delete to all its nodepools. -Resources without `RequiredAdapters` in their descriptor skip the Finalizing phase entirely — they are hard-deleted immediately on `DELETE`. +Resources without required adapters skip the Finalizing phase entirely — they are hard-deleted immediately on `DELETE`. ### Migration System -Uses GORM AutoMigrate: +Migrations are: - Non-destructive (never drops columns or tables) - Additive (creates missing tables, columns, indexes) - Run via `./bin/hyperfleet-api migrate` ### Migration Coordination -**Problem:** During rolling deployments, multiple pods attempt to run migrations simultaneously, causing race conditions and deployment failures. - -**Solution:** PostgreSQL advisory locks ensure exclusive migration execution. - -#### How It Works - -```go -// Only one pod/process acquires the lock and runs migrations -// Others wait until the lock is released -db.MigrateWithLock(ctx, factory) -``` - -**Implementation:** -1. Pod sets statement timeout (5 minutes) to prevent indefinite blocking -2. Pod acquires advisory lock via `pg_advisory_xact_lock(hash("migrations"), hash("Migrations"))` -3. Lock holder runs migrations exclusively -4. Other pods block until lock is released or timeout is reached -5. Lock automatically released on transaction commit - -**Key Features:** -- **Zero infrastructure overhead** - Uses native PostgreSQL locks -- **Automatic cleanup** - Locks released on transaction end or pod crash -- **Timeout protection** - 5-minute timeout prevents indefinite blocking if a pod hangs -- **Nested lock support** - Same lock can be acquired in nested contexts without deadlock -- **UUID-based ownership** - Only original acquirer can unlock - -#### Testing Concurrent Migrations - -Integration tests validate concurrent behavior: - -```bash -make test-integration # Runs TestConcurrentMigrations -``` - -**Test coverage:** -- `TestConcurrentMigrations` - Multiple pods running migrations simultaneously -- `TestAdvisoryLocksConcurrently` - Lock serialization under race conditions -- `TestAdvisoryLocksWithTransactions` - Lock + transaction interaction -- `TestAdvisoryLockBlocking` - Lock blocking behavior +During rolling deployments, multiple pods may attempt to run migrations simultaneously. The API uses PostgreSQL advisory locks to ensure only one pod runs migrations at a time — other pods wait (up to 5 minutes) until the lock is released. Locks are automatically cleaned up on transaction commit or pod crash, so no manual intervention is needed. ## Database Setup @@ -152,43 +114,16 @@ See [development.md](development.md) for detailed setup instructions. ## Transaction Strategy -The API uses an optimized transaction strategy to maximize connection pool efficiency and reduce latency under high adapter polling load. - -### Write Operations (POST/PUT/PATCH/DELETE) - -Write operations create full GORM transactions with ACID guarantees: -- Transaction begins before handler execution -- Automatic commit on success, rollback on error (via `MarkForRollback()`) -- Transaction ID tracked in logs for debugging - -### Read Operations (GET) - -Read operations skip transaction creation entirely for performance: -- Direct database session without BEGIN/COMMIT overhead -- No transaction ID consumption -- Reduced connection hold time and pool pressure - -### Trade-offs - -**List Operations**: COUNT and SELECT queries execute as separate autocommit statements (read operations don't use transactions). PostgreSQL's default READ COMMITTED isolation level means each statement gets a fresh snapshot: - -- Under concurrent deletes, `total` count may slightly exceed actual `items` returned -- This is a cosmetic pagination issue, not a data integrity problem -- Occurs only during the ~1ms window between COUNT and SELECT -- Low probability in practice (requires delete between two consecutive queries) - -**Why not use transactions for reads?** Creating transactions for every GET request would: -- Increase connection pool pressure under high adapter polling load -- Consume transaction IDs unnecessarily -- Add latency (BEGIN/COMMIT overhead) +- **Write operations** (POST/PUT/PATCH/DELETE) run inside a database transaction with automatic commit on success and rollback on error. +- **Read operations** (GET) run without a transaction for lower latency and reduced connection pool pressure. -**Why not use REPEATABLE READ?** The current inconsistency is acceptable for pagination UX. REPEATABLE READ would add overhead and doesn't align with the read-heavy workload optimization. +### Pagination note -**Alternative**: Clients can use continuation tokens (Kubernetes pattern) instead of page/total pagination if strict consistency is required. +Because list queries run without a transaction, the `total` count and the returned `items` are computed in separate statements. Under concurrent deletes, `total` may briefly exceed the actual number of items returned. This is a cosmetic pagination artifact, not a data integrity issue. ## Connection Pool Configuration -The API manages a Go `sql.DB` connection pool with the following tunable parameters, exposed as CLI flags: +The connection pool is configured via CLI flags: | Flag | Default | Description | |------|---------|-------------| diff --git a/docs/deployment.md b/docs/deployment.md index 91881f1e..75d98f19 100644 --- a/docs/deployment.md +++ b/docs/deployment.md @@ -1,80 +1,99 @@ # Deployment Guide -This guide covers two deployment modes: +This guide covers deploying HyperFleet API to a Kubernetes cluster via Helm chart. -- **[Kubernetes Deployment (Helm)](#kubernetes-deployment-helm)** — deploying to a cluster via Helm chart (partners, staging, production) -- **[Local Execution](#local-execution)** — running the binary directly on your machine (HF engineers, development, debugging) +For running the binary directly on your machine (development, debugging), see the **[Development Guide](development.md)**. --- -## Kubernetes Deployment (Helm) +## Prerequisites -Deploy HyperFleet API to a Kubernetes cluster using the included Helm chart. Typical use cases: partner deployments, staging, production, engineer validation on a cluster. +Before deploying, ensure you have: -### Container Image +- **Kubernetes cluster** (1.25+) with **Helm 3** installed +- **PostgreSQL database** — either: + - An external managed instance (Cloud SQL, RDS, Azure Database) for production, or + - The chart's built-in PostgreSQL pod for evaluation and testing +- **Container image** — a released hypershift-api image, a pre-built image from your registry, or build your own: + ```bash + make image \ + IMAGE_REGISTRY=quay.io/yourorg \ + IMAGE_TAG=v1.0.0 -#### Building Images + podman push quay.io/yourorg/hyperfleet-api:v1.0.0 + ``` -```bash -# Build container image with default tag -make image +--- -# Build with custom tag -make image IMAGE_TAG=v1.0.0 +## Quick Start -# Build and push to default registry -make image-push +The fastest path to a running deployment. This uses the chart's built-in PostgreSQL and no authentication — suitable for evaluation and testing. -# Build and push to personal Quay registry (for development) -QUAY_USER=myuser make image-dev -``` +**Three values are required** (they have no usable defaults): -#### Image Registry Configuration +| Value | What to set | Example | +|-------|-------------|---------| +| `image.registry` | Container registry domain | `quay.io` | +| `image.repository` | Organization and image name | `openshift-hyperfleet/hyperfleet-api` | +| `image.tag` | Image version | `v1.0.0` | -The `image.registry` value in [`charts/values.yaml`](../charts/values.yaml) defaults to `CHANGE_ME` — a placeholder that intentionally prevents accidental deployments with an incorrect registry. You **must** set this to your actual container registry before deploying. +**Deploy:** -| Environment | Image | -|-------------|-------| -| Development | `quay.io//hyperfleet-api:dev-` | -| Staging | `quay.io/openshift-hyperfleet/hyperfleet-api:v` | -| Production | `quay.io/openshift-hyperfleet/hyperfleet-api:v` | +```bash +helm install hyperfleet-api ./charts/ \ + --namespace hyperfleet-system \ + --create-namespace \ + --set image.registry=quay.io \ + --set image.repository=openshift-hyperfleet/hyperfleet-api \ + --set image.tag=v1.0.0 +``` -Example `values.yaml` overrides (pick one): +**Verify:** -**Production/Staging:** -```yaml -image: - registry: quay.io - repository: openshift-hyperfleet/hyperfleet-api - tag: v1.2.3 +```bash +kubectl get pods --namespace hyperfleet-system +kubectl port-forward svc/hyperfleet-api 8000:8000 --namespace hyperfleet-system +curl http://localhost:8000/api/hyperfleet/v1/clusters ``` -**Development:** -```yaml -image: - registry: quay.io - repository: user/hyperfleet-api - tag: dev-abc1234 -``` +This creates a HyperFleet API deployment, a PostgreSQL StatefulSet, and the necessary Services, ConfigMaps, and Secrets. -#### Custom Registry +--- -```bash -make image \ - IMAGE_REGISTRY=your-registry.io/yourorg \ - IMAGE_TAG=v1.0.0 +## Production Deployment + +For production, use an external managed database and store credentials in a Kubernetes Secret. -podman push your-registry.io/yourorg/hyperfleet-api:v1.0.0 +### Step 1: Create database secret + +```bash +kubectl create secret generic hyperfleet-db-external \ + --namespace hyperfleet-system \ + --from-literal=db.host= \ + --from-literal=db.port=5432 \ + --from-literal=db.name=hyperfleet \ + --from-literal=db.user=hyperfleet \ + --from-literal=db.password= ``` -### Configuration in Kubernetes +### Step 2: Deploy with external database -The Helm chart manages configuration through: -- **ConfigMap** — generated from [`charts/values.yaml`](../charts/values.yaml) for non-sensitive settings -- **Secrets** — database credentials injected via `secretKeyRef` +```bash +helm install hyperfleet-api ./charts/ \ + --namespace hyperfleet-system \ + --create-namespace \ + --set image.registry=quay.io \ + --set image.repository=openshift-hyperfleet/hyperfleet-api \ + --set image.tag=v1.0.0 \ + --set database.postgresql.enabled=false \ + --set database.external.enabled=true \ + --set database.external.secretName=hyperfleet-db-external +``` + +The chart injects database credentials as environment variables using `secretKeyRef` — credentials are never exposed in ConfigMaps or pod specs.
-Configuration Flow in Kubernetes (click to expand) +How configuration flows in Kubernetes (click to expand) ``` ┌─────────────────────────────────────────────────────────────┐ @@ -95,7 +114,6 @@ The Helm chart manages configuration through: │ - server.port │ │ - db.user │ │ _CONFIG │ │ - logging.level │ │ - db.pass │ │ - secretKeyRef│ └──────┬───────────┘ └──────┬──────┘ └───────┬───────┘ - │ │ │ │ │ │ └────────────────────┴────────────────────┘ │ @@ -132,17 +150,71 @@ The Helm chart manages configuration through:
-**Example: Setting required adapters:** +--- + +## Configuring Authentication + +JWT authentication is **disabled by default** in the Helm chart. To enable it: + +```bash +helm install hyperfleet-api ./charts/ \ + --namespace hyperfleet-system \ + --set image.registry=quay.io \ + --set image.repository=openshift-hyperfleet/hyperfleet-api \ + --set image.tag=v1.0.0 \ + --set config.server.jwt.enabled=true \ + --set config.server.jwt.issuer_url=https://your-idp.example.com/auth/realms/your-realm \ + --set config.server.jwk.cert_url=https://your-idp.example.com/auth/realms/your-realm/protocol/openid-connect/certs +``` + +| Value | Required when JWT enabled | Description | +|-------|---------------------------|-------------| +| `config.server.jwt.enabled` | Yes | Set to `true` | +| `config.server.jwt.issuer_url` | Yes | Expected JWT issuer URL for token validation | +| `config.server.jwk.cert_url` | Yes (unless `cert_file` is set) | URL to fetch JWK signing keys | +| `config.server.jwt.audience` | No | Expected JWT audience claim | +| `config.server.jwt.identity_claim` | No | JWT claim used as caller identity (default: `email`) | + +See [Authentication](authentication.md) for full reference including identity header configuration and caller identity details. + +--- + +## Configuring Required Adapters + +Adapters are external components (validation, DNS, pull-secret, HyperShift) that report status back to HyperFleet API. The `required` adapter lists define which adapters must report "ready" before a resource is considered **Reconciled**. + +By default, no adapters are required (`[]`). For production, configure the adapters your deployment uses: + ```bash --set 'config.adapters.required.cluster={validation,dns,pullsecret,hypershift}' \ --set 'config.adapters.required.nodepool={validation,hypershift}' ``` -See [Configuration Guide](config.md) for the complete reference (including [caller identity details](config.md#caller-identity)), and [`charts/values.yaml`](../charts/values.yaml) for all Helm-specific settings. +Or in a values file: + +```yaml +config: + adapters: + required: + cluster: + - validation + - dns + - pullsecret + - hypershift + nodepool: + - validation + - hypershift +``` + +--- -### Schema Validation via Helm +## Configuring Schema Validation -Partners can supply a custom OpenAPI schema for `spec` field validation: +The API can validate cluster and nodepool `spec` fields against a custom OpenAPI schema on every create/update request. This is **disabled by default**. + +### Inline schema + +Provide the schema content directly in your values file: ```yaml validationSchema: @@ -169,9 +241,9 @@ validationSchema: type: string ``` -When `validationSchema.enabled` is `true`, the chart creates a ConfigMap with the schema content, mounts it into the container, and sets `server.openapi_schema_path` in the generated config file to point to it. +### Existing ConfigMap -Alternatively, reference an existing ConfigMap (must contain an `openapi.yaml` key): +Reference a ConfigMap that already exists in the namespace (must contain an `openapi.yaml` key): ```yaml validationSchema: @@ -179,70 +251,13 @@ validationSchema: existingConfigMap: my-validation-schema ``` -### Deploying - -#### Production Deployment - -Deploy with external database (recommended for production): +When enabled, the chart creates (or references) a ConfigMap with the schema, mounts it into the container, and configures the API to validate against it. The API **will fail to start** if the schema is invalid. -##### Step 1: Create database secret - -```bash -kubectl create secret generic hyperfleet-db-external \ - --namespace hyperfleet-system \ - --from-literal=db.host= \ - --from-literal=db.port=5432 \ - --from-literal=db.name=hyperfleet \ - --from-literal=db.user=hyperfleet \ - --from-literal=db.password= -``` - -##### Step 2: Deploy with external database - -```bash -helm install hyperfleet-api ./charts/ \ - --namespace hyperfleet-system \ - --set image.registry=quay.io \ - --set database.postgresql.enabled=false \ - --set database.external.enabled=true \ - --set database.external.secretName=hyperfleet-db-external \ - --set 'config.adapters.required.cluster={validation,dns,pullsecret,hypershift}' \ - --set 'config.adapters.required.nodepool={validation,hypershift}' -``` - -**How it works:** -1. Helm Chart creates a ConfigMap with non-sensitive configuration -2. Your Secret (created in Step 1) contains database credentials -3. Helm Chart injects credentials as environment variables using `secretKeyRef` -4. Application reads credentials from environment variables -5. Credentials are never exposed in pod specs or ConfigMaps - -This is the Kubernetes-native pattern for handling sensitive data securely. - -#### Development Deployment (Using custom images) - -Deploy with built-in PostgreSQL for development and testing (e.g., for engineer validation on a cluster): - -```bash -helm install hyperfleet-api ./charts/ \ - --namespace hyperfleet-system \ - --create-namespace \ - --set image.registry=quay.io \ - --set image.repository=myuser/hyperfleet-api \ - --set image.tag=v1.0.0 \ - --set 'config.adapters.required.cluster={validation,dns,pullsecret,hypershift}' \ - --set 'config.adapters.required.nodepool={validation,hypershift}' -``` - -This creates: -- HyperFleet API deployment -- PostgreSQL StatefulSet -- Services for both components -- ConfigMaps and Secrets +--- -**Note**: The `registry` should contain only the registry domain (e.g., `quay.io`, `docker.io`). The `repository` includes the organization and image name (e.g., `myuser/hyperfleet-api`). +## Managing the Deployment -#### Upgrade +### Upgrade ```bash helm upgrade hyperfleet-api ./charts/ \ @@ -250,26 +265,29 @@ helm upgrade hyperfleet-api ./charts/ \ --set image.tag=v1.1.0 ``` -#### Uninstall +### Uninstall ```bash helm uninstall hyperfleet-api --namespace hyperfleet-system ``` -#### Custom Values File +### Custom Values File -Create a `values.yaml` file for repeatable deployments: +For repeatable deployments, create a `values.yaml` file: ```yaml image: registry: quay.io - repository: myuser/hyperfleet-api + repository: openshift-hyperfleet/hyperfleet-api tag: v1.0.0 config: server: jwt: enabled: true + issuer_url: https://your-idp.example.com/auth/realms/your-realm + jwk: + cert_url: https://your-idp.example.com/auth/realms/your-realm/protocol/openid-connect/certs adapters: required: @@ -306,35 +324,41 @@ helm install hyperfleet-api ./charts/ \ --values values.yaml ``` -### Helm Values Reference +--- + +## Helm Values Reference | Parameter | Description | Default | |-----------|-------------|---------| -| `image.registry` | Container registry | `CHANGE_ME` (must be set explicitly) | -| `image.repository` | Image repository | `openshift-hyperfleet/hyperfleet-api` | -| `image.tag` | Image tag | `latest` | +| `image.registry` | Container registry | `CHANGE_ME` (must be set) | +| `image.repository` | Image repository | `CHANGE_ME` (must be set) | +| `image.tag` | Image tag | `""` (must be set) | | `image.pullPolicy` | Image pull policy | `Always` | +| `config.server.jwt.enabled` | Enable JWT authentication | `false` | | `config.adapters.required.cluster` | Cluster adapters required for Reconciled state | `[]` | | `config.adapters.required.nodepool` | Nodepool adapters required for Reconciled state | `[]` | -| `config.server.jwt.enabled` | Enable JWT authentication | `true` | | `database.postgresql.enabled` | Enable built-in PostgreSQL | `true` | | `database.external.enabled` | Use external database | `false` | -| `database.external.secretName` | Secret containing database credentials | `hyperfleet-db-external` | -| `serviceMonitor.enabled` | Enable Prometheus Operator ServiceMonitor | `false` | -| `serviceMonitor.interval` | Metrics scrape interval | `30s` | -| `serviceMonitor.scrapeTimeout` | Metrics scrape timeout | `10s` | -| `serviceMonitor.labels` | Additional labels for Prometheus selector | `{}` | -| `serviceMonitor.namespace` | Namespace for ServiceMonitor (if different) | `""` | +| `database.external.secretName` | Secret containing database credentials | `""` | +| `validationSchema.enabled` | Enable spec validation schema | `false` | | `replicaCount` | Number of API replicas | `1` | | `resources.limits.cpu` | CPU limit | `500m` | | `resources.limits.memory` | Memory limit | `512Mi` | | `podDisruptionBudget.enabled` | Enable PodDisruptionBudget | `false` | | `podDisruptionBudget.minAvailable` | Minimum available pods during disruption | `1` | -| `podDisruptionBudget.maxUnavailable` | Maximum unavailable pods during disruption | - | +| `serviceMonitor.enabled` | Enable Prometheus Operator ServiceMonitor | `false` | +| `serviceMonitor.interval` | Metrics scrape interval | `30s` | +| `serviceMonitor.scrapeTimeout` | Metrics scrape timeout | `10s` | +| `serviceMonitor.labels` | Additional labels for Prometheus selector | `{}` | +| `serviceMonitor.namespace` | Namespace for ServiceMonitor (if different) | `""` | + +See [Configuration Guide](config.md) for the complete application configuration reference and [`charts/values.yaml`](../charts/values.yaml) for all Helm-specific settings. -### Operations +--- -#### Check Deployment Status +## Operations + +### Check Deployment Status ```bash helm status hyperfleet-api --namespace hyperfleet-system @@ -343,7 +367,7 @@ kubectl get pods --namespace hyperfleet-system kubectl get svc --namespace hyperfleet-system ``` -#### View Logs +### View Logs ```bash kubectl logs -f deployment/hyperfleet-api --namespace hyperfleet-system @@ -353,7 +377,7 @@ kubectl logs -f -l app=hyperfleet-api --namespace hyperfleet-system kubectl logs -f statefulset/hyperfleet-postgresql --namespace hyperfleet-system ``` -#### Troubleshooting +### Troubleshooting ```bash kubectl describe pod --namespace hyperfleet-system @@ -366,9 +390,9 @@ kubectl get configmaps --namespace hyperfleet-system ### Health Checks The deployment includes: -- Liveness probe: `GET /healthz` (port 8080) - Returns 200 if the process is alive -- Readiness probe: `GET /readyz` (port 8080) - Returns 200 when ready to receive traffic, 503 during startup/shutdown -- Metrics: `GET /metrics` (port 9090) - Prometheus metrics endpoint +- Liveness probe: `GET /healthz` (port 8080) — returns 200 if the process is alive +- Readiness probe: `GET /readyz` (port 8080) — returns 200 when ready to receive traffic, 503 during startup/shutdown +- Metrics: `GET /metrics` (port 9090) — Prometheus metrics endpoint ### Scaling @@ -386,7 +410,7 @@ Enable autoscaling via Helm values (`autoscaling.enabled=true`). ### Monitoring -Prometheus metrics available at `http://:9090/metrics`. +Prometheus metrics are available at `http://:9090/metrics`. #### Prometheus Operator Integration @@ -395,39 +419,48 @@ Prometheus metrics available at `http://:9090/metrics`. helm install hyperfleet-api ./charts/ \ --namespace hyperfleet-system \ --set image.registry=quay.io \ + --set image.repository=openshift-hyperfleet/hyperfleet-api \ + --set image.tag=v1.0.0 \ --set serviceMonitor.enabled=true # With custom Prometheus selector labels -helm install hyperfleet-api ./charts/ \ - --namespace hyperfleet-system \ - --set image.registry=quay.io \ - --set serviceMonitor.enabled=true \ - --set serviceMonitor.labels.release=prometheus +--set serviceMonitor.labels.release=prometheus # ServiceMonitor in a different namespace -helm install hyperfleet-api ./charts/ \ - --namespace hyperfleet-system \ - --set image.registry=quay.io \ - --set serviceMonitor.enabled=true \ - --set serviceMonitor.namespace=monitoring +--set serviceMonitor.namespace=monitoring ``` -### Production Checklist +--- + +## Production Checklist Before deploying to production, ensure: +- [ ] **Image**: Specific version tag set (not `latest` or empty) - [ ] **Database**: External managed database configured (Cloud SQL, RDS, Azure Database) -- [ ] **Secrets**: Database credentials stored in Secret (not ConfigMap) -- [ ] **Authentication**: JWT enabled (`config.server.jwt.enabled=true`) +- [ ] **Secrets**: Database credentials stored in a Secret (not ConfigMap) +- [ ] **Authentication**: JWT enabled with issuer and JWK URL configured - [ ] **Adapters**: Required adapters specified for cluster and nodepool +- [ ] **Config file permissions**: Config files (`--config` / `HYPERFLEET_CONFIG`) must be operator-trusted — see [below](#configuration-file-security) - [ ] **Resources**: CPU/memory limits and requests set - [ ] **Replicas**: Multiple replicas configured (`replicaCount >= 2`) -- [ ] **Image**: Specific version tag (not `latest`) - [ ] **Disruption**: PodDisruptionBudget enabled (`podDisruptionBudget.enabled=true`) - [ ] **Monitoring**: ServiceMonitor enabled if using Prometheus Operator -- [ ] **TLS**: HTTPS enabled for API endpoint (optional) -### Complete Example: GKE Deployment +### Configuration File Security + +The configuration file path — set via `--config` or `HYPERFLEET_CONFIG` — is a trust boundary. The API validates configuration **content** on startup (unknown fields are rejected, required values are enforced, TLS/JWT/timeout settings are checked) and will refuse to start with an invalid configuration. However, **path and permission safety is the operator's responsibility**. The API reads whatever file the process can access at the given path without checking permissions or ownership. + +Ensure configuration files are: +- Owned by the service account running the API (e.g., `root:root` or a dedicated user) +- Mode `0600` (owner read/write only) or `0640` if group-readable access is needed +- Never world-writable + +In Helm deployments, the chart mounts the configuration as a ConfigMap volume at `/etc/hyperfleet/config.yaml` with default Kubernetes permissions, which satisfies these requirements. This guidance applies primarily to bare-metal or VM deployments where config files are managed directly on disk. + +--- + +## Complete Example: GKE Deployment ```bash # 1. Build and push image @@ -444,7 +477,7 @@ gcloud container clusters get-credentials my-cluster \ kubectl create namespace hyperfleet-system kubectl config set-context --current --namespace=hyperfleet-system -# 4. Create database secret (for production) +# 4. Create database secret kubectl create secret generic hyperfleet-db-external \ --from-literal=db.host=10.10.10.10 \ --from-literal=db.port=5432 \ @@ -460,6 +493,7 @@ helm install hyperfleet-api ./charts/ \ --set config.server.jwt.enabled=false \ --set database.postgresql.enabled=false \ --set database.external.enabled=true \ + --set database.external.secretName=hyperfleet-db-external \ --set 'config.adapters.required.cluster={validation,dns,pullsecret,hypershift}' \ --set 'config.adapters.required.nodepool={validation,hypershift}' @@ -474,125 +508,8 @@ curl http://localhost:8000/api/hyperfleet/v1/clusters --- -## Local Execution - -Run HyperFleet API directly on your machine without Helm or Kubernetes. Typical use cases: local development, debugging, integration testing. - -### Prerequisites - -- Go 1.25+, Podman, Make -- A running PostgreSQL instance (local container or external) - -### Configuration - -The application loads configuration in this priority order: **CLI flags > environment variables > config file > defaults**. - -**Config file:** Copy the example and adjust as needed: - -```bash -cp configs/config.yaml.example configs/config.yaml -``` - -The loader searches for a config file in this order: -1. `--config` flag (explicit path) -2. `HYPERFLEET_CONFIG` environment variable -3. `/etc/hyperfleet/config.yaml` (production default) -4. `./configs/config.yaml` (development default) - -If none are found, the command fails with `failed to load configuration`. - -**Environment variables:** Override any config value with the `HYPERFLEET_*` prefix: - -```bash -export HYPERFLEET_DATABASE_HOST=localhost -export HYPERFLEET_DATABASE_PORT=5432 -export HYPERFLEET_DATABASE_NAME=hyperfleet -export HYPERFLEET_DATABASE_USER=hyperfleet -export HYPERFLEET_DATABASE_PASSWORD=hyperfleet-dev-password -export HYPERFLEET_LOGGING_LEVEL=debug -export HYPERFLEET_SERVER_PORT=8000 -``` - -See [Configuration Guide](config.md) for the complete reference and all available settings. - -### Database Setup - -**Option A: Local PostgreSQL container (quickest)** - -```bash -make db/setup # Creates a PostgreSQL container via Podman -make db/login # Connect to the database for inspection -``` - -**Option B: External PostgreSQL** - -Point the config or environment variables to your PostgreSQL instance: - -```bash -export HYPERFLEET_DATABASE_HOST=my-postgres-host.example.com -export HYPERFLEET_DATABASE_PORT=5432 -export HYPERFLEET_DATABASE_NAME=hyperfleet -export HYPERFLEET_DATABASE_USER=hyperfleet -export HYPERFLEET_DATABASE_PASSWORD=my-password -export HYPERFLEET_DATABASE_SSL_MODE=require # for remote databases -``` - -### Running - -```bash -# 1. Generate code (required after clone) -make generate-all - -# 2. Build -make build - -# 3. Run migrations -./bin/hyperfleet-api migrate - -# 4. Start the server (no JWT auth) -make run-no-auth - -# Or start with auth enabled: -./bin/hyperfleet-api serve -``` - -### Schema Validation (Local) - -The API validates cluster and nodepool `spec` fields against an OpenAPI schema. Configure the schema path: - -```bash -# Via flag -./bin/hyperfleet-api serve --server-openapi-schema-path ./openapi/openapi.yaml - -# Via environment variable -export HYPERFLEET_SERVER_OPENAPI_SCHEMA_PATH=./openapi/openapi.yaml -``` - -The API **will fail to start** if the configured schema file is missing, unreadable, or invalid. - -### Endpoints - -Once running, the API is available at: - -- **REST API**: `http://localhost:8000/api/hyperfleet/v1/` -- **OpenAPI spec**: `http://localhost:8000/api/hyperfleet/v1/openapi` -- **Swagger UI**: `http://localhost:8000/api/hyperfleet/v1/openapi.html` -- **Liveness probe**: `http://localhost:8080/healthz` -- **Readiness probe**: `http://localhost:8080/readyz` -- **Metrics**: `http://localhost:9090/metrics` - -### CLI Subcommands - -```bash -./bin/hyperfleet-api serve # Start the HTTP server -./bin/hyperfleet-api migrate # Run database migrations -./bin/hyperfleet-api version # Print version, commit, and build date -``` - ---- - ## Related Documentation -- [Configuration Guide](config.md) - Complete configuration reference -- [Authentication](authentication.md) - Authentication configuration -- [Development Guide](development.md) - Local development setup and workflows +- [Configuration Guide](config.md) — Complete configuration reference +- [Authentication](authentication.md) — Authentication configuration +- [Development Guide](development.md) — Local execution, development setup, and workflows diff --git a/docs/development.md b/docs/development.md index ecb4b050..455477d0 100644 --- a/docs/development.md +++ b/docs/development.md @@ -6,14 +6,14 @@ This guide covers the complete development workflow for HyperFleet API, from ini Before running hyperfleet-api, ensure these prerequisites are installed. See [PREREQUISITES.md](../PREREQUISITES.md) for detailed installation instructions. -- **Go 1.24 or higher** +- **Go 1.25 or higher** - **Podman** -- **PostgreSQL 13+** +- **PostgreSQL 14+** - **Make** Verify installations: ```bash -go version # Should show 1.24+ +go version # Should show 1.25+ podman version make --version ``` @@ -32,11 +32,11 @@ go mod download # 3. Build the binary make build -# 4. Setup PostgreSQL database +# 4. Setup PostgreSQL database (see Database Setup below) make db/setup # 5. Run database migrations -./bin/hyperfleet-api migrate +make db/migrate # 6. Verify database schema make db/login @@ -45,42 +45,58 @@ make db/login **Important**: Generated code is not tracked in git. You must run `make generate-all` after cloning to generate both OpenAPI models and mocks. -## Pre-commit Hooks (Optional) +## Configuration -This project uses pre-commit hooks for code quality and security checks. +The application loads configuration in this priority order: **CLI flags > environment variables > config file > defaults**. -### Setup +**Config file:** Copy the example and adjust as needed: ```bash -# Install pre-commit -brew install pre-commit # macOS -# or -pip install pre-commit +cp configs/config.yaml.example configs/config.yaml +``` -# Install hooks -pre-commit install -pre-commit install --hook-type pre-push +The loader searches for a config file in this order: +1. `--config` flag (explicit path) +2. `HYPERFLEET_CONFIG` environment variable +3. `/etc/hyperfleet/config.yaml` (production default) +4. `./configs/config.yaml` (development default) -# Test -pre-commit run --all-files +If none are found, the application continues normally using environment variables and CLI flags. + +**Environment variables:** Override any config value with the `HYPERFLEET_*` prefix: + +```bash +export HYPERFLEET_DATABASE_HOST=localhost +export HYPERFLEET_DATABASE_PORT=5432 +export HYPERFLEET_DATABASE_NAME=hyperfleet +export HYPERFLEET_DATABASE_USER=hyperfleet +export HYPERFLEET_DATABASE_PASSWORD=hyperfleet-dev-password +export HYPERFLEET_LOGGING_LEVEL=debug +export HYPERFLEET_SERVER_PORT=8000 ``` -### For External Contributors +See [Configuration Guide](config.md) for the complete reference and all available settings. -The `.pre-commit-config.yaml` includes `rh-pre-commit` which requires access to Red Hat's internal GitLab. External contributors can skip it: +## Database Setup + +**Option A: Local PostgreSQL container (quickest)** ```bash -# Skip internal hook when committing -SKIP=rh-pre-commit git commit -m "your message" +make db/setup # Creates a PostgreSQL container via Podman +make db/login # Connect to the database for inspection ``` -Or comment out the internal hook in `.pre-commit-config.yaml`. +**Option B: External PostgreSQL** -### Update Hooks +Point the config or environment variables to your PostgreSQL instance: ```bash -pre-commit autoupdate -pre-commit run --all-files +export HYPERFLEET_DATABASE_HOST=my-postgres-host.example.com +export HYPERFLEET_DATABASE_PORT=5432 +export HYPERFLEET_DATABASE_NAME=hyperfleet +export HYPERFLEET_DATABASE_USER=hyperfleet +export HYPERFLEET_DATABASE_PASSWORD=my-password +export HYPERFLEET_DATABASE_SSL_MODE=require # for remote databases ``` ## Running the Service @@ -93,13 +109,7 @@ make run-no-auth **Note**: The default runtime environment is `production` (JWT and TLS enabled). The `make run-no-auth` target explicitly disables authentication for local development. If running the binary directly, set `HYPERFLEET_ENV=development` or use `--server-jwt-enabled=false`. -The service starts on `localhost:8000`: -- REST API: `http://localhost:8000/api/hyperfleet/v1/` -- OpenAPI spec: `http://localhost:8000/api/hyperfleet/v1/openapi` -- Swagger UI: `http://localhost:8000/api/hyperfleet/v1/openapi.html` -- Liveness probe: `http://localhost:8080/healthz` -- Readiness probe: `http://localhost:8080/readyz` -- Metrics: `http://localhost:9090/metrics` +The service starts on `localhost:8000` — see [Accessing the API](../README.md#accessing-the-api) for all available endpoints. ### Testing the API @@ -131,7 +141,29 @@ curl -H "Authorization: Bearer ${TOKEN}" \ http://localhost:8000/api/hyperfleet/v1/clusters ``` -See [Deployment](deployment.md) and [Authentication](authentication.md) for complete configuration options. +See [Deployment](deployment.md) for Kubernetes/Helm deployment and [Authentication](authentication.md) for JWT configuration. + +### Schema Validation (Local) + +The API validates cluster and nodepool `spec` fields against an OpenAPI schema. Configure the schema path: + +```bash +# Via flag +./bin/hyperfleet-api serve --server-openapi-schema-path ./openapi/openapi.yaml + +# Via environment variable +export HYPERFLEET_SERVER_OPENAPI_SCHEMA_PATH=./openapi/openapi.yaml +``` + +The API **will fail to start** if the configured schema file is missing, unreadable, or invalid. + +### CLI Subcommands + +```bash +./bin/hyperfleet-api serve # Start the HTTP server +./bin/hyperfleet-api migrate # Run database migrations +./bin/hyperfleet-api version # Print version, commit, and build date +``` ## Testing @@ -292,6 +324,45 @@ bingo list Tool versions are tracked in `.bingo/*.mod` files and loaded automatically via `include .bingo/Variables.mk` in the Makefile. + +### Pre-commit Hooks (Optional) + +This project uses pre-commit hooks for code quality and security checks. + +#### Setup + +```bash +# Install pre-commit +brew install pre-commit # macOS +# or +pip install pre-commit + +# Install hooks +pre-commit install +pre-commit install --hook-type pre-push + +# Test +pre-commit run --all-files +``` + +#### For External Contributors + +The `.pre-commit-config.yaml` includes `rh-pre-commit` which requires access to Red Hat's internal GitLab. External contributors can skip it: + +```bash +# Skip internal hook when committing +SKIP=rh-pre-commit git commit -m "your message" +``` + +Or comment out the internal hook in `.pre-commit-config.yaml`. + +#### Update Hooks + +```bash +pre-commit autoupdate +pre-commit run --all-files +``` + ### Making Changes 1. **Create a feature branch**: @@ -382,6 +453,8 @@ make test-integration ## Related Documentation -- [Database](database.md) - Database schema and migrations -- [Deployment](deployment.md) - Container and Kubernetes deployment -- [API Resources](api-resources.md) - API endpoints and data models +- [Configuration Guide](config.md) — Complete configuration reference +- [Database](database.md) — Database schema and migrations +- [Deployment](deployment.md) — Kubernetes/Helm deployment (ops) +- [Authentication](authentication.md) — Authentication configuration +- [API Resources](api-resources.md) — API endpoints and data models From bcf389630c010f1ccd3239eb4b2c77dffd2f8001 Mon Sep 17 00:00:00 2001 From: sherine-k Date: Mon, 15 Jun 2026 14:42:19 +0200 Subject: [PATCH 5/5] HYPERFLEET-1168 - docs: fix review findings and reorganize delete lifecycle --- README.md | 28 ++++++---------------------- docs/api-resources.md | 22 +++++++++++++++++++++- docs/database.md | 15 +-------------- docs/deployment.md | 9 ++++++--- 4 files changed, 34 insertions(+), 40 deletions(-) diff --git a/README.md b/README.md index c71a56ed..3e96a344 100755 --- a/README.md +++ b/README.md @@ -1,31 +1,15 @@ # HyperFleet API -HyperFleet API - Simple REST API for cluster lifecycle management. Provides CRUD operations for clusters and status sub-resources. Pure data layer with PostgreSQL integration - no business logic or event creation. Stateless design enables horizontal scaling. +The HyperFleet API is the data storage and status aggregation layer of the HyperFleet platform. It exposes a REST API for CRUD operations on customizable entities (e.g. Cluster, NodePool) backed by PostgreSQL. -## Architecture +The API is the source of truth for desired state of resources that live in remote clusters. It persists resource specs, increments generation on spec changes, and aggregates adapter-reported conditions into Kubernetes-style status. +It does not reconcile infrastructure or publish events itself. For that it collaborates with other HyperFleet components: -### Core Features +* **[Sentinel](https://github.com/openshift-hyperfleet/hyperfleet-sentinel)** component polls the API for unreconciled resources and publishes a message for reconciliation actions +* **[Adapter](https://github.com/openshift-hyperfleet/hyperfleet-adapter)** component listens to events, performs actions needed to reconcile a resource and reports the status to the API. -* OpenAPI 3.0 specification -* Automated Go code generation from OpenAPI -* Cluster and NodePool lifecycle management (create, patch, delete, force-delete) -* Generic resource types (WifConfigs, Channels, Versions) via plugin-based registration -* Adapter-based status reporting with Kubernetes-style conditions -* Soft-delete with adapter finalization and force-delete for stuck resources -* Descriptor-driven delete policies (restrict/cascade) for generic resources -* Configurable caller identity for audit fields (HTTP header or JWT claim) -* Runtime spec validation against custom OpenAPI schemas -* Pagination and search capabilities - -### Technology Stack - -- **Language**: Go 1.25+ -- **API Definition**: OpenAPI 3.0 -- **Code Generation**: oapi-codegen -- **Database**: PostgreSQL with GORM ORM -- **Container Runtime**: Podman -- **Testing**: Gomega + Resty +Stateless design enables horizontal scaling. Adapters fetch full resource state from the API after receiving minimal CloudEvents (anemic events pattern). ## Getting Started diff --git a/docs/api-resources.md b/docs/api-resources.md index c0636c15..184cb93f 100644 --- a/docs/api-resources.md +++ b/docs/api-resources.md @@ -338,6 +338,8 @@ Updates a cluster's `spec` and/or `labels`. Only the fields provided in the requ Soft-deletes a cluster. Sets `deleted_time` and `deleted_by`, increments `generation`, and cascades deletion to child nodepools according to the deletion policy: nodepools with required adapters are soft-deleted (their `deleted_time` and `deleted_by` are set and `generation` is incremented, entering **Finalizing**), while nodepools without required adapters are hard-deleted immediately. The cluster itself enters a **Finalizing** state — it remains in the database until adapters report `Finalized=True`, at which point it is hard-deleted automatically. +For more information, please take a look at the [delete lifecycle](#delete-lifecycle). + **Response (202 Accepted):**
@@ -372,7 +374,9 @@ Once a cluster is soft-deleted, creating or updating child nodepools returns `40 **POST** `/api/hyperfleet/v1/clusters/{cluster_id}/force-delete` -Permanently removes a cluster that is stuck in the Finalizing state. This bypasses the normal adapter finalization flow — use it only when adapters are unable to report `Finalized=True`. The cluster, all its child nodepools, and all associated adapter statuses are hard-deleted immediately. The caller and reason are recorded in an audit log entry before deletion. +Permanently removes a cluster that is stuck in the Finalizing state. This bypasses the normal adapter finalization flow — use it only when adapters are unable to report `Finalized=True`. See [delete lifecycle](#delete-lifecycle) for more details. + +The cluster, all its child nodepools, and all associated adapter statuses are hard-deleted immediately. The caller and reason are recorded in an audit log entry before deletion. The cluster **must** already be soft-deleted (have a `deleted_time`). Calling force-delete on an active cluster returns `409 Conflict`. @@ -558,6 +562,8 @@ Updates a nodepool's `spec` and/or `labels`. Same semantics as [Patch Cluster](# Soft-deletes a nodepool. Same lifecycle as [Delete Cluster](#delete-cluster) — sets `deleted_time` and `deleted_by`, enters the Finalizing state, and is hard-deleted when adapters report `Finalized=True`. +For more information, please refer to [delete lifecycle](#delete-lifecycle). + **Response (202 Accepted):** Full nodepool resource with `deleted_time` and `deleted_by` fields set. ### Force Delete NodePool @@ -576,6 +582,20 @@ Same semantics as [Force Delete Cluster](#force-delete-cluster). The nodepool mu **Response:** `204 No Content` +## Delete Lifecycle + +Resources follow a three-phase delete lifecycle: + +```text +Active ──(DELETE)──▶ Finalizing ──(adapters report Finalized=True)──▶ Hard-Deleted + │ + └──(POST /force-delete)──▶ Hard-Deleted +``` + +1. **Active** — Normal state. Resource is visible in list queries and can be updated. +2. **Finalizing** (soft-deleted) — `DELETE` sets `deleted_time` and `deleted_by`, increments `generation`. The resource stays in the database so adapters can observe the deletion and clean up external state. Soft-deleted records are excluded from list queries by default. Creating new child resources under a finalizing parent is rejected with `409 Conflict`. +3. **Hard-Deleted** — Permanently removed from the database. This happens automatically when all required adapters report `Finalized=True` at the current generation. If adapters are stuck, `POST .../force-delete` bypasses the adapter gating and hard-deletes immediately — but the resource must already be in Finalizing state; calling force-delete on an active resource returns `409 Conflict`. Repeated force-delete calls after hard-deletion return `404 Not Found`. Cluster force-delete cascades to all child NodePools and their adapter statuses. NodePool force-delete only removes the NodePool and its adapter statuses. + ## Pagination and Search ### Pagination diff --git a/docs/database.md b/docs/database.md index 3996ff60..e41170f8 100644 --- a/docs/database.md +++ b/docs/database.md @@ -57,19 +57,6 @@ Flexible schema storage for: - Runtime validation against OpenAPI schema - PostgreSQL JSON query capabilities -### Delete Lifecycle - -Resources follow a three-phase delete lifecycle: - -```text -Active ──(DELETE)──▶ Finalizing ──(adapters report Finalized=True)──▶ Hard-Deleted - │ - └──(POST /force-delete)──▶ Hard-Deleted -``` - -1. **Active** — Normal state. Resource is visible in list queries and can be updated. -2. **Finalizing** (soft-deleted) — `DELETE` sets `deleted_time` and `deleted_by`, increments `generation`. The resource stays in the database so adapters can observe the deletion and clean up external state. Soft-deleted records are excluded from list queries by default. Creating new child resources under a finalizing parent is rejected with `409 Conflict`. -3. **Hard-Deleted** — Permanently removed from the database. This happens automatically when all required adapters report `Finalized=True` at the current generation. If adapters are stuck, `POST .../force-delete` bypasses the adapter gating and hard-deletes immediately — but the resource must already be in Finalizing state; calling force-delete on an active resource returns `409 Conflict`. Repeated force-delete calls after hard-deletion return `404 Not Found`. Cluster force-delete cascades to all child NodePools and their adapter statuses. NodePool force-delete only removes the NodePool and its adapter statuses. Adapter statuses do not use soft delete — they are hard-deleted when their parent resource is hard-deleted. @@ -82,7 +69,7 @@ Generic resources (the `resources` table) use delete policies to control child b | `restrict` | Parent delete is rejected with `409 Conflict` if active children exist | | `cascade` | All children are soft-deleted (marked Finalizing) along with the parent | -Policies are enforced recursively — a cascade on a parent triggers policy checks on grandchildren. For clusters and nodepools, the cascade is built-in: deleting a cluster always cascades the soft-delete to all its nodepools. +Policies are enforced recursively — a cascade on a parent triggers policy checks on children. For clusters and nodepools, the cascade is built-in: deleting a cluster cascades to all its nodepools — those with required adapters are soft-deleted (entering Finalizing), while those without are hard-deleted immediately. Resources without required adapters skip the Finalizing phase entirely — they are hard-deleted immediately on `DELETE`. diff --git a/docs/deployment.md b/docs/deployment.md index 75d98f19..81c87d91 100644 --- a/docs/deployment.md +++ b/docs/deployment.md @@ -10,11 +10,12 @@ For running the binary directly on your machine (development, debugging), see th Before deploying, ensure you have: -- **Kubernetes cluster** (1.25+) with **Helm 3** installed +- **Kubernetes cluster** (1.25+) +- **Helm 3** CLI - **PostgreSQL database** — either: - An external managed instance (Cloud SQL, RDS, Azure Database) for production, or - The chart's built-in PostgreSQL pod for evaluation and testing -- **Container image** — a released hypershift-api image, a pre-built image from your registry, or build your own: +- **Container image** — a released hyperfleet-api image, a pre-built image from your registry, or build your own: ```bash make image \ IMAGE_REGISTRY=quay.io/yourorg \ @@ -154,7 +155,7 @@ The chart injects database credentials as environment variables using `secretKey ## Configuring Authentication -JWT authentication is **disabled by default** in the Helm chart. To enable it: +JWT authentication is **disabled by default** in the Helm chart. To enable it, set the `config.server.jwt.*` properties, like so: ```bash helm install hyperfleet-api ./charts/ \ @@ -265,6 +266,8 @@ helm upgrade hyperfleet-api ./charts/ \ --set image.tag=v1.1.0 ``` +During upgrade, in case schema changes have occurred in the new version, a DB migration will be handled automatically. See [Migration](./database.md#migration-system). + ### Uninstall ```bash