Skip to content

refactor(hardware): replace flat requirements lookup with per-node RequirementsProvider#796

Merged
alex-au merged 1 commit into
mainfrom
00684-refactor-hardware-requirements-provider
Jul 3, 2026
Merged

refactor(hardware): replace flat requirements lookup with per-node RequirementsProvider#796
alex-au merged 1 commit into
mainfrom
00684-refactor-hardware-requirements-provider

Conversation

@alex-au

@alex-au alex-au commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Description

Replaces the static 2-D requirementsRegistry map in pkg/hardware/requirements.go with a RequirementsProvider interface. Each node type owns a []Rule list; predicates fire on DeploymentSpec{NodeType, Profile, Options}, and Reduce() accumulates contributions with Max for CPU/memory and Sum for storage.

All existing hardware floors are reproduced exactly with empty Options (regression-safe). Plugin-preset-aware sizing is wired end-to-end through both block node check and block node install: both commands build a DeploymentSpec with Options["preset"], which flows into NewNodeSafetyCheckWorkflow. Failing checks call ComputeWithWhy() and attach the binding rule's Why string via ErrPropertyWhyFloor; the doctor compact panel surfaces it as " Set by: <reason>" below the Cause line.

Block-node hardware floors are updated to the BN team's confirmed numbers (Slack thread 2026-06-17): testnet/perfnet LFH (n2d-standard-16) 16 vCPU / 64 GB / 5 TB; testnet/perfnet RFH (c3d-standard-8) 8 vCPU / 32 GB / 150 GB; previewnet LFH 16 vCPU / 64 GB / 3 TB; previewnet RFH 8 vCPU / 32 GB / 150 GB; mainnet cloud (n2d-highmem-32) 32 vCPU / 256 GB / 150 GB. A not predicate combinator was added so LFH and RFH rules are mutually exclusive — without it the Max reducer would always select the larger LFH numbers.

SubstrateProvider is registered under "k8s-substrate" for use by #685.

Note: block node check always runs hardware checks regardless of --skip-hardware-checksNewBlockNodePreflightCheckWorkflow hardcodes skipHardwareChecks: false because the command's sole purpose is to validate hardware. --skip-hardware-checks is meaningful only for block node install and other workflow commands that go through NodeSetupWorkflow.

Files changed

File Change
pkg/hardware/provider.go New RequirementsProvider interface (Compute, ComputeWithWhy), DeploymentSpec, copy-on-read Providers() registry
pkg/hardware/rules.go New Rule, Contribution (no Op field), Reduce() with hardcoded Max/Sum reducer
pkg/hardware/predicates.go Unexported predicate combinators: profilePredicate, presetPredicate, hasPlugin, anyProfile, and, or, not (new — inverts a predicate; needed to make LFH/RFH rules mutually exclusive), always
pkg/hardware/provider_block.go Block-node rules aligned with BN team Slack numbers (2026-06-17): local 3/1/1; testnet/perfnet LFH 16/64/5000 and RFH 8/32/150; previewnet LFH 16/64/3000 and RFH 8/32/150; mainnet cloud RFH 32/256/150; mainnet LFH (bare metal, no preset) → no rule fires → no-op check
pkg/hardware/provider_consensus.go Consensus stub returning today's numbers via Reduce
pkg/hardware/provider_substrate.go K8s substrate floor: 2 CPU / 2 GB / 20 GB; registered under "k8s-substrate"
pkg/hardware/requirements.go Stripped to OS constants only; requirementsRegistry and GetRequirements removed
pkg/hardware/node_spec.go NewNodeSpec(spec DeploymentSpec, host HostProfile) — delegates to provider
pkg/hardware/factory.go CreateNodeSpec(spec DeploymentSpec, host HostProfile) new signature; added SupportedNodeTypes()
pkg/hardware/hardware_test.go All call sites updated to DeploymentSpec; mainnet and previewnet test cases updated for new numbers
pkg/hardware/requirements_test.go GetRequirements calls replaced; previewnet block-node comparisons updated; mainnet block no-preset explicitly asserted as no-op
pkg/hardware/provider_block_test.go Rule-level unit tests for each profile×preset combination; TestNotPredicate and TestBlockNodeProvider_MainnetLFH added
pkg/hardware/provider_consensus_test.go Consensus stub: asserts current numbers reproduced
pkg/hardware/provider_substrate_test.go Substrate: asserts K8s floor values
internal/workflows/preflight.go NewNodeSafetyCheckWorkflow(spec hardware.DeploymentSpec, skipHardwareChecks bool); failed checks attach ErrPropertyWhyFloor via ComputeWithWhy
internal/workflows/setup.go NodeSetupWorkflow gains pluginPreset string; non-empty value placed in DeploymentSpec.Options["preset"]
internal/workflows/cluster.go InstallClusterWorkflow gains pluginPreset string; forwarded to NodeSetupWorkflow
internal/workflows/blocknode.go NewBlockNodePreflightCheckWorkflow(spec hardware.DeploymentSpec) — always runs checks (skipHardwareChecks: false)
internal/bll/blocknode/install_handler.go Passes ins.PluginPreset to InstallClusterWorkflow so install path enforces preset-aware floors
cmd/cli/commands/block/node/check.go Validates --plugins, resolves --plugins/--plugin-preset precedence (mirrors init.go); builds normalised DeploymentSpec with Options
cmd/cli/commands/kube/cluster/install.go Updated call site — passes "" (no preset for kube cluster install)
pkg/models/errors.go Added ErrPropertyWhyFloor errorx property
internal/doctor/diagnose.go Added WhyFloor to ErrorDiagnosis; findWhyFloor() helper; displays " Set by: <why>" below Cause in checkErrCompact
docs/quickstart.md Documents --plugin-preset and --plugins for block node check

Review guide

Code review checklist

  • rules.goReduce() uses Max for CPU and memory, Sum for storage — not the other way around
  • provider_block.go — LFH and RFH rules are mutually exclusive via not(presetPredicate("tier1-rfh")); without this, Max would always pick the larger LFH numbers since RFH has lower requirements
  • provider_block.go — numbers match the BN team Slack thread (2026-06-17): testnet/perfnet LFH 16/64/5000; RFH 8/32/150; previewnet LFH 16/64/3000; RFH 8/32/150; mainnet cloud 32/256/150; mainnet LFH no-op
  • blocknode.goNewBlockNodePreflightCheckWorkflow hardcodes skipHardwareChecks: false; --skip-hardware-checks has no effect on block node check (intentional — the command's purpose is to run the checks)
  • setup.goNodeSetupWorkflow passes pluginPreset into DeploymentSpec.Options; empty string means no preset rules fire
  • install_handler.goins.PluginPreset is now forwarded; before this PR it was silently dropped at the InstallClusterWorkflow call site
  • errors.goErrPropertyWhyFloor is declared in pkg/models/ (not internal/doctor/) to avoid circular imports
  • diagnose.gofindWhyFloor() walks the full errorx cause chain, not just the top error
  • preflight.goComputeWithWhy is called in the failing branch only (not on every check) to avoid double-compute
  • No errors.New or fmt.Errorf — all production errors use errorx (enforced by forbidigo lint, task lint clean)
  • All new .go files carry // SPDX-License-Identifier: Apache-2.0 header
  • provider_substrate.go registered under "k8s-substrate" key (consumed by refactor(cli): drop --profile and --node-type from kube cluster install #685)

Unit tests

# macOS — no Linux-only deps in these packages:
go test -tags='!integration' -run . ./pkg/hardware/... ./pkg/models/...

# Full unit suite (UTM VM required for Linux-only packages):
task vm:test:unit

# Lint
task lint

Manual UAT

Build for Linux and run inside the UTM VM (or any amd64/arm64 Linux host).

task build:cli GOOS=linux GOARCH=amd64
# scp binary to the VM
  1. Default check (no preset) — testnet LFH requires 16 cores / 5 TB

    sudo solo-provisioner block node check --profile testnet
    # On a VM with < 16 cores or < 5 TB local disk:
    #   Expected: FAIL — CPU or storage validation
    #   Expected panel: "  Set by: block node testnet LFH: n2d-standard-16, 5 TB local disk"
  2. tier1-rfh preset reduces the floor to 8 cores / 150 GB (c3d-standard-8)

    sudo solo-provisioner block node check --profile testnet --plugin-preset tier1-rfh
    # On a VM with >= 8 cores, >= 32 GB RAM, >= 150 GB disk:
    #   Expected: PASS all checks
    #
    # On a VM with < 8 cores:
    #   Expected: FAIL — "Set by: block node testnet RFH: c3d-standard-8, 150 GB local disk"
  3. Install path also enforces preset-aware checks

    sudo solo-provisioner block node install --profile testnet --plugin-preset tier1-rfh --non-interactive
    # On a VM failing tier1-rfh requirements (e.g. < 8 cores):
    #   Expected: FAIL at the hardware check phase after prompts complete
    #   Expected panel: "  Set by: block node testnet RFH: c3d-standard-8, 150 GB local disk"
  4. kube cluster install also enforces hardware checks via the same path

    sudo solo-provisioner kube cluster install --profile testnet --node-type block --non-interactive
    # kube cluster install passes no preset, so the testnet LFH floor applies (16 cores / 5 TB).
    # On an undersized VM:
    #   Expected: FAIL — CPU or storage validation
    #   Expected panel: "  Set by: block node testnet LFH: n2d-standard-16, 5 TB local disk"
  5. --skip-hardware-checks bypasses checks on install but has no effect on check

    # install — skip-hardware-checks is respected:
    sudo solo-provisioner block node install --profile local --plugin-preset tier1-lfh --non-interactive --skip-hardware-checks
    #   Expected: PASS hardware check phase and proceed to installation
    #
    # check — always enforces checks regardless (--skip-hardware-checks not available on check):
    sudo solo-provisioner block node check --profile testnet
    #   Expected: FAIL storage if < 5 TB — check ignores any skip flag
  6. Why-string appears in compact error panel on hardware failure

    sudo solo-provisioner block node check --profile mainnet --plugin-preset tier1-rfh
    # On an undersized host — expected compact error panel:
    #
    #   ✗ Error: storage validation failed
    #     Cause: ...
    #     Set by: block node mainnet cloud minimum: n2d-highmem-32, 150 GB local disk
  7. Invalid profile gives a clear error listing supported profiles

    sudo solo-provisioner block node check --profile does-not-exist
    # Expected: errorx.IllegalArgument with message listing supported profile names
  8. Invalid --plugins value is rejected early

    sudo solo-provisioner block node check --profile local --plugins "plugin-a, plugin-b"
    # Expected: FAIL — "invalid --plugins value: plugin list entry ... must not have surrounding whitespace"

Risks / rollback

  • All existing (nodeType, profile) pairs return numbers from the confirmed BN team Slack thread — verified by requirements_test.go and provider_block_test.go. No behavioral regression relative to what was intended.
  • --skip-hardware-checks behaviour is intentionally asymmetric: respected by install, ignored by check. This is by design and documented in the checklist above.
  • No template changes — no startup migration impact.
  • Rollback: revert this PR; GetRequirements and requirementsRegistry are restored, callers revert to their prior signatures.

Related Issues

@alex-au alex-au requested a review from a team as a code owner July 1, 2026 06:45
@alex-au alex-au requested a review from crypto-pablo July 1, 2026 06:45
@swirlds-automation

swirlds-automation commented Jul 1, 2026

Copy link
Copy Markdown

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors hardware requirement computation from a static (nodeType, profile) lookup table into a provider-based rules engine keyed by DeploymentSpec{NodeType, Profile, Options}. It also plumbs “why” attribution through failing preflight checks so doctor can display which rule set the binding floor, and introduces a "k8s-substrate" provider intended for substrate-only validation work.

Changes:

  • Replaces the flat requirements registry with RequirementsProvider + per-node providers, driven by Rule + Reduce() (CPU/memory=max, storage=sum).
  • Wires plugin preset / plugin list inputs into block node checks via DeploymentSpec.Options.
  • Adds a whyFloor errorx property and displays "Set by: …" in the doctor compact error panel.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
pkg/hardware/provider.go Adds RequirementsProvider, DeploymentSpec, and a global provider registry.
pkg/hardware/rules.go Introduces Rule/Contribution and the Reduce() reducer logic.
pkg/hardware/predicates.go Adds predicate combinators for rules (profile/preset/plugins).
pkg/hardware/provider_block.go Block-node provider rules reproducing existing floors (with TODOs for TBD values).
pkg/hardware/provider_consensus.go Consensus provider stub using rules/reducer to reproduce current floors.
pkg/hardware/provider_substrate.go Adds "k8s-substrate" provider for Kubernetes substrate minimums.
pkg/hardware/requirements.go Removes the old requirements registry and GetRequirements.
pkg/hardware/node_spec.go Updates node spec creation to compute requirements via providers.
pkg/hardware/factory.go Updates CreateNodeSpec to accept DeploymentSpec and validates inputs.
pkg/hardware/hardware_test.go Updates tests to use DeploymentSpec-driven spec creation.
pkg/hardware/requirements_test.go Updates legacy-coverage tests to use provider registry lookups.
pkg/hardware/provider_block_test.go Adds rule/reducer semantics and why-attribution unit tests for block rules.
pkg/hardware/provider_consensus_test.go Adds unit tests verifying consensus numbers unchanged.
pkg/hardware/provider_substrate_test.go Adds unit tests for substrate provider values and registry presence.
internal/workflows/preflight.go Threads DeploymentSpec through preflight and attaches whyFloor on failures.
internal/workflows/setup.go Keeps NodeSetupWorkflow signature while internally building DeploymentSpec.
internal/workflows/blocknode.go Updates block node preflight workflow to accept DeploymentSpec.
cmd/cli/commands/block/node/check.go Parses preset/plugins flags into DeploymentSpec.Options for sizing.
pkg/models/errors.go Adds ErrPropertyWhyFloor property key.
internal/doctor/diagnose.go Extracts and displays whyFloor as “Set by:” in compact error output.
docs/quickstart.md Documents --plugin-preset / --plugins usage for block node check.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cmd/cli/commands/block/node/check.go
Comment thread pkg/hardware/provider_substrate.go Outdated
Comment thread pkg/hardware/provider.go
Comment thread internal/workflows/preflight.go
Comment thread internal/workflows/preflight.go
Comment thread internal/workflows/preflight.go
@alex-au alex-au force-pushed the 00684-refactor-hardware-requirements-provider branch 6 times, most recently from 693150f to b5a12cd Compare July 2, 2026 13:16
brunodam
brunodam previously approved these changes Jul 3, 2026

@brunodam brunodam left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alex-au alex-au force-pushed the 00684-refactor-hardware-requirements-provider branch from b5a12cd to 3fc47ec Compare July 3, 2026 05:23
…quirementsProvider

Replace the static 2-D map in pkg/hardware/requirements.go with a
RequirementsProvider interface. Each node type owns a []Rule list;
predicates fire on DeploymentSpec{NodeType, Profile, Options}, and
Reduce() accumulates contributions (Max for CPU/memory, Sum for storage).

All existing hardware floors are reproduced exactly with empty Options.
Plugin-preset-aware sizing (e.g. tier1-rfh reduced local disk) is wired
end-to-end: check.go reads --plugin-preset/--plugins into DeploymentSpec,
preflight steps call ComputeWithWhy() on failure and attach Why strings
via ErrPropertyWhyFloor, and the doctor compact panel surfaces them as
"Set by: <reason>" below the Cause line.

NodeSetupWorkflow public signature is unchanged (no cascade to cluster.go
or install_handler.go in this PR). SubstrateProvider is registered under
"k8s-substrate" for use by #685.

Closes #684

Signed-off-by: alex-au <alex.w.aus@gmail.com>
@alex-au alex-au force-pushed the 00684-refactor-hardware-requirements-provider branch from 3fc47ec to 3c2188a Compare July 3, 2026 05:25
@alex-au alex-au merged commit ab7d07d into main Jul 3, 2026
20 checks passed
@alex-au alex-au deleted the 00684-refactor-hardware-requirements-provider branch July 3, 2026 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

refactor(hardware): replace flat requirements lookup with per-node RequirementsProvider

4 participants