[DNM] Fix Local Zone subnet validation in cucushift-installer-check-aws-custom-vpc#80456
Conversation
…tom-vpc Problem: The validation step cucushift-installer-check-aws-custom-vpc fails for Local Zone configurations because it expects ALL subnets to have the kubernetes.io/cluster/<infra-id>:shared tag, but Local Zone subnets are correctly tagged as kubernetes.io/cluster/unmanaged:true instead. Failed Job: periodic-ci-openshift-openshift-tests-private-release-4.14-amd64-nightly-aws-ipi-localzone-byo-subnet-sdn-f60-destructive Error: FAIL: check tag [kubernetes.io/cluster/ci-op-ghrnctn8-00fbe-l6scc:shared], found 6, but expect 7 Root Cause Analysis: The script validates ALL subnets from install-config must have the 'shared' tag (lines 73-74), but doesn't account for Edge/Local Zone subnets that are intentionally tagged as 'unmanaged'. The ENABLE_AWS_EDGE_ZONE parameter exists but is only used for EdgeNode validation later (line 193), after the main tag check has already failed. When the provision chain sets ENABLE_AWS_EDGE_ZONE=yes: - 7 total subnets (6 regular + 1 Local Zone) - 6 subnets have 'shared' tag (correct) - 1 Local Zone subnet has 'unmanaged' tag (correct) - Script expects 7 with 'shared' tag (incorrect) Solution: Updated the tag count logic to exclude subnets tagged as 'unmanaged' when ENABLE_AWS_EDGE_ZONE=yes. This allows Local Zone subnets to be properly excluded from the managed subnet count while still validating that all managed subnets have the required cluster tag. Changes: - Added conditional logic to check ENABLE_AWS_EDGE_ZONE parameter - Count unmanaged subnets when Edge Zone is enabled - Subtract unmanaged count from expected managed subnet count - Added INFO logging for Edge Zone subnet count calculation Testing: /test rehearse periodic-ci-openshift-openshift-tests-private-release-4.14-amd64-nightly-aws-ipi-localzone-byo-subnet-sdn-f60-destructive Expected output after fix: INFO: Edge Zone enabled - total subnets=7, unmanaged subnets=1, expecting managed subnets=6 PASS: check tag [kubernetes.io/cluster/<infra-id>:shared]
|
/DNM |
|
@MrSanketkumar: The specified target(s) for The following commands are available to trigger optional jobs: Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/hold |
WalkthroughThis PR updates a CI step script to conditionally adjust expected subnet count validation based on AWS edge zone configuration. When edge zones are enabled, the expected count subtracts unmanaged subnets from the total; otherwise it uses the original total-subnet value. ChangesEdge-zone-aware subnet count validation
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 14 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (14 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: MrSanketkumar The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
[REHEARSALNOTIFIER]
A total of 1539 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs. A full list of affected jobs can be found here Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
ci-operator/step-registry/cucushift/installer/check/aws/custom-vpc/cucushift-installer-check-aws-custom-vpc-commands.sh (1)
78-80: ⚡ Quick winConsider validating numeric values before arithmetic.
The edge-zone branch performs arithmetic (
total_subnets - unmanaged_cnt) without validating that the jq query succeeded or thatunmanaged_cntis numeric. If the jq filter fails silently or returns non-numeric output, the arithmetic could produce an incorrect or negativeexpect_cnt, leading to a confusing failure message at line 87.🛡️ Proposed validation
# For Edge Zone scenarios, exclude subnets tagged as 'kubernetes.io/cluster/unmanaged:true' - unmanaged_cnt=$(jq -r '[.Subnets[] | select(any(.Tags[]; .Key == "kubernetes.io/cluster/unmanaged" and .Value == "true"))] | length' "$out") + unmanaged_cnt=$(jq -r '[.Subnets[] | select(any(.Tags[]; .Key == "kubernetes.io/cluster/unmanaged" and .Value == "true"))] | length' "$out" || echo "0") + if ! [[ "$unmanaged_cnt" =~ ^[0-9]+$ ]]; then + echo "ERROR: Failed to count unmanaged subnets" + exit 1 + fi total_subnets=$(echo "${ic_subnets}" | wc -w) expect_cnt=$((total_subnets - unmanaged_cnt)) + if ((expect_cnt < 0)); then + echo "ERROR: Unmanaged subnet count ($unmanaged_cnt) exceeds total subnets ($total_subnets)" + exit 1 + fi echo "INFO: Edge Zone enabled - total subnets=${total_subnets}, unmanaged subnets=${unmanaged_cnt}, expecting managed subnets=${expect_cnt}"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ci-operator/step-registry/cucushift/installer/check/aws/custom-vpc/cucushift-installer-check-aws-custom-vpc-commands.sh` around lines 78 - 80, Validate that the jq call produced a numeric unmanaged_cnt and that total_subnets is numeric before computing expect_cnt: check jq's exit status and/or test unmanaged_cnt with a numeric regex (e.g., ^[0-9]+$) and treat non-numeric or failed jq results as an error or default to 0, likewise validate total_subnets derived from ic_subnets, then compute expect_cnt only after both are sanitized so subtraction cannot produce unexpected negative or non-numeric values; update the variables mentioned (unmanaged_cnt, total_subnets, expect_cnt) and add a clear error/log path if validation fails.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@ci-operator/step-registry/cucushift/installer/check/aws/custom-vpc/cucushift-installer-check-aws-custom-vpc-commands.sh`:
- Line 78: The jq invocations use unquoted variable expansions which can cause
word-splitting/globbing (SC2086); update the commands that compute unmanaged_cnt
and the similar expressions on lines 79 and 83 to quote the file/path variable
(e.g., change occurrences of $out to "$out") so the jq arguments are passed as a
single token; locate the three jq calls that reference the shell variable $out
(the one setting unmanaged_cnt and the other two nearby jq uses) and wrap the
expansions in double quotes.
---
Nitpick comments:
In
`@ci-operator/step-registry/cucushift/installer/check/aws/custom-vpc/cucushift-installer-check-aws-custom-vpc-commands.sh`:
- Around line 78-80: Validate that the jq call produced a numeric unmanaged_cnt
and that total_subnets is numeric before computing expect_cnt: check jq's exit
status and/or test unmanaged_cnt with a numeric regex (e.g., ^[0-9]+$) and treat
non-numeric or failed jq results as an error or default to 0, likewise validate
total_subnets derived from ic_subnets, then compute expect_cnt only after both
are sanitized so subtraction cannot produce unexpected negative or non-numeric
values; update the variables mentioned (unmanaged_cnt, total_subnets,
expect_cnt) and add a clear error/log path if validation fails.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 26a9aede-c88c-4ef0-a3c4-f8036a026c62
📒 Files selected for processing (1)
ci-operator/step-registry/cucushift/installer/check/aws/custom-vpc/cucushift-installer-check-aws-custom-vpc-commands.sh
| # Calculate expected count based on whether Edge Zones are enabled | ||
| if [[ ${ENABLE_AWS_EDGE_ZONE:-no} == "yes" ]]; then | ||
| # For Edge Zone scenarios, exclude subnets tagged as 'kubernetes.io/cluster/unmanaged:true' | ||
| unmanaged_cnt=$(jq -r '[.Subnets[] | select(any(.Tags[]; .Key == "kubernetes.io/cluster/unmanaged" and .Value == "true"))] | length' $out) |
There was a problem hiding this comment.
Quote variable expansions to prevent word splitting.
Lines 78, 79, and 83 use unquoted variable expansions that could cause word splitting or globbing issues, flagged by Shellcheck SC2086. While unlikely to cause issues in practice (file paths and AWS subnet IDs don't typically contain spaces), quoting is a shell best practice.
🛡️ Proposed fix
- unmanaged_cnt=$(jq -r '[.Subnets[] | select(any(.Tags[]; .Key == "kubernetes.io/cluster/unmanaged" and .Value == "true"))] | length' $out)
- total_subnets=$(echo ${ic_subnets} | wc -w)
+ unmanaged_cnt=$(jq -r '[.Subnets[] | select(any(.Tags[]; .Key == "kubernetes.io/cluster/unmanaged" and .Value == "true"))] | length' "$out")
+ total_subnets=$(echo "${ic_subnets}" | wc -w)
expect_cnt=$((total_subnets - unmanaged_cnt))
echo "INFO: Edge Zone enabled - total subnets=${total_subnets}, unmanaged subnets=${unmanaged_cnt}, expecting managed subnets=${expect_cnt}"
else
- expect_cnt=$(echo ${ic_subnets} | wc -w)
+ expect_cnt=$(echo "${ic_subnets}" | wc -w)
fiAlso applies to: 79-79, 83-83
🧰 Tools
🪛 Shellcheck (0.11.0)
[info] 78-78: Double quote to prevent globbing and word splitting.
(SC2086)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/cucushift/installer/check/aws/custom-vpc/cucushift-installer-check-aws-custom-vpc-commands.sh`
at line 78, The jq invocations use unquoted variable expansions which can cause
word-splitting/globbing (SC2086); update the commands that compute unmanaged_cnt
and the similar expressions on lines 79 and 83 to quote the file/path variable
(e.g., change occurrences of $out to "$out") so the jq arguments are passed as a
single token; locate the three jq calls that reference the shell variable $out
(the one setting unmanaged_cnt and the other two nearby jq uses) and wrap the
expansions in double quotes.
Source: Linters/SAST tools
|
/test rehearse periodic-ci-openshift-openshift-tests-private-release-4.14-amd64-nightly-aws-ipi-localzone-byo-subnet-sdn-f60-destructive |
|
@MrSanketkumar: The specified target(s) for The following commands are available to trigger optional jobs: Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.14-amd64-nightly-aws-ipi-localzone-byo-subnet-sdn-f60-destructive |
|
@MrSanketkumar: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.14-amd64-nightly-aws-ipi-localzone-byo-subnet-sdn-f60-destructive |
|
@MrSanketkumar: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@MrSanketkumar: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Problem:
The validation step cucushift-installer-check-aws-custom-vpc fails for Local Zone configurations because it expects ALL subnets to have the kubernetes.io/cluster/:shared tag, but Local Zone subnets are correctly tagged as kubernetes.io/cluster/unmanaged:true instead.
Failed Job:
periodic-ci-openshift-openshift-tests-private-release-4.14-amd64-nightly-aws-ipi-localzone-byo-subnet-sdn-f60-destructive
Error:
FAIL: check tag [kubernetes.io/cluster/ci-op-ghrnctn8-00fbe-l6scc:shared], found 6, but expect 7
Root Cause Analysis:
The script validates ALL subnets from install-config must have the 'shared' tag (lines 73-74), but doesn't account for Edge/Local Zone subnets that are intentionally tagged as 'unmanaged'. The ENABLE_AWS_EDGE_ZONE parameter exists but is only used for EdgeNode validation later (line 193), after the main tag check has already failed.
When the provision chain sets ENABLE_AWS_EDGE_ZONE=yes:
Solution:
Updated the tag count logic to exclude subnets tagged as 'unmanaged' when ENABLE_AWS_EDGE_ZONE=yes. This allows Local Zone subnets to be properly excluded from the managed subnet count while still validating that all managed subnets have the required cluster tag.
Changes:
Testing:
/test rehearse periodic-ci-openshift-openshift-tests-private-release-4.14-amd64-nightly-aws-ipi-localzone-byo-subnet-sdn-f60-destructive
Expected output after fix:
INFO: Edge Zone enabled - total subnets=7, unmanaged subnets=1, expecting managed subnets=6 PASS: check tag [kubernetes.io/cluster/:shared]
Summary by CodeRabbit
Bug Fixes