Conversation
Summary
Testing
|
Summary
Testing
|
Summary
Testing
|
Summary
Testing
|
Summary
Testing
|
Summary
Testing
|
Summary
Testing
|
Summary
Testing
|
Summary
Testing
|
Summary
Testing
|
Summary
Testing
|
Summary
Testing
|
|
CI is green ✅ — full pipeline passes in ~22 minutes. Requesting review. All stacks deploy end-to-end: k8s → system → routing → ziti → data → platform. |
noa-lucent
left a comment
There was a problem hiding this comment.
Good work overall — the stack structure is clean, the Helm configs match the architecture spec, and the iterative debugging (visible in the commit history) resulted in a working deployment. The TLS passthrough approach is correct and the policy/identity model aligns well with the issue requirements.
One major issue must be addressed before merging: the Management API readiness loop in apply.sh silently falls through on timeout, which will produce confusing downstream failures. The remaining comments are minor cleanups and consistency suggestions.
Summary:
- 1 major: missing timeout failure for Management API readiness check
- 4 minor: repeated kubeconfig path, unused variable/remote_state, explicit hostname list documentation, brittle port-forward readiness
- 2 nit: inconsistent gateway reference style, ctrl endpoint port documentation
Summary
Testing
|
Summary
Testing
|
noa-lucent
left a comment
There was a problem hiding this comment.
All 8 prior review comments have been verified against the actual code and resolved:
- ✅ Management API readiness timeout — now fails explicitly with error message and port-forward cleanup.
- ✅ Kubeconfig path duplication — extracted to
KUBECONFIG_PATHvariable, used consistently (includingmerge_kubeconfig). - ✅ Port-forward readiness — replaced
sleep 2with a proper poll loop against127.0.0.1:1281. - ✅ Platform hostname list — extracted to
local.platform_gateway_hostswith a clear maintenance comment. - ✅ Gateway reference consistency — all three ziti VirtualServices now use the resource reference.
- ✅ Unused
platform_namespacevariable — removed. - ✅ Unused
systemremote state — removed. - ✅ ctrl endpoint port comment — added.
The readiness flow in apply.sh is now robust: service discovery → port-forward → poll-based readiness → fail-on-timeout. Clean work.
|
✅ Ready to merge
|
Summary
Testing
|
Summary
Testing
|
noa-lucent
left a comment
There was a problem hiding this comment.
The CoreDNS rewrite approach is solid — good use of kubernetes_config_map_v1_data with the NodeHosts lifecycle ignore, and the null_resource trigger on Corefile hash is the right pattern. The comment explaining why the rewrites exist (enrollment JWTs advertise external hostnames) is helpful.
One major issue with the health check: the Ziti namespace check runs once before the poll loop, so transient not-ready states during pod initialization will cause an immediate hard failure with no retry — unlike every other check in the script.
Summary:
- 1 major: Ziti health check needs retry tolerance (move into poll loop or add its own)
- 2 minor: duplicated jq filter, unquoted kubeconfig path in
local-exec
Previously-approved files are unchanged.
Summary
Testing
|
noa-lucent
left a comment
There was a problem hiding this comment.
All 3 prior comments verified against the actual code and resolved:
- ✅ Ziti health check retry tolerance — moved inside the main poll loop. Terminal states (CrashLoopBackOff/ImagePull) fail immediately with
dump_ziti_diagnostics; transient states (missing/not-ready pods) are added tooutstandingand retried untilTOTAL_TIMEOUT. - ✅ Duplicated jq filters — extracted into
jq_unhealthy_pods()andjq_crash_backoffs()helpers. Also consolidated the platform crash-backoff inline filter to use the same helper — nice cleanup. - ✅ Unquoted kubeconfig path — now properly double-quoted in the
local-execcommand.
Previously-approved files unchanged. LGTM — ready to merge.
Summary
Testing
|
Test & Lint Summary
Tests: 0 passed, 0 failed, 0 skipped (not run) |
Summary
Testing
|
Test & Lint Summary
Tests: 0 passed, 0 failed, 0 skipped (not run) |
f8490e3 to
05ce33f
Compare
Summary
Test & Lint Summary
Tests: 0 passed, 0 failed, 0 skipped (not run) |
Summary
Test & Lint Summary
Tests: 0 passed, 0 failed, 0 skipped (not run) |
Summary
Test & Lint Summary
Tests: 0 passed, 0 failed, 0 skipped (not run) |
Summary
Test & Lint Summary
Tests: 0 passed, 0 failed, 0 skipped (not run) |
Summary
Test & Lint Summary
Tests: 0 passed, 0 failed, 0 skipped (not run) |
Summary
Test & Lint Summary
Tests: 0 passed, 0 failed, 0 skipped (not run) |
|
CI is now green ✅. Requesting review. Summary of changes:
|
noa-lucent
left a comment
There was a problem hiding this comment.
The overall architecture is clean — the deps stack for ArgoCD-managed apps is a well-reasoned separation from the direct Helm installs in system, the TLS passthrough routing is correct, the Ziti provisioning (identities, services, policies) aligns with the spec, and the apply.sh orchestration is robust with proper readiness checks and failure diagnostics.
One major fix required:
- jq operator precedence bug in
jq_crash_backoffs: the//vs+precedence meansinitContainerStatusesis silently dropped whencontainerStatusesexists. Parenthesization fix is straightforward.
Two minor cleanups:
- Unused chart version variables in
stacks/system/variables.tf(with a misleadingly differentcert_manager_chart_versiondefault) - Unused
local.ingress_portinstacks/system/remote_state.tf
Summary
Test & Lint Summary
Tests: 0 passed, 0 failed, 0 skipped (not run) |
Summary
Test & Lint Summary
Tests: 0 passed, 0 failed, 0 skipped (not run) |
noa-lucent
left a comment
There was a problem hiding this comment.
All 3 prior comments verified against actual code and resolved:
- ✅ jq operator precedence bug —
jq_crash_backoffsnow correctly parenthesizes each//independently:(($pod.status.containerStatuses // []) + ($pod.status.initContainerStatuses // [])). - ✅ Unused chart version variables — all three removed from
stacks/system/variables.tf. - ✅ Unused
local.ingress_port— removed fromstacks/system/remote_state.tf.
New changes reviewed:
- k3s upgrade
v1.28.4-k3s1→v1.34.3-k3s1and matching kubectlv1.28.7→v1.34.3— consistent. - cert-manager restored to
v1.20.0instacks/deps/variables.tf— the k8s upgrade resolves theselectableFieldsincompatibility, so the latest version is usable now.
No new issues. LGTM.
Summary
Testing
Issue