chore: make Windows smoke block resilient to vault + setup failures#3344
chore: make Windows smoke block resilient to vault + setup failures#3344Daniel Ayaz (danielayaz) wants to merge 1 commit intomainfrom
Conversation
…t metric
Two fixes after the first windows/amd64 run failed at vault auth:
1. The hardcoded `vault kv get -field=CONFLUENT_CLOUD_EMAIL` calls failed
because the actual field names in v1/ci/kv/apif/cli/live-testing-data
are not the same as the env var names the live test expects. Linux
gets away with this because vault-sem-get-secret normalizes field
names; vault-sem-get-secret is Linux-only.
Replaced the hardcoded lookups with a Set-VaultFields helper that:
- Pulls each secret as JSON
- Logs the field names it found (so future failures are debuggable)
- Exports every field under BOTH its original name AND an
UPPER_SNAKE_CASE variant, covering every common naming convention
(email/EMAIL, confluent-cloud-email, confluent_cloud_email, etc.)
2. Wrapped the entire vault + build + test sequence in one PowerShell
try/catch/finally block. The emitter is built BEFORE this block, and
the finally clause ALWAYS calls it with the final RESULT. So vault
auth failures, build failures, test failures, or any thrown error
now report cli_smoke_test_result=0 instead of leaving the windows
panel showing "No data" on infra issues.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🎉 All Contributor License Agreements have been signed. Ready to merge. |
There was a problem hiding this comment.
Pull request overview
Updates the Windows/amd64 Semaphore smoke-test block to be more resilient to Vault/infra failures and to consistently report the OTLP smoke-test metric (so the Windows panel doesn’t show “No data” on failure).
Changes:
- Replaces hardcoded
vault kv get -field=...lookups with a PowerShell helper that reads Vault secrets as JSON and exports all fields (including an uppercased variant). - Wraps Vault auth + CLI build + smoke test in a
try/catch/finallyso Slack + metric emission happen even when earlier steps fail. - Builds the metric emitter before the failable section.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - $Env:SMOKE_COMMAND = "environment_list" | ||
|
|
||
| # Build emitter + CLI directly (no `make` available without bash; mirrors Makefile targets) | ||
| # Build the emitter FIRST so it's available even if vault/test setup fails later. |
| Set-VaultFields -Path "v1/ci/kv/apif/cli/live-testing-data" | ||
| Set-VaultFields -Path "v1/ci/kv/apif/cli/slack-notifications-live-testing" | ||
|
|
||
| # Build the CLI under test. |
| # Report pass/fail metric (never fails the pipeline; emitter always exits 0) | ||
| & .\bin\otel-smoke-metric.exe $RESULT | ||
| # Always emit the metric so the windows/amd64 panel never goes to "No data" on infra failure. | ||
| & .\bin\otel-smoke-metric.exe $RESULT |
Can we guarantee that we won't also log the secret values themselves? |
|





Summary
Follow-up to #3342. The first scheduled run of the smoke pipeline against
mainhad:vault kv get -field=CONFLUENT_CLOUD_EMAIL …withField "CONFLUENT_CLOUD_EMAIL" not present in secret. Because the failure was on a regular YAML command line, the job aborted before reaching the metric emitter, so the windows panel showed "No data" instead of0.This PR fixes both issues.
Changes (all in
.semaphore/smoke-tests.yml, windows/amd64 block only)1. Dynamic vault field handling
Replaced 3 hardcoded
vault kv get -field=…calls with aSet-VaultFieldsPowerShell helper that:vault kv get -format=jsonFields in v1/ci/kv/apif/cli/live-testing-data: email, passwordand we'll know exactly what's thereUPPER_SNAKE_CASEvariant, covering every common Vault → env-var convention (email/EMAIL,confluent-cloud-email/CONFLUENT_CLOUD_EMAIL,confluent_cloud_email, etc.) at onceThis mirrors what
vault-sem-get-secretdoes on Linux, which is why Linux works without specifying field names.vault-sem-get-secretships with the Semaphore Linux agent toolbox and is not available on Windows agents.2. Always emit the metric
The emitter binary is now built before the failable section. The vault auth + CLI build + smoke test are wrapped in one PowerShell
try / catch / finally, and thefinallyclause always callsotel-smoke-metric.exewith the captured$RESULT:$RESULT = "0"is the default"1"after a clean test passSo vault failures, build failures, test failures, or any thrown exception all report
cli_smoke_test_result{os="windows",arch="amd64"} = 0to Heracles. The windows panel in cc-terraform-monitoring#9639 will now go red on infra failures instead of going silent.Test plan
Fields in v1/ci/kv/apif/cli/live-testing-data: …line lists field names that include something matchingCONFLUENT_CLOUD_EMAIL/CONFLUENT_CLOUD_PASSWORD(after upper-snake-case normalization)cli_smoke_test_result{os="windows",arch="amd64"} = 1shows up in Heraclescli_smoke_test_result{os="windows",arch="amd64"} = 0shows up — the new always-emit guarantee makes the failure visibleOut of scope
{os, arch}label, which is the standard Prometheus per-platform pattern.🤖 Generated with Claude Code