Fix 5 deterministically failing CLI E2E tests in quarantine runs#15919
Fix 5 deterministically failing CLI E2E tests in quarantine runs#15919
Conversation
|
@copilot this isn't correct. In the outerloop and quarantined test workflows, we build packages and the cli archives. We should install the cli from these archives, and make these packages available. Investigate how to do this correctly. Maybe in the specialized workflow we can install the cli to the system, so tests that run on the system can use these. Also, the tests should validate that they are running the correct and expected version of |
Done in commit Workflow (
Test code — the tests now detect the pre-installed CLI via the new
Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 15919Or
iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 15919" |
|
Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
|
65a413f to
b5a89f9
Compare
When a non-default channel (e.g. 'ci') is configured, the CLI shows a version selection prompt before the template list. AspireNewAsync now detects either the template list or the version picker, and if the picker appears, accepts the first (latest) version before proceeding. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…PreInstalled Replace the three-mode install system (SourceBuild, GaRelease, PullRequest) with two modes (PreInstalled, GaRelease): - Remove IsRunningInCI, GetRequiredPrNumber(), GetRequiredCommitSha() - Add ExpectedCliVersion (from ASPIRE_CLI_VERSION env var) - Add PreInstalledCliDir (from ASPIRE_CLI_PATH_DIR env var) - Collapse SourceBuild+PullRequest into PreInstalled mode - FindLocalCliBinary now checks ASPIRE_CLI_PATH_DIR first - Remove GetVersionPrefix(), IsStabilizedBuild() (version parsing) - Remove InstallAspireCliFromPullRequestAsync, InstallAspireBundleFromPullRequestAsync, SourceAspireBundleEnvironmentAsync - Simplify VerifyAspireCliVersionAsync to compare against env var - Update Docker InstallAspireCliInDockerAsync for hive setup The CI workflow now pre-installs the CLI before tests run, so tests no longer need to download artifacts at runtime. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When requiresCliArchive is true, the workflow now: 1. Downloads the CLI native archive artifact (Linux only) 2. Extracts it to ~/.aspire/bin and verifies the version 3. Copies built NuGet packages to a local hive 4. Configures the CLI channel to 'ci' 5. Exports ASPIRE_CLI_PATH_DIR and ASPIRE_CLI_VERSION env vars This replaces the old approach where tests downloaded PR artifacts at runtime using GITHUB_PR_NUMBER, GITHUB_PR_HEAD_SHA, and GH_TOKEN. Those env vars are removed from all test step environments. The requiresCliArchive comment is updated to reflect the new env vars. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Playwright CLI skill is no longer selected by default in the skill selection prompt. Navigate down to the playwright-cli entry and toggle it on before confirming. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Migrate DockerDeploymentTests, JavaScriptPublishTests, KubernetesPublishTests, and TypeScriptCodegenValidationTests from the old IsRunningInCI/PullRequest pattern to the new PreInstalledCliDir pattern: - Replace IsRunningInCI checks with PreInstalledCliDir null checks - Remove GetRequiredPrNumber/GetRequiredCommitSha calls - Use SourceAspireCliEnvironmentAsync (not Bundle variant) - Use parameterless VerifyAspireCliVersionAsync Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
In Docker-in-Docker environments, NuGet restore + build during 'aspire start' can exceed the 120-second backchannel timeout, causing spurious failures. Run 'dotnet build' first, then use 'aspire start --no-build' so the backchannel connects quickly. Also add VerifyAspireCliVersionAsync after CLI install. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add VerifyAspireCliVersionAsync after CLI install to catch version mismatches early. Increase 'aspire wait' timeout from 60s to 120s (and prompt wait from 90s to 150s) to account for container image pull times in Docker-in-Docker environments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove ActiveIssue for #15930 from DockerDeploymentTests (CreateAndDeployToDockerCompose, CreateAndDeployToDockerComposeInteractive) and KubernetesPublishTests (CreateAndPublishToKubernetes). These tests were disabled because they failed outside PR context. The PreInstalled CLI mode and CI archive workflow changes fix the root cause, so the tests can run in quarantine/outerloop again. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
e090913 to
99dfb2e
Compare
|
🎬 CLI E2E Test Recordings — 56 recordings uploaded (commit View recordings
📹 Recordings uploaded automatically from CI run #24110515133 |
Description
Fixes all 5 tests listed in issue #8813 as failing 100% of the time in scheduled quarantine CI runs. Each failure was confirmed as a deterministic (non-flaky) root-cause issue rather than intermittent flakiness.
Root causes and fixes
DockerDeploymentTests.CreateAndDeployToDockerComposeandCreateAndDeployToDockerComposeInteractive(issues #15882/#14129, #15871)These tests use
CreateTestTerminal()(runs directly on the GitHub Actions runner, not in Docker). The CLI installation is guarded byif (isCI), whereisCIrequires bothGITHUB_PR_NUMBERandGITHUB_PR_HEAD_SHAto be set. In scheduled quarantine runs (no associated PR) these variables are empty, soisCI = falseand the CLI is never installed — resulting inbash: aspire: command not found.Fix: The
run-tests.ymlworkflow now pre-installs the CLI from thecli-native-archives-linux-x64artifact (built in the same workflow run) whenrequiresCliArchive=trueon Linux. It setsASPIRE_CLI_PATH_DIRandASPIRE_CLI_COMMIT_SHA=github.sha, adds the binary to PATH, and copies built NuGets to~/.aspire/hives/ci/packages/with channel "ci" configured. Tests detect the pre-installed CLI via the newPreInstalledCliDirproperty, skip the install step, and verify the version againstASPIRE_CLI_COMMIT_SHA.KubernetesPublishTests.CreateAndPublishToKubernetes(issue #15870)Same root cause —
aspire newfailed with "command not found" in scheduled CI. Fixed with the same workflow-level pre-install approach.WaitCommandTests.CreateStartWaitAndStopAspireProject(issue #14993)aspire starthas a 120-second backchannel timeout. In Docker-in-Docker a cold NuGet restore + build exceeds this, causing a spurious timeout. Fixed by adding adotnet buildstep beforeaspire start --no-build. Version verification after install was also added.ProjectReferenceTests.TypeScriptAppHostWithProjectReferenceIntegration(issue #15831)aspire wait --timeout 60was too short for Redis image pull and container startup in Docker-in-Docker. Increased to 120 s (test-side wait from 90 s to 150 s). Version verification after install was also added.New workflow steps (
run-tests.yml)When
requiresCliArchive=trueon Linux, two steps are added before tests run:cli-native-archives-linux-x64from the current workflow runaspire-cli-*.tar.gzto$HOME/.aspire/bin/, adds to PATH, setsASPIRE_CLI_PATH_DIRandASPIRE_CLI_COMMIT_SHA, copies built NuGets to a local hive, and configures channel "ci"Updated test helpers
CliE2ETestHelpers.PreInstalledCliDir— readsASPIRE_CLI_PATH_DIR; replaces the removedIsRunningOnGitHubActionsCliE2ETestHelpers.FindLocalCliBinary— checksASPIRE_CLI_PATH_DIRfirst, then repo artifactsCliE2ETestHelpers.GetRequiredCommitSha— also checksASPIRE_CLI_COMMIT_SHAas a fallback toGITHUB_PR_HEAD_SHACliE2ETestHelpers.CreateDockerTestTerminal(SourceBuild) — mountsBUILT_NUGETS_PATHat/built-nugets:roCliE2EAutomatorHelpers.InstallAspireCliInDockerAsync(SourceBuild) — copies/built-nugetsto~/.aspire/hives/ci/packages/and sets channel "ci"Removed
CliE2ETestHelpers.IsRunningOnGitHubActions— superseded byPreInstalledCliDirCliE2EAutomatorHelpers.InstallAspireCliGaAsync— GA release install was the wrong approach; CI workflows build and use their own archivesChecklist