Skip to content

Fix 5 deterministically failing CLI E2E tests in quarantine runs#15919

Draft
Copilot wants to merge 8 commits intomainfrom
copilot/fix-cli-endtoend-tests
Draft

Fix 5 deterministically failing CLI E2E tests in quarantine runs#15919
Copilot wants to merge 8 commits intomainfrom
copilot/fix-cli-endtoend-tests

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 7, 2026

Description

Fixes all 5 tests listed in issue #8813 as failing 100% of the time in scheduled quarantine CI runs. Each failure was confirmed as a deterministic (non-flaky) root-cause issue rather than intermittent flakiness.

Root causes and fixes

DockerDeploymentTests.CreateAndDeployToDockerCompose and CreateAndDeployToDockerComposeInteractive (issues #15882/#14129, #15871)

These tests use CreateTestTerminal() (runs directly on the GitHub Actions runner, not in Docker). The CLI installation is guarded by if (isCI), where isCI requires both GITHUB_PR_NUMBER and GITHUB_PR_HEAD_SHA to be set. In scheduled quarantine runs (no associated PR) these variables are empty, so isCI = false and the CLI is never installed — resulting in bash: aspire: command not found.

Fix: The run-tests.yml workflow now pre-installs the CLI from the cli-native-archives-linux-x64 artifact (built in the same workflow run) when requiresCliArchive=true on Linux. It sets ASPIRE_CLI_PATH_DIR and ASPIRE_CLI_COMMIT_SHA=github.sha, adds the binary to PATH, and copies built NuGets to ~/.aspire/hives/ci/packages/ with channel "ci" configured. Tests detect the pre-installed CLI via the new PreInstalledCliDir property, skip the install step, and verify the version against ASPIRE_CLI_COMMIT_SHA.

KubernetesPublishTests.CreateAndPublishToKubernetes (issue #15870)

Same root cause — aspire new failed with "command not found" in scheduled CI. Fixed with the same workflow-level pre-install approach.

WaitCommandTests.CreateStartWaitAndStopAspireProject (issue #14993)

aspire start has a 120-second backchannel timeout. In Docker-in-Docker a cold NuGet restore + build exceeds this, causing a spurious timeout. Fixed by adding a dotnet build step before aspire start --no-build. Version verification after install was also added.

ProjectReferenceTests.TypeScriptAppHostWithProjectReferenceIntegration (issue #15831)

aspire wait --timeout 60 was too short for Redis image pull and container startup in Docker-in-Docker. Increased to 120 s (test-side wait from 90 s to 150 s). Version verification after install was also added.

New workflow steps (run-tests.yml)

When requiresCliArchive=true on Linux, two steps are added before tests run:

  1. Download CLI archive — downloads cli-native-archives-linux-x64 from the current workflow run
  2. Install CLI from archive — extracts aspire-cli-*.tar.gz to $HOME/.aspire/bin/, adds to PATH, sets ASPIRE_CLI_PATH_DIR and ASPIRE_CLI_COMMIT_SHA, copies built NuGets to a local hive, and configures channel "ci"

Updated test helpers

  • CliE2ETestHelpers.PreInstalledCliDir — reads ASPIRE_CLI_PATH_DIR; replaces the removed IsRunningOnGitHubActions
  • CliE2ETestHelpers.FindLocalCliBinary — checks ASPIRE_CLI_PATH_DIR first, then repo artifacts
  • CliE2ETestHelpers.GetRequiredCommitSha — also checks ASPIRE_CLI_COMMIT_SHA as a fallback to GITHUB_PR_HEAD_SHA
  • CliE2ETestHelpers.CreateDockerTestTerminal (SourceBuild) — mounts BUILT_NUGETS_PATH at /built-nugets:ro
  • CliE2EAutomatorHelpers.InstallAspireCliInDockerAsync (SourceBuild) — copies /built-nugets to ~/.aspire/hives/ci/packages/ and sets channel "ci"

Removed

  • CliE2ETestHelpers.IsRunningOnGitHubActions — superseded by PreInstalledCliDir
  • CliE2EAutomatorHelpers.InstallAspireCliGaAsync — GA release install was the wrong approach; CI workflows build and use their own archives

Checklist

  • Is this feature complete?
    • Yes. Ready to ship.
    • No. Follow-up changes expected.
  • Are you including unit tests for the changes and scenario tests if relevant?
    • Yes
    • No
  • Did you add public API?
    • Yes
    • No
  • Does the change make any security assumptions or guarantees?
    • Yes
    • No
  • Does the change require an update in our Aspire docs?
    • Yes
    • No

@radical
Copy link
Copy Markdown
Member

radical commented Apr 7, 2026

@copilot this isn't correct. In the outerloop and quarantined test workflows, we build packages and the cli archives. We should install the cli from these archives, and make these packages available. Investigate how to do this correctly. Maybe in the specialized workflow we can install the cli to the system, so tests that run on the system can use these. Also, the tests should validate that they are running the correct and expected version of aspire cli.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 7, 2026

@copilot this isn't correct. In the outerloop and quarantined test workflows, we build packages and the cli archives. We should install the cli from these archives, and make these packages available. ...

Done in commit 1f9cf43. Here's what changed:

Workflow (run-tests.yml) — when requiresCliArchive=true on Linux, two new steps run before the tests:

  1. Download the cli-native-archives-linux-x64 artifact built earlier in the same workflow run
  2. Extract the aspire-cli-*.tar.gz archive to $HOME/.aspire/bin/, add it to PATH, set ASPIRE_CLI_PATH_DIR=$HOME/.aspire/bin and ASPIRE_CLI_COMMIT_SHA=${{ github.sha }}, then copy built NuGets to ~/.aspire/hives/ci/packages/ and configure CLI channel "ci" so aspire new/aspire add resolve CI-built packages without going to nuget.org

Test code — the tests now detect the pre-installed CLI via the new PreInstalledCliDir property (reads ASPIRE_CLI_PATH_DIR):

  • DockerDeploymentTests / KubernetesPublishTests: replace the isGitHubActions → GA install branch with preInstalledCliDir is not null → skip install + verify version against ASPIRE_CLI_COMMIT_SHA
  • GetRequiredCommitSha() now also checks ASPIRE_CLI_COMMIT_SHA as a fallback to GITHUB_PR_HEAD_SHA
  • Docker container tests (WaitCommandTests, ProjectReferenceTests): FindLocalCliBinary checks ASPIRE_CLI_PATH_DIR first; CreateDockerTestTerminal mounts BUILT_NUGETS_PATH at /built-nugets:ro; InstallAspireCliInDockerAsync (SourceBuild) copies the packages to the hive and sets channel "ci"; version verification added after install
  • Removed IsRunningOnGitHubActions and InstallAspireCliGaAsync (GA install approach was wrong)

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • dnceng.pkgs.visualstudio.com
    • Triggering command: /home/REDACTED/work/aspire/aspire/.dotnet/dotnet /home/REDACTED/work/aspire/aspire/.dotnet/dotnet build tests/Aspire.Cli.EndToEnd.Tests/Aspire.Cli.EndToEnd.Tests.csproj /p:TreatWarningsAsErrors=false etin�� er/QuarantinedTestRunsheetBuilder.targets (dns block)
    • Triggering command: /home/REDACTED/work/aspire/aspire/.dotnet/dotnet /home/REDACTED/work/aspire/aspire/.dotnet/dotnet build tests/Aspire.Cli.EndToEnd.Tests/Aspire.Cli.EndToEnd.Tests.csproj /p:TreatWarningsAsErrors=false dbs/actions/diagnostic/extractors/actions ions/log /actions/working e38/dbs/actions/src tions/trap/actions s/actions (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 15919

Or

  • Run remotely in PowerShell:
iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 15919"

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
GitHub was asked to rerun all failed jobs for that attempt, and the rerun is being tracked in the rerun attempt.
The job links below point to the failed attempt jobs that matched the retry-safe transient failure rules.

  • Tests / Hosting.Yarp / Hosting.Yarp (windows-latest) - Post-test cleanup steps 'Upload logs, and test results | Copy CLI E2E recordings for upload | Upload CLI E2E recordings | Generate test results summary | Post Checkout code' matched the Windows process initialization failure override allowlist.

@radical radical force-pushed the copilot/fix-cli-endtoend-tests branch from 65a413f to b5a89f9 Compare April 7, 2026 18:25
radical and others added 8 commits April 7, 2026 20:06
When a non-default channel (e.g. 'ci') is configured, the CLI shows a
version selection prompt before the template list. AspireNewAsync now
detects either the template list or the version picker, and if the
picker appears, accepts the first (latest) version before proceeding.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…PreInstalled

Replace the three-mode install system (SourceBuild, GaRelease,
PullRequest) with two modes (PreInstalled, GaRelease):

- Remove IsRunningInCI, GetRequiredPrNumber(), GetRequiredCommitSha()
- Add ExpectedCliVersion (from ASPIRE_CLI_VERSION env var)
- Add PreInstalledCliDir (from ASPIRE_CLI_PATH_DIR env var)
- Collapse SourceBuild+PullRequest into PreInstalled mode
- FindLocalCliBinary now checks ASPIRE_CLI_PATH_DIR first
- Remove GetVersionPrefix(), IsStabilizedBuild() (version parsing)
- Remove InstallAspireCliFromPullRequestAsync,
  InstallAspireBundleFromPullRequestAsync,
  SourceAspireBundleEnvironmentAsync
- Simplify VerifyAspireCliVersionAsync to compare against env var
- Update Docker InstallAspireCliInDockerAsync for hive setup

The CI workflow now pre-installs the CLI before tests run, so tests
no longer need to download artifacts at runtime.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When requiresCliArchive is true, the workflow now:
1. Downloads the CLI native archive artifact (Linux only)
2. Extracts it to ~/.aspire/bin and verifies the version
3. Copies built NuGet packages to a local hive
4. Configures the CLI channel to 'ci'
5. Exports ASPIRE_CLI_PATH_DIR and ASPIRE_CLI_VERSION env vars

This replaces the old approach where tests downloaded PR artifacts at
runtime using GITHUB_PR_NUMBER, GITHUB_PR_HEAD_SHA, and GH_TOKEN.
Those env vars are removed from all test step environments.

The requiresCliArchive comment is updated to reflect the new env vars.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Playwright CLI skill is no longer selected by default in the skill
selection prompt. Navigate down to the playwright-cli entry and toggle
it on before confirming.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Migrate DockerDeploymentTests, JavaScriptPublishTests,
KubernetesPublishTests, and TypeScriptCodegenValidationTests from the
old IsRunningInCI/PullRequest pattern to the new PreInstalledCliDir
pattern:

- Replace IsRunningInCI checks with PreInstalledCliDir null checks
- Remove GetRequiredPrNumber/GetRequiredCommitSha calls
- Use SourceAspireCliEnvironmentAsync (not Bundle variant)
- Use parameterless VerifyAspireCliVersionAsync

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
In Docker-in-Docker environments, NuGet restore + build during
'aspire start' can exceed the 120-second backchannel timeout, causing
spurious failures. Run 'dotnet build' first, then use
'aspire start --no-build' so the backchannel connects quickly.

Also add VerifyAspireCliVersionAsync after CLI install.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add VerifyAspireCliVersionAsync after CLI install to catch version
mismatches early. Increase 'aspire wait' timeout from 60s to 120s
(and prompt wait from 90s to 150s) to account for container image
pull times in Docker-in-Docker environments.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove ActiveIssue for #15930 from DockerDeploymentTests
(CreateAndDeployToDockerCompose, CreateAndDeployToDockerComposeInteractive)
and KubernetesPublishTests (CreateAndPublishToKubernetes).

These tests were disabled because they failed outside PR context. The
PreInstalled CLI mode and CI archive workflow changes fix the root
cause, so the tests can run in quarantine/outerloop again.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@radical radical force-pushed the copilot/fix-cli-endtoend-tests branch from e090913 to 99dfb2e Compare April 8, 2026 00:06
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

🎬 CLI E2E Test Recordings — 56 recordings uploaded (commit 99dfb2e)

View recordings
Test Recording
AddPackageInteractiveWhileAppHostRunningDetached ▶️ View Recording
AddPackageWhileAppHostRunningDetached ▶️ View Recording
AgentCommands_AllHelpOutputs_AreCorrect ▶️ View Recording
AgentInitCommand_DefaultSelection_InstallsSkillOnly ▶️ View Recording
AgentInitCommand_MigratesDeprecatedConfig ▶️ View Recording
AllPublishMethodsBuildDockerImages ▶️ View Recording
AspireAddPackageVersionToDirectoryPackagesProps ▶️ View Recording
AspireUpdateRemovesAppHostPackageVersionFromDirectoryPackagesProps ▶️ View Recording
Banner_DisplayedOnFirstRun ▶️ View Recording
Banner_DisplayedWithExplicitFlag ▶️ View Recording
Banner_NotDisplayedWithNoLogoFlag ▶️ View Recording
CertificatesClean_RemovesCertificates ▶️ View Recording
CertificatesTrust_WithNoCert_CreatesAndTrustsCertificate ▶️ View Recording
CertificatesTrust_WithUntrustedCert_TrustsCertificate ▶️ View Recording
ConfigSetGet_CreatesNestedJsonFormat ▶️ View Recording
CreateAndRunAspireStarterProject ▶️ View Recording
CreateAndRunAspireStarterProjectWithBundle ▶️ View Recording
CreateAndRunEmptyAppHostProject ▶️ View Recording
CreateAndRunJavaEmptyAppHostProject ▶️ View Recording
CreateAndRunJsReactProject ▶️ View Recording
CreateAndRunPythonReactProject ▶️ View Recording
CreateAndRunTypeScriptEmptyAppHostProject ▶️ View Recording
CreateAndRunTypeScriptStarterProject ▶️ View Recording
CreateJavaAppHostWithViteApp ▶️ View Recording
CreateStartAndStopAspireProject ▶️ View Recording
CreateTypeScriptAppHostWithViteApp ▶️ View Recording
DashboardRunWithOtelTracesReturnsNoTraces ▶️ View Recording
DescribeCommandResolvesReplicaNames ▶️ View Recording
DescribeCommandShowsRunningResources ▶️ View Recording
DetachFormatJsonProducesValidJson ▶️ View Recording
DoctorCommand_DetectsDeprecatedAgentConfig ▶️ View Recording
DoctorCommand_WithSslCertDir_ShowsTrusted ▶️ View Recording
DoctorCommand_WithoutSslCertDir_ShowsPartiallyTrusted ▶️ View Recording
GlobalMigration_HandlesCommentsAndTrailingCommas ▶️ View Recording
GlobalMigration_HandlesMalformedLegacyJson ▶️ View Recording
GlobalMigration_PreservesAllValueTypes ▶️ View Recording
GlobalMigration_SkipsWhenNewConfigExists ▶️ View Recording
GlobalSettings_MigratedFromLegacyFormat ▶️ View Recording
InvalidAppHostPathWithComments_IsHealedOnRun ▶️ View Recording
LegacySettingsMigration_AdjustsRelativeAppHostPath ▶️ View Recording
LogsCommandShowsResourceLogs ▶️ View Recording
PsCommandListsRunningAppHost ▶️ View Recording
PsFormatJsonOutputsOnlyJsonToStdout ▶️ View Recording
PublishWithDockerComposeServiceCallbackSucceeds ▶️ View Recording
RestoreGeneratesSdkFiles ▶️ View Recording
RestoreSupportsConfigOnlyHelperPackageAndCrossPackageTypes ▶️ View Recording
RunFromParentDirectory_UsesExistingConfigNearAppHost ▶️ View Recording
RunWithMissingAwaitShowsHelpfulError ▶️ View Recording
SecretCrudOnDotNetAppHost ▶️ View Recording
SecretCrudOnTypeScriptAppHost ▶️ View Recording
StagingChannel_ConfigureAndVerifySettings_ThenSwitchChannels ▶️ View Recording
StopAllAppHostsFromAppHostDirectory ▶️ View Recording
StopAllAppHostsFromUnrelatedDirectory ▶️ View Recording
StopNonInteractiveMultipleAppHostsShowsError ▶️ View Recording
StopNonInteractiveSingleAppHost ▶️ View Recording
StopWithNoRunningAppHostExitsSuccessfully ▶️ View Recording

📹 Recordings uploaded automatically from CI run #24110515133

@radical radical added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants