Skip to content

Make RestoreTask resettable across builds (NuGet/Home#14958 item 1)#7507

Open
JanProvaznik wants to merge 9 commits into
NuGet:devfrom
JanProvaznik:dev-JanProvaznik-restore-state-restoretask-only
Open

Make RestoreTask resettable across builds (NuGet/Home#14958 item 1)#7507
JanProvaznik wants to merge 9 commits into
NuGet:devfrom
JanProvaznik:dev-JanProvaznik-restore-state-restoretask-only

Conversation

@JanProvaznik

Copy link
Copy Markdown

Bug

Progress: NuGet/Home#14958 (Plan item 1 — Make RestoreTask resettable)

Description

MSBuild is moving to an execution model where the entry-point process persists across builds (MSBuild Server, and later the multithreaded server). RestoreTask has always relied on the process dying after each build to drop its cached environment-variable-derived state and to reclaim the credential/plugin child processes. In a reused process that state would leak between restores — stale env-derived caches, a credential service pinned to the first build's credentials, and lingering plugin processes.

This PR implements item 1 of NuGet/Home#14958: make RestoreTask explicitly reset that state — refresh env-derived caches at the start of restore and tear down live resources (plugins) at the end — so a reused process behaves as if it had started fresh.

Scope: this is only item 1. It deliberately does not include item 4 (moving the env-cache reset to the first satellite task that hits cached state), which the issue calls out as lower-priority and a pre-existing concern in the current model. Static-graph restore is out of scope (it already spawns a fresh, short-lived process per the issue). No impact on VS or nuget.exe.

Approach — small public registry, internal self-registration

The only new public API is a tiny keyed registry in NuGet.Common:

public static class NuGetProcessState
{
    public enum ResetKey { StartRestore, EndRestore }
    public static void RegisterResetAction(ResetKey key, Action resetAction);
    public static void Reset(ResetKey key);   // runs all actions for the key, best-effort & isolated
}

Each cache/resource self-registers its own internal reset from its static constructor, so the individual resets stay internal/private and nothing else becomes public. RestoreTask.Execute drives the lifecycle: Reset(StartRestore) before restore, Reset(EndRestore) in a finally. RestoreTask runs once per restore on the entry node, so these are direct calls with no guard.

Reviewable commit-by-commit

  1. Add NuGetProcessState registry (NuGet.Common) — the infrastructure + unit tests + the single PublicAPI.Unshipped entry.
  2. Invoke it from RestoreTaskReset(StartRestore) at start, Reset(EndRestore) in finally. (No-op until caches register, below.)
  3. NuGet.Common env caches — ExceptionLogger (NUGET_SHOW_STACK), ConcurrencyUtilities (lock base path), NuGetEnvironment (home/temp/folder-path caches).
  4. NuGet.ConfigurationProxyCache (http_proxy family + nuget.config; drops cached proxy creds).
  5. NuGet.CredentialsPreviewFeatureSettings; DefaultCredentialServiceUtility (discards the pinned credential service so the next restore rebuilds it instead of reusing the first build's interactivity/providers/cached credentials).
  6. NuGet.ProjectModelDependencyGraphSpec (env-derived hash-function flag).
  7. NuGet.Protocol env caches — NuGetFeatureFlags, NuGetTestMode, PackageIdValidator, HttpSourceResourceProvider.Throttle (per-restore concurrency semaphore).
  8. NuGet.CommandsSourceRepositoryDependencyProvider throttle (NUGET_CONCURRENCY_LIMIT).
  9. End-of-restore teardownPluginManager disposes the shared instance (kills credential/plugin child processes and their timers) under EndRestore.

Completeness audit

Cross-checked against a whole-program, cross-assembly reachability analysis from the restore task entry points (13 assemblies, ~7,267 reachable methods). Every environment-variable-derived cached static reachable from RestoreTask is reset at StartRestore; the live plugin/credential processes are torn down at EndRestore. Items intentionally not reset, with rationale:

  • Self-refreshing each restore (so reset is unnecessary): UserAgent.UserAgentString and X509TrustStore are both re-initialized unconditionally in BuildTasksUtility.RestoreAsync every restore.
  • Not environment-derived / machine-invariant for the process lifetime: RuntimeEnvironmentHelper.* (OS / process-name flags), PathUtility._isFileSystemCaseInsensitive, framework-mapping tables, resource managers, comparers, JSON serializers, assembly-version singletons, NuGetExtractionFileIO._unixPermissions (umask).
  • Deterministic, env-independent computation caches (resetting would be pure perf loss): GraphOperations.Cache, RuntimeGraph.Cache.
  • Experimental knob with a deliberate caching contract: X509ChainBuildPolicyFactory.Policy (NUGET_EXPERIMENTAL_CHAIN_BUILD_RETRY_POLICY).
How the audit was performed (cross-assembly reachability analyzer)

The "every reachable static" claim above is not eyeballed — it comes from a purpose-built Roslyn analyzer: RestoreStateAnalyzer (README · full report.md).

A normal DiagnosticAnalyzer only sees one compilation and only metadata for referenced assemblies, so it cannot follow a call from RestoreTask (in NuGet.Build.Tasks) into the body of NuGet.Protocol, NuGet.Credentials, etc. To be accurate across assemblies, the tool instead:

  1. Builds NuGet.Build.Tasks for net10.0 (the CoreCLR dotnet restore runtime) with /bl, and reads the exact csc command lines from the binlog — so source files, references, #defines (IS_CORECLR, …), nullable and langversion match the shipping build exactly.
  2. Reconstructs one CSharpCompilation per NuGet project, wiring inter-project references as Roslyn CompilationReferences (symbols resolve into the source of every assembly) and running the same source generators. Result: 13 assemblies, 0 binding errors.
  3. Seeds the restore task entry points and performs an interprocedural worklist walk over IOperation trees — following invocations / object creations / property & event accesses / static initializers, expanding virtual & interface dispatch to all source overrides transitively.
  4. Records every reachable static field/property (classified: mutable / readonly-mutable-ref / Lazy / [ThreadStatic] / …) and every lingering process-state sink (child processes, env edits/reads, timers, static event subscriptions, cwd/console/culture/registry/AppDomain mutations), each annotated with the set of restore tasks that reach it and a representative call path.

The audit table above is the result of diffing that enumeration (filtered to RestoreTask-reachable, environment-derived statics) against the set of RegisterResetAction call sites in this PR. The same report drove the e2e validation target list below.

End-to-end validation

The combined behavior (this NuGet change + the matching MSBuild change that runs RestoreTask in the persistent process) was validated end-to-end on a coordinated VMR build using the two-PAT credential-leak scenario from dotnet/msbuild#13660: restore #1 with PAT-A against a private Azure Artifacts feed, revoke PAT-A, restore #2 with PAT-B in the same MSBuild Server process. On main restore #2 fails with NU1301: 401 (the revoked PAT leaks from the cached plugin); with these resets restore #2 succeeds with PAT-B. RestoreTask was confirmed (via binlog) to run in-process in the persistent server, so the leak-prone path is genuinely exercised.

PR Checklist

  • Meaningful title and description — implements item 1 of dotnet restore/build should work correctly when its MSBuild tasks run repeatedly and in the same process Home#14958; reviewable commit-by-commit.
  • TestsNuGetProcessStateTests (registry runs-all / failure-isolation, and re-read after Reset(StartRestore)); ProxyCacheTests.ResetCache, HttpSourceResourceProviderTests.ResetThrottle, DefaultCredentialServiceUtilityTests.ResetCredentialService. Full NuGet.Build.Tasks.Test suite green; build clean on net472 + net10.0.
  • No behavior change for the existing (process-dies-per-build) model — the resets re-read the same environment a fresh process would, and teardown matches the prior process-exit reclamation.
  • PublicAPI.Unshipped.txt updated — only NuGet.Common.NuGetProcessState (the registry); every per-type reset is internal/private.

JanProvaznik and others added 9 commits June 24, 2026 13:54
A small public keyed registry: RegisterResetAction(ResetKey, Action) accumulates resets under a key and Reset(ResetKey) runs them (best-effort, isolated). Keys: StartRestore (refresh env-derived caches) and EndRestore (tear down live resources). This is the only new public API; each cache/resource registers its own internal reset, so nothing else becomes public.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
RestoreTask.Execute calls Reset(StartRestore) before restore and Reset(EndRestore) in a finally. RestoreTask runs once per restore on the entry node, so these are direct calls with no guard. Makes a process reused across builds (MSBuild Server) behave as if started fresh. No-op until caches register (following commits).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ExceptionLogger (NUGET_SHOW_STACK), ConcurrencyUtilities (lock base path) and NuGetEnvironment (home/temp/folder path caches) self-register a reset from their static constructors.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ProxyCache.Instance is rebuilt so a reused process re-reads the http_proxy family of env vars / nuget.config and drops cached proxy credentials.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PreviewFeatureSettings re-reads its env flag; DefaultCredentialServiceUtility discards the pinned credential service so a reused process rebuilds it for the next restore instead of reusing the first build's interactivity/providers/cached credentials.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Re-reads its env-derived hash-function flag.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
NuGetFeatureFlags, NuGetTestMode and PackageIdValidator re-read their env flags; HttpSourceResourceProvider.Throttle (the per-restore concurrency semaphore) is cleared so it does not leak into the next restore.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ore)

Recreates the NUGET_CONCURRENCY_LIMIT-sized semaphore from the current environment.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PluginManager registers a private ResetSharedInstance under EndRestore that disposes the shared instance, killing credential/plugin child processes and their timers that the per-build process exit used to reclaim.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@JanProvaznik JanProvaznik requested a review from a team as a code owner June 24, 2026 13:20
@JanProvaznik JanProvaznik requested review from nkolev92 and zivkan June 24, 2026 13:20
@dotnet-policy-service dotnet-policy-service Bot added the Community PRs created by someone not in the NuGet team label Jun 24, 2026
@JanProvaznik

Copy link
Copy Markdown
Author

@OvesN @AR-May — requesting your review on this one (item 1 of NuGet/Home#14958: make RestoreTask resettable across builds). Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Community PRs created by someone not in the NuGet team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant