Skip to content

Investigate O(n²) flat-add path in inlined .slnx parser (1000-project flat layout) #261

Description

@ChrisonSimtian

Background

After inlining vs-solutionpersistence in #248, we wrote BenchmarkDotNet baselines in #259. The numbers show a clear performance asymmetry worth a targeted fix — but deferred, because no current user is in the pathological range.

Baseline numbers (Windows 11, .NET 10.0.8, BDN 0.14.0)

Projects Folders Mean Allocated
1 No 157.7 µs 19.6 KB
1 Yes 162.7 µs 21.3 KB
10 No 207.2 µs 41.5 KB
10 Yes 158.4 µs 43.6 KB
100 No 1.03 ms 610 KB
100 Yes 0.59 ms 274 KB
1000 No 71.8 ms 48 MB
1000 Yes 12.2 ms 2.6 MB

The smoking gun

At 1000 projects, flat layout is 5.9× slower and 18× more memory than foldered for identical data:

  • Flat: 100 → 1000 projects = 1 ms → 72 ms (72× for 10× data — super-linear).
  • Foldered: 100 → 1000 = 0.59 ms → 12.2 ms (~21× for 10× data — milder).

Same content, very different perf. The asymmetry strongly suggests an O(n²) flat-add path — likely a linear scan (.Contains(...), .Any(...) over a List) called per-project-add. The foldered path avoids it (each folder is a small batch).

Why deferred

  • Common case is fine. ≤100 projects is sub-millisecond. This repo has ~35; most consumers have 5–50.
  • Pathological case is unusual. A 1000-project flat solution is rare; large monorepos group into folders (where the parser is already 6× faster).
  • No current user complaint. Found by benchmarking, not by users hitting it.

Approach when someone picks it up

  1. Profile the 1000-flat case via dotnet-trace or BenchmarkDotNet [EventPipeProfiler]. The hot path should be obvious — a List/array scan in the model-construction loop.
  2. First candidates:
    • src/Persistence/Fallout.Persistence.Solution/Model/SolutionModel.csAddProject/AddSolutionFolder paths.
    • Any HashSet that's actually a List<T> scanned with .Contains.
    • String-keyed collections doing case-insensitive comparisons the wrong way.
  3. Fix surgically — likely a List<T>Dictionary<string, T> swap or a HashSet<string> for de-dup.
  4. Re-run tests/Benchmarks/Fallout.Persistence.Solution.Benchmarks. The 1000-flat number should drop to single-digit ms, allocations to low MB.
  5. Don't rewrite to XmlSerializer unless the trace shows the XML layer is the bottleneck (the data says model construction dominates).

Done when

  • The 1000-flat case is within 2× of 1000-foldered in time and allocations.
  • A benchmark snippet is included in the fix PR for proof.
  • The fix is < 50 lines (otherwise re-scope as a bigger refactor).

Coordinates with

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesttarget/backlogNo committed release year; long-tail / demand-driven

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions