Skip to content

Seeder - Adding density distributions #7191

Merged
theMickster merged 3 commits intomainfrom
PM-31023/density-distributions-and-fixes
Mar 10, 2026
Merged

Seeder - Adding density distributions #7191
theMickster merged 3 commits intomainfrom
PM-31023/density-distributions-and-fixes

Conversation

@theMickster
Copy link
Contributor

🎟️ Tracking

PM-31023 - Relational Density Modeling
PM-32777 - Baked-In Density Preset Profiles

📔 Objective

Add 5 density distributions to the Seeder's preset density block and fix the Hamilton apportionment bug in Distribution<T>. This is PR 2 of 3 splitting #7157 — PR 1 (#7176, preset reorg) is merged, PR 3 (scale presets + docs) follows.

New distributions: collections-per-user, cipher type selection, multi-collection cipher assignment, personal cipher count per user, folder count per user.

Bug fix: Distribution<T> remainder items now use largest-fraction apportionment instead of dumping on the last bucket. Zero-weight buckets get exactly zero.

More on Hamilton apportionment

The Distribution<T>.Select() method divides items into percentage-based buckets using integer truncation, which leaves unclaimed remainder items. The old code silently dumped all remainder onto the last bucket — so a zero-weight HidePasswords bucket would still receive items. The fix uses Hamilton apportionment (largest-remainder method): remainder items go one-at-a-time to whichever buckets lost the most from truncation.

Alexander Hamilton — the first U.S. Secretary of the Treasury. He proposed this method in 1792 to apportion congressional seats among states. The math problem is the same: distribute a fixed number of indivisible items (seats, or in our case collection permissions) proportionally across groups when the proportional shares aren't whole numbers.

🧪 Testing

Expand for detailed instructions

Step 1: Verify backward compatibility

Seed the no-density validation preset:

dotnet run -- seed --preset validation.density-modeling-no-density-test --mangle

Verify: 0 CollectionGroup records, uniform round-robin group membership, every cipher assigned to at least one collection. This confirms the null-density path is unchanged.

Step 2: Verify density distributions

Seed a density validation preset:

dotnet run -- seed --preset validation.density-modeling-power-law-test --mangle

Verify: group membership follows power-law decay, permission percentages match configured weights, orphan ciphers present at configured rate.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 10, 2026

Logo
Checkmarx One – Scan Summary & Details4fb5cdf4-ca4e-4ac6-ad41-d6a146158a57

Great job! No new security vulnerabilities introduced in this pull request

@codecov
Copy link

codecov bot commented Mar 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 57.38%. Comparing base (bf9bc84) to head (56c7dc7).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7191      +/-   ##
==========================================
- Coverage   57.38%   57.38%   -0.01%     
==========================================
  Files        2031     2031              
  Lines       89283    89283              
  Branches     7936     7936              
==========================================
- Hits        51237    51236       -1     
- Misses      36205    36206       +1     
  Partials     1841     1841              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sonarqubecloud
Copy link

var digest = userDigests[index];
var range = distribution.Select(index, userDigests.Count);
var count = range.Min + (index % Math.Max(range.Max - range.Min, 1));
var count = range.Min + (index % Math.Max(range.Max - range.Min + 1, 1));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a 🐛 fix... range.Max - range.Min excluded the max value from the reachable range. With (4, 8) the old formula produced 4-7, never 8. The + 1 makes max inclusive. Same fix applied in GeneratePersonalCiphersStep.RangeCalculationTests validates this.

}

internal static List<CollectionUser> BuildCollectionUsers(
internal List<CollectionUser> BuildCollectionUsers(
Copy link
Contributor Author

@theMickster theMickster Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed from static to instance because the new ComputeCollectionsPerUser() method needs _density to read UserCollectionShape and UserCollectionSkew. Three existing tests updated accordingly... Same assertions, just constructing a step instance first.

{
var orphanCount = (int)(count * _density.OrphanCipherRate);
var nonOrphanCount = count - orphanCount;
var primaryIndices = new int[nonOrphanCount];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 The multi-collection loop runs after the primary assignment loop completes. We need to know which collection each cipher was originally assigned to, and the collection index isn't stored on the CollectionCipher entity. Capturing it during the first pass gives O(1) lookup instead of searching back through the list.

}
else if (density?.PersonalCipherDistribution is not null)
{
builder.AddPersonalCiphers(0, density: density);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 The countPerUser: 0 looks a little strange and that it would skip personal cipher generation entirely, but it doesn't.

When PersonalCipherDistribution is set on the density profile, the distribution overrides the flat count — each user gets a variable number. The step's early-return guard checks both: if (countPerUser == 0 && density?.PersonalCipherDistribution is null).

context.Registry.CipherIds.AddRange(cipherIds);
}

private static int EstimateTotal(int userCount, Distribution<(int Min, int Max)> dist)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed because typeDistribution.Select(globalIndex, total) needs the exact total cipher count upfront to place each cipher in the correct type bucket. When using a distribution, each user gets a different count, so we can't just multiply userDigests.Count * countPerUser. Without pre-computing, the total is wrong and later ciphers overflow into the last bucket (SSHKey) 🤢 🤮.

This was a real bug we caught in development.

@theMickster theMickster added the ai-review Request a Claude code review label Mar 10, 2026
@theMickster theMickster marked this pull request as ready for review March 10, 2026 13:39
@claude
Copy link
Contributor

claude bot commented Mar 10, 2026

Claude finished @theMickster's task in 3m 28s —— View job


Code Review: Seeder - Adding density distributions

Reviewed commit: 56c7dc79 (Remove unnecessary and confusing guard)
PR: #7191 - Seeder - Adding density distributions

Summary

This PR adds 5 density distributions (collections-per-user, cipher type selection, multi-collection cipher assignment, personal cipher count per user, folder count per user) and fixes a significant Hamilton apportionment bug in Distribution<T>.GetCounts(). This is PR 2 of 3 splitting a larger change.

Review Result: Approved -- No Issues Found

Security: No concerns. This is a seeder/test utility; no production code paths, no sensitive data handling, and no changes to authentication or encryption logic.

Correctness: Verified the following are correct:

  • Hamilton apportionment fix (Distribution.cs): Remainder items are distributed to buckets with the largest fractional parts instead of being dumped on the last bucket. Zero-weight buckets correctly receive zero items. The -1.0 sentinel after awarding a remainder item prevents double-counting.
  • Range formula fix (GenerateFoldersStep.cs, GeneratePersonalCiphersStep.cs): The +1 in Math.Max(range.Max - range.Min + 1, 1) correctly makes the max value inclusive. The old formula excluded range.Max from the reachable set.
  • Multi-collection assignment (GenerateCiphersStep.cs): The secondary index formula (primaryIndices[i] + 1 + j) % collectionIds.Count guarantees no secondary equals primary. The extraCount clamp to collectionIds.Count - 1 prevents overflow. The primaryIndices array is correctly bounded by nonOrphanCount, and multiCount <= nonOrphanCount ensures no out-of-bounds access.
  • EstimateTotal pre-computation (GeneratePersonalCiphersStep.cs): Uses the same formula as the generation loop, so typeDistribution.Select(globalIndex, expectedTotal) receives an accurate total.
  • Backward compatibility contract maintained: All steps guard if (_density == null) to preserve the original round-robin path.

Breaking Changes: None. BuildCollectionUsers changed from static to instance, but it is internal and tests were updated accordingly. New parameters on AddFolders() and AddPersonalCiphers() have default values of null.

Test Coverage: Thorough. New tests cover Hamilton apportionment, zero-weight buckets, range formula inclusivity, multi-collection duplicate prevention, and all three ComputeCollectionsPerUser shapes (Uniform, PowerLaw, FrontLoaded). Codecov confirms all modified lines are covered.

CI Status: All checks pass -- build, tests, lint, Checkmarx security scan, SonarQube quality gate, and code coverage.

@theMickster theMickster merged commit bf40668 into main Mar 10, 2026
80 checks passed
@theMickster theMickster deleted the PM-31023/density-distributions-and-fixes branch March 10, 2026 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-review Request a Claude code review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants