Seeder - Adding density distributions #7191
Conversation
|
Great job! No new security vulnerabilities introduced in this pull request |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
| var digest = userDigests[index]; | ||
| var range = distribution.Select(index, userDigests.Count); | ||
| var count = range.Min + (index % Math.Max(range.Max - range.Min, 1)); | ||
| var count = range.Min + (index % Math.Max(range.Max - range.Min + 1, 1)); |
There was a problem hiding this comment.
This was a 🐛 fix... range.Max - range.Min excluded the max value from the reachable range. With (4, 8) the old formula produced 4-7, never 8. The + 1 makes max inclusive. Same fix applied in GeneratePersonalCiphersStep.RangeCalculationTests validates this.
| } | ||
|
|
||
| internal static List<CollectionUser> BuildCollectionUsers( | ||
| internal List<CollectionUser> BuildCollectionUsers( |
There was a problem hiding this comment.
Changed from static to instance because the new ComputeCollectionsPerUser() method needs _density to read UserCollectionShape and UserCollectionSkew. Three existing tests updated accordingly... Same assertions, just constructing a step instance first.
| { | ||
| var orphanCount = (int)(count * _density.OrphanCipherRate); | ||
| var nonOrphanCount = count - orphanCount; | ||
| var primaryIndices = new int[nonOrphanCount]; |
There was a problem hiding this comment.
📝 The multi-collection loop runs after the primary assignment loop completes. We need to know which collection each cipher was originally assigned to, and the collection index isn't stored on the CollectionCipher entity. Capturing it during the first pass gives O(1) lookup instead of searching back through the list.
| } | ||
| else if (density?.PersonalCipherDistribution is not null) | ||
| { | ||
| builder.AddPersonalCiphers(0, density: density); |
There was a problem hiding this comment.
📝 The countPerUser: 0 looks a little strange and that it would skip personal cipher generation entirely, but it doesn't.
When PersonalCipherDistribution is set on the density profile, the distribution overrides the flat count — each user gets a variable number. The step's early-return guard checks both: if (countPerUser == 0 && density?.PersonalCipherDistribution is null).
| context.Registry.CipherIds.AddRange(cipherIds); | ||
| } | ||
|
|
||
| private static int EstimateTotal(int userCount, Distribution<(int Min, int Max)> dist) |
There was a problem hiding this comment.
This is needed because typeDistribution.Select(globalIndex, total) needs the exact total cipher count upfront to place each cipher in the correct type bucket. When using a distribution, each user gets a different count, so we can't just multiply userDigests.Count * countPerUser. Without pre-computing, the total is wrong and later ciphers overflow into the last bucket (SSHKey) 🤢 🤮.
This was a real bug we caught in development.
|
Claude finished @theMickster's task in 3m 28s —— View job Code Review: Seeder - Adding density distributionsReviewed commit: SummaryThis PR adds 5 density distributions (collections-per-user, cipher type selection, multi-collection cipher assignment, personal cipher count per user, folder count per user) and fixes a significant Hamilton apportionment bug in Review Result: Approved -- No Issues FoundSecurity: No concerns. This is a seeder/test utility; no production code paths, no sensitive data handling, and no changes to authentication or encryption logic. Correctness: Verified the following are correct:
Breaking Changes: None. Test Coverage: Thorough. New tests cover Hamilton apportionment, zero-weight buckets, range formula inclusivity, multi-collection duplicate prevention, and all three CI Status: All checks pass -- build, tests, lint, Checkmarx security scan, SonarQube quality gate, and code coverage. |




🎟️ Tracking
PM-31023 - Relational Density Modeling
PM-32777 - Baked-In Density Preset Profiles
📔 Objective
Add 5 density distributions to the Seeder's preset density block and fix the Hamilton apportionment bug in
Distribution<T>. This is PR 2 of 3 splitting #7157 — PR 1 (#7176, preset reorg) is merged, PR 3 (scale presets + docs) follows.New distributions: collections-per-user, cipher type selection, multi-collection cipher assignment, personal cipher count per user, folder count per user.
Bug fix:
Distribution<T>remainder items now use largest-fraction apportionment instead of dumping on the last bucket. Zero-weight buckets get exactly zero.More on Hamilton apportionment
The
Distribution<T>.Select()method divides items into percentage-based buckets using integer truncation, which leaves unclaimed remainder items. The old code silently dumped all remainder onto the last bucket — so a zero-weight HidePasswords bucket would still receive items. The fix uses Hamilton apportionment (largest-remainder method): remainder items go one-at-a-time to whichever buckets lost the most from truncation.Alexander Hamilton — the first U.S. Secretary of the Treasury. He proposed this method in 1792 to apportion congressional seats among states. The math problem is the same: distribute a fixed number of indivisible items (seats, or in our case collection permissions) proportionally across groups when the proportional shares aren't whole numbers.
🧪 Testing
Expand for detailed instructions
Step 1: Verify backward compatibility
Seed the no-density validation preset:
dotnet run -- seed --preset validation.density-modeling-no-density-test --mangleVerify: 0 CollectionGroup records, uniform round-robin group membership, every cipher assigned to at least one collection. This confirms the null-density path is unchanged.
Step 2: Verify density distributions
Seed a density validation preset:
dotnet run -- seed --preset validation.density-modeling-power-law-test --mangleVerify: group membership follows power-law decay, permission percentages match configured weights, orphan ciphers present at configured rate.