PM 31023 - Creating density models for Seeder by theMickster · Pull Request #7157 · bitwarden/server

theMickster · 2026-03-05T08:39:42Z

Not Planning to Publish because this is too large. Really just wanted to get the whole thing working and some eyes on it before breaking it down

🎟️ Tracking

PM-31023 - Relational Density Modeling
PM-32777 - Baked-In Density Preset Profiles

📔 Objective

Complete the density modeling additions to the Seeder in our presets. The work represents a sizable shift in the way the Seeder was first allocating entities as it created them. By leveraging the new JSON density property, we can now make precise adjustments to allocate entity distribution without changing the Seeder.

Key changes

Add 5 density distributions to preset density block
Create 9 production-calibrated scale presets (XS-XL)
Fix Hamilton apportionment bug in Distribution
Reorganize presets into purpose-based folders
Consolidate docs into Seeds/docs/ with cross-refs
Add Q5-Q8 verification queries for new distributions
Deprecate wonka-teams-small and large-enterprise in favor of the production scale presets

Note on the Hamilton apportionment bug

The Distribution.Select() method divides items into percentage-based buckets using integer truncation, which leaves unclaimed remainder items. The old code silently dumped all remainder onto the last bucket — so a zero-weight HidePasswords bucket would still receive items. The fix uses Hamilton apportionment (largest-remainder method): remainder items go one-at-a-time to whichever buckets lost the most from truncation, and zero-weight buckets are guaranteed to receive exactly zero.

Alexander Hamilton — the first U.S. Secretary of the Treasury. He proposed this method in 1792 to apportion congressional seats among states. The math problem is the same: distribute a fixed number of indivisible items (seats, or in our case collection permissions) proportionally across groups when the proportional shares aren't whole numbers.

Where did our distribution statistics come from?

The scale preset archetypes are modeled after three real production organizations analyzed in DBOPS-91: Company A (hierarchical, 2,795 users/74 groups), Company B (flat, 11,491 users/5 groups/13,906 collections), and Company C (balanced, 954 users/99 groups). These profiles revealed that production relationship patterns follow power-law and mega-group distributions — not the uniform round-robin the seeder previously generated. Each scale preset's density parameters (membership skew, collection fan-out, permission weights, orphan rates) were calibrated to reproduce these observed production shapes at five tiers from family (6 users) to mega-corp (10,000 users).

Why the re-organization or presets?

The seeder is still early-adoption — breaking preset names now costs nearly nothing, but doing it after teams build scripts around them has cost. Purpose-based folders (features/qa/scale/validation) make preset discovery self-documenting so engineers don't need to read a README to find the right one. Consolidating docs into Seeds/docs/ eliminates duplication across scattered READMEs and separates everyday usage from developer-only verification content.

🧪 Testing

Expand for detailed instructions

Step 1: Verify preset resolution (all 4 folders)

From util/SeederUtility/, run one preset from each folder:

dotnet run -- seed --preset features.sso-enterprise --mangle
dotnet run -- seed --preset qa.enterprise-basic --mangle
dotnet run -- seed --preset scale.sm-balanced-planet-express --mangle
dotnet run -- seed --preset validation.density-modeling-power-law-test --mangle

All four should seed successfully with no errors.

Step 2: Verify density distributions on a scale preset

Seed a mid-tier and large-tier preset:

dotnet run -- seed --preset scale.md-balanced-sterling-cooper --mangle
dotnet run -- seed --preset scale.lg-highperm-tyrell-corp --mangle

After each, run the verification queries from util/Seeder/Seeds/docs/verification.md against your local MSSQL database. Compare results to the expected-value tables in the same doc.

Key things to verify

Q1: Group membership follows power-law decay (not uniform)
Q3: Permission percentages match configured weights
Q4: Orphan cipher count matches configured rate
Q7: Collections-per-user shows min/max spread (not flat 1-2-3)
Q8: Multi-collection ciphers present at configured rate

Step 3: Verify backward compatibility

Seed the no-density validation preset:

dotnet run -- seed --preset validation.density-modeling-no-density-test --mangle

Key things to verify

0 CollectionGroup records
uniform round-robin group membership
every cipher assigned to at least one collection.

This confirms the null-density path is unchanged.

Claude Code prompt for verification

Note: Mick has a reading-bw-mssql skill that automates the pwsh/SqlClient connection pattern. If you'd like it for your Claude Code setup, ask him to share it.

If you'd like Claude Code to run the verification queries for you, use this prompt after seeding:

1. Read util/Seeder/Seeds/docs/verification.md for the SQL queries. 2. Run Q1 through Q8 against org ID '{paste-org-id-here}' using pwsh with $env:BW_READ_ONLY_MSSQL_CONNECTION_STRING. 3. Present results as markdown tables and compare against the expected values for {preset-name} in the verification doc.

github-actions · 2026-03-05T08:52:58Z

Checkmarx One – Scan Summary & Details – 7acff828-9ba9-4e2b-8e28-5cb492a54a88

Great job! No new security vulnerabilities introduced in this pull request

codecov · 2026-03-05T10:03:38Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.77%. Comparing base (996f479) to head (9832ae8).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7157      +/-   ##
==========================================
+ Coverage   56.68%   56.77%   +0.08%     
==========================================
  Files        2026     2026              
  Lines       88681    88685       +4     
  Branches     7905     7906       +1     
==========================================
+ Hits        50272    50348      +76     
+ Misses      36585    36507      -78     
- Partials     1824     1830       +6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

sonarqubecloud · 2026-03-05T15:12:39Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

nthompson-bitwarden · 2026-03-09T13:05:37Z

util/Seeder/Seeds/docs/verification.md

+
+---
+
+## Scale Preset Expected Values


I wonder if this adds a documentation burden. What if these expected values change? It may be more effective documentation-wise to provide a high level description of the shapes and diagrams, similar to what is shown in PM-31023. Then link to the presets, so readers can see how those high data distribution strategies get mapped to each preset

I had Claude add this because I found it more than helpful for Claude to do the verifications for me as we tested. It could do the math of where we were slightly off from expected; and then it went and fixed the various loops/distributions until it was correct.

Maybe chat about this topic this week?

nthompson-bitwarden · 2026-03-09T13:08:10Z

util/Seeder/Seeds/docs/presets.md

+
+## QA
+
+Handcrafted fixture data for visual UI verification. Known users, groups, collections, and permissions you can point at in the web vault.


This is overly verbose - QA does not only do visual UI verification with importable data. Maybe just, "Known users, groups, collections, and permissions you can point a client to." is a better representation

10-4. Claude got a little excited and I didn't trim that.

nthompson-bitwarden · 2026-03-09T13:09:08Z

util/Seeder/Seeds/docs/presets.md

+dotnet run -- seed --preset scale.{name} --mangle
+```
+
+| Preset | Tier | Archetype | Users | Groups | Collections | Ciphers | Plan |


This documentation is duplicated by verification.md. May be a nothingburger - but any updates to this need to then be updated in the verification.md too

I'll work to trim these down when I break the work into 3-4 PRs shortly.

nthompson-bitwarden · 2026-03-09T13:10:01Z

util/Seeder/Seeds/docs/presets.md

@@ -0,0 +1,82 @@
+# Preset Catalog
+
+Complete catalog of all seeder presets, organized by purpose. Use `--mangle` to avoid collisions with existing data.


random note here (probably out of scope)- is there a situation where a user would not use --mangle?

So I thought about this and, frankly, I am not so sure anymore 🤷🏼 Perhaps we look to invert the flag and, instead, it's a --no-mangle.

Say we're in a near perfect state in Q4 of 2026 with test automation, would you foresee it making more sense that we always use the mangle feature? 🤔

I do foresee that. I imagine a "--no-mangle" would be for more rare "one-off" situations. Whereas, we would want most situations to be using mangle to prevent any confusion that might arise if someone did not have mangled data and did end up having data collisions

theMickster added 5 commits March 5, 2026 07:55

Initial scaffold of density presets

9caf1c0

Add collections-per-user distribution to density model

d086acf

Add cipher type distribution to density model

c37993c

Add multi-collection cipher assignment to density model

0192387

Add personal cipher count distribution to density model

4174100

theMickster added 3 commits March 5, 2026 10:02

Add folder distribution selection to density model

d5433bf

Minor code review fix-ups

4b6d6be

Fix guards

f68e7c9

Properly organize presets and tidy-up documentation

20f36ba

theMickster changed the title ~~Pm 31023 - Creating density models for Seeder~~ PM 31023 - Creating density models for Seeder Mar 5, 2026

theMickster added 3 commits March 5, 2026 12:29

Removing the 'Hamilton' specific language as it's fluff

4e23ca7

Resolve a couple off-by-one errors

8c02904

Remove unnecessary using

9832ae8

nthompson-bitwarden reviewed Mar 9, 2026

View reviewed changes

theMickster mentioned this pull request Mar 10, 2026

Seeder - Adding density distributions #7191

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PM 31023 - Creating density models for Seeder#7157

PM 31023 - Creating density models for Seeder#7157
theMickster wants to merge 12 commits intomainfrom
PM-31023/creating-density-models-for-seeder

theMickster commented Mar 5, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 5, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 5, 2026 •

edited

Loading

Uh oh!

sonarqubecloud bot commented Mar 5, 2026

Uh oh!

nthompson-bitwarden Mar 9, 2026

Uh oh!

theMickster Mar 9, 2026

Uh oh!

nthompson-bitwarden Mar 9, 2026

Uh oh!

theMickster Mar 9, 2026

Uh oh!

nthompson-bitwarden Mar 9, 2026

Uh oh!

theMickster Mar 9, 2026

Uh oh!

nthompson-bitwarden Mar 9, 2026

Uh oh!

theMickster Mar 9, 2026

Uh oh!

nthompson-bitwarden Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		## QA

		Handcrafted fixture data for visual UI verification. Known users, groups, collections, and permissions you can point at in the web vault.

		@@ -0,0 +1,82 @@
		# Preset Catalog

		Complete catalog of all seeder presets, organized by purpose. Use `--mangle` to avoid collisions with existing data.

Conversation

theMickster commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎟️ Tracking

📔 Objective

Key changes

Where did our distribution statistics come from?

Why the re-organization or presets?

🧪 Testing

Step 1: Verify preset resolution (all 4 folders)

Step 2: Verify density distributions on a scale preset

Key things to verify

Step 3: Verify backward compatibility

Key things to verify

Claude Code prompt for verification

Uh oh!

github-actions bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Great job! No new security vulnerabilities introduced in this pull request

Uh oh!

codecov bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sonarqubecloud bot commented Mar 5, 2026

Quality Gate passed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

theMickster commented Mar 5, 2026 •

edited

Loading

github-actions bot commented Mar 5, 2026 •

edited

Loading

codecov bot commented Mar 5, 2026 •

edited

Loading