Add fine-grained MoE experts: decouple expert FFN hidden dim from dense FFN by amazloumi · Pull Request #118 · KempnerInstitute/KempnerForge

amazloumi · 2026-06-08T15:47:51Z

Summary

Add ModelConfig.moe_expert_ffn_multiplier (default 1.0) + a computed_expert_ffn_hidden_dim property: expert FFN hidden = computed_ffn_hidden_dim × multiplier, rounded to a multiple of 16.
Thread the expert hidden dim into every expert-construction site: transformer.py (standard build_moe), mot.py (MoT per-modality build_moe), moma.py (ExpertChoiceMoE).
Update num_params_estimate to count experts at the (possibly smaller) expert hidden dim.
Enables DeepSeek-style fine-grained experts and activated-FLOP-matched MoE↔dense comparisons: with top-k, set multiplier = 1/k so k × (F/k) = F (e.g. top-2 + 0.5 → MoE activates the same FFN compute as dense). Verified: 70M backbone, top-2/0.5 → 1536 activated = dense (1.0×).

Default 1.0 reproduces current behavior exactly (expert hidden == dense FFN); dense and existing MoE configs are unchanged.

uv run ruff check kempnerforge/ tests/ passes
uv run ruff format --check kempnerforge/ tests/ scripts/ passes
uv run pyright kempnerforge/ passes (0 errors; parity with main)
uv run pytest tests/unit/ -v --timeout=60 passes (1377 passed, 2 skipped; +9 new tests in test_config / test_moe / test_mot / test_moma)
distributed: n/a (no distributed code changed)
e2e: n/a (no training-loop/parallelism/optimizer change). Ran tests/e2e --e2e anyway: 25 passed, 5 pre-existing failures (checkpoint-resume / pipeline-parallel / SIGTERM), identical on main.

Closes #116

…se FFN

codecov · 2026-06-08T15:51:22Z

✅ All modified and coverable lines are covered by tests.

Files with missing lines	Coverage Δ
kempnerforge/config/model.py	`100.00% <100.00%> (ø)`
kempnerforge/model/moma.py	`97.97% <ø> (ø)`
kempnerforge/model/mot.py	`93.85% <ø> (ø)`
kempnerforge/model/transformer.py	`94.41% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Add fine-grained MoE experts: decouple expert FFN hidden dim from den…

6e8db76

…se FFN

amazloumi requested review from Naeemkh and mmshad June 8, 2026 15:47