Skip to content

Add fine-grained MoE experts: decouple expert FFN hidden dim from dense FFN#118

Open
amazloumi wants to merge 1 commit into
mainfrom
feature/moe-finegrained-experts
Open

Add fine-grained MoE experts: decouple expert FFN hidden dim from dense FFN#118
amazloumi wants to merge 1 commit into
mainfrom
feature/moe-finegrained-experts

Conversation

@amazloumi

Copy link
Copy Markdown
Member

Summary

  • Add ModelConfig.moe_expert_ffn_multiplier (default 1.0) + a computed_expert_ffn_hidden_dim property: expert FFN hidden = computed_ffn_hidden_dim × multiplier, rounded to a multiple of 16.
  • Thread the expert hidden dim into every expert-construction site: transformer.py (standard build_moe), mot.py (MoT per-modality build_moe), moma.py (ExpertChoiceMoE).
  • Update num_params_estimate to count experts at the (possibly smaller) expert hidden dim.
  • Enables DeepSeek-style fine-grained experts and activated-FLOP-matched MoE↔dense comparisons: with top-k, set multiplier = 1/k so k × (F/k) = F (e.g. top-2 + 0.5 → MoE activates the same FFN compute as dense). Verified: 70M backbone, top-2/0.5 → 1536 activated = dense (1.0×).

Default 1.0 reproduces current behavior exactly (expert hidden == dense FFN); dense and existing MoE configs are unchanged.

Testing

  • uv run ruff check kempnerforge/ tests/ passes
  • uv run ruff format --check kempnerforge/ tests/ scripts/ passes
  • uv run pyright kempnerforge/ passes (0 errors; parity with main)
  • uv run pytest tests/unit/ -v --timeout=60 passes (1377 passed, 2 skipped; +9 new tests in test_config / test_moe / test_mot / test_moma)
  • distributed: n/a (no distributed code changed)
  • e2e: n/a (no training-loop/parallelism/optimizer change). Ran tests/e2e --e2e anyway: 25 passed, 5 pre-existing failures (checkpoint-resume / pipeline-parallel / SIGTERM), identical on main.

Closes #116

@amazloumi amazloumi requested review from Naeemkh and mmshad June 8, 2026 15:47
@codecov

codecov Bot commented Jun 8, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines Coverage Δ
kempnerforge/config/model.py 100.00% <100.00%> (ø)
kempnerforge/model/moma.py 97.97% <ø> (ø)
kempnerforge/model/mot.py 93.85% <ø> (ø)
kempnerforge/model/transformer.py 94.41% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add fine-grained MoE experts (decouple expert FFN hidden dim from dense FFN)

1 participant