Skip to content
#

multi-token-prediction

Here are 11 public repositories matching this topic...

ChemMiniQ3-SAbRLo is a lightweight experimental generative model for chemistry, built on mini Qwen2-like arch, designed for rapid prototyping of HuggingFace AutoModel and AutoTokenizer compatibility, and fast iteration of Multi-Token Prediction (MTP) and RL fine-tuning algorithms/rewards.

  • Updated Oct 1, 2025
  • Python

Optimized vLLM setup for Gemma 4 31B NVFP4 with MTP on dual RTX PRO 6000 Blackwell using vllm and docker: native FP4 Tensor Cores, Multi-Token Prediction (96.5% acceptance rate), and prefix caching. Includes benchmark results and replication scripts.

  • Updated May 10, 2026
  • Shell

Improve this page

Add a description, image, and links to the multi-token-prediction topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multi-token-prediction topic, visit your repo's landing page and select "manage topics."

Learn more