Kuramoto oscillators in attention weights, not hidden state.
Phase-Modulated Attention (PMA) is a hybrid SSM-attention architecture where coupled Kuramoto oscillators generate a phase coherence matrix that multiplicatively modulates attention scores. This moves oscillator dynamics from the hidden state — where they decouple from generation (the K-SSM negative result) — to attention weights, where they directly gate information routing.
Token → Embedding → N × [SSM Block] → M × [PMA Block] → LM Head
↓ ↓
h_SSM (context) K = f(h_SSM)
φ → P → A_mod
- SSM blocks (pure-PyTorch Mamba): Handle sequential dynamics, produce context-rich hidden states
- PMA blocks: Phase-modulated multi-head attention where oscillator coupling strength K is derived from SSM hidden states, making synchronization context-dependent
Standard attention: A = softmax(QK^T / √d_k) @ V
Phase-modulated attention:
P[h,i,j] = cos(φ[h,i] - φ_pos[j]) # Phase coherence matrix
P_eff = λ·P + (1-λ)·1 # Residual gate (init λ=0.1)
τ = τ_base · (1 + β·Var(P)) · (1 + α·(1-R)) # Adaptive temperature
A[h] = softmax((QK^T / √d_k) · P_eff / τ) @ V
At initialization (λ≈0.1), this behaves like standard attention. During training, the model learns to use phase coherence for routing — or not, if it doesn't help.
The Relational Coupling Experiment (3,830 inference runs, 5 architectures) demonstrated that:
- Attention is required for the R×E superadditive interaction (+0.19 to +0.21 in transformers, absent in Falcon-Mamba SSM)
- Oscillators in hidden state decouple from generation (K-SSM negative result)
- The fix: Move oscillators to where they can't be ignored — attention weights
Two independent external reviews (Grok, Kimi K2) converged on the same architecture.
See docs/ARCHITECTURE_SYNTHESIS.md for the synthesis.
git clone https://github.com/TheTempleofTwo/phase-modulated-attention.git
cd phase-modulated-attention
pip install -e ".[dev]"from pma.model import PhaseModulatedAttentionLM
from pma.config import create_130m_config
config = create_130m_config()
model = PhaseModulatedAttentionLM(config)
# Forward pass
logits, metrics = model(input_ids, return_metrics=True)
print(f"R = {metrics['R_mean']:.4f}") # Phase coherence
# Intervention: override coupling strength
from pma.interventions import set_coupling_override
set_coupling_override(model, K=3.0) # Force high couplingEntropy Regime Switching in Language Models via Phase-Modulated Attention Anthony J Vasquez Sr (2026)
pma/ # Main package
├── layers/ # Kuramoto, SSM, PMA, FFN, RMSNorm
├── blocks/ # SSM block, PMA block
├── model.py # Full language model
├── interventions.py # R/K/τ/λ/phase override hooks
└── metrics.py # Entropy, Fisher, R tracking
data/ # Data pipeline
training/ # Training loop, losses, scheduler
eval/ # Perplexity, generation, R intervention test
experiments/ # Ablation suite
paper/ # NeurIPS submission
docs/ # Architecture reviews
tests/ # Unit tests
MIT
Anthony J Vasquez Sr — @TheTempleofTwo