Skip to content

templetwo/phase-modulated-attention

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Phase-Modulated Attention

Kuramoto oscillators in attention weights, not hidden state.

Phase-Modulated Attention (PMA) is a hybrid SSM-attention architecture where coupled Kuramoto oscillators generate a phase coherence matrix that multiplicatively modulates attention scores. This moves oscillator dynamics from the hidden state — where they decouple from generation (the K-SSM negative result) — to attention weights, where they directly gate information routing.

Architecture

Token → Embedding → N × [SSM Block] → M × [PMA Block] → LM Head
                         ↓                    ↓
                    h_SSM (context)     K = f(h_SSM)
                                       φ → P → A_mod
  • SSM blocks (pure-PyTorch Mamba): Handle sequential dynamics, produce context-rich hidden states
  • PMA blocks: Phase-modulated multi-head attention where oscillator coupling strength K is derived from SSM hidden states, making synchronization context-dependent

Key Innovation

Standard attention: A = softmax(QK^T / √d_k) @ V

Phase-modulated attention:

P[h,i,j] = cos(φ[h,i] - φ_pos[j])           # Phase coherence matrix
P_eff = λ·P + (1-λ)·1                         # Residual gate (init λ=0.1)
τ = τ_base · (1 + β·Var(P)) · (1 + α·(1-R))  # Adaptive temperature
A[h] = softmax((QK^T / √d_k) · P_eff / τ) @ V

At initialization (λ≈0.1), this behaves like standard attention. During training, the model learns to use phase coherence for routing — or not, if it doesn't help.

Motivation

The Relational Coupling Experiment (3,830 inference runs, 5 architectures) demonstrated that:

  1. Attention is required for the R×E superadditive interaction (+0.19 to +0.21 in transformers, absent in Falcon-Mamba SSM)
  2. Oscillators in hidden state decouple from generation (K-SSM negative result)
  3. The fix: Move oscillators to where they can't be ignored — attention weights

Two independent external reviews (Grok, Kimi K2) converged on the same architecture. See docs/ARCHITECTURE_SYNTHESIS.md for the synthesis.

Installation

git clone https://github.com/TheTempleofTwo/phase-modulated-attention.git
cd phase-modulated-attention
pip install -e ".[dev]"

Quick Start

from pma.model import PhaseModulatedAttentionLM
from pma.config import create_130m_config

config = create_130m_config()
model = PhaseModulatedAttentionLM(config)

# Forward pass
logits, metrics = model(input_ids, return_metrics=True)
print(f"R = {metrics['R_mean']:.4f}")  # Phase coherence

# Intervention: override coupling strength
from pma.interventions import set_coupling_override
set_coupling_override(model, K=3.0)  # Force high coupling

Paper

Entropy Regime Switching in Language Models via Phase-Modulated Attention Anthony J Vasquez Sr (2026)

PDF | RCT Data

Project Structure

pma/                     # Main package
├── layers/              # Kuramoto, SSM, PMA, FFN, RMSNorm
├── blocks/              # SSM block, PMA block
├── model.py             # Full language model
├── interventions.py     # R/K/τ/λ/phase override hooks
└── metrics.py           # Entropy, Fisher, R tracking

data/                    # Data pipeline
training/                # Training loop, losses, scheduler
eval/                    # Perplexity, generation, R intervention test
experiments/             # Ablation suite
paper/                   # NeurIPS submission
docs/                    # Architecture reviews
tests/                   # Unit tests

License

MIT

Author

Anthony J Vasquez Sr — @TheTempleofTwo

About

Phase-Modulated Attention: Kuramoto oscillators in attention weights, not hidden state

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors