achyuthan.s Achyuthan-S

Achyuthan Sivasankar

I work on one problem: how sparse neural systems learn to route computation — and when routing actually helps.

Currently a research assistant in Prof. Anna Choromanska's lab at NYU, working on self-supervised world models for autonomous driving with LiDAR.

What I'm building

AD-LiST-JEPA — spatiotemporal JEPA world model for autonomous driving; predicts future BEV LiDAR embeddings without labels or contrastive pairs
KAN-Multi — routing layer that selects among 6 function bases with zero supervision; +6.8% over MLP on CIFAR-100
MoE-Bench — open diagnostic toolkit for expert collapse & routing entropy in sparse MoE LLMs (OLMoE, JetMoE, Qwen)

What I care about Self-supervised learning · Sparse MoE architectures · Neural routing · World models · LiDAR perception

Stack Python · PyTorch · C/C++ · Go · HuggingFace · Docker · FastAPI · AWS

Notable Open source contributions

NVIDIA-NeMo/Megatron-Bridge #4601 Make finetuning batch sampler epoch-aware on checkpoint resume
vllm-project/vllm #47062 Return raw output when GPT-OSS Harmony parser ends in a non-terminal state
NVIDIA-NeMo/Automodel #2805 Reject `tie_word_embeddings=True` on separate-head model families
deepspeedai/DeepSpeed #8078 Avoid CUDA context initialization during import-time op compatibility checks (fork-safe import)
NVIDIA-NeMo/Automodel #2732 Resolve `tie_word_embeddings` top-level-first to match HF tying semantics
vllm-project/vllm #44795 Fix nightly Docker `ImportError: AnthropicOutputConfig`
NVIDIA-NeMo/Automodel #2601 Re-tie `lm_head` to active `embed_tokens` on Gemma4 MoE path
NVIDIA-NeMo/Automodel #2709 Cherry-pick #2601 into `r0.5.0`