Skip to content
View Achyuthan-S's full-sized avatar
🎯
Focusing
🎯
Focusing
  • New York University
  • New York

Highlights

  • Pro

Block or report Achyuthan-S

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Achyuthan-S/README.md

Achyuthan Sivasankar

I work on one problem: how sparse neural systems learn to route computation — and when routing actually helps.

Currently a research assistant in Prof. Anna Choromanska's lab at NYU, working on self-supervised world models for autonomous driving with LiDAR.


What I'm building

  • AD-LiST-JEPA — spatiotemporal JEPA world model for autonomous driving; predicts future BEV LiDAR embeddings without labels or contrastive pairs
  • KAN-Multi — routing layer that selects among 6 function bases with zero supervision; +6.8% over MLP on CIFAR-100
  • MoE-Bench — open diagnostic toolkit for expert collapse & routing entropy in sparse MoE LLMs (OLMoE, JetMoE, Qwen)

What I care about Self-supervised learning · Sparse MoE architectures · Neural routing · World models · LiDAR perception

Stack Python · PyTorch · C/C++ · Go · HuggingFace · Docker · FastAPI · AWS



Notable Open source contributions

NVIDIA-NeMo/Megatron-Bridge #4601
Make finetuning batch sampler epoch-aware on checkpoint resume
NVIDIA ready to merge
vllm-project/vllm #47062
Return raw output when GPT-OSS Harmony parser ends in a non-terminal state
vLLM merged
NVIDIA-NeMo/Automodel #2805
Reject tie_word_embeddings=True on separate-head model families
NVIDIA merged
deepspeedai/DeepSpeed #8078
Avoid CUDA context initialization during import-time op compatibility checks (fork-safe import)
deepspeedai merged
NVIDIA-NeMo/Automodel #2732
Resolve tie_word_embeddings top-level-first to match HF tying semantics
NVIDIA merged
vllm-project/vllm #44795
Fix nightly Docker ImportError: AnthropicOutputConfig
vLLM merged
NVIDIA-NeMo/Automodel #2601
Re-tie lm_head to active embed_tokens on Gemma4 MoE path
NVIDIA merged
NVIDIA-NeMo/Automodel #2709
Cherry-pick #2601 into r0.5.0
NVIDIA merged

Personal Portfolio


📫 achyuthan.sivasankar@gmail.com · LinkedIn · Portfolio

Pinned Loading

  1. moe-bench moe-bench Public

    Open benchmark for expert collapse and routing efficiency in sparse MoE LLMs

    Python 16

  2. rag-acga-knowledge-base-memory-system rag-acga-knowledge-base-memory-system Public

    Production-ready RAG with adaptive memory system, hybrid retrieval, and graph augmentation. Plug-and-play components for any RAG project.

    Python 12

  3. Automodel Automodel Public

    Forked from NVIDIA-NeMo/Automodel

    🚀 Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support

    Python

  4. TensorRT-LLM TensorRT-LLM Public

    Forked from NVIDIA/TensorRT-LLM

    TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

    Python

  5. vllm vllm Public

    Forked from vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    Python

  6. DeepSpeed DeepSpeed Public

    Forked from deepspeedai/DeepSpeed

    DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

    Python