[ICRA 2026] NavSpace: How Navigation Agents Follow Spatial Intelligence Instructions

Paper & Project Website

ArXiv (paper): NavSpace — How Navigation Agents Follow Spatial Intelligence Instructions
Project website: https://navspace.github.io/

Highlights

NavSpace is the first benchmark for evaluating spatial intelligence in embodied navigation
We manually collect 6 task categories and 1,228 trajectory-instruction pairs
We comprehensively evaluate 22 navigation agents
We propose a strong baseline model, SNav, validated on both NavSpace and real robot tests. SNav achieves SoTA in all validations.

✨ What is NavSpace?

Instruction-following navigation is a key step toward embodied intelligence. Existing benchmarks mainly focus on semantic understanding but overlook a systematic evaluation of spatial perception and reasoning. NavSpace fills this gap with six task categories and 1,228 trajectory-instruction pairs, and evaluates multimodal large language models, local navigation models, and the proposed SNav baseline on the same benchmark.

This repository contains everything you need to reproduce, extend, and build on top of the paper:

📊 Benchmark data — all six NavSpace subtasks.
🧪 Evaluation suite — LLM API / SNav / StreamVLN routes with a unified result format.
✏️ Annotation pipeline — Flask + Habitat-Sim web UI for collecting new trajectories, see docs/annotation.md.
🎓 SNav training code (Stage-1 vanilla baseline) — runnable end-to-end from Habitat rendering to DeepSpeed SFT, see docs/training.md. This is a baseline release: Video-QA mixing, height / lighting perturbation and full data-augmentation flows are out of scope and left for users to layer on top.

Qualitative visualizations

The figure below shows typical agent rollouts on the six NavSpace subtasks — each column is a different spatial-intelligence skill (environment state, space structure, precise movement, viewpoint shifting, vertical perception, spatial relationship).

SNav fine-tuning pipeline

Our SNav baseline is fine-tuned on top of Llava-Video-7b-Qwen2. The Stage-1 vanilla SFT recipe (Habitat rendering →
LLaVA-Video-7B-Qwen2 SFT via DeepSpeed) is open-sourced under
snav_training/ — see the training guide. For the full paper recipe you additionally need Video-QA mixing (hook already exposed), height / lighting variation during rendering, and the Stage-2/3 data-augmentation pipelines, which are deliberately out of scope here; the baseline is meant to be extended.

🗺️ Navigate the repo

Module	Folder	Docs
🧪 Evaluation (LLM + SNav + StreamVLN)	evaluation/	中文文档 · English
✏️ Annotation pipeline (Flask + Habitat-Sim)	annotation_pipeline/	docs/annotation.md (中英双语: CN&EN Bilingual)
🎓 SNav training (Stage-1 vanilla baseline)	snav_training/	docs/training.md (中英双语: CN&EN Bilingual)
📦 Benchmark data	NavSpace-Datasets/	built into docs/evaluation.md
🧰 Utilities	tools/	built into docs/evaluation.md §0, §6

Jump straight to a module:

Evaluation Guide (中文) — deploy Habitat-Sim / HM3D, run LLM / SNav / StreamVLN, merge shard results.
Evaluation Guide (English) — complete English translation of the above.
Annotation Pipeline Guide — deploy the web UI, the 200-step familiarization gate, output JSON format.
SNav Training Guide — render expert trajectories, run Stage-1 vanilla SFT with DeepSpeed, and hand off the checkpoint to the evaluation suite.

📂 Repository layout

NavSpace-main/
├── frontpage.png               # README hero figure
├── visualization.png           # qualitative rollouts figure
├── snav-finetune.png           # SNav pipeline figure
├── NavSpace-Datasets/          # benchmark data for the 6 subtasks
├── evaluation/                 # unified evaluation suite (LLM / SNav / StreamVLN)
├── annotation_pipeline/        # Flask + Habitat-Sim web UI for annotation
├── snav_training/              # SNav Stage-1 vanilla SFT (render + train + pipeline)
├── tools/                      # smoke_test / merge_results / llm_client / ...
├── docs/
│   ├── evaluation.md           # 中文评测指南
│   ├── evaluation_en.md        # English evaluation guide
│   ├── annotation.md           # Bilingual Annotation pipeline guide
│   └── training.md             # Bilingual Training Setup guide
├── gpt_eval.py                 # legacy wrapper -> evaluation/run_llm_eval
├── run_annotation_server.sh    # cd to repo root + start Flask annotation UI
├── el.sh                       # 8-way shard launcher
├── requirements-base.txt         # base deps
├── requirements-llm.txt          # LLM route deps
├── requirements-local-model.txt  # local-model route deps
└── requirements-annotation.txt   # annotation web UI deps (Flask + SocketIO)

📊 Benchmark at a glance

The NavSpace dataset contains VLN-style trajectories for six subtasks (1,228 episodes total):

Environment State	Space Structure	Precise Movement	Viewpoint Shifting	Vertical Perception	Spatial Relationship	Total
200	200	201	207	208	212	1,228

Each subfolder under NavSpace-Datasets/ ships three JSON flavours:

*_vln.json — standard VLN format (coordinates / instruction / goal / path). This is what the evaluation scripts in this repository consume.
*_action.json — ground-truth action sequences aligned with *_vln.json.
*_with_tokens.json — pre-tokenized format for lightweight navigation models.

Action space

Action	Effect
`forward`	move 0.25 m straight
`left` / `right`	rotate 30° left / right
`look-up` / `look-down`	tilt camera up / down 30°
`backward`	move 0.25 m backward
`stop`	end of trajectory

🚀 Quick start

# 1. Clone / download and enter the repo
cd NavSpace-main

# 2. (Optional) offline sanity check — no Habitat-Sim / HM3D / API key needed
python tools/smoke_test.py

# 3. Validate the shipped benchmark data
python NavSpace-Datasets/validate_dataset_integrity.py

# 4. After installing habitat-sim + downloading HM3D, run one LLM evaluation
export OPENAI_API_KEY=sk-xxxxx
python evaluation/run_llm_eval.py \
  --profile gemini-pro \
  --task environment_state \
  --hm3d-base-path /path/to/hm3d_v0.2

For everything else — provider selection, API keys, SNav / StreamVLN setup, parallel sharding, result merging and offline verification — open the Evaluation Guide.

📦 Dependencies

Dependency files are split by usage so you only install what you need:

requirements-base.txt — common runtime dependencies.
requirements-llm.txt — LLM API clients (OpenAI-compatible + Zhipu).
requirements-local-model.txt — local-model route (torch / transformers / decord / ...).
requirements-annotation.txt — Flask + Flask-SocketIO web UI for the annotation pipeline.

habitat-sim / habitat-lab and the HM3D / MP3D assets still have to be installed separately per your platform and CUDA version — see §1 of the Evaluation Guide.

🗺️ Roadmap

Public benchmark data for all six subtasks.
Unified evaluation suite (LLM / SNav / StreamVLN).
Annotation pipeline with a 200-step familiarization gate.
Offline verification (tools/smoke_test.py, --dry-run).
SNav Stage-1 vanilla SFT baseline (snav_training/).
Height / lighting variation during rendering.
Data-augmentation scripts (LLM-based instruction rewriting, panorama aug, DAgger).
Pretrained SNav checkpoints.

📎 Citation

@misc{yang2026navspacenavigationagentsfollow,
      title={NavSpace: How Navigation Agents Follow Spatial Intelligence Instructions}, 
      author={Haolin Yang and Yuxing Long and Zhuoyuan Yu and Zihan Yang and Minghan Wang and Jiapeng Xu and Yihan Wang and Ziyan Yu and Wenzhe Cai and Lei Kang and Hao Dong},
      year={2026},
      eprint={2510.08173},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2510.08173}, 
}

Paper: https://arxiv.org/abs/2510.08173 · Project website: https://navspace.github.io/

Star, cite & issues

If NavSpace is useful for your work, a GitHub star helps others discover the repo and keeps us motivated. If you use this benchmark or code in a paper or report, please cite the BibTeX entry above. Bug reports, feature ideas, and discussion are welcome—please open an Issue on GitHub so we can track and improve the project together.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ICRA 2026] NavSpace: How Navigation Agents Follow Spatial Intelligence Instructions

Paper & Project Website

Highlights

✨ What is NavSpace?

Qualitative visualizations

SNav fine-tuning pipeline

🗺️ Navigate the repo

📂 Repository layout

📊 Benchmark at a glance

Action space

🚀 Quick start

📦 Dependencies

🗺️ Roadmap

📎 Citation

Star, cite & issues

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
NavSpace-Datasets		NavSpace-Datasets
annotation_pipeline		annotation_pipeline
docs		docs
evaluation		evaluation
snav_training		snav_training
tools		tools
.gitignore		.gitignore
README.md		README.md
el.sh		el.sh
frontpage.png		frontpage.png
gpt_eval.py		gpt_eval.py
requirements-annotation.txt		requirements-annotation.txt
requirements-base.txt		requirements-base.txt
requirements-llm.txt		requirements-llm.txt
requirements-local-model.txt		requirements-local-model.txt
run_annotation_server.sh		run_annotation_server.sh
snav-finetune.png		snav-finetune.png
visualization.png		visualization.png

Folders and files

Latest commit

History

Repository files navigation

[ICRA 2026] NavSpace: How Navigation Agents Follow Spatial Intelligence Instructions

Paper & Project Website

Highlights

✨ What is NavSpace?

Qualitative visualizations

SNav fine-tuning pipeline

🗺️ Navigate the repo

📂 Repository layout

📊 Benchmark at a glance

Action space

🚀 Quick start

📦 Dependencies

🗺️ Roadmap

📎 Citation

Star, cite & issues

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages