Your star means a lot for us to develop this project! ✨
- Upload sparse attention weight.
The point-cloud rendering pipeline depends on π³, which is included as a git submodule. Make sure to clone recursively so that Pi3/ is fetched at the same time:
git clone --recursive https://github.com/OpenImagingLab/AnyRecon.git
# If you already cloned without --recursive, run:
# git submodule update --init --recursive
cd AnyRecon
conda create -n anyrecon python=3.10 -y
conda activate anyrecon
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
pip install -r Pi3/requirements.txtAnyRecon relies on specific pre-trained weights. Please download them and place them in the ./checkpoints folder.
- Base Video Diffusion Model (Wan2.1 I2V 14B 720P) [download]
- AnyRecon LoRA weights [download]
- π³ checkpoint (for point-cloud rendering) [download] → place at
Pi3/model.safetensors
To reproduce the provided example, run:
bash test.shOr directly:
python run_AnyRecon.py \
--root_dir example/valley \
--output_dir example/valley \
--lora_path full_attention.ckptrun_AnyRecon.py expects point-cloud rendered condition videos as input. To prepare them from a raw video, we provide a helper script built on top of π³:
bash run_pi3.shInput video format. Your input video must be organized so that:
- the first
--num_cond_framesframes are the capture views — these provide the 3D point cloud, - the remaining frames are the test views — they are only used to estimate the camera poses at which the point cloud is rendered, and do not contribute any points to the reconstruction.
Custom test-view trajectory (no test frames needed). If you'd rather specify a custom rendering trajectory instead of estimating poses from real test-view images, you can replace the test-view portion of the video with any placeholder frames and override target_extrinsics[num_cond_frames:] inside process_scene with your desired sequence of world→camera 4×4 matrices. The capture views (the first num_cond_frames frames) will still be used to build the point cloud, and rendering proceeds along your chosen trajectory.
Once run_pi3.py has produced the condition videos in --output_dir, point run_AnyRecon.py --root_dir to that directory and run inference as shown above.
Thanks to these great repositories: Wan2.1, DiffSynth-Studio, and π³.
If you find our work helpful, please cite it:
@article{chen2026anyrecon,
title={AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model},
author={Chen, Yutian and Guo, Shi and Jin, Renbiao and Yang, Tianshuo and Cai, Xin and Luo, Yawen and Yang, Mingxin and Yu, Mulin and Xu, Linning and Xue, Tianfan},
journal={arXiv preprint arXiv:2604.19747},
year={2026}
}

