I have successfully run lingbot-map in my environment.
Here is informations, hope it helps others.
Environment
| Item |
Detail |
| Machine |
Dell G15 5530 Laptop |
| GPU |
NVIDIA RTX4060 Laptop 8GB |
| OS |
Ubuntu 22.04 + 6.0.0-1020-oem kernel + Xfce X11 gdm3 |
| NVIDIA DRIVER |
580-server |
Modifications
- Use only 21 frames of example/courthouse.
- In load_model(demo.py), load the model to CPU first, then transfer to GPU to avoid memory peak.
- Use Chrome instead of Firefox. Firefox may be incompatible with Viser, causing the point cloud to fail to render.
CMD and Log
python demo.py --model_path ../models/lingbot-map-long.pt --image_folder example/courthouse_0_20 --use_sdpa --offload_to_cpu --mask_sky
Loading 21 images...
Loading images: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:00<00:00, 509.37it/s]
Preprocessed images to 518x294 using canonical crop mode
/home/jcy/miniconda3/envs/lingbot-map/lib/python3.10/site-packages/tvm_ffi/_optional_torch_c_dlpack.py:181: UserWarning: Failed to JIT torch c dlpack extension, EnvTensorAllocator will not be enabled.
We recommend installing via pip install torch-c-dlpack-ext
warnings.warn(
Building model...
pretrained_path:
Failed to load pretrained weights: [Errno 2] No such file or directory: ''
Loading checkpoint: ../models/lingbot-map-long.pt
Checkpoint loaded.
Total load time: 9.4s
Casting aggregator to torch.bfloat16 (heads kept in fp32)
Input: 21 frames, shape (21, 3, 294, 518)
Mode: streaming
GPU mem after load: alloc=2.85 GB, reserved=2.88 GB
Running streaming inference (dtype=torch.bfloat16)...
Streaming inference: 100%|████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:02<00:00, 4.63it/s]
Inference done in 5.4s
GPU peak during inference: 6.02 GB (reserved peak 6.42 GB)
Moving results to CPU...
╭────── viser (listening *:8080) ───────╮
│ ╷ │
│ HTTP │ http://localhost:8080 │
│ Websocket │ ws://localhost:8080 │
│ ╵ │
╰───────────────────────────────────────╯
Generating sky masks from image array...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:00<00:00, 784.82it/s]
Sky segmentation applied successfully
Screenshot

I have successfully run lingbot-map in my environment.
Here is informations, hope it helps others.
Environment
Modifications
CMD and Log
python demo.py --model_path ../models/lingbot-map-long.pt --image_folder example/courthouse_0_20 --use_sdpa --offload_to_cpu --mask_sky
Loading 21 images...
Loading images: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:00<00:00, 509.37it/s]
Preprocessed images to 518x294 using canonical crop mode
/home/jcy/miniconda3/envs/lingbot-map/lib/python3.10/site-packages/tvm_ffi/_optional_torch_c_dlpack.py:181: UserWarning: Failed to JIT torch c dlpack extension, EnvTensorAllocator will not be enabled.
We recommend installing via
pip install torch-c-dlpack-extwarnings.warn(
Building model...
pretrained_path:
Failed to load pretrained weights: [Errno 2] No such file or directory: ''
Loading checkpoint: ../models/lingbot-map-long.pt
Checkpoint loaded.
Total load time: 9.4s
Casting aggregator to torch.bfloat16 (heads kept in fp32)
Input: 21 frames, shape (21, 3, 294, 518)
Mode: streaming
GPU mem after load: alloc=2.85 GB, reserved=2.88 GB
Running streaming inference (dtype=torch.bfloat16)...
Streaming inference: 100%|████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:02<00:00, 4.63it/s]
Inference done in 5.4s
GPU peak during inference: 6.02 GB (reserved peak 6.42 GB)
Moving results to CPU...
╭────── viser (listening *:8080) ───────╮
│ ╷ │
│ HTTP │ http://localhost:8080 │
│ Websocket │ ws://localhost:8080 │
│ ╵ │
╰───────────────────────────────────────╯
Generating sky masks from image array...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:00<00:00, 784.82it/s]
Sky segmentation applied successfully
Screenshot