Lightweight test with 8GB GPU

I have successfully run lingbot-map in my environment.
Here is informations, hope it helps others.

### Environment
| Item | Detail |
|------|------|
| Machine | Dell G15 5530 Laptop |
| GPU  | NVIDIA RTX4060 Laptop 8GB |
| OS | Ubuntu 22.04 + 6.0.0-1020-oem kernel + Xfce X11 gdm3 |
| NVIDIA DRIVER | 580-server|

### Modifications
1. Use only 21 frames of example/courthouse.
2. In load_model(demo.py), load the model to CPU first, then transfer to GPU to avoid memory peak.
3. Use Chrome instead of Firefox. Firefox may be incompatible with Viser, causing the point cloud to fail to render.

### CMD and Log
python demo.py --model_path ../models/lingbot-map-long.pt --image_folder example/courthouse_0_20 --use_sdpa --offload_to_cpu --mask_sky

Loading 21 images...
Loading images: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:00<00:00, 509.37it/s]
Preprocessed images to 518x294 using canonical crop mode
/home/jcy/miniconda3/envs/lingbot-map/lib/python3.10/site-packages/tvm_ffi/_optional_torch_c_dlpack.py:181: UserWarning: Failed to JIT torch c dlpack extension, EnvTensorAllocator will not be enabled.
We recommend installing via `pip install torch-c-dlpack-ext`
  warnings.warn(
Building model...
pretrained_path: 
Failed to load pretrained weights: [Errno 2] No such file or directory: ''
Loading checkpoint: ../models/lingbot-map-long.pt
  Checkpoint loaded.
Total load time: 9.4s
Casting aggregator to torch.bfloat16 (heads kept in fp32)
Input: 21 frames, shape (21, 3, 294, 518)
Mode: streaming
GPU mem after load: alloc=2.85 GB, reserved=2.88 GB
Running streaming inference (dtype=torch.bfloat16)...
Streaming inference: 100%|████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:02<00:00,  4.63it/s]
Inference done in 5.4s
GPU peak during inference: 6.02 GB (reserved peak 6.42 GB)
Moving results to CPU...
╭────── viser (listening *:8080) ───────╮
│             ╷                         │
│   HTTP      │ http://localhost:8080   │
│   Websocket │ ws://localhost:8080     │
│             ╵                         │
╰───────────────────────────────────────╯
Generating sky masks from image array...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:00<00:00, 784.82it/s]
Sky segmentation applied successfully

### Screenshot

<img width="1920" height="1080" alt="Image" src="https://github.com/user-attachments/assets/50526b54-4860-4fbb-9481-d999e85c528e" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lightweight test with 8GB GPU #64

Environment

Modifications

CMD and Log

Screenshot

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Item	Detail
Machine	Dell G15 5530 Laptop
GPU	NVIDIA RTX4060 Laptop 8GB
OS	Ubuntu 22.04 + 6.0.0-1020-oem kernel + Xfce X11 gdm3
NVIDIA DRIVER	580-server

Lightweight test with 8GB GPU #64

Description

Environment

Modifications

CMD and Log

Screenshot

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions