Skip to content

Question about the image-to-world expansion loop in the demo #21

@xiyichen

Description

@xiyichen

Hi, thanks for sharing this impressive project. I have a question about the technical details of the image-to-world mode.

From the demos, it looks like the Gaussian scene keeps expanding as new video trajectories are generated beyond the currently reconstructed region. However, I am not fully sure how this is implemented, given that the diffusion model appears to operate on a fixed 81-frame input.

I am wondering which of the following better matches your implementation:

  1. Chunk-wise expansion with later merging.
    Each 81-frame diffusion pass only extrapolates from the currently reconstructed Gaussian scene. Then, to obtain the final world Gaussian, you either:

    • combine all generated video chunks and feed them jointly into the reconstruction model, or
    • reconstruct separate Gaussian scenes from different chunks and then align/merge them.
  2. Progressive expansion within overlapping 81-frame windows.
    For each 81-frame diffusion pass, the trajectory is chosen so that the generated video contains both:

    • regions already covered by the current Gaussian scene, and
    • newly extrapolated regions beyond it.

    In this case, the final Gaussian would simply be reconstructed from the last generated 81-frame video, without explicitly merging multiple video chunks or Gaussian reconstructions.

Could you clarify which of these is closer to the actual implementation, or whether the real pipeline is different from both?

Thanks a lot.

Image Image Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions