Skip to content

I need 1 hr to inference single_example_image.json on 4 3090 GPUs, is there anything I can do to increase the speed? #197

@Haosonn

Description

@Haosonn

export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
GPU_NUM=4
torchrun --nproc_per_node=$GPU_NUM --standalone generate_infinitetalk.py
--ckpt_dir weights/Wan2.1-I2V-14B-480P
--wav2vec_dir 'weights/chinese-wav2vec2-base'
--infinitetalk_dir weights/InfiniteTalk/single/infinitetalk.safetensors
--dit_fsdp --t5_fsdp
--ulysses_size=$GPU_NUM
--input_json examples/single_example_image.json
--size infinitetalk-480
--sample_steps 40
--mode streaming
--motion_frame 9
--save_file infinitetalk_res_multigpu

Here is the scirpt I used. PYTORCH_CUDA_ALLOC_CONF is set becuase of OOM problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions