MLX-Video is the best package for inference and finetuning of Image-Video-Audio generation models on your Mac using MLX.
pip install git+https://github.com/Blaizzy/mlx-video.gituv pip install git+https://github.com/Blaizzy/mlx-video.git- LTX-2 — 19B parameter video generation model from Lightricks
- Wan2.1 — 1.3B / 14B parameter T2V models (single-model pipeline)
- Wan2.2 — T2V-14B, TI2V-5B, and I2V-14B models (dual-model pipeline)
LTX-2 / LTX-2.3
- Text-to-Video (T2V), Image-to-Video (I2V), Audio-to-Video (A2V)
- Audio-Video joint generation
- Multi-pipeline: distilled, dev, dev-two-stage, dev-two-stage-hq
- 2x spatial upscaling for images and videos
- Prompt enhancement via Gemma
Wan2.1 / Wan2.2
- Text-to-Video (T2V) — 1.3B and 14B models
- Image-to-Video (I2V) — 14B model
- Flow-matching diffusion with classifier-free guidance
- LoRA support (e.g. Wan2.2-Lightning for 4-step generation)
General
- Optimized for Apple Silicon using MLX
# Text-to-Video (distilled, fastest)
uv run mlx_video.ltx_2.generate --prompt "Two dogs wearing sunglasses, cinematic, sunset" -n 97 --width 768
# Image-to-Video
uv run mlx_video.ltx_2.generate --prompt "A person dancing" --image photo.jpg
# Audio-to-Video
uv run mlx_video.ltx_2.generate --audio-file music.wav --prompt "A band playing music"
# Dev pipeline with CFG (higher quality)
uv run mlx_video.ltx_2.generate --pipeline dev --prompt "A cinematic scene" --cfg-scale 3.0
# Dev two-stage HQ (highest quality)
uv run mlx_video.ltx_2.generate --pipeline dev-two-stage-hq \
--prompt "A cinematic scene of ocean waves at golden hour" \
--model-repo prince-canuma/LTX-2-devConverting weights:
Pre-converted weights are available on HuggingFace (LTX-2-distilled, LTX-2-dev, LTX-2.3-distilled, LTX-2.3-dev), or convert from the original Lightricks checkpoint:
| Option | Default | Description |
|---|---|---|
--prompt, -p |
(required) | Text description of the video |
--height, -H |
512 | Output height (must be divisible by 64) |
--width, -W |
512 | Output width (must be divisible by 64) |
--num-frames, -n |
100 | Number of frames |
--seed, -s |
42 | Random seed for reproducibility |
--fps |
24 | Frames per second |
--output, -o |
output.mp4 | Output video path |
--save-frames |
false | Save individual frames as images |
--model-repo |
Lightricks/LTX-2 | HuggingFace model repository |
Both Wan2.1 and Wan2.2 are text-to-video diffusion models built on a DiT (Diffusion Transformer) backbone with a T5 text encoder and 3D VAE.
See the dedicated Wan2.1/Wan2.2 README.md for details.
# Wan2.1 — uses defaults from config (50 steps, shift=5.0, guide=5.0)
python -m mlx_video.wan_2.generate \
--model-dir wan21_mlx \
--prompt "A cat playing piano in a cozy room"
# Wan2.2 — uses defaults from config (40 steps, shift=12.0, guide=3.0,4.0)
python -m mlx_video.wan_2.generate \
--model-dir wan22_mlx \
--prompt "A cat playing piano in a cozy room"With custom settings:
python -m mlx_video.wan_2.generate \
--model-dir wan21_mlx \
--prompt "Ocean waves at sunset, cinematic, 4K" \
--negative-prompt "blurry, low quality" \
--width 1280 \
--height 720 \
--num-frames 81 \
--steps 50 \
--guide-scale 5.0 \
--shift 5.0 \
--seed 42 \
--output-path my_video.mp4The pipeline auto-detects the model version from config.json and selects the right pipeline mode (single or dual model).
python -m mlx_video.wan_2.generate \
--model-dir wan22_i2v_mlx \
--prompt "The camera slowly zooms in as the subject begins to move" \
--image start.png \
--num-frames 81 \
--output-path my_video.mp4LoRAs can be used with the --lora-high and --lora-low command line switches.
For example, using the distilled Wan2.2-Lightning LoRA for 4-step generation:
python -m mlx_video.wan_2.generate \
--model-dir /Volumes/SSD/Wan-AI/Wan2.2-T2V-A14B-MLX \
--width 480 \
--height 704 \
--num-frames 41 \
--prompt "Two dogs of the poodle breed sitting on a beach wearing sunglasses, nodding with their heads, close up, cinematic, sunset" \
--steps 4 \
--guide-scale 1 \
--trim-first-frames 1 \
--seed 2391784614 \
--lora-high /Volumes/SSD/Wan-AI/lightx2v/Wan2.2-Lightning/Wan2.2-T2V-A14B-4steps-lora-rank64-Seko-V2.0/high_noise_model.safetensors 1 \
--lora-low /Volumes/SSD/Wan-AI/lightx2v/Wan2.2-Lightning/Wan2.2-T2V-A14B-4steps-lora-rank64-Seko-V2.0/low_noise_model.safetensors 1| Option | Default | Description |
|---|---|---|
--model-dir |
(required) | Path to converted MLX model directory |
--prompt |
(required) | Text description of the video |
--image |
None |
Input image path (for I2V models) |
--negative-prompt |
"" |
Negative prompt for guidance |
--width |
1280 | Video width |
--height |
720 | Video height |
--num-frames |
81 | Number of frames (must be 4n+1) |
--steps |
from config | Number of diffusion steps |
--guide-scale |
from config | Guidance scale: float or low,high pair |
--shift |
from config | Noise schedule shift |
--seed |
-1 (random) | Random seed for reproducibility |
--output-path |
output.mp4 |
Output video path |
- macOS with Apple Silicon
- Python >= 3.11
- MLX >= 0.22.0
- For weight conversion: PyTorch (
pip install torch)
MIT

