feat: load LTX-2.3 connector weights from GGUF on Apple Silicon by samirhassen · Pull Request #431 · Lightricks/ComfyUI-LTXVideo

Samir Hassen (samirhassen) · 2026-03-10T22:59:25Z

Summary
Enables LTXVGemmaCLIPModelLoader to load LTX-2.3 (22B AV) connector weights directly from a GGUF checkpoint on Apple Silicon, with no separate safetensors extraction step required. Tested on M4 Max (36 GB) with the Q4_K_S GGUF (~16 GB).

Motivation
The existing loader only supports safetensors checkpoints. On Apple Silicon, the primary distribution format for large models is GGUF (via llama.cpp-style quantisation). The connector and projection weights are embedded in the same GGUF file as the diffusion model, so users shouldn't need to unpack or convert anything manually.

Changes
text_embeddings_connectors.py

Added _load_gguf_connector_sd(): reads connector tensors directly from GGUF via gguf.GGUFReader, reverses shape order (GGUF stores dimensions innermost-first), and handles F32 / F16 / BF16 natively. Falls back to ComfyUI-GGUF's dequant.py for quantised types.
Hardcoded transformer_config for LTX-2.3 (22B AV), which is not stored in the GGUF metadata: 32 heads × 128 head_dim for video connector (inner_dim=4096), 32 heads × 64 head_dim for audio connector (inner_dim=2048).
Auto-discovers proj_linear.safetensors from ComfyUI's text_encoders folder paths and merges it into the state dict, so the text_embedding_projection weights are always picked up without manual configuration.

embeddings_connector.py

load_embeddings_connector: changed strict=True → strict=False in load_state_dict to tolerate minor key mismatches between GGUF-extracted tensors and the module definition.
Embeddings1DConnector.forward: auto-selects RoPE frequency spacing based on inner_dim. The existing "exp" spacing uses POS_EMBEDDING_EXP_VALUES, which is sized for inner_dim=3840 (19B model). LTX-2.3's connector has inner_dim=4096, so "exp_2" (standard scaled formula) is used instead, preventing a shape mismatch at inference time.

gemma_encoder.py

LTXVGemmaCLIPModelLoader: made ltxv_path optional ([""] + ...). When empty, auto-discovers the GGUF from ComfyUI's unet/ folder so the UI doesn't require a duplicate path entry.
GGUF Gemma loading: falls back to AutoModelForCausalLM.from_pretrained(..., gguf_file=...) when no model*.safetensors is found, enabling the text encoder itself to be loaded from GGUF.

Why is_av must remain enabled
preprocess_text_embeds in the LTXAV transformer checks whether the embedding dimension is cross_attention_dim + audio_cross_attention_dim (4096 + 2048 = 6144) to decide whether it has already been processed. If is_av=False, only the video connector runs and the output is 4096-dim — the transformer then double-processes it and produces garbage. The is_av flag is correctly detected from the presence of audio_adaln_single.linear.weight in the state dict.

Testing
Ran full text-to-video inference in ComfyUI on Mac Studio M4 Max (36 GB) with ltx-video-2b-v0.9.5-distilled.gguf (Q4_K_S).
Video output via VHS_VideoCombine node confirmed correct (motion, coherence, no artefacts from mis-sized embeddings).

Known limitations
transformer_config for LTX-2.3 is hardcoded rather than read from GGUF metadata (the metadata does not contain it). If Lightricks releases a new architecture variant, this will need updating.
The dequantisation fallback for non-float types depends on ComfyUI-GGUF being present alongside this plugin.

- Load video/audio embeddings connector weights directly from GGUF - Auto-find proj_linear.safetensors for text_embedding_projection - Fix RoPE spacing (exp_2) for connectors with inner_dim != 3840 - Set is_av correctly for AV model to output 6144-dim conditioning - Add audio_connector_attention_head_dim config for proper 2048-dim audio connector

- Replace all print()/DEBUG statements with proper logger calls - Move stdlib imports (glob, importlib.util, logging, os) to module level - Move folder_paths import to module level in text_embeddings_connectors - Add logger = logging.getLogger(__name__) to text_embeddings_connectors - Fix comment typo in gemma_encoder.py (bytesed → bytes) - embeddings_connector.py had no debug statements (no changes needed) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

user1000 and others added 2 commits March 10, 2026 23:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: load LTX-2.3 connector weights from GGUF on Apple Silicon#431

feat: load LTX-2.3 connector weights from GGUF on Apple Silicon#431
Samir Hassen (samirhassen) wants to merge 2 commits intoLightricks:masterfrom
samirhassen:ltx23-gguf-support

Samir Hassen (samirhassen) commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Samir Hassen (samirhassen) commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant