Skip to content

Conversation

@xingliu14
Copy link
Collaborator

@xingliu14 xingliu14 commented Dec 5, 2025

Description

  • Add auto as default value for MODEL_IMPL_TYPE env var
  • For GptOssForCausalLM, auto resolves to vllm for better performance
  • For all other architectures, 'auto' resolves to flax_nnx for better performance
  • Add tests for 'auto' resolution behavior

Tests

pytest

tests/test_envs.py
tests/models/common/test_model_loader.py

Checklist

Before submitting this PR, please make sure:

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have made or will make corresponding changes to any relevant documentation.

@xingliu14
Copy link
Collaborator Author

@kyuyeunk Please review.

Copy link
Collaborator

@kyuyeunk kyuyeunk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be possible move 'auto' to 'match/case' as well?

- Add 'auto' as default value for MODEL_IMPL_TYPE env var
- For GptOssForCausalLM, 'auto' resolves to 'vllm' for better performance
- For all other architectures, 'auto' resolves to 'flax_nnx'
- Add _VLLM_REQUIRED_ARCHITECTURES frozenset in model_loader.py
- Use match/case pattern in get_model() for implementation selection
- Add tests for 'auto' resolution behavior

Signed-off-by: Xing Liu <xingliu14@gmail.com>
@xingliu14
Copy link
Collaborator Author

It is possible to move it in to match-case, but in that case it will have duplicated codes, including: get_vllm_model, get_flax_model and the fall back check. I think resolve first then use the same code will be more clean.

return jit_model, compute_logits_fn, combine_hidden_states_fn, None, params, lora_manager, model


# Architectures that require "vllm" implementation type when MODEL_IMPL_TYPE is "auto".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"require" might be too strong word. replace it with "prefer"

# Architectures that require "vllm" implementation type when MODEL_IMPL_TYPE is "auto".
# These architectures are listed here because they have better performance with the
# vLLM PyTorch backend compared to the flax_nnx JAX backend for now.
_VLLM_REQUIRED_ARCHITECTURES: frozenset[str] = frozenset({"GptOssForCausalLM"})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, this kind of constants should be placed at the start of the file. Please move it.

vllm_config.model_config.dtype.dtype)
if impl == "auto":
# Resolve "auto" based on architecture
architectures = getattr(vllm_config.model_config.hf_config,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dumb question: is there a cases where there's a multiple "architectures" for a single model?

# Resolve "auto" based on architecture
architectures = getattr(vllm_config.model_config.hf_config,
"architectures", [])
for arch in architectures:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to above comment. can we just to an assert to check if len(architectures)==1 and do a simple hash map fetch instead of iterating for loop?

try:
# Try to load the flax model first
return get_flax_model(vllm_config, rng, mesh, is_draft_model)
except UnsupportedArchitectureError as e:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably a nit question: in c's switch statements, if we don't put break;, it will automatically invoke next case. Is it not the case for python's match/case? I.e., if UnsupportedArchitectureError is thrown, we skip break; statement and automatically let the next case (which is case "vllm") to be invoke.

@kyuyeunk kyuyeunk added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants