Skip to content

Comments

feat: integrate RDMA support with MLX backend#1

Draft
localai-bot wants to merge 1 commit intomasterfrom
feature/issue-8505
Draft

feat: integrate RDMA support with MLX backend#1
localai-bot wants to merge 1 commit intomasterfrom
feature/issue-8505

Conversation

@localai-bot
Copy link
Owner

This PR adds RDMA support to the MLX backend in LocalAI, enabling distributed inference across Apple Silicon machines.

Summary of Changes

Backend Enhancements (backend/python/mlx/backend.py)

  • Added _initialize_rdma() method to initialize mlx.distributed when MLX_GRPC_SERVERS is set
  • RDMA workers are identified via environment variable (no hostfile needed, manual worker coordination via P2P)
  • Model loading and generation logic now supports collective operations when RDMA is active

CLI Worker Support (core/cli/worker/worker_mlx.go)

  • New mlx-rdma worker type for launching MLX backends in distributed mode
  • Follows the same pattern as llama.cpp workers (no P2P registration, uses MLX_GRPC_SERVERS)

P2P Integration (core/cli/run.go)

  • Sets MLX_GRPC_SERVERS environment variable alongside LLAMACPP_GRPC_SERVERS in TunnelCallback
  • Enables automatic worker discovery and IP collection for MLX RDMA workers

Design Decisions

  1. No hostfile needed: Since workers are launched manually via local-ai worker mlx-rdma, we don't need automatic process spawning via mlx.launch --hostfile
  2. Environment variable-based: Uses MLX_GRPC_SERVERS (same as LLAMACPP_GRPC_SERVERS) to pass worker IPs to the backend
  3. P2P reuse: Leverages existing LocalAI P2P infrastructure for worker discovery, but backend handles RDMA coordination

Testing

  1. Launch main instance with P2P enabled: local-ai run --p2p --token <token>
  2. Launch workers on each node: local-ai worker mlx-rdma
  3. Workers register via P2P; main instance sets MLX_GRPC_SERVERS env var
  4. MLX backend initializes RDMA when MLX_GRPC_SERVERS is set

Notes

  • Requires mlx with JACCL backend support (mlx-jaccl-cluster integration)
  • Current implementation assumes all workers have identical model paths
  • Future work: Add model sharding via model.shard(mx.distributed.world_size())

- Add mlx_rdma_enabled environment variable check in MLX backend
- Initialize mlx.distributed when MLX_GRPC_SERVERS is set
- Add mlx-rdma worker type to CLI worker commands
- Set MLX_GRPC_SERVERS env var alongside LLAMACPP_GRPC_SERVERS
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant