GPU-accelerated DataFrame library for Apple Silicon, built on MLX.
MLX-DF brings cuDF-style GPU DataFrame operations to Mac, exploiting Apple's unified memory for zero-copy CPU/GPU data sharing. The API mirrors Pandas for easy migration.
Warning
MLX-DF currently supports only Apple Silicon devices (M-series chips).
pip install mlxdfUsing uv:
uv add mlxdfWith PyArrow/Parquet support:
pip install mlxdf[arrow]Using uv:
uv add "mlxdf[arrow]"From source:
uv sync
uv buildfrom mlxdf import MlxDataFrame, merge, read_parquet
# Create a DataFrame (string columns auto-detected as CategoricalSeries)
df = MlxDataFrame({
"product_id": [1.0, 2.0, 1.0, 3.0, 2.0],
"quantity": [5.0, 3.0, 2.0, 7.0, 1.0],
"category": ["A", "B", "A", "C", "B"],
})
# Filter
high_qty = df[df["quantity"] > 2.0]
# Computed columns
df["double_qty"] = df["quantity"] * 2
# GroupBy aggregation
result = df.groupby("category")["quantity"].sum()
result.show()
# Join two DataFrames
prices = MlxDataFrame({
"product_id": [1.0, 2.0, 3.0],
"price": [10.0, 25.0, 15.0],
})
joined = df.merge(prices, on="product_id", how="inner")
# Parquet I/O (requires mlx-df[arrow])
df.to_parquet("output.parquet")
df2 = read_parquet("output.parquet", columns=["product_id", "quantity"])- MlxSeries — Column with boolean null mask, vectorized arithmetic, comparisons, and aggregations
- CategoricalSeries — Dictionary-encoded string column (55× faster filtering vs Pandas)
- MlxDataFrame — Dict-like table with column access, boolean filtering, head/tail/slicing
- GroupBy — Bincount/sort-based groupby with sum/mean/count/max/min aggregations
- Join — Hash-index join supporting inner/left/right/outer (4× faster vs Pandas at 200M rows)
- Pandas Interop —
to_pandas()/from_pandas()with automatic type conversion - PyArrow & Parquet — Read/write Parquet with column pruning and predicate pushdown
- JIT Compilation —
compile_fnfor fused GPU kernel execution
uv sync# Run all unit tests (benchmarks are excluded by default)
uv run pytest
# Run a specific test file
uv run pytest tests/test_series.py
# Run a specific test case
uv run pytest tests/test_series.py::TestArithmetic::test_add_series -v
# Run with verbose output
uv run pytest -v
# Run and stop on first failure
uv run pytest -xBenchmarks are integrated into pytest via the bench marker, defaulting to deselected so they don't slow down regular test runs.
# Run all benchmarks
uv run pytest -m bench
# Run a specific benchmark
uv run pytest -m bench -k parquet
uv run pytest -m bench -k tpch
uv run pytest -m bench -k categorical
uv run pytest -m bench -k compile
# Run both tests and benchmarks together
uv run pytest -m ""
# Run benchmark scripts directly (also works)
uv run python benchmarks/bench_vs_pandas.pyAvailable benchmarks: bench_vs_pandas, bench_categorical, bench_parquet, bench_compile_df, bench_tpch_q1/q3/q4/q6/q18/q19。