MLX-DF

GPU-accelerated DataFrame library for Apple Silicon, built on MLX.

MLX-DF brings cuDF-style GPU DataFrame operations to Mac, exploiting Apple's unified memory for zero-copy CPU/GPU data sharing. The API mirrors Pandas for easy migration.

Warning

MLX-DF currently supports only Apple Silicon devices (M-series chips).

Installation

pip install mlxdf

Using uv:

uv add mlxdf

With PyArrow/Parquet support:

pip install mlxdf[arrow]

Using uv:

uv add "mlxdf[arrow]"

From source:

uv sync
uv build

Quick Start

from mlxdf import MlxDataFrame, merge, read_parquet

# Create a DataFrame (string columns auto-detected as CategoricalSeries)
df = MlxDataFrame({
    "product_id": [1.0, 2.0, 1.0, 3.0, 2.0],
    "quantity":   [5.0, 3.0, 2.0, 7.0, 1.0],
    "category":   ["A", "B", "A", "C", "B"],
})

# Filter
high_qty = df[df["quantity"] > 2.0]

# Computed columns
df["double_qty"] = df["quantity"] * 2

# GroupBy aggregation
result = df.groupby("category")["quantity"].sum()
result.show()

# Join two DataFrames
prices = MlxDataFrame({
    "product_id": [1.0, 2.0, 3.0],
    "price":      [10.0, 25.0, 15.0],
})
joined = df.merge(prices, on="product_id", how="inner")

# Parquet I/O (requires mlx-df[arrow])
df.to_parquet("output.parquet")
df2 = read_parquet("output.parquet", columns=["product_id", "quantity"])

Features

MlxSeries — Column with boolean null mask, vectorized arithmetic, comparisons, and aggregations
CategoricalSeries — Dictionary-encoded string column (55× faster filtering vs Pandas)
MlxDataFrame — Dict-like table with column access, boolean filtering, head/tail/slicing
GroupBy — Bincount/sort-based groupby with sum/mean/count/max/min aggregations
Join — Hash-index join supporting inner/left/right/outer (4× faster vs Pandas at 200M rows)
Pandas Interop — to_pandas() / from_pandas() with automatic type conversion
PyArrow & Parquet — Read/write Parquet with column pruning and predicate pushdown
JIT Compilation — compile_fn for fused GPU kernel execution

Development

Setup

uv sync

Running Tests

# Run all unit tests (benchmarks are excluded by default)
uv run pytest

# Run a specific test file
uv run pytest tests/test_series.py

# Run a specific test case
uv run pytest tests/test_series.py::TestArithmetic::test_add_series -v

# Run with verbose output
uv run pytest -v

# Run and stop on first failure
uv run pytest -x

Benchmarks

Benchmarks are integrated into pytest via the bench marker, defaulting to deselected so they don't slow down regular test runs.

# Run all benchmarks
uv run pytest -m bench

# Run a specific benchmark
uv run pytest -m bench -k parquet
uv run pytest -m bench -k tpch
uv run pytest -m bench -k categorical
uv run pytest -m bench -k compile

# Run both tests and benchmarks together
uv run pytest -m ""

# Run benchmark scripts directly (also works)
uv run python benchmarks/bench_vs_pandas.py

Available benchmarks: bench_vs_pandas, bench_categorical, bench_parquet, bench_compile_df, bench_tpch_q1/q3/q4/q6/q18/q19。

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
src/mlxdf		src/mlxdf
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLX-DF

Installation

Quick Start

Features

Development

Setup

Running Tests

Benchmarks

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MLX-DF

Installation

Quick Start

Features

Development

Setup

Running Tests

Benchmarks

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages