Skip to content

[Enh]: Add cut method to nw.Expr and nw.Series #3293

@felixgwilliams

Description

@felixgwilliams

We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?

No response

Please describe the purpose of the new feature or describe the problem to solve.

The polars.Expr.cut and pandas.cut functions are used bin continuous values into discrete categories. This functionality could be useful in Narwhals.

I see that the Polars function is marked as unstable. Does that mean Narwhals can't implement the function until there is a Polars counterpart with a stable API?

Suggest a solution if possible.

No response

If you have tried alternatives, please describe them below.

One way I worked around this is by coalescing a list of nw.when expressions.

bins = [1, 2, 3]
labels = ["bin_1", "bin_2", "bin_3", "bin_4"]

to_coalesce = []

for bin, label in zip(bin, labels, strict=False):
    expr = nw.when(nw.col("x")<=bin).then(nw.lit(label))
    to_coalesce.append(expr)

to_coalesce.append(nw.lit(labels[-1])

df.with_columns(
    x_bin=nw.coalesce(to_coalesce)
)

This approach may have performance issues if the number of bins is quite large.

I also tried a combination of np.searchsorted and nw.Expr.map_batches, but this didn't work with lazy frames. I think adding is_elementwise to map_batches may enable this approach (#2522).

Additional information that may help us understand your needs.

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions