Skip to content

Comments

Implement njit map_rows#451

Draft
Graciaaa3 wants to merge 8 commits intomainfrom
map_rows-njit
Draft

Implement njit map_rows#451
Graciaaa3 wants to merge 8 commits intomainfrom
map_rows-njit

Conversation

@Graciaaa3
Copy link
Collaborator

Change Description

Adding njit version of map_rows (originally reduce) to achieve potential speedup for the row-wise operation. This has a factored out module of njit_funcs.py that contains njit-decorated helper methods.

Adding a documentation notebook for njit map_rows usage with examples and benchmark results.

Closes #277

Solution Description

  • Added an njit-compiled execution path for map_rows.
  • Support for user-defined functions with one or two arguments.
  • Explicit preprocessing of base and nested columns to satisfy Numba’s static typing requirements.
  • Keeping the existing python map_rows execution as the default implementation if njit doesn't support.

Code Quality

  • I have read the Contribution Guide and agree to the Code of Conduct
  • My code follows the code style of this project
  • My code builds (or compiles) cleanly without any errors or warnings
  • My code contains relevant comments and necessary documentation

Project-Specific Pull Request Checklists

New Feature Checklist

  • I have added or updated the docstrings associated with my feature using the NumPy docstring format
  • I have updated the tutorial to highlight my new feature (if appropriate)
  • I have added unit/End-to-End (E2E) test cases to cover my new feature
  • My change includes a breaking change
    • My change includes backwards compatibility and deprecation warnings (if possible)

I will add the tests to cover added logic and update the documentation for map_rows afterward.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions
Copy link

github-actions bot commented Feb 6, 2026

Before [39a4681] After [3be5eb5] Ratio Benchmark (Parameter)
256M failed n/a benchmarks.AssignSingleDfToNestedSeries.peakmem_run
29.8±2ms failed n/a benchmarks.AssignSingleDfToNestedSeries.time_run
139M failed n/a benchmarks.CountNestedBy.peakmem_run
66.5±0.9ms failed n/a benchmarks.CountNestedBy.time_run
105M failed n/a benchmarks.NestedFrameAddNested.peakmem_run
11.0±0.3ms failed n/a benchmarks.NestedFrameAddNested.time_run
110M failed n/a benchmarks.NestedFrameQuery.peakmem_run
10.7±0.04ms failed n/a benchmarks.NestedFrameQuery.time_run
109M failed n/a benchmarks.NestedFrameReduce.peakmem_run
1.26±0.02ms failed n/a benchmarks.NestedFrameReduce.time_run

Click here to view all benchmarks.

@hombit
Copy link
Collaborator

hombit commented Feb 9, 2026

Hey, @Graciaaa3, is it actually ready for review? I'm a little bit confused because the PR is marked as a draft, but I'm selected as a reviewer.

@Graciaaa3
Copy link
Collaborator Author

Hey, @Graciaaa3, is it actually ready for review? I'm a little bit confused because the PR is marked as a draft, but I'm selected as a reviewer.

Hi @hombit, sorry for the confusion, it's ready for review! I put in a draft PR since I haven't finish writing the tests, but the code part and the tutorial notebook are finished.

Copy link
Collaborator

@hombit hombit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very deep and useful work, thank you!

I'm not 100% sure how the numba compilation pipeline works, but I guess that our @njit functions would be recompiled for every new user's func. For the same reason, I guess it is not worth asking the user to pre-jit-compile their function; we should just try to do it ourselves, maybe conditionally on some keyword argument, e.g. engine=None, like in pandas' apply method. Other reasons to make it conditional on a new argument: 1) your benchmarks show that speed-up is not guaranteed, so the choice should be more explicit, 2) if the user wants to use a numba-compiled func, but not a numba-compiled map_rows, for example for an arbitrary number of columns, we absolutely should allow it, and the warning would be confusing.

I'm wondering how hard it would be to generalize to an arbitrary number of columns using a bit of Python metaprogramming magic?

Thank you for making the notebook and the plots, they are great and very informative! One cosmetic issue: could you please set ylim to [0; None], so 0 is always visible? I would also like to see how much time jit-compilation takes, because it is not clear if it is actually worth it for single-frame use, and what guidelines we should give LSDB users about using it.

Tests would be appreciated.

iterators.append(self[col])
else:
iterators.append(self[layer].array.iter_field_lists(col))
# check if func is jitted by numba and fits the criteria
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of code here, maybe worth refactoring to a new helper function/method?

Copy link
Collaborator

@dougbrn dougbrn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all your work on this! Just have an initial comment here, and then I'll look through the tutorial notebook separately

from nested_pandas.series.nestedseries import NestedSeries
from nested_pandas.series.packer import pack, pack_lists, pack_sorted_df_into_struct
from nested_pandas.series.utils import is_pa_type_a_list
from numba.core.registry import CPUDispatcher
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to add numba as a dependency to pyproject.toml, and that should hopefully allow the docs to build.

I'm a little conflicted about adding it as a core dependency, but I think it's the right choice for now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm agree about keeping numba optional. It always causes a lot of weird dependency problems with CPython/numpy upgrades.

@dougbrn
Copy link
Collaborator

dougbrn commented Feb 11, 2026

Please see this for notebook comments: https://app.reviewnb.com/lincc-frameworks/nested-pandas/blob/map_rows-njit/docs%2Fpre_executed%2Fnjit_map_rows.ipynb%2F/discussion

I think the notebook is looking really good!

@Graciaaa3
Copy link
Collaborator Author

Very deep and useful work, thank you!

I'm not 100% sure how the numba compilation pipeline works, but I guess that our @njit functions would be recompiled for every new user's func. For the same reason, I guess it is not worth asking the user to pre-jit-compile their function; we should just try to do it ourselves, maybe conditionally on some keyword argument, e.g. engine=None, like in pandas' apply method. Other reasons to make it conditional on a new argument: 1) your benchmarks show that speed-up is not guaranteed, so the choice should be more explicit, 2) if the user wants to use a numba-compiled func, but not a numba-compiled map_rows, for example for an arbitrary number of columns, we absolutely should allow it, and the warning would be confusing.

I'm wondering how hard it would be to generalize to an arbitrary number of columns using a bit of Python metaprogramming magic?

Thank you for making the notebook and the plots, they are great and very informative! One cosmetic issue: could you please set ylim to [0; None], so 0 is always visible? I would also like to see how much time jit-compilation takes, because it is not clear if it is actually worth it for single-frame use, and what guidelines we should give LSDB users about using it.

Tests would be appreciated.

Thanks for the comment! I agree we should allow optional njit if the user doesn't wants to use it. I looked into pandas' apply and engine kwarg that is of type decorator or {‘python’, ‘numba’}. I saw that "String parameters will stop being supported in a future pandas version" in the warning section. What would be our take on this?

And would we want to support a similar feature of compiling python custom func with njit inside map_rows? If so, I think maybe it would be better for me to finish up what I have currently and add in the feature after?

Also, for the time it takes for jit-compiling custom function and/or map_rows logic, which one, or both, would be more useful to see? To make sure I'm understanding it correctly, we are looking for the comparison between just one call to map_rows in njit (with method and/or custom func not pre-compiled) with python execution?

@hombit
Copy link
Collaborator

hombit commented Feb 15, 2026

I saw that "String parameters will stop being supported in a future pandas version" in the warning section. What would be our take on this?

I don't have a strong opinion. I'm OK with supporting callables only, e.g. engine=None/engine=numba.njit, etc.

And would we want to support a similar feature of compiling python custom func with njit inside map_rows? If so, I think maybe it would be better for me to finish up what I have currently and add in the feature after?

No, but I also believe that any func passed as an argument to another function would be (re)compiled anyway. That's why I see no reason to ask the user to wrap func into njit(). We can easily do it for the user. See some benchmark experiments below.

Also, for the time it takes for jit-compiling custom function and/or map_rows logic, which one, or both, would be more useful to see? To make sure I'm understanding it correctly, we are looking for the comparison between just one call to map_rows in njit (with method and/or custom func not pre-compiled) with python execution?

Yes, I think even a single compilation + single jitted map_rows evaluation to pure Python map_rows evaluation ratio would be useful. Again, it is interesting to see these numbers because njitted map_rows would be recompiled for every new func it sees.

Justification that function argument causes recompilation of an outer function
# Feel free to set NUMBA_DEBUG_JIT=1 to see compilation logs
from numba import njit

def inner1(x, y):
    return x**0.2 + y**0.3

inner1_compiled = njit(inner1)
inner1_compiled(1.0, 1.0)

def inner2(x, y):
    return x**0.3 - y**0.2

inner2_compiled = njit(inner2)
inner2_compiled(1.0, 2.0)

def outer(f, x, y):
    return f(x, y) + x**y

outer_compiled = njit(outer)
outer_compiled(inner1_compiled, 1.0, 1.0)

## Check if recompilation is caused for any new input function
## Using the same function argument is fast, no recompilation I guess
%time outer_compiled(inner1_compiled, 2.0, 2.0)
# Wall time: 19.8 μs
## Using a different function is slow: note millisecond vs microseconds
%time outer_compiled(inner2_compiled, 2.0, 2.0)
# Wall time: 16.4 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement a jit-compiled version of reduce

3 participants