|
7 | 7 | "source": [ |
8 | 8 | "# Custom Aggregations\n", |
9 | 9 | "\n", |
10 | | - "This notebook is motivated by a [post](https://discourse.pangeo.io/t/using-xhistogram-to-bin-measurements-at-particular-stations/2365/4) on the Pangeo discourse forum.\n", |
| 10 | + "This notebook is motivated by a\n", |
| 11 | + "[post](https://discourse.pangeo.io/t/using-xhistogram-to-bin-measurements-at-particular-stations/2365/4)\n", |
| 12 | + "on the Pangeo discourse forum.\n", |
11 | 13 | "\n", |
12 | 14 | "> Even better would be a command that lets me simply do the following.\n", |
13 | 15 | ">\n", |
14 | 16 | "> A = da.groupby(['lon_bins', 'lat_bins']).mode()\n", |
15 | 17 | "\n", |
16 | | - "This notebook will describe how to accomplish this using a custom `Aggregation` since `mode` and `median` aren't supported by flox yet." |
| 18 | + "This notebook will describe how to accomplish this using a custom `Aggregation`\n", |
| 19 | + "since `mode` and `median` aren't supported by flox yet.\n" |
17 | 20 | ] |
18 | 21 | }, |
19 | 22 | { |
|
439 | 442 | "source": [ |
440 | 443 | "## A built-in reduction\n", |
441 | 444 | "\n", |
442 | | - "First a simple example of lat-lon binning using a built-in reduction: mean" |
| 445 | + "First a simple example of lat-lon binning using a built-in reduction: mean\n" |
443 | 446 | ] |
444 | 447 | }, |
445 | 448 | { |
|
494 | 497 | "source": [ |
495 | 498 | "## Aggregations\n", |
496 | 499 | "\n", |
497 | | - "flox knows how to interperet `func=\"mean\"` because it's been implemented in `aggregations.py` as an [Aggregation](https://flox.readthedocs.io/en/latest/generated/flox.aggregations.Aggregation.html)\n", |
| 500 | + "flox knows how to interperet `func=\"mean\"` because it's been implemented in\n", |
| 501 | + "`aggregations.py` as an\n", |
| 502 | + "[Aggregation](https://flox.readthedocs.io/en/latest/generated/flox.aggregations.Aggregation.html)\n", |
498 | 503 | "\n", |
499 | | - "An `Aggregation` is a blueprint for computing an aggregation, with both numpy and dask data." |
| 504 | + "An `Aggregation` is a blueprint for computing an aggregation, with both numpy\n", |
| 505 | + "and dask data.\n" |
500 | 506 | ] |
501 | 507 | }, |
502 | 508 | { |
|
545 | 551 | "```python\n", |
546 | 552 | "mean = Aggregation(\n", |
547 | 553 | " name=\"mean\",\n", |
548 | | - " \n", |
549 | | - " # strings in the following are built-in grouped reductions \n", |
| 554 | + "\n", |
| 555 | + " # strings in the following are built-in grouped reductions\n", |
550 | 556 | " # implemented by the underlying \"engine\": flox or numpy_groupies or numbagg\n", |
551 | | - " \n", |
| 557 | + "\n", |
552 | 558 | " # for pure numpy inputs\n", |
553 | | - " numpy=\"mean\", \n", |
554 | | - " \n", |
| 559 | + " numpy=\"mean\",\n", |
| 560 | + "\n", |
555 | 561 | " # The next are for dask inputs and describe how to reduce\n", |
556 | 562 | " # the data in parallel\n", |
557 | 563 | " chunk=(\"sum\", \"nanlen\"), # first compute these blockwise : (grouped_sum, grouped_count)\n", |
558 | 564 | " combine=(\"sum\", \"sum\"), # reduce intermediate reuslts (sum the sums, sum the counts)\n", |
559 | 565 | " finalize=lambda sum_, count: sum_ / count, # final mean value (divide sum by count)\n", |
560 | | - " \n", |
| 566 | + "\n", |
561 | 567 | " fill_value=(0, 0), # fill value for intermediate sums and counts when groups have no members\n", |
562 | 568 | " dtypes=(None, np.intp), # optional dtypes for intermediates\n", |
563 | 569 | " final_dtype=np.floating, # final dtype for output\n", |
|
572 | 578 | "source": [ |
573 | 579 | "## Defining a custom aggregation\n", |
574 | 580 | "\n", |
575 | | - "First we'll need a function that executes the grouped reduction given numpy inputs. \n", |
| 581 | + "First we'll need a function that executes the grouped reduction given numpy\n", |
| 582 | + "inputs.\n", |
| 583 | + "\n", |
| 584 | + "Custom functions are required to have this signature (copied form\n", |
| 585 | + "numpy_groupies):\n", |
576 | 586 | "\n", |
577 | | - "Custom functions are required to have this signature (copied form numpy_groupies):\n", |
578 | | - "``` python\n", |
| 587 | + "```python\n", |
579 | 588 | "\n", |
580 | 589 | "def custom_grouped_reduction(\n", |
581 | 590 | " group_idx, array, *, axis=-1, size=None, fill_value=None, dtype=None\n", |
582 | 591 | "):\n", |
583 | 592 | " \"\"\"\n", |
584 | 593 | " Parameters\n", |
585 | 594 | " ----------\n", |
586 | | - " \n", |
| 595 | + "\n", |
587 | 596 | " group_idx : np.ndarray, 1D\n", |
588 | 597 | " integer codes for group labels (1D)\n", |
589 | 598 | " array : np.ndarray, nD\n", |
|
596 | 605 | " fill_value for when number groups in group_idx is less than size\n", |
597 | 606 | " dtype : optional\n", |
598 | 607 | " dtype of output\n", |
599 | | - " \n", |
| 608 | + "\n", |
600 | 609 | " Returns\n", |
601 | 610 | " -------\n", |
602 | | - " \n", |
| 611 | + "\n", |
603 | 612 | " np.ndarray with array.shape[-1] == size, containing a single value per group\n", |
604 | 613 | " \"\"\"\n", |
605 | 614 | " pass\n", |
606 | 615 | "```\n", |
607 | 616 | "\n", |
608 | | - "\n", |
609 | | - "Since numpy_groupies does not implement a median, we'll do it ourselves by passing `np.median` to `numpy_groupies.aggregate_numpy.aggregate`. This will loop over all groups, and then execute `np.median` on the group members in serial. It is not fast, but quite convenient.\n" |
| 617 | + "Since numpy_groupies does not implement a median, we'll do it ourselves by\n", |
| 618 | + "passing `np.median` to `numpy_groupies.aggregate_numpy.aggregate`. This will\n", |
| 619 | + "loop over all groups, and then execute `np.median` on the group members in\n", |
| 620 | + "serial. It is not fast, but quite convenient.\n" |
610 | 621 | ] |
611 | 622 | }, |
612 | 623 | { |
|
639 | 650 | "id": "b356f4f2-ae22-4f56-89ec-50646136e2eb", |
640 | 651 | "metadata": {}, |
641 | 652 | "source": [ |
642 | | - "Now we create the `Aggregation`" |
| 653 | + "Now we create the `Aggregation`\n" |
643 | 654 | ] |
644 | 655 | }, |
645 | 656 | { |
|
682 | 693 | "id": "899ece52-ebd4-47b4-8090-cbbb63f504a4", |
683 | 694 | "metadata": {}, |
684 | 695 | "source": [ |
685 | | - "And apply it!" |
| 696 | + "And apply it!\n" |
686 | 697 | ] |
687 | 698 | }, |
688 | 699 | { |
|
0 commit comments