Skip to content

Consider removal of t-digest computation #10

@zblz

Description

@zblz

Right now the t-digest computation (done using a python t-digest implementation) takes most of the time in generating a summary. The initial motivation to include it was for it to contain an approximation of the histogram information, but we are also computing a fixed-bin-width histogram so it is of limited value. The t-digest information is used in the explorer for:

  • arbitrary bin width histograms.
  • building percentile functions in lens.Summary that get used to plot a CDF in lens.Explorer.

We have to consider whether these two features are important enough and whether we can use other approaches to substitute this information.

Having an adaptively binned histogram (through, e.g., bayesian blocks) would go a long way to replacing the t-digest for our exploration needs.

A significant advantage of a t-digest is that it can be updated in a chunked manner, but we are not currently using that.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions