Skip to content

Move index of numbers from BKD into DocValuesSkipper? #14850

@gf2121

Description

@gf2121

Description

Immature idea, for discussion purpose :)

Today our range query on numbers have two approaches: BKD or DocValue Skipper.

The drawback of BKD:

  • Allocating too much memory (big bitset)
  • can not respect small lead clause lazily (now eased by IndexOrDocValue)
  • can not respect intra-segment concurrency.
  • doc ids are usually stored 3 bytes.

The drawback of DocValue Skipper:

  • Slow if we can not skip on one block.

I wonder if we can have a doc id list stored in value order in DocValue Skipper to replace BKD totally. These docs should be split into smaller blocks (128 or 512) and can be vectorized decoded. This idea sounds really like we have a small BKD in each block.

For example:

Today we have:
doc 1, 2, 3, 5, 6
value 3, 5, 2, 4, 1

we store another doc list 6, 3, 1, 5, 2
whose values are in order.

Advantages:

  1. solve drawbacks of BKDs.
  2. ease slow problem of DocValue Skipper

Disadvantages:

  1. Comparing to BKD: this can not gather all docs needed on segment level. If docs are evenly distributed, there may be just a few docs required in a block, then we can not append whole doc block without caring values (addAll). Maybe we should not build this on level0 block but higher, like 65536-docs block?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions