feat: support zonemap indexes in ALTER TABLE CREATE INDEX#466
Closed
beinan wants to merge 3 commits into
Closed
Conversation
Contributor
Author
|
Closing in favor of #473 which contains all these changes plus the distributed build work. |
5 tasks
hamersaw
pushed a commit
that referenced
this pull request
Jun 4, 2026
…516) ## Summary - Add zonemap as a new index type in `CREATE INDEX` DDL with distributed build support - Batch fragments into configurable segments via `num_segments` option (defaults to `spark.default.parallelism`) - Each segment is built in parallel on Spark executors and committed as a logical index on the driver - Zonemap indexes currently support single column only ## What Changed - `AddIndexExec.scala`: Zonemap-specific path with `ZonemapIndexJob`/`ZonemapIndexTask` and `commitIndexSegments` - `create-index.md`: Document zonemap index type, options, and usage - Tests: unit tests for segment creation/validation and integration test ## Notes - Rebased cleanly onto current `main` - Depends on lance-core `7.0.0-beta.10` or newer which includes zonemap segment support - Supersedes PR #473 and closed PR #466 ## Test plan - [x] CI passes (lint, unit tests, integration tests across all Spark/Scala versions) - [x] Zonemap index creation with default segment count - [x] Zonemap index creation with explicit `num_segments` - [x] Repeated zonemap index creation replaces existing segments - [x] Query correctness after zonemap index creation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Beinan Wang <beinanwang@microsoft.com>
ivscheianu
pushed a commit
to ivscheianu/lance-spark
that referenced
this pull request
Jun 12, 2026
…ance-format#516) ## Summary - Add zonemap as a new index type in `CREATE INDEX` DDL with distributed build support - Batch fragments into configurable segments via `num_segments` option (defaults to `spark.default.parallelism`) - Each segment is built in parallel on Spark executors and committed as a logical index on the driver - Zonemap indexes currently support single column only ## What Changed - `AddIndexExec.scala`: Zonemap-specific path with `ZonemapIndexJob`/`ZonemapIndexTask` and `commitIndexSegments` - `create-index.md`: Document zonemap index type, options, and usage - Tests: unit tests for segment creation/validation and integration test ## Notes - Rebased cleanly onto current `main` - Depends on lance-core `7.0.0-beta.10` or newer which includes zonemap segment support - Supersedes PR lance-format#473 and closed PR lance-format#466 ## Test plan - [x] CI passes (lint, unit tests, integration tests across all Spark/Scala versions) - [x] Zonemap index creation with default segment count - [x] Zonemap index creation with explicit `num_segments` - [x] Repeated zonemap index creation replaces existing segments - [x] Query correctness after zonemap index creation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Beinan Wang <beinanwang@microsoft.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ALTER TABLE ... CREATE INDEXvia Lance directcreateIndexinstead of fragment trainingSHOW INDEXESand scan planning, and fix numeric zonemap pruning across mixed numeric typesTesting
./mvnw -pl lance-spark-4.0_2.13,lance-spark-4.1_2.13 -Dtest=AddIndexTest,ShowIndexesTest,ZonemapFragmentPrunerTest,CreateIndexStandardSyntaxTest -Dsurefire.failIfNoSpecifiedTests=false test./mvnw -pl lance-spark-4.0_2.13,lance-spark-4.1_2.13 -Dtest=AddIndexTest,ShowIndexesTest -Dsurefire.failIfNoSpecifiedTests=false testNotes
CREATE INDEXsyntax accepts a column list, but current Lance core rejects multi-column zonemap creation withLanceError(Index): Only support building index on 1 column at the momentZonemap index currently supports a single column onlypytestor the expected/home/lance/datafixture path