Skip to content

feat(table): add bin-pack compaction strategy#850

Open
laskoviymishka wants to merge 4 commits intoapache:mainfrom
laskoviymishka:feat/compaction-strategy
Open

feat(table): add bin-pack compaction strategy#850
laskoviymishka wants to merge 4 commits intoapache:mainfrom
laskoviymishka:feat/compaction-strategy

Conversation

@laskoviymishka
Copy link
Copy Markdown
Contributor

Add CompactionConfig and Plan() that groups FileScanTasks by partition, classifies files as candidates based on size thresholds and delete file counts, and bin-packs candidates into CompactionGroups using the existing SlicePacker. This is the planning layer for RewriteDataFiles (#832).

  • Oversized files skipped unless delete count exceeds threshold
  • Deterministic partition key via sorted field IDs
  • Config validation (target between min/max, positive thresholds)
  • Ceiling division for output file estimation

@laskoviymishka laskoviymishka force-pushed the feat/compaction-strategy branch 2 times, most recently from 442605a to 2eb4e80 Compare April 4, 2026 17:55
@laskoviymishka laskoviymishka marked this pull request as ready for review April 4, 2026 18:02
Comment on lines +28 to +29
// CompactionConfig holds tunable thresholds for bin-pack compaction.
type CompactionConfig struct {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if compaction should actually be a subpackage inside of table or something? The table package is getting quite large and I'd like to either refactor/reduce it or at least avoid putting even more things into it if possible.

What do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally makes sense

@laskoviymishka laskoviymishka force-pushed the feat/compaction-strategy branch from 2eb4e80 to 0d46bc1 Compare April 7, 2026 20:50
@laskoviymishka laskoviymishka requested a review from zeroshade April 7, 2026 20:58
Copy link
Copy Markdown
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, just two questions. One below and the other as to whether we should update PackEnd to not be a footgun modifying the caller's slice (if we decide that, it can be updated in a follow up rather than here).

Add table/compaction package with Config and PlanCompaction() that
groups FileScanTasks by partition, classifies files as candidates
based on size thresholds and delete file counts, and bin-packs
candidates into Groups using the existing SlicePacker.

- Oversized files skipped unless delete count exceeds threshold
- Config validation (target between min/max, positive thresholds)
- Ceiling division for output file estimation
- EstOutputBytes is an upper-bound (actual is smaller after
  delete removal and better Parquet compression on larger files)
- Returns Plan by value to avoid unnecessary heap allocation
- Uses map[string]partitionBucket (value type, not pointer)
@laskoviymishka laskoviymishka force-pushed the feat/compaction-strategy branch from 19cbb42 to 6c122ce Compare April 11, 2026 01:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants