Skip to content

Add branch merge strategies#2434

Closed
ForeverAngry wants to merge 3 commits intoapache:mainfrom
ForeverAngry:branch-merge-strategies
Closed

Add branch merge strategies#2434
ForeverAngry wants to merge 3 commits intoapache:mainfrom
ForeverAngry:branch-merge-strategies

Conversation

@ForeverAngry
Copy link
Copy Markdown
Contributor

@ForeverAngry ForeverAngry commented Sep 7, 2025

Closes #2433

Rationale for this change

This PR adds comprehensive branch merge strategies to PyIceberg, bringing Git-like branch merging capabilities to Iceberg table operations. This enhancement enables users to merge branches with different strategies depending on their workflow needs.

Feature Overview:
Apache Iceberg supports branch operations (create, delete, tag), but lacked merge capabilities between branches. This PR implements 5 standard merge strategies commonly used in version control systems (note, there are differences between this and the java implementation):

  1. MERGE: Classic three-way merge creating a merge commit that preserves history of both branches
  2. SQUASH: Condenses all commits from source branch into a single clean commit on target branch
  3. REBASE: Creates linear history by replaying commits from source branch on top of target branch
  4. CHERRY_PICK: Selects and applies specific individual commits from one branch to another
  5. FAST_FORWARD: Moves target branch pointer forward when no divergent commits exist (no merge commit needed)

Implementation Details:

  • Strategy Pattern: Clean, extensible architecture with abstract base class and concrete implementations
  • Automatic Detection: Fast-forward possibility automatically detected and validated
  • Robust Utilities: Common ancestor finding, branch validation, and snapshot traversal utilities
  • Flexible API: Optional source branch deletion after successful merge
  • Error Handling: Comprehensive validation with clear error messages for invalid operations

Use Cases:

  • Development Workflows: Feature branch integration with different merge policies
  • Data Pipeline Management: Merging experimental data processing branches back to production
  • Schema Evolution: Combining schema changes from different development branches

Are these changes tested?

Yes, extremely somewhat comprehensive test coverage with 35 tests across multiple categories:

Are there any user-facing changes?

Yes - New Feature Addition (No Breaking Changes)

New Public API:

from pyiceberg.table.update.snapshot import BranchMergeStrategy

# New enum with 5 merge strategies (im open to suggestions on this, I couldn't decided on the best approach)
BranchMergeStrategy.MERGE
BranchMergeStrategy.SQUASH  
BranchMergeStrategy.REBASE
BranchMergeStrategy.CHERRY_PICK
BranchMergeStrategy.FAST_FORWARD

# New method on ManageSnapshots
table.manage_snapshots().merge_branch(
    source_branch="feature",
    target_branch="main", 
    strategy=BranchMergeStrategy.SQUASH,
    delete_source_branch=False  # Optional: preserve or delete source branch
).commit()

- Implemented unit tests for various branch merge strategies including Merge, Squash, Rebase, Cherry-Pick, and Fast-Forward.
- Added tests for utility functions related to snapshot management and ancestor finding.
- Ensured coverage for edge cases such as missing snapshots, circular references, and validation errors during merges.
- Verified that all strategies return consistent structures and handle integration scenarios correctly.
- Included tests for error handling and behavior differences across strategies.
@ForeverAngry
Copy link
Copy Markdown
Contributor Author

@jayceslesar i know you have done a good bit of work in the managed snapshots class, can you review this as well?

@ForeverAngry ForeverAngry marked this pull request as ready for review September 7, 2025 00:33
@ForeverAngry ForeverAngry changed the title Add comprehensive tests for branch merge strategies in pyiceberg Add branch merge strategies Sep 7, 2025
@ForeverAngry
Copy link
Copy Markdown
Contributor Author

ForeverAngry commented Sep 19, 2025

@gabeiglio i noticed you had created a pretty great proposal for this work. It would be awesome if you wanted to review the implementation and help get it in good shape!

@gabeiglio
Copy link
Copy Markdown
Contributor

Thanks for the PR! this is awesome, will review it by this week!

@github-actions
Copy link
Copy Markdown

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that's incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Mar 18, 2026
@github-actions
Copy link
Copy Markdown

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add branch merge strategies

2 participants