Skip to content

[FEAT] Diff & Merge Table#188

Draft
Poseidon-fan wants to merge 8 commits intooceanbase:developfrom
Poseidon-fan:develop
Draft

[FEAT] Diff & Merge Table#188
Poseidon-fan wants to merge 8 commits intooceanbase:developfrom
Poseidon-fan:develop

Conversation

@Poseidon-fan
Copy link
Copy Markdown
Contributor

@Poseidon-fan Poseidon-fan commented Mar 20, 2026

Fixes #187

Summary by CodeRabbit

  • New Features

    • Compare collections with diff() to show row-level differences.
    • Merge collections with merge_into(), supporting conflict strategies: FAIL, THEIRS, OURS; invalid strategies are rejected.
  • Tests

    • Added integration tests covering diff (identical/added/modified/deleted/empty/read-only) and merge (FAIL/THEIRS/OURS, no-conflict, invalid-strategy) behaviors.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 20, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8d8a3ccf-b21d-4e2e-9a03-3448b262c3c4

📥 Commits

Reviewing files that changed from the base of the PR and between a07caed and 2e8402a.

📒 Files selected for processing (1)
  • src/pyseekdb/client/collection.py

📝 Walkthrough

Walkthrough

Adds server-gated DIFF and MERGE table operations: BaseClient internal support, Collection public APIs (.diff, .merge_into), integration tests for diff/merge behaviors, and a small test helper rename to probe fork-table capability.

Changes

Cohort / File(s) Summary
BaseClient Diff/Merge Support
src/pyseekdb/client/client_base.py
Added _VALID_MERGE_STRATEGIES: ClassVar[set[str]], _diff_merge_enabled() capability check, _collection_diff() to run DIFF TABLE ... AGAINST ..., and _collection_merge() to run MERGE TABLE ... INTO ... STRATEGY ... with ownership and strategy validation.
Collection Public API
src/pyseekdb/client/collection.py
Added diff(self, other) -> list[dict[str, Any]] and merge_into(self, target, strategy: str = "FAIL") -> None that delegate to BaseClient internal methods.
Integration Tests - Diff
tests/integration_tests/test_collection_diff.py
New integration test suite validating DIFF TABLE behavior (identical/added/modified/deleted/empty/read-only) and the public Collection.diff() path; tests skip when server feature is disabled.
Integration Tests - Merge
tests/integration_tests/test_collection_merge.py
New integration test suite validating MERGE TABLE behaviors for strategies FAIL/THEIRS/OURS, conflict handling/rollback, invalid-strategy error, and Collection.merge_into() usage; tests skip when server feature is disabled.
Integration Tests - Fork Refactor
tests/integration_tests/test_collection_fork.py
Renamed test helper _is_fork_enabled -> _is_fork_table_enabled and updated probes to call client._server._fork_table_enabled().

Sequence Diagram(s)

sequenceDiagram
    participant Client as Collection
    participant BaseClient
    participant Server

    Client->>BaseClient: diff(incoming, current)
    activate BaseClient
    BaseClient->>BaseClient: _diff_merge_enabled()
    Note over BaseClient: check db_type == "seekdb" and version >= 1.2.0.0
    BaseClient->>BaseClient: _get_collection_table_name(incoming)\n_get_collection_table_name(current)
    BaseClient->>Server: DIFF TABLE <incoming_table> AGAINST <current_table>
    Server-->>BaseClient: diff rows
    deactivate BaseClient
    BaseClient-->>Client: list of diff entries
Loading
sequenceDiagram
    participant Client as Collection
    participant BaseClient
    participant Server

    Client->>BaseClient: merge_into(incoming, target, strategy)
    activate BaseClient
    BaseClient->>BaseClient: _diff_merge_enabled()
    Note over BaseClient: check db_type == "seekdb" and version >= 1.2.0.0
    BaseClient->>BaseClient: validate strategy in _VALID_MERGE_STRATEGIES
    BaseClient->>BaseClient: _get_collection_table_name(incoming)\n_get_collection_table_name(target)
    BaseClient->>Server: MERGE TABLE <incoming_table> INTO <target_table> STRATEGY <STRATEGY>
    Server-->>BaseClient: execution result / error
    deactivate BaseClient
    BaseClient-->>Client: None (raises on error)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 I hopped through rows both new and old,
I sniffed the diffs, I marked the bold.
THEIRS, OURS, or FAIL — I chose with care,
I stitched the tables, patching each pair.
Hooray, a merge! 🥕✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main feature being added: Diff & Merge Table functionality for collections.
Linked Issues check ✅ Passed All coding requirements from issue #187 are met: DIFF TABLE and MERGE TABLE wrappers are implemented on Collection object, providing git-like workflow operations.
Out of Scope Changes check ✅ Passed All changes are directly scoped to implementing DIFF TABLE and MERGE TABLE wrappers; no unrelated modifications or out-of-scope features are present.
Docstring Coverage ✅ Passed Docstring coverage is 92.86% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/pyseekdb/client/client_base.py`:
- Around line 1577-1584: The diff/merge currently builds table names from
incoming/current but always runs SQL on self, allowing cross-client operations;
add a guard at the start of the diff/merge helpers (before calling
_get_collection_table_name or _execute) that verifies both incoming and current
belong to this client/context (e.g., compare their client attribute or
underlying connection/DB identifier to self) and raise a ValueError if they
differ; apply the same check to the analogous block around the 1600-1613 region
so operations cannot run against a different client/schema.
- Around line 1566-1589: _collection_diff returns the raw payload from _execute
which can be tuples on some backends, but Collection.diff is documented to
return list[dict[str, Any]]; update _collection_diff to normalize the rows into
dicts before returning (like other BaseClient paths do): after calling
self._execute(diff_sql) convert each row to a mapping using the collection
schema/column names from _get_collection_table_name/incoming/current (or
existing helper used elsewhere) so the method (_collection_diff) always returns
a list of dicts when _diff_merge_enabled() is true.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 124ce44a-b7b2-426b-9d05-f80c2d5b2a72

📥 Commits

Reviewing files that changed from the base of the PR and between 0ee6337 and c521d0d.

📒 Files selected for processing (5)
  • src/pyseekdb/client/client_base.py
  • src/pyseekdb/client/collection.py
  • tests/integration_tests/test_collection_diff.py
  • tests/integration_tests/test_collection_fork.py
  • tests/integration_tests/test_collection_merge.py

Copy link
Copy Markdown
Member

@hnwyllmm hnwyllmm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution. I left some comments.
Please add some samples for database fork/diff/merge and test the functions both in server and embedded mode.

with contextlib.suppress(Exception):
self._execute(client, f"DROP TABLE IF EXISTS `{name}`")

def test_diff_identical_tables(self, db_client):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please test with collections but not common tables?

with contextlib.suppress(Exception):
self._execute(client, f"DROP TABLE IF EXISTS `{name}`")

def test_merge_no_conflict(self, db_client):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please test the cases with collections but not common tables.

@hnwyllmm
Copy link
Copy Markdown
Member

The related functions are not available for collections, so let's hold this feature for a while.

@hnwyllmm hnwyllmm marked this pull request as draft March 23, 2026 02:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Diff & Merge Table

2 participants