Skip to content

Implement ColumnValue and RowData types for PostgreSQL logical replication#34

Merged
isdaniel merged 4 commits intomainfrom
feat/remove-serde_json-use-encode-decode-perf-enhancement
Feb 25, 2026
Merged

Implement ColumnValue and RowData types for PostgreSQL logical replication#34
isdaniel merged 4 commits intomainfrom
feat/remove-serde_json-use-encode-decode-perf-enhancement

Conversation

@isdaniel
Copy link
Owner

@isdaniel isdaniel commented Feb 25, 2026

  • Introduced ColumnValue enum to represent PostgreSQL column data as either Null, Text, or Binary.
  • Added methods for encoding and decoding ColumnValue to/from a binary wire format.
  • Implemented RowData struct to hold ordered pairs of column names and values, supporting efficient serialization and deserialization.
  • Included hex encoding and decoding utilities for binary data representation.
  • Added comprehensive tests for both ColumnValue and RowData, covering serialization, deserialization, and various edge cases.

benchmark.

Suite N JSON median Binary median Speedup (JSON / Binary) Binary vs JSON Δ%
serialize 5 306.15 ns 91.522 ns 3.35× -70.1%
serialize 10 471.63 ns 137.600 ns 3.43× -70.8%
serialize 20 740.33 ns 310.540 ns 2.38× -58.1%
serialize 50 1.4900 µs 676.380 ns 2.20× -54.6%
deserialize 5 1.2131 µs 725.190 ns 1.67× -40.2%
deserialize 10 1.9736 µs 1.1972 µs 1.65× -39.3%
deserialize 20 3.2340 µs 2.2980 µs 1.41× -28.9%
deserialize 50 8.7417 µs 5.8856 µs 1.49× -32.7%
round_trip 5 1.5761 µs 797.170 ns 1.98× -49.4%
round_trip 10 2.4647 µs 1.3241 µs 1.86× -46.3%
round_trip 20 4.1536 µs 2.6145 µs 1.59× -37.1%
round_trip 50 10.2030 µs 6.5841 µs 1.55× -35.5%
payload_size 5 335.66 ns 73.999 ns 4.54× -78.0%
payload_size 10 494.18 ns 131.490 ns 3.76× -73.4%
payload_size 20 747.12 ns 278.120 ns 2.69× -62.8%
payload_size 50 1.4915 µs 671.910 ns 2.22× -55.0%
pipeline 5 1.0991 µs 855.700 ns 1.28× -22.1%
pipeline 10 1.9750 µs 1.5645 µs 1.26× -20.8%
pipeline 20 4.1716 µs 3.5611 µs 1.17× -14.6%
pipeline 50 9.9356 µs 9.0535 µs 1.10× -8.9%

for #32

…ation

- Introduced `ColumnValue` enum to represent PostgreSQL column data as either `Null`, `Text`, or `Binary`.
- Added methods for encoding and decoding `ColumnValue` to/from a binary wire format.
- Implemented `RowData` struct to hold ordered pairs of column names and values, supporting efficient serialization and deserialization.
- Included hex encoding and decoding utilities for binary data representation.
- Added comprehensive tests for both `ColumnValue` and `RowData`, covering serialization, deserialization, and various edge cases.
@gemini-code-assist
Copy link

Summary of Changes

Hello @isdaniel, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors how PostgreSQL logical replication data is handled internally, moving from a serde_json::Value-based representation to custom, optimized ColumnValue and RowData types. This change introduces a compact binary wire format, resulting in substantial performance improvements and reduced data transfer overhead, making the replication process more efficient. The update also includes new integration tests to ensure robust handling of various complex PostgreSQL data types.

Highlights

  • New Data Types for Column Values: Introduced ColumnValue enum to represent PostgreSQL column data as Null, Text, or Binary, and RowData struct to hold ordered column name-value pairs, replacing previous serde_json::Value usage for improved efficiency.
  • Efficient Binary Wire Format: Implemented a compact binary wire format for ColumnValue and RowData, enabling significantly faster serialization and deserialization (up to 4.5x speedup) and reduced payload sizes (up to 78% smaller) compared to JSON.
  • Performance Benchmarks: Added a new benchmark suite (columnvalue_vs_json) to compare the performance of the new binary serialization against JSON serialization, demonstrating substantial gains across various operations like construction, serialization, deserialization, and full pipeline scenarios.
  • Enhanced Complex Type Handling: Included comprehensive integration tests (complex_types.rs) to verify correct streaming and representation of complex PostgreSQL data types such as arrays, JSON/JSONB, and geometric types through logical replication.
  • Dependency Streamlining: Removed serde_json as a core dependency for internal data representation, moving it to dev-dependencies where it's still needed for benchmarks, and updated chrono to a newer version.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • Cargo.lock
    • Updated chrono dependency to 0.4.44.
    • Removed itoa, serde_json, and zmij dependencies.
  • Cargo.toml
    • Updated chrono dependency to 0.4.44.
    • Removed serde_json from main dependencies.
    • Updated tokio in dev-dependencies to 1.49.0.
    • Added serde_json to dev-dependencies.
    • Added new complex_types test target.
    • Renamed and updated benchmark target from rowdata_vs_hashmap to columnvalue_vs_json.
  • benches/columnvalue_vs_json.rs
    • Added new benchmark file comparing JSON and binary serialization performance for ChangeEvent.
  • benches/rowdata_vs_hashmap.rs
    • Removed old benchmark file.
  • examples/basic-streaming/Cargo.lock
    • Updated chrono dependency to 0.4.44.
    • Removed itoa, serde_json, and zmij dependencies.
  • integration-tests/complex_types.rs
    • Added new integration test file to verify streaming of complex PostgreSQL data types (arrays, JSON/JSONB, geometric types).
  • src/column_value.rs
    • Added new module defining ColumnValue enum and RowData struct.
    • Implemented binary encoding and decoding for ColumnValue and RowData.
    • Included hex encoding and decoding utilities.
  • src/lib.rs
    • Exported the new column_value module.
    • Re-exported ColumnValue and RowData from column_value.
    • Removed format_postgres_timestamp utility function.
    • Removed RowData re-export from types module.
  • src/protocol.rs
    • Updated TupleData::to_row_data to use the new ColumnValue and RowData types for column data conversion.
    • Removed the deprecated TupleData::to_hash_map method.
    • Added ColumnData::raw_bytes method for zero-copy access to raw bytes.
    • Updated internal tests to reflect the new ColumnValue usage.
  • src/stream.rs
    • Refactored tuple_to_data function to convert PostgreSQL column data into the new ColumnValue and RowData types.
    • Removed the local hex_encode function, now provided by column_value module.
    • Updated tests to use ColumnValue and RowData directly.
  • src/types.rs
    • Removed the RowData struct definition, now re-exported from column_value module.
    • Removed format_postgres_timestamp function.
    • Updated ChangeEvent's metadata field to use HashMap<String, String> instead of HashMap<String, serde_json::Value>.
    • Implemented binary encode and decode methods for ChangeEvent for efficient wire format.
    • Updated internal tests to align with the new data type structure and binary serialization.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/ci.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@codecov
Copy link

codecov bot commented Feb 25, 2026

Codecov Report

❌ Patch coverage is 99.45318% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.26%. Comparing base (352a32f) to head (0e52290).

Files with missing lines Patch % Lines
src/types.rs 99.51% 4 Missing ⚠️
src/column_value.rs 99.47% 3 Missing ⚠️
src/stream.rs 97.05% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #34      +/-   ##
==========================================
+ Coverage   91.19%   92.26%   +1.06%     
==========================================
  Files           8        9       +1     
  Lines        8210     9461    +1251     
==========================================
+ Hits         7487     8729    +1242     
- Misses        723      732       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant performance improvement by replacing serde_json::Value with custom ColumnValue and RowData types, and adding a binary serialization format for change events. However, the current implementation of the binary encoder is vulnerable to protocol desynchronization due to unsafe integer truncation when serializing string lengths and collection counts. If any length or count exceeds the 16-bit limit (65,535), the resulting binary stream will be corrupted, which can lead to data integrity issues or denial of service in downstream consumers. Additionally, there is one critical issue that would prevent compilation on stable Rust, and a couple of suggestions to improve error handling and code maintainability.

@isdaniel
Copy link
Owner Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant performance improvement by replacing serde_json::Value with custom ColumnValue and RowData types, which use zero-copy Bytes and support a compact binary wire format. The changes are well-implemented, with thorough benchmarks demonstrating the speedup and extensive integration tests covering complex data types. The refactoring to move RowData and related helpers into a new column_value.rs module is clean. I've identified a critical issue with the serde implementation that could lead to data corruption, along with a couple of suggestions to improve robustness and maintainability.

@isdaniel
Copy link
Owner Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an excellent pull request that significantly refactors the data representation for logical replication events, leading to major performance improvements as shown by the benchmarks. The introduction of ColumnValue and RowData to replace serde_json::Value and HashMap is a well-executed change that reduces allocations and avoids unnecessary serialization/deserialization. The new binary wire format for ChangeEvent is a great addition for efficient inter-process communication. The code is clean, well-documented, and the new integration tests for complex types are very thorough.

I have a couple of suggestions for improvement, one regarding a breaking API change and another about error handling.

@isdaniel
Copy link
Owner Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces ColumnValue and RowData types, replacing serde_json::Value to significantly improve performance by enabling zero-copy data handling. A custom binary serialization format for ChangeEvent is also added, which is more efficient than JSON, as demonstrated by the comprehensive new benchmarks. The changes are well-supported by extensive integration and unit tests, especially for complex data types and the new serialization logic.

I have one suggestion to refactor a small piece of duplicated code to improve maintainability. It's also worth noting that changing ChangeEvent.metadata to use String values is a breaking change, but a sensible one that helps remove serde_json as a public dependency.

Overall, this is an excellent contribution that enhances both performance and type safety.

@isdaniel isdaniel merged commit 6244fe1 into main Feb 25, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant