Skip to content

Implement ColumnValue and RowData types for PostgreSQL logical replication#33

Closed
isdaniel wants to merge 2 commits intomainfrom
feat/remove-serde_json-use-encode-decode-perf-enhancement
Closed

Implement ColumnValue and RowData types for PostgreSQL logical replication#33
isdaniel wants to merge 2 commits intomainfrom
feat/remove-serde_json-use-encode-decode-perf-enhancement

Conversation

@isdaniel
Copy link
Owner

@isdaniel isdaniel commented Feb 25, 2026

  • Introduced ColumnValue enum to represent PostgreSQL column data as either Null, Text, or Binary.
  • Added methods for encoding and decoding ColumnValue to/from a binary wire format.
  • Implemented RowData struct to hold ordered pairs of column names and values, supporting efficient serialization and deserialization.
  • Included hex encoding and decoding utilities for binary data representation.
  • Added comprehensive tests for both ColumnValue and RowData, covering serialization, deserialization, and various edge cases.

benchmark.

Suite N JSON median Binary median Speedup (JSON / Binary) Binary vs JSON Δ%
serialize 5 306.15 ns 91.522 ns 3.35× -70.1%
serialize 10 471.63 ns 137.600 ns 3.43× -70.8%
serialize 20 740.33 ns 310.540 ns 2.38× -58.1%
serialize 50 1.4900 µs 676.380 ns 2.20× -54.6%
deserialize 5 1.2131 µs 725.190 ns 1.67× -40.2%
deserialize 10 1.9736 µs 1.1972 µs 1.65× -39.3%
deserialize 20 3.2340 µs 2.2980 µs 1.41× -28.9%
deserialize 50 8.7417 µs 5.8856 µs 1.49× -32.7%
round_trip 5 1.5761 µs 797.170 ns 1.98× -49.4%
round_trip 10 2.4647 µs 1.3241 µs 1.86× -46.3%
round_trip 20 4.1536 µs 2.6145 µs 1.59× -37.1%
round_trip 50 10.2030 µs 6.5841 µs 1.55× -35.5%
payload_size 5 335.66 ns 73.999 ns 4.54× -78.0%
payload_size 10 494.18 ns 131.490 ns 3.76× -73.4%
payload_size 20 747.12 ns 278.120 ns 2.69× -62.8%
payload_size 50 1.4915 µs 671.910 ns 2.22× -55.0%
pipeline 5 1.0991 µs 855.700 ns 1.28× -22.1%
pipeline 10 1.9750 µs 1.5645 µs 1.26× -20.8%
pipeline 20 4.1716 µs 3.5611 µs 1.17× -14.6%
pipeline 50 9.9356 µs 9.0535 µs 1.10× -8.9%

…ation

- Introduced `ColumnValue` enum to represent PostgreSQL column data as either `Null`, `Text`, or `Binary`.
- Added methods for encoding and decoding `ColumnValue` to/from a binary wire format.
- Implemented `RowData` struct to hold ordered pairs of column names and values, supporting efficient serialization and deserialization.
- Included hex encoding and decoding utilities for binary data representation.
- Added comprehensive tests for both `ColumnValue` and `RowData`, covering serialization, deserialization, and various edge cases.
@gemini-code-assist
Copy link

Summary of Changes

Hello @isdaniel, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces fundamental improvements to how PostgreSQL logical replication data is represented and processed. By implementing ColumnValue and RowData types, the system now leverages zero-copy data handling and a compact binary wire format, moving away from less efficient JSON-based representations. This refactoring aims to boost performance, reduce memory allocations, and provide more robust handling of diverse PostgreSQL data types, ensuring a more efficient and reliable data streaming experience.

Highlights

  • New Data Types for Column Values: Introduced ColumnValue enum to represent PostgreSQL column data (Null, Text, Binary) with zero-copy semantics using bytes::Bytes.
  • Optimized Row Data Structure: Implemented RowData struct for ordered column name/value pairs, optimized for efficient serialization and deserialization, replacing previous HashMap<String, serde_json::Value> usage.
  • Binary Wire Format: Added a compact binary wire format for encoding and decoding both ColumnValue and RowData, significantly improving performance and reducing payload sizes compared to JSON.
  • ChangeEvent Refactoring: Refactored ChangeEvent to utilize the new ColumnValue and RowData types for its data payloads and implemented binary encoding/decoding for the entire event structure.
  • Enhanced Complex Type Support: Added new integration tests to ensure robust and correct streaming and representation of various complex PostgreSQL data types, including arrays, JSON/JSONB, and geometric types.
  • Performance Benchmarks: Replaced the rowdata_vs_hashmap benchmark with a new columnvalue_vs_json benchmark to validate the performance gains of the new binary serialization over JSON serialization.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • Cargo.lock
    • Updated chrono dependency from 0.4.43 to 0.4.44.
    • Removed itoa, serde_json, and zmij packages, reflecting changes in dependency tree.
  • Cargo.toml
    • Updated chrono dependency to 0.4.44.
    • Removed serde_json from main dependencies and added it to dev-dependencies.
    • Updated tokio dev-dependency version from 1.47.2 to 1.49.0.
    • Added a new [[test]] entry for complex_types integration tests.
    • Renamed the [[bench]] entry from rowdata_vs_hashmap to columnvalue_vs_json.
  • benches/columnvalue_vs_json.rs
    • Added a new benchmark file to compare JSON serialization (serde_json) with the new binary serialization (ColumnValue encode/decode) across various operations like construction, serialization, deserialization, and full round-trips.
  • benches/rowdata_vs_hashmap.rs
    • Removed the old benchmark file comparing RowData with HashMap.
  • examples/basic-streaming/Cargo.lock
    • Updated chrono dependency from 0.4.43 to 0.4.44.
    • Removed itoa, serde_json, and zmij packages.
  • integration-tests/complex_types.rs
    • Added a new integration test file to verify the correct streaming and representation of complex PostgreSQL data types, including arrays, JSON/JSONB, and geometric types, covering inserts, updates, and deletes.
  • src/column_value.rs
    • Added a new module defining the ColumnValue enum (Null, Text, Binary) and RowData struct for efficient column data representation.
    • Implemented binary wire format encoding and decoding for both ColumnValue and RowData.
    • Provided serde::Serialize and serde::Deserialize implementations for ColumnValue and RowData for JSON compatibility.
    • Included helper functions hex_encode and hex_decode.
  • src/lib.rs
    • Declared column_value as a public module.
    • Re-exported ColumnValue and RowData from the new column_value module.
    • Removed the re-export of RowData from types and format_postgres_timestamp.
  • src/protocol.rs
    • Updated TupleData::to_row_data to construct RowData using ColumnValue variants directly, replacing serde_json::Value.
    • Removed the deprecated TupleData::to_hash_map method.
    • Added a raw_bytes() method to ColumnData for zero-copy access to underlying bytes.
    • Updated internal tests to use ColumnValue instead of serde_json::Value for assertions.
  • src/stream.rs
    • Updated the tuple_to_data function to directly map ColumnData to ColumnValue variants (Null, Text, Binary) using zero-copy Bytes.
    • Removed the local hex_encode function, as it is now centralized in src/column_value.rs.
    • Updated internal tests to use ColumnValue for RowData construction and assertions.
  • src/types.rs
    • Removed the local definition of RowData struct and its associated implementations, as it has been moved to src/column_value.rs.
    • Removed the format_postgres_timestamp function.
    • Changed ChangeEvent::metadata from Option<HashMap<String, serde_json::Value>> to Option<HashMap<String, String>>.
    • Implemented encode and decode methods for ChangeEvent to support a compact binary wire format.
    • Added helper functions encode_arc_str, decode_arc_str, decode_string, and micros_to_chrono for binary serialization/deserialization.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/ci.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@codecov
Copy link

codecov bot commented Feb 25, 2026

Codecov Report

❌ Patch coverage is 99.45318% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.26%. Comparing base (352a32f) to head (95045c6).

Files with missing lines Patch % Lines
src/types.rs 99.51% 4 Missing ⚠️
src/column_value.rs 99.47% 3 Missing ⚠️
src/stream.rs 97.05% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #33      +/-   ##
==========================================
+ Coverage   91.19%   92.26%   +1.06%     
==========================================
  Files           8        9       +1     
  Lines        8210     9461    +1251     
==========================================
+ Hits         7487     8729    +1242     
- Misses        723      732       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a significant and well-executed improvement. It introduces ColumnValue and RowData types, moving away from serde_json::Value for representing column data. This greatly improves performance by enabling zero-copy operations and reducing dependencies. The new custom binary serialization format for ChangeEvent is a major enhancement for efficiency. The addition of comprehensive benchmarks and integration tests for complex data types is excellent and provides confidence in the changes.

I have a couple of suggestions for improvement detailed below.

@isdaniel
Copy link
Owner Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an excellent pull request that significantly improves performance by introducing ColumnValue and RowData types and a custom binary serialization format for ChangeEvent. The move away from serde_json::Value in the hot path is a great design choice. The new types are well-implemented with zero-copy in mind.

The changes are thoroughly tested with a comprehensive new integration test suite for complex data types and a new benchmark that clearly demonstrates the performance gains. The code quality is high, and the refactoring is consistent across the codebase.

I've left one minor suggestion for a small performance optimization in the RowData deserialization logic. Overall, this is a very solid and well-executed improvement.

@isdaniel isdaniel closed this Feb 25, 2026
@isdaniel isdaniel deleted the feat/remove-serde_json-use-encode-decode-perf-enhancement branch February 25, 2026 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant