Implement ColumnValue and RowData types for PostgreSQL logical replication by isdaniel · Pull Request #33 · isdaniel/pg-walstream

isdaniel · 2026-02-25T03:34:28Z

Introduced ColumnValue enum to represent PostgreSQL column data as either Null, Text, or Binary.
Added methods for encoding and decoding ColumnValue to/from a binary wire format.
Implemented RowData struct to hold ordered pairs of column names and values, supporting efficient serialization and deserialization.
Included hex encoding and decoding utilities for binary data representation.
Added comprehensive tests for both ColumnValue and RowData, covering serialization, deserialization, and various edge cases.

benchmark.

Suite	N	JSON median	Binary median	Speedup (JSON / Binary)	Binary vs JSON Δ%
serialize	5	306.15 ns	91.522 ns	3.35×	-70.1%
serialize	10	471.63 ns	137.600 ns	3.43×	-70.8%
serialize	20	740.33 ns	310.540 ns	2.38×	-58.1%
serialize	50	1.4900 µs	676.380 ns	2.20×	-54.6%
deserialize	5	1.2131 µs	725.190 ns	1.67×	-40.2%
deserialize	10	1.9736 µs	1.1972 µs	1.65×	-39.3%
deserialize	20	3.2340 µs	2.2980 µs	1.41×	-28.9%
deserialize	50	8.7417 µs	5.8856 µs	1.49×	-32.7%
round_trip	5	1.5761 µs	797.170 ns	1.98×	-49.4%
round_trip	10	2.4647 µs	1.3241 µs	1.86×	-46.3%
round_trip	20	4.1536 µs	2.6145 µs	1.59×	-37.1%
round_trip	50	10.2030 µs	6.5841 µs	1.55×	-35.5%
payload_size	5	335.66 ns	73.999 ns	4.54×	-78.0%
payload_size	10	494.18 ns	131.490 ns	3.76×	-73.4%
payload_size	20	747.12 ns	278.120 ns	2.69×	-62.8%
payload_size	50	1.4915 µs	671.910 ns	2.22×	-55.0%
pipeline	5	1.0991 µs	855.700 ns	1.28×	-22.1%
pipeline	10	1.9750 µs	1.5645 µs	1.26×	-20.8%
pipeline	20	4.1716 µs	3.5611 µs	1.17×	-14.6%
pipeline	50	9.9356 µs	9.0535 µs	1.10×	-8.9%

…ation - Introduced `ColumnValue` enum to represent PostgreSQL column data as either `Null`, `Text`, or `Binary`. - Added methods for encoding and decoding `ColumnValue` to/from a binary wire format. - Implemented `RowData` struct to hold ordered pairs of column names and values, supporting efficient serialization and deserialization. - Included hex encoding and decoding utilities for binary data representation. - Added comprehensive tests for both `ColumnValue` and `RowData`, covering serialization, deserialization, and various edge cases.

gemini-code-assist · 2026-02-25T03:34:58Z

Summary of Changes

Hello @isdaniel, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces fundamental improvements to how PostgreSQL logical replication data is represented and processed. By implementing ColumnValue and RowData types, the system now leverages zero-copy data handling and a compact binary wire format, moving away from less efficient JSON-based representations. This refactoring aims to boost performance, reduce memory allocations, and provide more robust handling of diverse PostgreSQL data types, ensuring a more efficient and reliable data streaming experience.

Highlights

New Data Types for Column Values: Introduced ColumnValue enum to represent PostgreSQL column data (Null, Text, Binary) with zero-copy semantics using bytes::Bytes.
Optimized Row Data Structure: Implemented RowData struct for ordered column name/value pairs, optimized for efficient serialization and deserialization, replacing previous HashMap<String, serde_json::Value> usage.
Binary Wire Format: Added a compact binary wire format for encoding and decoding both ColumnValue and RowData, significantly improving performance and reducing payload sizes compared to JSON.
ChangeEvent Refactoring: Refactored ChangeEvent to utilize the new ColumnValue and RowData types for its data payloads and implemented binary encoding/decoding for the entire event structure.
Enhanced Complex Type Support: Added new integration tests to ensure robust and correct streaming and representation of various complex PostgreSQL data types, including arrays, JSON/JSONB, and geometric types.
Performance Benchmarks: Replaced the rowdata_vs_hashmap benchmark with a new columnvalue_vs_json benchmark to validate the performance gains of the new binary serialization over JSON serialization.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

Cargo.lock
- Updated chrono dependency from 0.4.43 to 0.4.44.
- Removed itoa, serde_json, and zmij packages, reflecting changes in dependency tree.
Cargo.toml
- Updated chrono dependency to 0.4.44.
- Removed serde_json from main dependencies and added it to dev-dependencies.
- Updated tokio dev-dependency version from 1.47.2 to 1.49.0.
- Added a new [[test]] entry for complex_types integration tests.
- Renamed the [[bench]] entry from rowdata_vs_hashmap to columnvalue_vs_json.
benches/columnvalue_vs_json.rs
- Added a new benchmark file to compare JSON serialization (serde_json) with the new binary serialization (ColumnValue encode/decode) across various operations like construction, serialization, deserialization, and full round-trips.
benches/rowdata_vs_hashmap.rs
- Removed the old benchmark file comparing RowData with HashMap.
examples/basic-streaming/Cargo.lock
- Updated chrono dependency from 0.4.43 to 0.4.44.
- Removed itoa, serde_json, and zmij packages.
integration-tests/complex_types.rs
- Added a new integration test file to verify the correct streaming and representation of complex PostgreSQL data types, including arrays, JSON/JSONB, and geometric types, covering inserts, updates, and deletes.
src/column_value.rs
- Added a new module defining the ColumnValue enum (Null, Text, Binary) and RowData struct for efficient column data representation.
- Implemented binary wire format encoding and decoding for both ColumnValue and RowData.
- Provided serde::Serialize and serde::Deserialize implementations for ColumnValue and RowData for JSON compatibility.
- Included helper functions hex_encode and hex_decode.
src/lib.rs
- Declared column_value as a public module.
- Re-exported ColumnValue and RowData from the new column_value module.
- Removed the re-export of RowData from types and format_postgres_timestamp.
src/protocol.rs
- Updated TupleData::to_row_data to construct RowData using ColumnValue variants directly, replacing serde_json::Value.
- Removed the deprecated TupleData::to_hash_map method.
- Added a raw_bytes() method to ColumnData for zero-copy access to underlying bytes.
- Updated internal tests to use ColumnValue instead of serde_json::Value for assertions.
src/stream.rs
- Updated the tuple_to_data function to directly map ColumnData to ColumnValue variants (Null, Text, Binary) using zero-copy Bytes.
- Removed the local hex_encode function, as it is now centralized in src/column_value.rs.
- Updated internal tests to use ColumnValue for RowData construction and assertions.
src/types.rs
- Removed the local definition of RowData struct and its associated implementations, as it has been moved to src/column_value.rs.
- Removed the format_postgres_timestamp function.
- Changed ChangeEvent::metadata from Option<HashMap<String, serde_json::Value>> to Option<HashMap<String, String>>.
- Implemented encode and decode methods for ChangeEvent to support a compact binary wire format.
- Added helper functions encode_arc_str, decode_arc_str, decode_string, and micros_to_chrono for binary serialization/deserialization.

Ignored Files

Ignored by pattern: .github/workflows/** (1)
- .github/workflows/ci.yml

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

codecov · 2026-02-25T03:35:56Z

Codecov Report

❌ Patch coverage is 99.45318% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.26%. Comparing base (352a32f) to head (95045c6).

Files with missing lines	Patch %	Lines
src/types.rs	99.51%	4 Missing ⚠️
src/column_value.rs	99.47%	3 Missing ⚠️
src/stream.rs	97.05%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #33      +/-   ##
==========================================
+ Coverage   91.19%   92.26%   +1.06%     
==========================================
  Files           8        9       +1     
  Lines        8210     9461    +1251     
==========================================
+ Hits         7487     8729    +1242     
- Misses        723      732       +9

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

gemini-code-assist

Code Review

This pull request is a significant and well-executed improvement. It introduces ColumnValue and RowData types, moving away from serde_json::Value for representing column data. This greatly improves performance by enabling zero-copy operations and reducing dependencies. The new custom binary serialization format for ChangeEvent is a major enhancement for efficiency. The addition of comprehensive benchmarks and integration tests for complex data types is excellent and provides confidence in the changes.

I have a couple of suggestions for improvement detailed below.

src/column_value.rs

src/types.rs

…umnValue decoding

isdaniel · 2026-02-25T03:51:43Z

/gemini review

gemini-code-assist

Code Review

This is an excellent pull request that significantly improves performance by introducing ColumnValue and RowData types and a custom binary serialization format for ChangeEvent. The move away from serde_json::Value in the hot path is a great design choice. The new types are well-implemented with zero-copy in mind.

The changes are thoroughly tested with a comprehensive new integration test suite for complex data types and a new benchmark that clearly demonstrates the performance gains. The code quality is high, and the refactoring is consistent across the codebase.

I've left one minor suggestion for a small performance optimization in the RowData deserialization logic. Overall, this is a very solid and well-executed improvement.

src/column_value.rs

gemini-code-assist bot reviewed Feb 25, 2026

View reviewed changes

src/column_value.rs Show resolved Hide resolved

src/types.rs Show resolved Hide resolved

Remove unused payload size printing and improve error handling in Col…

95045c6

…umnValue decoding

gemini-code-assist bot reviewed Feb 25, 2026

View reviewed changes

src/column_value.rs Show resolved Hide resolved

isdaniel closed this Feb 25, 2026

isdaniel deleted the feat/remove-serde_json-use-encode-decode-perf-enhancement branch February 25, 2026 06:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement ColumnValue and RowData types for PostgreSQL logical replication#33

Implement ColumnValue and RowData types for PostgreSQL logical replication#33
isdaniel wants to merge 2 commits intomainfrom
feat/remove-serde_json-use-encode-decode-perf-enhancement

isdaniel commented Feb 25, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 25, 2026

Uh oh!

codecov bot commented Feb 25, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

isdaniel commented Feb 25, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

isdaniel commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Feb 25, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

codecov bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

isdaniel commented Feb 25, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

isdaniel commented Feb 25, 2026 •

edited

Loading

codecov bot commented Feb 25, 2026 •

edited

Loading