Skip to content

Support binary data transfer in COPY FROM#573

Open
ahoppen wants to merge 3 commits intovapor:mainfrom
ahoppen:copy-from-binary
Open

Support binary data transfer in COPY FROM#573
ahoppen wants to merge 3 commits intovapor:mainfrom
ahoppen:copy-from-binary

Conversation

@ahoppen
Copy link
Contributor

@ahoppen ahoppen commented Jul 24, 2025

My benchmark of transferring the integers from 0 to 1,000,000 both as an integer and as a string was about the same speed as the old text-based transfer. I believe that the binary transfer will start to show significant benefits when transferring binary data, other fields that don't need to be represented as fields and also means that the user doesn't need to worry about escapping their data.

A call to binary transfer data looks as follows

let records: [(id: Int, name: String)] = [(1, "Alice"), (42, "Bob")]
try await conn.copyFromBinary(table: "copy_table", columns: ["id", "name"], logger: .psqlTest) { writer in
    for record in records {
        try await writer.writeRow { columnWriter in
            try columnWriter.writeColumn(Int32(record.id))
            try columnWriter.writeColumn(record.name)
        }
    }
}

@codecov
Copy link

codecov bot commented Dec 3, 2025

Codecov Report

❌ Patch coverage is 95.45455% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.49%. Comparing base (d578b86) to head (2b3e55c).

Files with missing lines Patch % Lines
...esNIO/Connection/PostgresConnection+CopyFrom.swift 95.45% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #573      +/-   ##
==========================================
+ Coverage   76.40%   76.49%   +0.08%     
==========================================
  Files         134      134              
  Lines       10161    10227      +66     
==========================================
+ Hits         7764     7823      +59     
- Misses       2397     2404       +7     
Files with missing lines Coverage Δ
...esNIO/Connection/PostgresConnection+CopyFrom.swift 93.95% <95.45%> (+0.85%) ⬆️

... and 6 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ahoppen ahoppen force-pushed the copy-from-binary branch 2 times, most recently from e42bfe1 to 42f110e Compare December 4, 2025 23:54
/// Serialize a single row to the backend. Call `writeColumn` on `columnWriter` for every column that should be
/// included in the row.
@inlinable
public mutating func writeRow(_ body: (_ columnWriter: inout ColumnWriter) throws -> Void) async throws {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Should this method allow an asynchronous body?
  • Should this method be generic over the return type?
  • We should probably make this nonsiolated(nonsending)
Suggested change
public mutating func writeRow(_ body: (_ columnWriter: inout ColumnWriter) throws -> Void) async throws {
public nonisolated(nonsending) mutating func writeRow<Return>(_ body: (_ columnWriter: inout ColumnWriter) throws -> Return) async throws -> Return {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably make this nonsiolated(nonsending)

Ignore this. The package has adopted the upcoming language feature

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this method allow an asynchronous body?

I don’t think so. On a technical level, we can’t call an asynchronous closure in the withUnsafeMutablePointer call of withColumnWriter call here. From a user’s perspective, it should be extremely easy to compute any asynchronous data before the call to writeRow.

Should this method be generic over the return type?

Good call 👍🏽

}

@usableFromInline
static func withColumnWriter<T>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Should this allow an asynchronous body? If yes then we should make it nonisolated(nonsending)
  • This should adopt typed throws instead of rethrows

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this allow an asynchronous body? If yes then we should make it nonisolated(nonsending)

withUnsafeMutablePointer does not support asynchronous bodies, so we can’t. Also see #573 (comment) why I don’t think this is necessary.

This should adopt typed throws instead of rethrows

👍🏽

Comment on lines +198 to +200
if buffer.readableBytes > bufferSize {
try await flush()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we flush this after every call of writeRow otherwise a user might not know when data went out at all. There must be some guarantee to user that if they called writeRow the data went to the DB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely do not want to do that for performance reasons. Sending a package of data to the database has very measurable performance overhead. I don’t have numbers anymore and am not sure what exactly happens in the database’s implementation but on our side starting a new data package will need to send the PGCOPY\n\ff… header again.

Furthermore, sending data to the database doesn’t actually commit it yet anyway. Only when the COPY … FROM operation is finished does the data get committed to the database. So the point in time when we flush data should be transparent to the user.

My benchmark of transferring the integers from 0 to 1,000,000 both as an integer and as a string was about the same speed as the old text-based transfer. I believe that the binary transfer will start to show significant benefits when transferring binary data, other fields that don't need to be represented as fields and also means that the user doesn't need to worry about escapping their data.
Copy link
Contributor

@FranzBusch FranzBusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from my side. @gwynne @fabianfett mind reviewing and approving?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants