Support binary data transfer in COPY FROM#573
Conversation
8032167 to
9650d8c
Compare
e07d36a to
6dd723c
Compare
6dd723c to
f2e1370
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #573 +/- ##
==========================================
+ Coverage 76.40% 76.49% +0.08%
==========================================
Files 134 134
Lines 10161 10227 +66
==========================================
+ Hits 7764 7823 +59
- Misses 2397 2404 +7
🚀 New features to boost your workflow:
|
e42bfe1 to
42f110e
Compare
| /// Serialize a single row to the backend. Call `writeColumn` on `columnWriter` for every column that should be | ||
| /// included in the row. | ||
| @inlinable | ||
| public mutating func writeRow(_ body: (_ columnWriter: inout ColumnWriter) throws -> Void) async throws { |
There was a problem hiding this comment.
- Should this method allow an asynchronous body?
- Should this method be generic over the return type?
- We should probably make this
nonsiolated(nonsending)
| public mutating func writeRow(_ body: (_ columnWriter: inout ColumnWriter) throws -> Void) async throws { | |
| public nonisolated(nonsending) mutating func writeRow<Return>(_ body: (_ columnWriter: inout ColumnWriter) throws -> Return) async throws -> Return { |
There was a problem hiding this comment.
We should probably make this nonsiolated(nonsending)
Ignore this. The package has adopted the upcoming language feature
There was a problem hiding this comment.
Should this method allow an asynchronous body?
I don’t think so. On a technical level, we can’t call an asynchronous closure in the withUnsafeMutablePointer call of withColumnWriter call here. From a user’s perspective, it should be extremely easy to compute any asynchronous data before the call to writeRow.
Should this method be generic over the return type?
Good call 👍🏽
| } | ||
|
|
||
| @usableFromInline | ||
| static func withColumnWriter<T>( |
There was a problem hiding this comment.
- Should this allow an asynchronous body? If yes then we should make it
nonisolated(nonsending) - This should adopt typed throws instead of rethrows
There was a problem hiding this comment.
Should this allow an asynchronous body? If yes then we should make it
nonisolated(nonsending)
withUnsafeMutablePointer does not support asynchronous bodies, so we can’t. Also see #573 (comment) why I don’t think this is necessary.
This should adopt typed throws instead of rethrows
👍🏽
| if buffer.readableBytes > bufferSize { | ||
| try await flush() | ||
| } |
There was a problem hiding this comment.
Shouldn't we flush this after every call of writeRow otherwise a user might not know when data went out at all. There must be some guarantee to user that if they called writeRow the data went to the DB.
There was a problem hiding this comment.
We definitely do not want to do that for performance reasons. Sending a package of data to the database has very measurable performance overhead. I don’t have numbers anymore and am not sure what exactly happens in the database’s implementation but on our side starting a new data package will need to send the PGCOPY\n\ff… header again.
Furthermore, sending data to the database doesn’t actually commit it yet anyway. Only when the COPY … FROM operation is finished does the data get committed to the database. So the point in time when we flush data should be transparent to the user.
My benchmark of transferring the integers from 0 to 1,000,000 both as an integer and as a string was about the same speed as the old text-based transfer. I believe that the binary transfer will start to show significant benefits when transferring binary data, other fields that don't need to be represented as fields and also means that the user doesn't need to worry about escapping their data.
42f110e to
2b3e55c
Compare
FranzBusch
left a comment
There was a problem hiding this comment.
LGTM from my side. @gwynne @fabianfett mind reviewing and approving?
My benchmark of transferring the integers from 0 to 1,000,000 both as an integer and as a string was about the same speed as the old text-based transfer. I believe that the binary transfer will start to show significant benefits when transferring binary data, other fields that don't need to be represented as fields and also means that the user doesn't need to worry about escapping their data.
A call to binary transfer data looks as follows