Summary
When the server rejects a COPY <table> FROM STDIN issued via sql`COPY ...`.writable() (e.g. a malformed CSV row), the writable's end-callback is never invoked, so stream.pipeline() (and any await on stream finish) hangs forever. If the COPY runs inside sql.begin(), the transaction callback never returns, no ROLLBACK is ever sent, and the connection leaks permanently as idle in transaction (aborted). One occurrence leaks one pool connection; repeated occurrences exhaust the pool and take the application down. The real Postgres error is also swallowed — it surfaces nowhere.
This caused a production outage for us: an hourly CSV import job hit a column-count mismatch, leaked one connection per run, and after ~15 hours had consumed the entire pool (max: 10). Every request needing the DB then queued forever.
This is related to, but distinct from, the known COPY-hang reports (#499, #917) and PR #1016 — that PR only patches the success path (socket left paused after CopyDone); this issue is about the error path, which is unfixed on current master.
postgres version: 3.4.9 (also traced on current master)
- Runtime: reproduced on both Bun 1.3 and Node 22
- PostgreSQL: 17
Reproduction
import postgres from 'postgres';
import { pipeline } from 'node:stream/promises';
import { Readable } from 'node:stream';
const sql = postgres(process.env.DATABASE_URL, { max: 10 });
// 3 columns in the row, 2 in the table -> server errors with
// "extra data after last expected column"
const badCsv = 'a,b\n"x","y","EXTRA COLUMN"\n';
await sql.begin(async (tx) => {
await tx.unsafe(`CREATE TEMP TABLE _t (a text, b text) ON COMMIT DROP`);
const writable = await tx`COPY _t FROM STDIN WITH (FORMAT csv, HEADER true)`.writable();
await pipeline(Readable.from([badCsv]), writable); // <-- never settles
});
// never reached: begin() neither resolves nor rejects
Observed:
pipeline() never settles; sql.begin() never settles.
- The server-side session is left in
idle in transaction (aborted), wait_event = ClientRead, forever:
SELECT state, wait_event FROM pg_stat_activity WHERE application_name = 'postgres.js';
state | wait_event
-------------------------------+------------
idle in transaction (aborted) | ClientRead
- The pool slot is never returned. After
max occurrences, every subsequent query queues indefinitely.
- The
PostgresError ("extra data after last expected column") is never observable from application code.
Mechanism (traced in src/connection.js)
CopyInResponse resolves the query with a Writable whose final(callback) handler sends CopyDone and stashes the callback in the connection-scoped final variable:
final(callback) {
socket.write(b().c().end()) // CopyDone
final = callback
stream = null
}
- The stashed callback is only ever invoked in the CommandComplete handler:
final && (final(), final = null)
-
A failed COPY never sends CommandComplete — the server sends ErrorResponse, discards remaining CopyData, and (after receiving CopyDone) sends ReadyForQuery. Neither handler invokes final, so the writable's end() never completes and pipeline() hangs.
-
At ReadyForQuery, errorResponse is routed to errored() → queryError(query, err) → query.reject(err) — but the COPY query's promise was already resolved with the stream in CopyInResponse, so the rejection is a no-op. The error is silently dropped. (stream was already nulled by final, so stream.destroy(err) is skipped too.)
-
With the sql.begin() callback permanently blocked on the pipeline await, no ROLLBACK is ever issued. Server waits on ClientRead, client waits on the end-callback: a permanent deadlock holding an aborted transaction open.
Timing note: if ErrorResponse arrives while the client is still mid-write (large file, backpressure window), stream may still be set, in which case stream.destroy(err) does fire and the pipeline rejects with the real error — so the bug manifests intermittently depending on file size and timing. The small-file case above deadlocks deterministically.
Expected behavior
A failed COPY should settle the writable with the server's error (invoke/destroy on ErrorResponse/ReadyForQuery when a COPY final is pending), so pipeline() rejects with the real PostgresError, sql.begin() can roll back, and the connection is returned to the pool.
Workarounds we use
- Server-side
idle_in_transaction_session_timeout so leaked sessions are eventually killed instead of accumulating.
- App-side: after the source stream drains, issue a probe query on the same transaction (retrying while the client rejects with
COPY_IN_PROGRESS). In an aborted transaction the probe rejects with 25P02, which forces settlement and lets sql.begin() roll back. The real COPY error is still lost in the deadlock flavor (only 25P02 is observable), which is why a library-level fix is needed.
Happy to provide more detail or test a patch.
Investigated, reproduced, and written by Claude (Fable 5) while debugging the production outage; reviewed and verified by me.
Summary
When the server rejects a
COPY <table> FROM STDINissued viasql`COPY ...`.writable()(e.g. a malformed CSV row), the writable's end-callback is never invoked, sostream.pipeline()(and anyawaiton stream finish) hangs forever. If the COPY runs insidesql.begin(), the transaction callback never returns, no ROLLBACK is ever sent, and the connection leaks permanently asidle in transaction (aborted). One occurrence leaks one pool connection; repeated occurrences exhaust the pool and take the application down. The real Postgres error is also swallowed — it surfaces nowhere.This caused a production outage for us: an hourly CSV import job hit a column-count mismatch, leaked one connection per run, and after ~15 hours had consumed the entire pool (
max: 10). Every request needing the DB then queued forever.This is related to, but distinct from, the known COPY-hang reports (#499, #917) and PR #1016 — that PR only patches the success path (socket left paused after
CopyDone); this issue is about the error path, which is unfixed on currentmaster.postgresversion: 3.4.9 (also traced on currentmaster)Reproduction
Observed:
pipeline()never settles;sql.begin()never settles.idle in transaction (aborted),wait_event = ClientRead, forever:maxoccurrences, every subsequent query queues indefinitely.PostgresError("extra data after last expected column") is never observable from application code.Mechanism (traced in
src/connection.js)CopyInResponseresolves the query with aWritablewhosefinal(callback)handler sendsCopyDoneand stashes the callback in the connection-scopedfinalvariable:A failed COPY never sends
CommandComplete— the server sendsErrorResponse, discards remainingCopyData, and (after receivingCopyDone) sendsReadyForQuery. Neither handler invokesfinal, so the writable'send()never completes andpipeline()hangs.At
ReadyForQuery,errorResponseis routed toerrored()→queryError(query, err)→query.reject(err)— but the COPY query's promise was already resolved with the stream inCopyInResponse, so the rejection is a no-op. The error is silently dropped. (streamwas already nulled byfinal, sostream.destroy(err)is skipped too.)With the
sql.begin()callback permanently blocked on the pipeline await, noROLLBACKis ever issued. Server waits onClientRead, client waits on the end-callback: a permanent deadlock holding an aborted transaction open.Timing note: if
ErrorResponsearrives while the client is still mid-write (large file, backpressure window),streammay still be set, in which casestream.destroy(err)does fire and the pipeline rejects with the real error — so the bug manifests intermittently depending on file size and timing. The small-file case above deadlocks deterministically.Expected behavior
A failed COPY should settle the writable with the server's error (invoke/destroy on
ErrorResponse/ReadyForQuerywhen a COPYfinalis pending), sopipeline()rejects with the realPostgresError,sql.begin()can roll back, and the connection is returned to the pool.Workarounds we use
idle_in_transaction_session_timeoutso leaked sessions are eventually killed instead of accumulating.COPY_IN_PROGRESS). In an aborted transaction the probe rejects with25P02, which forces settlement and letssql.begin()roll back. The real COPY error is still lost in the deadlock flavor (only25P02is observable), which is why a library-level fix is needed.Happy to provide more detail or test a patch.
Investigated, reproduced, and written by Claude (Fable 5) while debugging the production outage; reviewed and verified by me.