Skip to content

Failed COPY FROM STDIN never settles the writable, skips ROLLBACK in sql.begin(), and permanently leaks the connection as 'idle in transaction (aborted)' #1173

Description

@jose-manuel-silva

Summary

When the server rejects a COPY <table> FROM STDIN issued via sql`COPY ...`.writable() (e.g. a malformed CSV row), the writable's end-callback is never invoked, so stream.pipeline() (and any await on stream finish) hangs forever. If the COPY runs inside sql.begin(), the transaction callback never returns, no ROLLBACK is ever sent, and the connection leaks permanently as idle in transaction (aborted). One occurrence leaks one pool connection; repeated occurrences exhaust the pool and take the application down. The real Postgres error is also swallowed — it surfaces nowhere.

This caused a production outage for us: an hourly CSV import job hit a column-count mismatch, leaked one connection per run, and after ~15 hours had consumed the entire pool (max: 10). Every request needing the DB then queued forever.

This is related to, but distinct from, the known COPY-hang reports (#499, #917) and PR #1016 — that PR only patches the success path (socket left paused after CopyDone); this issue is about the error path, which is unfixed on current master.

  • postgres version: 3.4.9 (also traced on current master)
  • Runtime: reproduced on both Bun 1.3 and Node 22
  • PostgreSQL: 17

Reproduction

import postgres from 'postgres';
import { pipeline } from 'node:stream/promises';
import { Readable } from 'node:stream';

const sql = postgres(process.env.DATABASE_URL, { max: 10 });

// 3 columns in the row, 2 in the table -> server errors with
// "extra data after last expected column"
const badCsv = 'a,b\n"x","y","EXTRA COLUMN"\n';

await sql.begin(async (tx) => {
	await tx.unsafe(`CREATE TEMP TABLE _t (a text, b text) ON COMMIT DROP`);
	const writable = await tx`COPY _t FROM STDIN WITH (FORMAT csv, HEADER true)`.writable();
	await pipeline(Readable.from([badCsv]), writable); // <-- never settles
});
// never reached: begin() neither resolves nor rejects

Observed:

  • pipeline() never settles; sql.begin() never settles.
  • The server-side session is left in idle in transaction (aborted), wait_event = ClientRead, forever:
SELECT state, wait_event FROM pg_stat_activity WHERE application_name = 'postgres.js';
            state              | wait_event
-------------------------------+------------
 idle in transaction (aborted) | ClientRead
  • The pool slot is never returned. After max occurrences, every subsequent query queues indefinitely.
  • The PostgresError ("extra data after last expected column") is never observable from application code.

Mechanism (traced in src/connection.js)

  1. CopyInResponse resolves the query with a Writable whose final(callback) handler sends CopyDone and stashes the callback in the connection-scoped final variable:
final(callback) {
  socket.write(b().c().end())   // CopyDone
  final = callback
  stream = null
}
  1. The stashed callback is only ever invoked in the CommandComplete handler:
final && (final(), final = null)
  1. A failed COPY never sends CommandComplete — the server sends ErrorResponse, discards remaining CopyData, and (after receiving CopyDone) sends ReadyForQuery. Neither handler invokes final, so the writable's end() never completes and pipeline() hangs.

  2. At ReadyForQuery, errorResponse is routed to errored()queryError(query, err)query.reject(err) — but the COPY query's promise was already resolved with the stream in CopyInResponse, so the rejection is a no-op. The error is silently dropped. (stream was already nulled by final, so stream.destroy(err) is skipped too.)

  3. With the sql.begin() callback permanently blocked on the pipeline await, no ROLLBACK is ever issued. Server waits on ClientRead, client waits on the end-callback: a permanent deadlock holding an aborted transaction open.

Timing note: if ErrorResponse arrives while the client is still mid-write (large file, backpressure window), stream may still be set, in which case stream.destroy(err) does fire and the pipeline rejects with the real error — so the bug manifests intermittently depending on file size and timing. The small-file case above deadlocks deterministically.

Expected behavior

A failed COPY should settle the writable with the server's error (invoke/destroy on ErrorResponse/ReadyForQuery when a COPY final is pending), so pipeline() rejects with the real PostgresError, sql.begin() can roll back, and the connection is returned to the pool.

Workarounds we use

  • Server-side idle_in_transaction_session_timeout so leaked sessions are eventually killed instead of accumulating.
  • App-side: after the source stream drains, issue a probe query on the same transaction (retrying while the client rejects with COPY_IN_PROGRESS). In an aborted transaction the probe rejects with 25P02, which forces settlement and lets sql.begin() roll back. The real COPY error is still lost in the deadlock flavor (only 25P02 is observable), which is why a library-level fix is needed.

Happy to provide more detail or test a patch.


Investigated, reproduced, and written by Claude (Fable 5) while debugging the production outage; reviewed and verified by me.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions