Summary
A pooler-side ErrorResponse followed by an immediate connection close (no trailing ReadyForQuery) can permanently poison a pooled connection. After the trigger, every subsequent query on that connection fails with: pq: there is already a query being processed on this connection.
The connection is never evicted from the pool because database/sql doesn't see driver.ErrBadConn. This is the same end-state as #1298, but reached through a different bug-path that the merge of #1299 (commit 6d77ced) does not close.
Root cause
PR #1272 added an inProgress atomic flag that is set at the start of query()/Exec() and only cleared when ReadyForQuery is received from the server. If a network error prevents ReadyForQuery from arriving, the flag stays stuck at true.
Five sites in conn.go (readParseResponse, readStatementDescribeResponse, readPortalDescribeResponse, readBindResponse, postExecuteWorkaround) handle a mid-extended-protocol ErrorResponse by draining the trailing ReadyForQuery and discarding whatever it returns:
case proto.ErrorResponse:
err := parseError(r, "")
_ = cn.readReadyForQuery()
return err
When the peer closes mid-stream, readReadyForQuery() returns io.EOF. The _ = drops it before handleError can classify it, so cn.err is never set, IsValid() returns true, and database/sql keeps handing out the broken connection. The CompareAndSwap guard rejects every subsequent query with errQueryInProgress — which is not driver.ErrBadConn, so (*DB).retry won't retry on a fresh connection either. The change merged via #1299 only addresses io.ErrUnexpectedEOF in handleError; on this path the EOF is dropped before handleError is reached.
How to reproduce
A reproducer is at: https://github.com/m1ralx/pq-bug-demo
It uses a TCP fault-injection proxy between a Go client and a real PostgreSQL 16 instance (via Docker). On a specific Parse, the proxy writes a hand-crafted ErrorResponse (severity ERROR, SQLSTATE 08P01) directly to the client and closes the connection before any ReadyForQuery reaches the client. This mirrors pgbouncer 1.15's disconnect_server(false, ...) -> send_pooler_error(client, false, ...) byte sequence.
git clone https://github.com/m1ralx/pq-bug-demo
cd pq-bug-demo
make up # start PostgreSQL 16 via Docker
make test-buggy # demonstrates the poisoning on v1.12.3
make test-fix # passes with the proposed patch
make down
Real-world impact
We hit this in stage environment with services routed through pgbouncer ≥1.15.
PR Fix
#1321
Summary
A pooler-side
ErrorResponsefollowed by an immediate connection close (no trailingReadyForQuery) can permanently poison a pooled connection. After the trigger, every subsequent query on that connection fails with:pq: there is already a query being processed on this connection.The connection is never evicted from the pool because
database/sqldoesn't seedriver.ErrBadConn. This is the same end-state as #1298, but reached through a different bug-path that the merge of #1299 (commit6d77ced) does not close.Root cause
PR #1272 added an
inProgressatomic flag that is set at the start ofquery()/Exec()and only cleared whenReadyForQueryis received from the server. If a network error preventsReadyForQueryfrom arriving, the flag stays stuck attrue.Five sites in
conn.go(readParseResponse,readStatementDescribeResponse,readPortalDescribeResponse,readBindResponse,postExecuteWorkaround) handle a mid-extended-protocolErrorResponseby draining the trailingReadyForQueryand discarding whatever it returns:When the peer closes mid-stream,
readReadyForQuery()returnsio.EOF. The_ =drops it beforehandleErrorcan classify it, socn.erris never set,IsValid()returnstrue, anddatabase/sqlkeeps handing out the broken connection. The CompareAndSwap guard rejects every subsequent query witherrQueryInProgress— which is notdriver.ErrBadConn, so(*DB).retrywon't retry on a fresh connection either. The change merged via #1299 only addressesio.ErrUnexpectedEOFinhandleError; on this path the EOF is dropped beforehandleErroris reached.How to reproduce
A reproducer is at: https://github.com/m1ralx/pq-bug-demo
It uses a TCP fault-injection proxy between a Go client and a real PostgreSQL 16 instance (via Docker). On a specific Parse, the proxy writes a hand-crafted
ErrorResponse(severity ERROR, SQLSTATE 08P01) directly to the client and closes the connection before anyReadyForQueryreaches the client. This mirrors pgbouncer 1.15'sdisconnect_server(false, ...) -> send_pooler_error(client, false, ...)byte sequence.Real-world impact
We hit this in stage environment with services routed through pgbouncer ≥1.15.
PR Fix
#1321