Skip to content

pg0 0.14.1 start silently hangs (0% CPU, no output) on existing instance on Windows #16

@bchumerch

Description

@bchumerch

Summary

On Windows, pg0 start --name <existing-instance> silently hangs: the pg0 process is alive at ~0% CPU, never writes anything to stdout or stderr, never binds the port, never creates postmaster.pid. Fresh instances (data dir doesn't yet exist) work perfectly, so the binary itself is fine — only the existing-instance path hangs.

Workaround is calling pg_ctl start directly with the exact arguments pg0 -v start uses internally (captured from a verbose run on a fresh instance). I'm filing this so the underlying behavior can be fixed.

Environment

  • pg0: 0.14.1 (pg0-embedded pip package), binary dated 2026-05-08
  • OS: Windows 11 Home 10.0.26200
  • PostgreSQL: 18.1.0 (bundled)
  • Installation root: C:\Users\<user>\.pg0\installation\18.1.0
  • Instance affected: any pre-existing instance with a populated data dir

Reproduction

  1. Create a fresh instance and let pg0 initialize it:

    pg0 -v start --name pg0-test --port 5444
    

    This works — full verbose output, postgres binds in ~7s, instance.json and postmaster.pid created.

  2. Stop the fresh instance, leaving the data directory intact:

    pg0 stop --name pg0-test
    
  3. Restart the existing instance:

    pg0 -v start --name pg0-test
    

    pg0 process spawns, sits at 0% CPU for minutes, produces zero bytes to stdout/stderr, never binds the port. Eventually it can be killed; no error code is observable because nothing was emitted.

Evidence

  • Get-Process pg0 shows the process alive with CPU=0.015 (i.e. essentially nothing) after 3+ minutes.
  • Get-NetTCPConnection -LocalPort <port> returns nothing.
  • The redirected log files captured by the parent process (pg0.out.log, pg0.err.log) are 0 bytes at rest.
  • pg0 info --name <instance> returns "stopped" status correctly — so pg0 can read instance.json; it just hangs on the start path.

For comparison, the fresh-instance verbose run logs everything:

Setting up PostgreSQL 18.1.0...
DEBUG pg0: PostgreSQL already extracted at C:\Users\<user>\.pg0\installation\18.1.0
DEBUG setup: initialize: postgresql_embedded::postgresql: Initializing database <path>
DEBUG setup: initialize: execute_command{program="initdb"}: ...
[... initdb output ...]
DEBUG start: postgresql_embedded::postgresql: Starting database <path> on port <port>
DEBUG start: execute_command{program="pg_ctl"}: ...
DEBUG start: postgresql_embedded::postgresql: Started database <path> on port <port>

For the hanging existing-instance case, none of these lines appear — not even the first Setting up PostgreSQL 18.1.0... line. So the hang is before the "setup" tracing span enters.

What was tried

  • Pid value in instance.json does not matter — tested with pid: 0 (Windows System Idle Process, always alive) and pid: 999999 (definitely dead). Both hang identically.
  • Stale postmaster.pid removal doesn't help — hang occurs whether or not postmaster.pid is present.
  • Same hang reproduces against two different existing instances (different data dirs, different ports).
  • Killing any zombie pg0 processes from prior attempts first, also no help.

Workaround that works

Run postgres directly via the bundled pg_ctl, using the exact args pg0 would have used (captured from a fresh-instance verbose run):

pg_ctl start --pgdata <data> --wait --log <data>/start.log \
  -o "-F -p <port>" \
  -o "-c log_directory=log" \
  -o "-c timezone=UTC" \
  -o "-c work_mem=64MB" \
  -o "-c log_rotation_size=100MB" \
  -o "-c maintenance_work_mem=512MB" \
  -o "-c shared_buffers=256MB" \
  -o "-c log_timezone=UTC" \
  -o "-c log_rotation_age=1d" \
  -o "-c effective_cache_size=1GB" \
  -o "-c max_parallel_maintenance_workers=4" \
  -o "-c logging_collector=on" \
  -o "-c log_filename=postgresql-%Y-%m-%d.log"

Crash recovery completes in <100ms, postgres binds the port, everything works.

Suggested investigation

The hang is happening before any of pg0's tracing spans enter, suggesting it's during instance-state reading / lock acquisition / Windows file-system call rather than in the postgres invocation itself. Possible suspects:

  1. A blocking Windows process-introspection call (e.g. checking if an old pid is alive)
  2. A file lock or named-pipe wait inherited from a prior crash
  3. A network/DNS lookup somewhere in the supervisor

The fact that fresh instances work but any restart against a populated data dir hangs suggests the diverging code path is the "I already have an instance.json and a data dir — let me figure out what state postgres is in" branch.

Related closed issues

Happy to capture additional traces or test a patch.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions