pg0 0.14.1 start silently hangs (0% CPU, no output) on existing instance on Windows

## Summary

On Windows, `pg0 start --name <existing-instance>` silently hangs: the pg0 process is alive at ~0% CPU, never writes anything to stdout or stderr, never binds the port, never creates `postmaster.pid`. Fresh instances (data dir doesn't yet exist) work perfectly, so the binary itself is fine — only the existing-instance path hangs.

Workaround is calling `pg_ctl start` directly with the exact arguments `pg0 -v start` uses internally (captured from a verbose run on a fresh instance). I'm filing this so the underlying behavior can be fixed.

## Environment

- pg0: **0.14.1** (`pg0-embedded` pip package), binary dated 2026-05-08
- OS: Windows 11 Home 10.0.26200
- PostgreSQL: 18.1.0 (bundled)
- Installation root: `C:\Users\<user>\.pg0\installation\18.1.0`
- Instance affected: any pre-existing instance with a populated data dir

## Reproduction

1. Create a fresh instance and let pg0 initialize it:
   ```
   pg0 -v start --name pg0-test --port 5444
   ```
   This works — full verbose output, postgres binds in ~7s, `instance.json` and `postmaster.pid` created.

2. Stop the fresh instance, leaving the data directory intact:
   ```
   pg0 stop --name pg0-test
   ```

3. Restart the **existing** instance:
   ```
   pg0 -v start --name pg0-test
   ```
   pg0 process spawns, sits at 0% CPU for minutes, produces **zero** bytes to stdout/stderr, never binds the port. Eventually it can be killed; no error code is observable because nothing was emitted.

## Evidence

- `Get-Process pg0` shows the process alive with `CPU=0.015` (i.e. essentially nothing) after 3+ minutes.
- `Get-NetTCPConnection -LocalPort <port>` returns nothing.
- The redirected log files captured by the parent process (`pg0.out.log`, `pg0.err.log`) are **0 bytes** at rest.
- `pg0 info --name <instance>` returns "stopped" status correctly — so pg0 *can* read `instance.json`; it just hangs on the start path.

For comparison, the fresh-instance verbose run logs everything:
```
Setting up PostgreSQL 18.1.0...
DEBUG pg0: PostgreSQL already extracted at C:\Users\<user>\.pg0\installation\18.1.0
DEBUG setup: initialize: postgresql_embedded::postgresql: Initializing database <path>
DEBUG setup: initialize: execute_command{program="initdb"}: ...
[... initdb output ...]
DEBUG start: postgresql_embedded::postgresql: Starting database <path> on port <port>
DEBUG start: execute_command{program="pg_ctl"}: ...
DEBUG start: postgresql_embedded::postgresql: Started database <path> on port <port>
```

For the hanging existing-instance case, **none** of these lines appear — not even the first `Setting up PostgreSQL 18.1.0...` line. So the hang is before the "setup" tracing span enters.

## What was tried

- Pid value in `instance.json` does not matter — tested with `pid: 0` (Windows System Idle Process, always alive) and `pid: 999999` (definitely dead). Both hang identically.
- Stale `postmaster.pid` removal doesn't help — hang occurs whether or not `postmaster.pid` is present.
- Same hang reproduces against two different existing instances (different data dirs, different ports).
- Killing any zombie pg0 processes from prior attempts first, also no help.

## Workaround that works

Run postgres directly via the bundled `pg_ctl`, using the exact args pg0 would have used (captured from a fresh-instance verbose run):

```
pg_ctl start --pgdata <data> --wait --log <data>/start.log \
  -o "-F -p <port>" \
  -o "-c log_directory=log" \
  -o "-c timezone=UTC" \
  -o "-c work_mem=64MB" \
  -o "-c log_rotation_size=100MB" \
  -o "-c maintenance_work_mem=512MB" \
  -o "-c shared_buffers=256MB" \
  -o "-c log_timezone=UTC" \
  -o "-c log_rotation_age=1d" \
  -o "-c effective_cache_size=1GB" \
  -o "-c max_parallel_maintenance_workers=4" \
  -o "-c logging_collector=on" \
  -o "-c log_filename=postgresql-%Y-%m-%d.log"
```

Crash recovery completes in <100ms, postgres binds the port, everything works.

## Suggested investigation

The hang is happening before any of pg0's tracing spans enter, suggesting it's during instance-state reading / lock acquisition / Windows file-system call rather than in the postgres invocation itself. Possible suspects:

1. A blocking Windows process-introspection call (e.g. checking if an old pid is alive)
2. A file lock or named-pipe wait inherited from a prior crash
3. A network/DNS lookup somewhere in the supervisor

The fact that **fresh instances work** but **any restart against a populated data dir hangs** suggests the diverging code path is the "I already have an `instance.json` and a data dir — let me figure out what state postgres is in" branch.

## Related closed issues

- [#6](https://github.com/vectorize-io/pg0/issues/6) — stale postmaster.pid path (data deletion). Different symptom (data loss vs silent hang), but adjacent code area. The fix for #6 may have introduced this new branch.
- [#13](https://github.com/vectorize-io/pg0/issues/13) — localized PG error matching on restart. Same surface (start an existing instance on Windows), different failure mode.

Happy to capture additional traces or test a patch.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pg0 0.14.1 start silently hangs (0% CPU, no output) on existing instance on Windows #16

Summary

Environment

Reproduction

Evidence

What was tried

Workaround that works

Suggested investigation

Related closed issues

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

pg0 0.14.1 start silently hangs (0% CPU, no output) on existing instance on Windows #16

Description

Summary

Environment

Reproduction

Evidence

What was tried

Workaround that works

Suggested investigation

Related closed issues

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions