fix(manager): detect existing server on startup, avoid port conflict#123
fix(manager): detect existing server on startup, avoid port conflict#123TimeToBuildBob wants to merge 5 commits intoActivityWatch:masterfrom
Conversation
Greptile SummaryThis PR fixes a startup port-conflict problem where aw-qt would attempt to launch The solution probes Key observations:
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["Module.start(testing)"] --> B{"name in\n'aw-server' /\n'aw-server-rust'?"}
B -- No --> F["Launch subprocess\nself._process = Popen(...)"]
B -- Yes --> C["_get_server_port(testing)\n(module-specific port from config)"]
C --> D["_probe_external_server(testing, timeout=1.0)\nGET /api/0/info"]
D -- "200 OK" --> E["_external_server = True\nstarted = True\nreturn (skip Popen)"]
D -- "Error / timeout" --> F
F --> G["started = True"]
H["Module.is_alive()"] --> I{"_external_server?"}
I -- Yes --> J["_probe_external_server_cached(testing, max_age=1.0)"]
J -- "alive=True" --> K["return True"]
J -- "alive=False" --> L["_external_server = False\nclear cache\nreturn False"]
I -- No --> M{"_process is None?"}
M -- Yes --> N["return False"]
M -- No --> O["poll() → returncode is None?\nreturn True/False"]
P["Module.stop()"] --> Q{"started?"}
Q -- No --> R["warn: not started"]
Q -- Yes --> S{"_external_server?"}
S -- Yes --> T["log: external, skip terminate\nreset all external-server state\nstarted = False\nreturn"]
S -- No --> U{"is_alive()?"}
U -- No --> V["warn: wasn't running"]
U -- Yes --> W["terminate() + wait()\nstarted = False"]
|
aw_qt/manager.py
Outdated
| if self.name in ("aw-server", "aw-server-rust"): | ||
| from .config import _read_server_port | ||
|
|
||
| port = _read_server_port(testing) |
There was a problem hiding this comment.
Wrong port probed when starting aw-server
_read_server_port() in config.py always reads the aw-server-rust config first and returns that port if set, regardless of which module is being started. This means that when self.name == "aw-server" (the Python server), if aw-server-rust has a custom port (e.g. 5700) set in its own config, the probe will hit port 5700 instead of aw-server's actual port (5600).
If aw-server-rust happens to be running on port 5700, aw-server will be incorrectly flagged as _external_server = True, skipping its own launch — even though nothing is bound to aw-server's real port. Watchers that connect to aw-server's port will then find nothing listening.
The module-specific helpers (_read_server_rust_port / _read_aw_server_port) should be used here instead:
from .config import _read_server_rust_port, _read_aw_server_port
default_port = 5666 if testing else 5600
if self.name == "aw-server-rust":
port = _read_server_rust_port(testing) or default_port
else: # aw-server (Python)
port = _read_aw_server_port(testing) or default_port
aw_qt/manager.py
Outdated
| if self.name in ("aw-server", "aw-server-rust"): | ||
| from .config import _read_server_port | ||
|
|
||
| port = _read_server_port(testing) |
There was a problem hiding this comment.
Wrong port probed when starting aw-server
_read_server_port() in config.py always reads the aw-server-rust config first and returns that port if set, regardless of which module is being started. This means that when self.name == "aw-server" (the Python server), if aw-server-rust has a custom port (e.g. 5700) set in its own config, the probe will hit port 5700 instead of aw-server's actual port (5600).
If aw-server-rust happens to be running on port 5700, aw-server will be incorrectly flagged as _external_server = True, skipping its own launch — even though nothing is bound to aw-server's real port. Watchers that connect to aw-server's port will then find nothing listening.
The module-specific helpers (_read_server_rust_port / _read_aw_server_port) should be used here instead:
from .config import _read_server_rust_port, _read_aw_server_port
default_port = 5666 if testing else 5600
if self.name == "aw-server-rust":
port = _read_server_rust_port(testing) or default_port
else: # aw-server (Python)
port = _read_aw_server_port(testing) or default_port
aw_qt/manager.py
Outdated
| if self._external_server: | ||
| # We don't own this process; assume it's alive (managed externally) | ||
| return True |
There was a problem hiding this comment.
Silent failure if external server crashes
is_alive() returns True unconditionally for external servers without actually re-probing the port. This has two downstream effects:
Manager.get_unexpected_stops()filters forx.started and not x.is_alive(). Sinceis_alive()never returnsFalsehere, a crashed external server is invisible to that check — no warning, no restart attempt, and the tray icon permanently shows the module as "running".stop_all()callsmodule.stop()onis_alive()modules. For an external server,stop()correctly skips termination, but the unconditionalTruefromis_alive()means this path is always taken, even after the server has already died on its own.
Consider probing the port in is_alive() for external servers (with a very short timeout and cached result) so that at least get_unexpected_stops() can surface the failure to the user:
def is_alive(self) -> bool:
if self._external_server:
try:
port = _read_server_port(self._testing) # cache testing flag on self
urllib.request.urlopen(f"http://localhost:{port}/api/0/info", timeout=0.5)
return True
except (urllib.error.URLError, OSError):
return False
...
aw_qt/manager.py
Outdated
| try: | ||
| urllib.request.urlopen( | ||
| f"http://localhost:{port}/api/0/info", timeout=1 | ||
| ) |
There was a problem hiding this comment.
Blocking HTTP call on startup thread may freeze UI
urllib.request.urlopen(..., timeout=1) is a synchronous call. If the TCP connection has to wait for the full 1-second timeout (e.g. the port is open but the process is slow to respond), this will block whichever thread calls autostart() — typically the main Qt thread — causing the tray icon to freeze during startup.
Consider running the probe in a threading.Thread or using a non-blocking socket.connect_ex() check first to gate the HTTP request, so the timeout only fires when truly needed:
import socket
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.settimeout(0.2)
if s.connect_ex(("localhost", port)) != 0:
pass # port not open, skip HTTP probe entirely
# else: proceed with urlopen|
@TimeToBuildBob Address comments |
When aw-server is already running (e.g. managed by systemd, Docker, or started manually for debugging), aw-qt previously failed to start its own instance with an "Address already in use" error, leaving the user confused and requiring a manual "Restart" click from the tray icon. This change adds a pre-startup check for server modules (aw-server and aw-server-rust): before launching a subprocess, aw-qt probes the expected port. If the server is already responding, it marks the module as started (external) and skips the subprocess launch entirely. - `Module._external_server` flag tracks this state - `is_alive()` returns True for external servers (not our process to track) - `stop()` skips termination for external servers (not ours to stop) Fixes the UX issue reported in ActivityWatch#113: users running aw-server via systemd or alongside another process no longer see startup failures.
aa51ef0 to
5af82d8
Compare
|
Addressed the review comments in Changes:
Validation:
@greptileai review |
aw_qt/manager.py
Outdated
| try: | ||
| urllib.request.urlopen(f"http://localhost:{port}/api/0/info", timeout=0.2) | ||
| except (urllib.error.URLError, OSError): | ||
| return False | ||
| return True |
There was a problem hiding this comment.
HTTP response not closed — resource leak
urllib.request.urlopen returns an http.client.HTTPResponse object that holds an open socket. Not closing it will emit ResourceWarning: unclosed socket in Python and can exhaust file descriptors if is_alive() is polled frequently (every N seconds by the tray). The response object supports the context-manager protocol; wrap it in a with statement:
| try: | |
| urllib.request.urlopen(f"http://localhost:{port}/api/0/info", timeout=0.2) | |
| except (urllib.error.URLError, OSError): | |
| return False | |
| return True | |
| try: | |
| with urllib.request.urlopen( | |
| f"http://localhost:{port}/api/0/info", timeout=0.2 | |
| ): | |
| pass | |
| except (urllib.error.URLError, OSError): | |
| return False | |
| return True |
| def is_alive(self) -> bool: | ||
| if self._external_server: | ||
| # We don't own this process, so re-probe the server instead. | ||
| alive = self._probe_external_server(self._external_server_testing) | ||
| if not alive: | ||
| self._external_server = False | ||
| self._external_server_testing = False | ||
| return alive |
There was a problem hiding this comment.
Repeated blocking HTTP probe inside is_alive() can stall the Qt event loop
is_alive() is called on every polling cycle (e.g. get_unexpected_stops() fires from the tray's timer). For each external-server module this now makes a blocking urlopen call with a 0.2 s timeout. When the external server is down every single check takes 0.2 s — and the call happens on the Qt main thread (same thread that drives autostart() and the tray timer).
Unlike the one-off probe in start(), this one is repeated on every liveness check. Consider caching the result with a short TTL (e.g. 5 s) or moving the probe off the Qt thread, so a dead external server doesn't impose a guaranteed 0.2 s stall per polling interval:
import time
_PROBE_CACHE_TTL = 5.0 # seconds
def is_alive(self) -> bool:
if self._external_server:
now = time.monotonic()
if now - getattr(self, "_last_probe_time", 0) >= _PROBE_CACHE_TTL:
alive = self._probe_external_server(self._external_server_testing)
self._last_probe_time = now
self._last_probe_result = alive
if not alive:
self._external_server = False
self._external_server_testing = False
return alive
return self._last_probe_result
...|
Addressed the latest Greptile follow-up in Changes:
Validation:
@greptileai review |
| with urllib.request.urlopen( | ||
| f"http://localhost:{port}/api/0/info", timeout=0.2 | ||
| ): |
There was a problem hiding this comment.
Startup probe timeout too short — may miss a loaded server
_probe_external_server uses timeout=0.2 for both the one-shot startup check (called from start()) and the repeated liveness probe (called from is_alive() via _probe_external_server_cached).
The 0.2 s limit makes sense for liveness polls to avoid stalling the Qt main thread. However, the startup probe in start() fires only once per session and blocks just as long either way. If the external server is momentarily loaded or slow (e.g., JVM / Python startup warmup), the probe can time out and return False, causing aw-qt to attempt launching a second instance — triggering the exact "address already in use" error this PR is intended to fix.
Consider using a separate, larger timeout for the startup case. One clean approach is a dedicated parameter on _probe_external_server:
def _probe_external_server(self, testing: bool, timeout: float = 0.2) -> bool:
port = self._get_server_port(testing)
if port is None:
return False
try:
with urllib.request.urlopen(
f"http://localhost:{port}/api/0/info", timeout=timeout
):
return True
except (urllib.error.URLError, OSError):
return FalseThen in start():
if self._probe_external_server(testing, timeout=1.0):This keeps the liveness path at 0.2 s while giving the startup probe a fighting chance on a loaded system.
|
Addressed the latest Greptile timeout follow-up in Changes:
Validation:
@greptileai review |
|
Addressed the latest review gaps in Changes:
Validation:
This should close the remaining Greptile concerns around determinism + external-server lifecycle coverage. |
|
@greptileai review |
Problem
When aw-server is already running (managed by systemd, Docker, or started manually for debugging), aw-qt fails to start its own instance with an "Address already in use" error. The tray icon then shows the module as failed, requiring the user to click "Restart" — which is confusing and non-obvious.
Reported in #113 (and related to #110).
Solution
Add a pre-startup check for server modules (
aw-serverandaw-server-rust): before launching a subprocess, aw-qt probes the configured port. If the server is already responding, it marks the module as started (external) and skips launching its own subprocess.Changes to
manager.pyModule._external_serverflag — tracks whether the module is using an already-running external server instanceModule.start()— probeshttp://localhost:{port}/api/0/infobefore starting; uses existing instance if already runningModule.is_alive()— returnsTruefor external servers (we don't own the process)Module.stop()— skips termination for external servers (not ours to stop)Uses
_read_server_port()fromconfig.pyto respect user-configured ports (not hardcoded 5600).Behavior
Testing
aw-client'swait_for_start()which probes the same endpointCloses #113