feat(sync): add AWSync client for pushing events to aw-sync-server#105
feat(sync): add AWSync client for pushing events to aw-sync-server#105TimeToBuildBob wants to merge 1 commit intoActivityWatch:masterfrom
Conversation
Adds `aw_client/sync.py` with `AWSync` — a lightweight client that incrementally pushes local ActivityWatch bucket events to a self-hosted aw-sync-server (https://github.com/TimeToBuildBob/aw-sync-server). Key design: - Uses `ActivityWatchClient` for the local AW instance - Talks to the sync server with raw `requests` + Bearer token auth - Persists a per-bucket high-water mark to `~/.config/activitywatch/aw-sync-state.json` so that re-runs only upload new events (incremental sync) - Handles bucket creation on the sync server automatically - Errors per bucket are caught and returned as -1 so one bad bucket doesn't abort the whole sync 9 unit tests cover: happy path, empty bucket skip, prefix filter, state persistence, incremental since= arg, error handling, missing state file, existing remote bucket not recreated, auth header present.
|
@greptileai review |
Greptile SummaryThis PR introduces Key findings:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller
participant AWSync
participant LocalAW as ActivityWatchClient (local)
participant StateFile as aw-sync-state.json
participant SyncServer as aw-sync-server
Caller->>AWSync: sync(bucket_filter?)
AWSync->>LocalAW: get_buckets()
LocalAW-->>AWSync: {bucket_id: info, ...}
loop for each matching bucket
AWSync->>StateFile: _load_state() / read since timestamp
AWSync->>LocalAW: get_events(bucket_id, start=since)
LocalAW-->>AWSync: [Event, ...]
AWSync->>SyncServer: GET /api/0/buckets/ (Auth: Bearer)
SyncServer-->>AWSync: {existing buckets}
alt bucket not on server
AWSync->>SyncServer: POST /api/0/buckets/{id} (create)
SyncServer-->>AWSync: 201 Created
end
AWSync->>SyncServer: POST /api/0/buckets/{id}/events (upload)
SyncServer-->>AWSync: 200 OK
AWSync->>StateFile: _save_state() / write new high-water mark
end
AWSync-->>Caller: {bucket_id: events_uploaded, ...}
Last reviewed commit: 5ca7e48 |
| latest: datetime = max( | ||
| e.timestamp + (e.duration or timedelta(0)) for e in events | ||
| ) | ||
| self._state[bucket_id] = latest.isoformat() |
There was a problem hiding this comment.
Duplicate uploads for zero-duration events
The high-water mark is computed as e.timestamp + (e.duration or timedelta(0)). In Python, timedelta(0) is falsy, so the expression correctly evaluates to timedelta(0) in either the None or timedelta(0) case — but the result is the same: when the latest event has a zero (or absent) duration, latest equals the event's timestamp.
On the next sync, get_events(bucket_id, start=since) fetches events with timestamp >= since. Since that zero-duration event's timestamp == since, it gets included in the response again and is re-uploaded, causing a duplicate on the server.
The fix is to advance the marker by one microsecond (the minimum representable tick) so that strictly-later events are fetched next time:
| latest: datetime = max( | |
| e.timestamp + (e.duration or timedelta(0)) for e in events | |
| ) | |
| self._state[bucket_id] = latest.isoformat() | |
| latest: datetime = max( | |
| e.timestamp + (e.duration or timedelta(0)) for e in events | |
| ) | |
| # Add 1 µs so a zero-duration event is not re-fetched on the next sync | |
| from datetime import timedelta as _td | |
| latest += timedelta(microseconds=1) | |
| self._state[bucket_id] = latest.isoformat() | |
| self._save_state() |
Or more cleanly, keep the import at module level and apply the offset:
latest: datetime = max(
e.timestamp + (e.duration or timedelta(0)) for e in events
) + timedelta(microseconds=1)
self._state[bucket_id] = latest.isoformat()
self._save_state()| remote_buckets = self._get_remote_buckets() | ||
| if bucket_id not in remote_buckets: | ||
| self._ensure_remote_bucket(bucket_id, event_type, hostname) |
There was a problem hiding this comment.
Remote bucket list fetched once per bucket, not once per sync
_get_remote_buckets() is called inside sync_bucket(), which is called for every bucket during a single sync() run. This means N identical GET requests hit the sync server for N local buckets — even though the list of remote buckets doesn't change between iterations within the same sync() call.
The fix is to fetch remote buckets once in sync() and pass the result down to sync_bucket():
# In sync():
remote_buckets = self._get_remote_buckets()
for bucket_id, info in buckets.items():
...
count = self.sync_bucket(bucket_id, info, remote_buckets=remote_buckets)
# sync_bucket signature:
def sync_bucket(self, bucket_id: str, bucket_info: dict, remote_buckets: Optional[Dict[str, dict]] = None) -> int:
...
if remote_buckets is None:
remote_buckets = self._get_remote_buckets()
if bucket_id not in remote_buckets:
self._ensure_remote_bucket(bucket_id, event_type, hostname)This also makes sync_bucket() still usable as a standalone method (with a fresh fetch) while being efficient when called from sync().
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| _DEFAULT_STATE_FILE = Path.home() / ".config" / "activitywatch" / "aw-sync-state.json" |
There was a problem hiding this comment.
Hardcoded ~/.config path is not cross-platform
Path.home() / ".config" / "activitywatch" is XDG-style and works on Linux/macOS, but on Windows it resolves to C:\Users\<user>\.config\activitywatch\, which is not the conventional location for config files there (normally %APPDATA%\activitywatch\).
The existing client.py already uses from aw_core.dirs import get_data_dir for a platform-aware data path. If aw_core.dirs exposes a get_config_dir() (or a state-directory equivalent), it should be used here to stay consistent with how ActivityWatch manages paths across platforms.
| _DEFAULT_STATE_FILE = Path.home() / ".config" / "activitywatch" / "aw-sync-state.json" | |
| _DEFAULT_STATE_FILE = Path(get_data_dir("aw-client")) / "aw-sync-state.json" |
(Adjust to the correct aw_core.dirs helper once confirmed; the important point is to avoid hardcoding ~/.config.)
|
No, we're not doing this "sync server" idea |
Summary
Adds
aw_client/sync.pywithAWSync— a client that incrementally pushes local ActivityWatch bucket events to a self-hosted aw-sync-server.This is the client-side complement to the aw-sync-server proof-of-concept.
Design
ActivityWatchClientfor the local AW instancerequests+Authorization: Bearer <api_key>header(the sync server exposes the same bucket+events API as aw-server)
~/.config/activitywatch/aw-sync-state.jsonso re-runs only upload new events (incremental sync)
-1so one failure doesn't abort the whole syncUsage
Files changed
aw_client/sync.py— newAWSyncclass (~130 lines)aw_client/__init__.py— exportAWSynctests/test_sync.py— 9 unit tests (all passing)Privacy note
This sync is intended for self-hosted use only. The aw-sync-server README includes a clear Privacy / Design Philosophy section stating it is not intended for third-party hosting of raw user data.
Closes / relates to: ActivityWatch/activitywatch#35