Skip to content

feat(sync): add AWSync client for pushing events to aw-sync-server#105

Closed
TimeToBuildBob wants to merge 1 commit intoActivityWatch:masterfrom
TimeToBuildBob:feat/aw-sync-client
Closed

feat(sync): add AWSync client for pushing events to aw-sync-server#105
TimeToBuildBob wants to merge 1 commit intoActivityWatch:masterfrom
TimeToBuildBob:feat/aw-sync-client

Conversation

@TimeToBuildBob
Copy link
Contributor

Summary

Adds aw_client/sync.py with AWSync — a client that incrementally pushes local ActivityWatch bucket events to a self-hosted aw-sync-server.

This is the client-side complement to the aw-sync-server proof-of-concept.

Design

  • Uses the existing ActivityWatchClient for the local AW instance
  • Talks to the sync server with requests + Authorization: Bearer <api_key> header
    (the sync server exposes the same bucket+events API as aw-server)
  • Persists a per-bucket high-water mark to ~/.config/activitywatch/aw-sync-state.json
    so re-runs only upload new events (incremental sync)
  • Bucket creation on the sync server is automatic (idempotent)
  • Per-bucket errors are caught and returned as -1 so one failure doesn't abort the whole sync

Usage

from aw_client.sync import AWSync

sync = AWSync(
    sync_url="http://localhost:5667",  # aw-sync-server base URL
    api_key="my-api-key",
)

# Sync all buckets
results = sync.sync()  # {"aw-watcher-window_host": 42, ...}

# Sync only window-activity buckets
results = sync.sync(bucket_filter="aw-watcher-window")

Files changed

  • aw_client/sync.py — new AWSync class (~130 lines)
  • aw_client/__init__.py — export AWSync
  • tests/test_sync.py — 9 unit tests (all passing)

Privacy note

This sync is intended for self-hosted use only. The aw-sync-server README includes a clear Privacy / Design Philosophy section stating it is not intended for third-party hosting of raw user data.

Closes / relates to: ActivityWatch/activitywatch#35

Adds `aw_client/sync.py` with `AWSync` — a lightweight client that
incrementally pushes local ActivityWatch bucket events to a self-hosted
aw-sync-server (https://github.com/TimeToBuildBob/aw-sync-server).

Key design:
- Uses `ActivityWatchClient` for the local AW instance
- Talks to the sync server with raw `requests` + Bearer token auth
- Persists a per-bucket high-water mark to `~/.config/activitywatch/aw-sync-state.json`
  so that re-runs only upload new events (incremental sync)
- Handles bucket creation on the sync server automatically
- Errors per bucket are caught and returned as -1 so one bad bucket
  doesn't abort the whole sync

9 unit tests cover: happy path, empty bucket skip, prefix filter,
state persistence, incremental since= arg, error handling, missing
state file, existing remote bucket not recreated, auth header present.
@TimeToBuildBob
Copy link
Contributor Author

@greptileai review

@greptile-apps
Copy link

greptile-apps bot commented Mar 11, 2026

Greptile Summary

This PR introduces aw_client/sync.py with AWSync, a new client class that incrementally pushes local ActivityWatch bucket events to a self-hosted aw-sync-server. It integrates cleanly with the existing ActivityWatchClient, uses a JSON state file to track per-bucket high-water marks for incremental syncs, and is exported from the package __init__.py. Tests cover the core paths.

Key findings:

  • Duplicate-upload bug for zero-duration events (logic): The high-water mark is computed as max(e.timestamp + (e.duration or timedelta(0))). Because timedelta(0) is falsy in Python, a zero-duration or None-duration event produces latest = e.timestamp. On the next sync get_events(start=since) returns events with timestamp >= since, re-fetching (and re-uploading) that event.
  • _get_remote_buckets() called N times per sync (logic): The method is invoked inside sync_bucket(), so a single sync() call with N buckets issues N identical GET requests to the sync server. Remote bucket data should be fetched once in sync() and passed down.
  • Non-portable default state file path (style): _DEFAULT_STATE_FILE is hard-coded to ~/.config/activitywatch/…, which is not the conventional location on Windows. The existing codebase already uses aw_core.dirs.get_data_dir() for platform-aware paths and the same pattern should be followed here.

Confidence Score: 3/5

  • Safe to merge for an experimental/PoC feature, but the duplicate-upload bug and per-bucket redundant network calls should be addressed before production use.
  • The implementation is well-structured and has solid test coverage, but contains a confirmed logic bug (zero-duration events get uploaded twice due to the high-water mark equalling the event timestamp) and a performance issue (remote bucket list fetched N times instead of once per sync). The cross-platform state path is a secondary concern. None of these are security issues, but the duplicate-upload bug could silently corrupt data on the sync server.
  • aw_client/sync.py — specifically the high-water mark computation (lines 155–158) and the _get_remote_buckets() call inside sync_bucket() (line 148).

Important Files Changed

Filename Overview
aw_client/sync.py New AWSync class that pushes local AW events to a self-hosted sync server; has a duplicate-upload bug for zero-duration events and fetches remote buckets N times per sync call instead of once.
aw_client/init.py Exports AWSync alongside ActivityWatchClient; change is minimal and correct.
tests/test_sync.py 9 unit tests covering happy path, filtering, incremental sync, error handling, and auth; does not include a test for the zero-duration duplicate-upload edge case.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant AWSync
    participant LocalAW as ActivityWatchClient (local)
    participant StateFile as aw-sync-state.json
    participant SyncServer as aw-sync-server

    Caller->>AWSync: sync(bucket_filter?)
    AWSync->>LocalAW: get_buckets()
    LocalAW-->>AWSync: {bucket_id: info, ...}

    loop for each matching bucket
        AWSync->>StateFile: _load_state() / read since timestamp
        AWSync->>LocalAW: get_events(bucket_id, start=since)
        LocalAW-->>AWSync: [Event, ...]

        AWSync->>SyncServer: GET /api/0/buckets/ (Auth: Bearer)
        SyncServer-->>AWSync: {existing buckets}

        alt bucket not on server
            AWSync->>SyncServer: POST /api/0/buckets/{id} (create)
            SyncServer-->>AWSync: 201 Created
        end

        AWSync->>SyncServer: POST /api/0/buckets/{id}/events (upload)
        SyncServer-->>AWSync: 200 OK

        AWSync->>StateFile: _save_state() / write new high-water mark
    end

    AWSync-->>Caller: {bucket_id: events_uploaded, ...}
Loading

Last reviewed commit: 5ca7e48

Comment on lines +155 to +158
latest: datetime = max(
e.timestamp + (e.duration or timedelta(0)) for e in events
)
self._state[bucket_id] = latest.isoformat()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate uploads for zero-duration events

The high-water mark is computed as e.timestamp + (e.duration or timedelta(0)). In Python, timedelta(0) is falsy, so the expression correctly evaluates to timedelta(0) in either the None or timedelta(0) case — but the result is the same: when the latest event has a zero (or absent) duration, latest equals the event's timestamp.

On the next sync, get_events(bucket_id, start=since) fetches events with timestamp >= since. Since that zero-duration event's timestamp == since, it gets included in the response again and is re-uploaded, causing a duplicate on the server.

The fix is to advance the marker by one microsecond (the minimum representable tick) so that strictly-later events are fetched next time:

Suggested change
latest: datetime = max(
e.timestamp + (e.duration or timedelta(0)) for e in events
)
self._state[bucket_id] = latest.isoformat()
latest: datetime = max(
e.timestamp + (e.duration or timedelta(0)) for e in events
)
# Add 1 µs so a zero-duration event is not re-fetched on the next sync
from datetime import timedelta as _td
latest += timedelta(microseconds=1)
self._state[bucket_id] = latest.isoformat()
self._save_state()

Or more cleanly, keep the import at module level and apply the offset:

latest: datetime = max(
    e.timestamp + (e.duration or timedelta(0)) for e in events
) + timedelta(microseconds=1)
self._state[bucket_id] = latest.isoformat()
self._save_state()

Comment on lines +148 to +150
remote_buckets = self._get_remote_buckets()
if bucket_id not in remote_buckets:
self._ensure_remote_bucket(bucket_id, event_type, hostname)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remote bucket list fetched once per bucket, not once per sync

_get_remote_buckets() is called inside sync_bucket(), which is called for every bucket during a single sync() run. This means N identical GET requests hit the sync server for N local buckets — even though the list of remote buckets doesn't change between iterations within the same sync() call.

The fix is to fetch remote buckets once in sync() and pass the result down to sync_bucket():

# In sync():
remote_buckets = self._get_remote_buckets()
for bucket_id, info in buckets.items():
    ...
    count = self.sync_bucket(bucket_id, info, remote_buckets=remote_buckets)

# sync_bucket signature:
def sync_bucket(self, bucket_id: str, bucket_info: dict, remote_buckets: Optional[Dict[str, dict]] = None) -> int:
    ...
    if remote_buckets is None:
        remote_buckets = self._get_remote_buckets()
    if bucket_id not in remote_buckets:
        self._ensure_remote_bucket(bucket_id, event_type, hostname)

This also makes sync_bucket() still usable as a standalone method (with a fresh fetch) while being efficient when called from sync().


logger = logging.getLogger(__name__)

_DEFAULT_STATE_FILE = Path.home() / ".config" / "activitywatch" / "aw-sync-state.json"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded ~/.config path is not cross-platform

Path.home() / ".config" / "activitywatch" is XDG-style and works on Linux/macOS, but on Windows it resolves to C:\Users\<user>\.config\activitywatch\, which is not the conventional location for config files there (normally %APPDATA%\activitywatch\).

The existing client.py already uses from aw_core.dirs import get_data_dir for a platform-aware data path. If aw_core.dirs exposes a get_config_dir() (or a state-directory equivalent), it should be used here to stay consistent with how ActivityWatch manages paths across platforms.

Suggested change
_DEFAULT_STATE_FILE = Path.home() / ".config" / "activitywatch" / "aw-sync-state.json"
_DEFAULT_STATE_FILE = Path(get_data_dir("aw-client")) / "aw-sync-state.json"

(Adjust to the correct aw_core.dirs helper once confirmed; the important point is to avoid hardcoding ~/.config.)

@ErikBjare
Copy link
Member

No, we're not doing this "sync server" idea

@ErikBjare ErikBjare closed this Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants