Skip to content

[codex] Debounce usage API cache writes#503

Merged
robinebers merged 6 commits into
robinebers:mainfrom
zergzorg:codex/debounce-usage-cache
May 25, 2026
Merged

[codex] Debounce usage API cache writes#503
robinebers merged 6 commits into
robinebers:mainfrom
zergzorg:codex/debounce-usage-cache

Conversation

@zergzorg
Copy link
Copy Markdown
Contributor

@zergzorg zergzorg commented May 24, 2026

Summary

Part of #487.

This PR coalesces usage-api-cache.json persistence so a burst of successful probe results does not rewrite the full cache file once per provider result.

Changes:

  • keep local HTTP API snapshots updated in memory immediately;
  • mark cache state dirty using a generation counter;
  • start at most one debounced cache writer thread at a time;
  • flush the latest snapshot map after a short debounce window;
  • flush any pending cache write during normal Tauri app shutdown (ExitRequested / Exit);
  • keep failed disk writes pending so a temp write/rename failure does not lose the latest successful snapshot;
  • serialize cache writes with a small write lock so an exit-time flush cannot be overwritten by an older debounced writer snapshot;
  • add test coverage for non-synchronous writes, eventual persistence of close updates, exit-time flush, and failed-write retry.

This is based on current main. It is independent from #498/#499/#500/#501/#502 and can merge separately after review.

Verification

  • cargo test local_http_api::cache::tests
  • cargo test --lib -- --skip ccusage_timeout_kills_descendant_and_closes_pipes
  • bun run build
  • git diff --check

Note: the skipped Rust test is the existing unrelated flaky ccusage_timeout_kills_descendant_and_closes_pipes, handled separately in #501.

Part of #487.

@github-actions github-actions Bot added the rust Pull requests that update rust code label May 24, 2026
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Re-trigger cubic

@zergzorg
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 353e347d23

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src-tauri/src/local_http_api/cache.rs
@zergzorg
Copy link
Copy Markdown
Contributor Author

zergzorg commented May 24, 2026

Addressed the Codex review feedback in the latest push:

  • added local_http_api::flush_cache() for an explicit synchronous pending-cache flush;
  • call it from the Tauri RunEvent::ExitRequested / RunEvent::Exit path so normal shutdown persists the latest successful probe results even inside the debounce window;
  • added a cache write mutex so a shutdown flush cannot be overwritten by an older in-flight debounced writer snapshot;
  • added a regression test for synchronous flush of a pending debounced write.

Verified locally with:

  • cargo test local_http_api::cache::tests
  • cargo test --lib -- --skip ccusage_timeout_kills_descendant_and_closes_pipes
  • bun run build
  • git diff --check

@zergzorg
Copy link
Copy Markdown
Contributor Author

zergzorg commented May 24, 2026

Follow-up after another pass over the debounce change:

  • fixed a remaining persistence edge case where a failed temp write/rename could mark the dirty generation as flushed before the cache was actually persisted;
  • save_cache() now returns a result, and flushed_generation only advances after a successful write;
  • failed writes stay pending so the debounced writer can retry instead of losing the last successful snapshot;
  • added failed_cache_write_stays_pending_for_retry as regression coverage.

Verified locally:

  • cargo test local_http_api::cache::tests
  • cargo test --lib -- --skip ccusage_timeout_kills_descendant_and_closes_pipes
  • bun run build
  • git diff --check

@codex review

@zergzorg
Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Hooray!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Owner

@robinebers robinebers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! 👋 This is Rob's AI reviewer. Thanks for the PR!

The concurrency design here is solid — the "one worker thread + write serializer lock + dirty/flushed generation counter" handoff is correct, the exit hook covers both ExitRequested and Exit, and the failed-write retry logic (keeping flushed_generation un-bumped) is the right call. Tests are well-scoped and use #[serial]. One thing to fix before merge:

  1. The retry loop has no backoff or log throttling. If a write fails permanently (disk full, app_data_dir removed mid-run, permissions broken), the worker will spin every 500ms forever and emit a log::warn! on every iteration — that's ~120 warn lines per minute, indefinitely. Worth either:

    • exponential backoff on consecutive failures (e.g. 500ms → 1s → 2s → cap at 30s), and/or
    • log only on transitions (first failure, then every Nth, then on recovery)

    Even something simple like a consecutive-failure counter that throttles the log would be enough.

Everything else (the exit-flush wiring, the generation counter semantics, test coverage including the failed-write retry test) looks good.

@zergzorg
Copy link
Copy Markdown
Contributor Author

Addressed the requested retry-loop fix in the latest push:

  • added exponential backoff for repeated cache-write failures, capped at 30s;
  • throttled warning logs to the first failure and power-of-two retries;
  • log an info recovery message after a later successful write;
  • kept explicit flush_cache() behavior unchanged for shutdown flushes;
  • added focused coverage for the retry delay/log-throttle helpers and updated the failed-write retry test.

Verified locally with:

  • rustfmt --check src/local_http_api/cache.rs
  • cargo test local_http_api::cache::tests
  • bun run build
  • git diff --check

Note: full cargo fmt --check still reports pre-existing formatting drift in unrelated files: config.rs, local_http_api/server.rs, webkit_config.rs. I only checked the touched Rust file to avoid unrelated churn.

@zergzorg
Copy link
Copy Markdown
Contributor Author

Follow-up: updated the branch with current origin/main using a normal merge commit, no force-push.

Re-verified after the merge:

  • cargo test local_http_api::cache::tests
  • bun run build

The Vite Node 22.0.0 warning is still present, but the build exits successfully.

Copy link
Copy Markdown
Owner

@robinebers robinebers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backoff + log throttling addressed cleanly. Tests cover the helpers and the retry path. LGTM.

@robinebers robinebers merged commit 810b122 into robinebers:main May 25, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants