perf: move inference logging to background writer thread#43
perf: move inference logging to background writer thread#43isaacbmiller wants to merge 1 commit intomainfrom
Conversation
Replace synchronous per-request file I/O with a queue-based background writer that batches up to 50 entries per flush. Uses a dedicated sentinel object (not None) for clean shutdown. log_inference() now never blocks the caller. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 43d4161ec1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| key = (str(logs_dir), program_name) | ||
| if key not in grouped: | ||
| grouped[key] = [] | ||
| grouped[key].append(json.dumps(log_entry)) |
There was a problem hiding this comment.
Catch serialization errors before they kill log writer
_flush_entries() now calls json.dumps(log_entry) outside the try block, so any non-JSON-serializable value in inputs, outputs, or lm_calls raises TypeError and terminates the background thread; after that, all subsequent inference logs are silently dropped because nothing drains _log_queue. The previous synchronous path caught this exception and continued serving requests, so this is a regression that can break metrics/training log collection for the rest of the process after a single malformed payload.
Useful? React with 👍 / 👎.
|
Dropping -- log_inference does one sub-millisecond file append per multi-second LLM request. The background writer adds 130 lines of queue/thread/sentinel/batching complexity to save microseconds. |
Split from #38 (4/5)
Changes
Replace synchronous per-request file I/O in
log_inference()with a queue-based background writer thread. The writer batches up to 50 entries per flush and groups writes by program to minimize file opens.Before
Every
log_inference()call opened the log file, wrote one JSON line, and closed it -- on the request thread/event loop.After
log_inference()enqueues a tuple and returns immediately. A daemon thread drains the queue and writes in batches.Simplifications vs original PR
_SENTINEL = object()instead ofNone-- prevents accidental shutdown ifNoneis enqueued_drain_remaining()helper for clarityFiles changed
src/dspy_cli/server/logging.py-- background writer + queuesrc/dspy_cli/server/app.py-- start/stop writer in lifecycleTesting
All 123 existing tests pass.