Skip to content

perf: move inference logging to background writer thread#43

Closed
isaacbmiller wants to merge 1 commit intomainfrom
isaac/perf-background-log-writer
Closed

perf: move inference logging to background writer thread#43
isaacbmiller wants to merge 1 commit intomainfrom
isaac/perf-background-log-writer

Conversation

@isaacbmiller
Copy link
Copy Markdown
Contributor

Split from #38 (4/5)

Changes

Replace synchronous per-request file I/O in log_inference() with a queue-based background writer thread. The writer batches up to 50 entries per flush and groups writes by program to minimize file opens.

Before

Every log_inference() call opened the log file, wrote one JSON line, and closed it -- on the request thread/event loop.

After

log_inference() enqueues a tuple and returns immediately. A daemon thread drains the queue and writes in batches.

Simplifications vs original PR

  • Uses _SENTINEL = object() instead of None -- prevents accidental shutdown if None is enqueued
  • Extracted _drain_remaining() helper for clarity

Files changed

  • src/dspy_cli/server/logging.py -- background writer + queue
  • src/dspy_cli/server/app.py -- start/stop writer in lifecycle

Testing

All 123 existing tests pass.

Replace synchronous per-request file I/O with a queue-based background
writer that batches up to 50 entries per flush. Uses a dedicated sentinel
object (not None) for clean shutdown. log_inference() now never blocks
the caller.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 43d4161ec1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

key = (str(logs_dir), program_name)
if key not in grouped:
grouped[key] = []
grouped[key].append(json.dumps(log_entry))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Catch serialization errors before they kill log writer

_flush_entries() now calls json.dumps(log_entry) outside the try block, so any non-JSON-serializable value in inputs, outputs, or lm_calls raises TypeError and terminates the background thread; after that, all subsequent inference logs are silently dropped because nothing drains _log_queue. The previous synchronous path caught this exception and continued serving requests, so this is a regression that can break metrics/training log collection for the rest of the process after a single malformed payload.

Useful? React with 👍 / 👎.

@isaacbmiller
Copy link
Copy Markdown
Contributor Author

Dropping -- log_inference does one sub-millisecond file append per multi-second LLM request. The background writer adds 130 lines of queue/thread/sentinel/batching complexity to save microseconds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant