Skip to content

codexUI can become extremely slow and grow to 6-10+ GB cgroup memory (mostly file cache) against large ~/.codex histories #46

@Jackten

Description

@Jackten

Summary

codexUI can become extremely slow over time while codexui.service grows to multi-GB cgroup memory on a host that is otherwise healthy.

In my case, the service repeatedly climbed into the ~6-10 GB range and the UI became sluggish enough that users complained it was barely usable. Restarting the service immediately restored responsiveness and dropped memory back to tens of MB.

The important part: this did not look like a classic anonymous-heap leak in Node or the Codex app-server process. The footprint was overwhelmingly file-backed cache inside the service cgroup.

Actual behavior

During bad windows, I observed:

  • codexUI becomes server-side slow / laggy
  • codexui.service memory grows to roughly:
    • 6.0-6.5 GiB with peak 7.6-8.2 GiB
    • later recurrence up to about 9.6 GiB with peak around 11.2 GiB
  • memory.stat showed mostly file cache, not anon heap
    • one sample: anon 389500928, file 6010097664
    • later sample: anon ~253 MB, file ~9.27 GB
  • host itself stayed healthy
    • load average around 1
    • free -h still showed ~24 GiB available
    • no codexUI cgroup OOM events
  • local HTTP health still returned 200 OK even while the UI felt degraded

After restart:

  • responsiveness returned immediately
  • MemoryCurrent dropped to ~24-31 MB
  • TasksCurrent dropped to 11

Expected behavior

codexUI should stay responsive even with a large Codex history, and it should not accumulate multi-GB file-backed memory inside the service cgroup just from normal UI usage.

Why I think this is codexUI-side, not just “my host is low on RAM”

The host had plenty of free memory and low load during the incident. The memory growth was isolated to codexui.service, and restart of just that service reliably fixed the symptom.

Also, the visible long-lived processes did not account for the full footprint:

  • Node process was modest
  • Codex app-server RSS was only a few hundred MB
  • but service cgroup memory was 6-10+ GB

That points much more toward repeated large file reads / cache accumulation than a straightforward process heap leak.

Evidence from code: likely hot paths

I do not want to overclaim the exact single root cause, but there are several concrete code paths in 0.1.78 that look capable of causing this with a large ~/.codex history.

1) Thread search builds a full in-memory index by reading every thread with includeTurns: true

In the packaged backend bundle:

  • dist-cli/index.js:5402-5459
  • dist-cli/index.js:6085-6096

loadAllThreadsForSearch(appServer):

  • pages through thread/list
  • collects every thread
  • then calls thread/read with includeTurns: true for every thread
  • extracts message text into searchableText
  • stores the docs in a map for later search

Relevant shape:

const response = await appServer.rpc("thread/list", { archived: false, limit: 100, sortKey: "updated_at", cursor })
...
const readResponse = await appServer.rpc("thread/read", { threadId: thread.id, includeTurns: true })
const messageText = extractThreadMessageText(readResponse)

Then /codex-api/thread-search does:

const index = await getThreadSearchIndex()

which lazily builds that full index on first non-empty search.

If a user has a large history, this can become very expensive.

2) Thread live-state / file-change endpoints read the session log file from disk as full UTF-8 text

In the packaged backend bundle:

  • dist-cli/index.js:5526-5545
  • dist-cli/index.js:5563-5584

Both endpoints call thread/read with includeTurns: true, then if a session path exists they do:

const sessionLogRaw = await readFile3(sessionPath, "utf8")

That means a large rollout/session log can be slurped into memory as a full string.

3) Account routes appear to trigger background refresh / inspection work

In the packaged backend bundle:

  • dist-cli/index.js:768-845

GET /codex-api/accounts calls:

const state = await scheduleAccountsBackgroundRefresh()

Separately, during one slowdown window, the service cgroup also contained hot Chrome/Playwright-like children and temp scripts such as:

  • /tmp/chatgpt-account-check.mjs
  • /tmp/chatgpt-assets.mjs

So there may also be account/inspection-related churn contributing to the bloat.

My environment had large history files, which likely amplified the problem

Examples from the same machine:

  • ~/.codex/sessions/.../rollout-...jsonl at 704,253,891 bytes
  • another rollout file at 79,693,927 bytes
  • ~/.codex/log/codex-tui.log at 76,790,328 bytes

With files of that size, endpoints that repeatedly call thread/read includeTurns=true and/or readFile(sessionPath, "utf8") can plausibly generate a lot of cache churn.

Minimal reproduction direction

I do not yet have a single 100%-minimal upstream repro script, but the live pattern is repeatable enough that I think the bug is real:

  1. Use codexUI on a machine with a large existing ~/.codex history.
  2. Browse threads / use the UI normally for a while.
  3. If thread search is used, trigger a non-empty /codex-api/thread-search query.
  4. Observe codexui.service memory over time.
  5. The service can become much slower while cgroup memory climbs into multi-GB territory, mostly as file, not anon.

Suggested fix directions

A few ideas that seem worth investigating:

  1. Avoid full-history thread search indexing
  • do not call thread/read(includeTurns=true) for every thread just to build the first search index
  • cap or page aggressively
  • consider incremental indexing or title/preview-only search by default
  1. Avoid reading huge session logs as one full string
  • stream, tail, or bound reads
  • put size limits around readFile(sessionPath, "utf8")
  1. Add backpressure / bounds for large-history installations
  • size-aware limits
  • cache eviction
  • guardrails when ~/.codex contains very large rollout files
  1. Add observability
  • log when full thread-index builds start / finish
  • log how many threads and total bytes were read
  • log when account inspection spawns helper processes

Why this is not a duplicate

I searched current issues before filing.
The only remotely adjacent issue I found was:

That issue is about naming/title sources, not service slowness or multi-GB memory/file-cache growth.

I did not find an existing issue specifically covering:

  • codexUI slowing down badly over time
  • codexui.service climbing to 6-10+ GB
  • memory dominated by file rather than anon
  • likely interaction with large ~/.codex histories and full-thread reads

Environment

  • codexUI: 0.1.78
  • Codex CLI: 0.120.0
  • Node: v22.22.2
  • OS: Linux netcup-clawd 6.17.0-14-generic x86_64 (Ubuntu)
  • service start shape:
    • /usr/bin/node .../dist-cli/index.js --port 15999 --password <set> --no-open --no-tunnel --no-login /root/.hermes/workspace

Current healthy-ish post-restart state, for comparison:

  • MemoryCurrent=582430720
  • MemoryPeak=1503264768
  • TasksCurrent=43
  • systemctl --user status showed Memory: 555M (peak: 1.3G)

Bottom line

This looks like a real performance bug in codexUI when pointed at a large Codex history:

  • the service becomes perceptibly slow
  • cgroup memory can balloon into multi-GB territory
  • the footprint is mostly file-backed cache, not classic process heap
  • restart fixes it immediately, but the issue recurs

Happy to provide more data if helpful, but I wanted to get the core evidence and likely code hotspots upstream first.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions