codexUI can become extremely slow and grow to 6-10+ GB cgroup memory (mostly file cache) against large ~/.codex histories

## Summary
codexUI can become extremely slow over time while `codexui.service` grows to multi-GB cgroup memory on a host that is otherwise healthy.

In my case, the service repeatedly climbed into the ~6-10 GB range and the UI became sluggish enough that users complained it was barely usable. Restarting the service immediately restored responsiveness and dropped memory back to tens of MB.

The important part: this did not look like a classic anonymous-heap leak in Node or the Codex app-server process. The footprint was overwhelmingly file-backed cache inside the service cgroup.

## Actual behavior
During bad windows, I observed:
- codexUI becomes server-side slow / laggy
- `codexui.service` memory grows to roughly:
  - `6.0-6.5 GiB` with peak `7.6-8.2 GiB`
  - later recurrence up to about `9.6 GiB` with peak around `11.2 GiB`
- `memory.stat` showed mostly file cache, not anon heap
  - one sample: `anon 389500928`, `file 6010097664`
  - later sample: `anon ~253 MB`, `file ~9.27 GB`
- host itself stayed healthy
  - load average around `1`
  - `free -h` still showed ~`24 GiB` available
  - no codexUI cgroup OOM events
- local HTTP health still returned `200 OK` even while the UI felt degraded

After restart:
- responsiveness returned immediately
- `MemoryCurrent` dropped to ~`24-31 MB`
- `TasksCurrent` dropped to `11`

## Expected behavior
codexUI should stay responsive even with a large Codex history, and it should not accumulate multi-GB file-backed memory inside the service cgroup just from normal UI usage.

## Why I think this is codexUI-side, not just “my host is low on RAM”
The host had plenty of free memory and low load during the incident. The memory growth was isolated to `codexui.service`, and restart of just that service reliably fixed the symptom.

Also, the visible long-lived processes did not account for the full footprint:
- Node process was modest
- Codex app-server RSS was only a few hundred MB
- but service cgroup memory was 6-10+ GB

That points much more toward repeated large file reads / cache accumulation than a straightforward process heap leak.

## Evidence from code: likely hot paths
I do not want to overclaim the exact single root cause, but there are several concrete code paths in 0.1.78 that look capable of causing this with a large `~/.codex` history.

### 1) Thread search builds a full in-memory index by reading every thread with `includeTurns: true`
In the packaged backend bundle:
- `dist-cli/index.js:5402-5459`
- `dist-cli/index.js:6085-6096`

`loadAllThreadsForSearch(appServer)`:
- pages through `thread/list`
- collects every thread
- then calls `thread/read` with `includeTurns: true` for every thread
- extracts message text into `searchableText`
- stores the docs in a map for later search

Relevant shape:
```js
const response = await appServer.rpc("thread/list", { archived: false, limit: 100, sortKey: "updated_at", cursor })
...
const readResponse = await appServer.rpc("thread/read", { threadId: thread.id, includeTurns: true })
const messageText = extractThreadMessageText(readResponse)
```

Then `/codex-api/thread-search` does:
```js
const index = await getThreadSearchIndex()
```
which lazily builds that full index on first non-empty search.

If a user has a large history, this can become very expensive.

### 2) Thread live-state / file-change endpoints read the session log file from disk as full UTF-8 text
In the packaged backend bundle:
- `dist-cli/index.js:5526-5545`
- `dist-cli/index.js:5563-5584`

Both endpoints call `thread/read` with `includeTurns: true`, then if a session path exists they do:
```js
const sessionLogRaw = await readFile3(sessionPath, "utf8")
```

That means a large rollout/session log can be slurped into memory as a full string.

### 3) Account routes appear to trigger background refresh / inspection work
In the packaged backend bundle:
- `dist-cli/index.js:768-845`

`GET /codex-api/accounts` calls:
```js
const state = await scheduleAccountsBackgroundRefresh()
```

Separately, during one slowdown window, the service cgroup also contained hot Chrome/Playwright-like children and temp scripts such as:
- `/tmp/chatgpt-account-check.mjs`
- `/tmp/chatgpt-assets.mjs`

So there may also be account/inspection-related churn contributing to the bloat.

## My environment had large history files, which likely amplified the problem
Examples from the same machine:
- `~/.codex/sessions/.../rollout-...jsonl` at `704,253,891` bytes
- another rollout file at `79,693,927` bytes
- `~/.codex/log/codex-tui.log` at `76,790,328` bytes

With files of that size, endpoints that repeatedly call `thread/read includeTurns=true` and/or `readFile(sessionPath, "utf8")` can plausibly generate a lot of cache churn.

## Minimal reproduction direction
I do not yet have a single 100%-minimal upstream repro script, but the live pattern is repeatable enough that I think the bug is real:

1. Use codexUI on a machine with a large existing `~/.codex` history.
2. Browse threads / use the UI normally for a while.
3. If thread search is used, trigger a non-empty `/codex-api/thread-search` query.
4. Observe `codexui.service` memory over time.
5. The service can become much slower while cgroup memory climbs into multi-GB territory, mostly as `file`, not `anon`.

## Suggested fix directions
A few ideas that seem worth investigating:

1. Avoid full-history thread search indexing
- do not call `thread/read(includeTurns=true)` for every thread just to build the first search index
- cap or page aggressively
- consider incremental indexing or title/preview-only search by default

2. Avoid reading huge session logs as one full string
- stream, tail, or bound reads
- put size limits around `readFile(sessionPath, "utf8")`

3. Add backpressure / bounds for large-history installations
- size-aware limits
- cache eviction
- guardrails when `~/.codex` contains very large rollout files

4. Add observability
- log when full thread-index builds start / finish
- log how many threads and total bytes were read
- log when account inspection spawns helper processes

## Why this is not a duplicate
I searched current issues before filing.
The only remotely adjacent issue I found was:
- #7 `Use Codex session index titles instead of preview text for thread names`

That issue is about naming/title sources, not service slowness or multi-GB memory/file-cache growth.

I did not find an existing issue specifically covering:
- codexUI slowing down badly over time
- `codexui.service` climbing to 6-10+ GB
- memory dominated by `file` rather than `anon`
- likely interaction with large `~/.codex` histories and full-thread reads

## Environment
- codexUI: `0.1.78`
- Codex CLI: `0.120.0`
- Node: `v22.22.2`
- OS: `Linux netcup-clawd 6.17.0-14-generic x86_64 (Ubuntu)`
- service start shape:
  - `/usr/bin/node .../dist-cli/index.js --port 15999 --password <set> --no-open --no-tunnel --no-login /root/.hermes/workspace`

Current healthy-ish post-restart state, for comparison:
- `MemoryCurrent=582430720`
- `MemoryPeak=1503264768`
- `TasksCurrent=43`
- `systemctl --user status` showed `Memory: 555M (peak: 1.3G)`

## Bottom line
This looks like a real performance bug in codexUI when pointed at a large Codex history:
- the service becomes perceptibly slow
- cgroup memory can balloon into multi-GB territory
- the footprint is mostly file-backed cache, not classic process heap
- restart fixes it immediately, but the issue recurs

Happy to provide more data if helpful, but I wanted to get the core evidence and likely code hotspots upstream first.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

codexUI can become extremely slow and grow to 6-10+ GB cgroup memory (mostly file cache) against large ~/.codex histories #46

Summary

Actual behavior

Expected behavior

Why I think this is codexUI-side, not just “my host is low on RAM”

Evidence from code: likely hot paths

1) Thread search builds a full in-memory index by reading every thread with `includeTurns: true`

2) Thread live-state / file-change endpoints read the session log file from disk as full UTF-8 text

3) Account routes appear to trigger background refresh / inspection work

My environment had large history files, which likely amplified the problem

Minimal reproduction direction

Suggested fix directions

Why this is not a duplicate

Environment

Bottom line

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

codexUI can become extremely slow and grow to 6-10+ GB cgroup memory (mostly file cache) against large ~/.codex histories #46

Description

Summary

Actual behavior

Expected behavior

Why I think this is codexUI-side, not just “my host is low on RAM”

Evidence from code: likely hot paths

1) Thread search builds a full in-memory index by reading every thread with includeTurns: true

2) Thread live-state / file-change endpoints read the session log file from disk as full UTF-8 text

3) Account routes appear to trigger background refresh / inspection work

My environment had large history files, which likely amplified the problem

Minimal reproduction direction

Suggested fix directions

Why this is not a duplicate

Environment

Bottom line

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1) Thread search builds a full in-memory index by reading every thread with `includeTurns: true`