Dev/ida streaming memory by buzzer-re · Pull Request #2 · buzzer-re/ToCode

buzzer-re · 2026-06-10T02:33:02Z

Reliable, faster IDA exports for large databases (kernels).

Fix OOM on multi-GB .i64: worker copies off tmpfs, worker count capped by real RAM + DB size (env-tunable).
No reanalysis of an already-analyzed .i64.
Xrefs now come from the decompiler (real read/write refs) instead of a slow source-text scan.
--entropy is opt-in, metadata/load/DB-copy phases now show progress.

Replace the source-text scan for cross-references with authoritative decompiler data, speed up the metadata phase, and surface its progress. - Collect data cross-references from the backend (IDA xrefblk / r2 axtj) during analysis while the session is open, store them on ProgramAnalysis, and resolve each reference to its containing function at metadata time. Removes the O(variables x source-lines) text scan in strings.json and variables.json; xrefs are now real read/write refs. - Add a stepped progress bar and per-step logging to the metadata phase, and copy the published database in chunks with a byte progress bar. - Add a --entropy flag (off by default); per-section Shannon entropy is skipped unless requested, and the entropy histogram now counts at C speed. Skips the costly per-byte scan over large segments by default. - Recover argument/local counts from the rendered summary so functions.json reports real nargs/nlocals without re-decompiling. - Skip the IDA string-list rebuild when a reused database already has one, and drop dead disasm/decompile/summary caches and _prime. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

A requested `--jobs N` previously bypassed the memory budget. On a large database (e.g. a kernel) each worker loads the whole `.i64`, so N workers that cannot fit in RAM get OOM-killed mid-export; the streaming pool then breaks and falls back to the slow single-session path, discarding progress. Apply the database-size-aware IDA memory ceiling to requested counts too, log a note when the count is reduced (with the TOCODE_IDA_WORKER_MEMORY_MB override), and reflect the cap in the worker summary line. Non-IDA backends still honor the requested count. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The per-worker memory ceiling was already derived at runtime from the host's available memory and the real database size, but two heuristic constants (base overhead and database resident factor) were hardcoded. Expose them as TOCODE_IDA_WORKER_BASE_MEMORY_MB and TOCODE_IDA_DB_RESIDENT_FACTOR (same defaults) and add coverage for tuning the model. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

buzzer-re and others added 3 commits June 9, 2026 23:18

buzzer-re merged commit 10b6024 into main Jun 10, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dev/ida streaming memory#2

Dev/ida streaming memory#2
buzzer-re merged 3 commits into
mainfrom
dev/ida-streaming-memory

buzzer-re commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

buzzer-re commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant