Skip to content

Dev/ida streaming memory#2

Merged
buzzer-re merged 3 commits into
mainfrom
dev/ida-streaming-memory
Jun 10, 2026
Merged

Dev/ida streaming memory#2
buzzer-re merged 3 commits into
mainfrom
dev/ida-streaming-memory

Conversation

@buzzer-re

Copy link
Copy Markdown
Owner

Reliable, faster IDA exports for large databases (kernels).

  • Fix OOM on multi-GB .i64: worker copies off tmpfs, worker count capped by real RAM + DB size (env-tunable).
  • No reanalysis of an already-analyzed .i64.
  • Xrefs now come from the decompiler (real read/write refs) instead of a slow source-text scan.
  • --entropy is opt-in, metadata/load/DB-copy phases now show progress.

buzzer-re and others added 3 commits June 9, 2026 23:18
Replace the source-text scan for cross-references with authoritative
decompiler data, speed up the metadata phase, and surface its progress.

- Collect data cross-references from the backend (IDA xrefblk / r2 axtj)
  during analysis while the session is open, store them on
  ProgramAnalysis, and resolve each reference to its containing function
  at metadata time. Removes the O(variables x source-lines) text scan in
  strings.json and variables.json; xrefs are now real read/write refs.
- Add a stepped progress bar and per-step logging to the metadata phase,
  and copy the published database in chunks with a byte progress bar.
- Add a --entropy flag (off by default); per-section Shannon entropy is
  skipped unless requested, and the entropy histogram now counts at C
  speed. Skips the costly per-byte scan over large segments by default.
- Recover argument/local counts from the rendered summary so
  functions.json reports real nargs/nlocals without re-decompiling.
- Skip the IDA string-list rebuild when a reused database already has
  one, and drop dead disasm/decompile/summary caches and _prime.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A requested `--jobs N` previously bypassed the memory budget. On a large
database (e.g. a kernel) each worker loads the whole `.i64`, so N workers
that cannot fit in RAM get OOM-killed mid-export; the streaming pool then
breaks and falls back to the slow single-session path, discarding
progress.

Apply the database-size-aware IDA memory ceiling to requested counts too,
log a note when the count is reduced (with the TOCODE_IDA_WORKER_MEMORY_MB
override), and reflect the cap in the worker summary line. Non-IDA
backends still honor the requested count.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The per-worker memory ceiling was already derived at runtime from the
host's available memory and the real database size, but two heuristic
constants (base overhead and database resident factor) were hardcoded.

Expose them as TOCODE_IDA_WORKER_BASE_MEMORY_MB and
TOCODE_IDA_DB_RESIDENT_FACTOR (same defaults) and add coverage for tuning
the model.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@buzzer-re buzzer-re merged commit 10b6024 into main Jun 10, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant