fix(retriever): correctly cap PDF extract actors to avoid CPU oversubscription by charlesbluca · Pull Request #1780 · NVIDIA/NeMo-Retriever

charlesbluca · 2026-04-02T15:04:34Z

Description

Fixes a deadlock introduced when GraphBatchIngestor replaced BatchIngestor as the default batch runtime.
Unlike the legacy path, the graph executor always passes operator classes (not instances) to map_batches, which requires ActorPoolStrategy for all nodes — including PDFExtractionActor, which was previously using TaskPoolStrategy.
Persistent actors hold their CPU allocations continuously, so with the default 12 PDF extract actors × 2 CPUs each, plus ~9 CPUs from other actors in the pipeline (DocToPdf, PDFSplit, PageElements, OCR, UDF, Embed), a single-GPU/32-CPU machine requests 33 CPUs at startup and deadlocks — ReadBinary is immediately backpressured and no data ever flows.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

charlesbluca · 2026-04-02T20:20:17Z

Superseded by #1786

charlesbluca added 2 commits April 2, 2026 14:21

Avoid oversubscribing PDF extract actors

a6ec32c

New approach - use non-PDF CPU overhead to compute PDF extract tasks

3cd265a

charlesbluca changed the title ~~(retriever) Avoid oversubscription of PDF extraction actors~~ fix(retriever): correctly cap PDF extract actors to avoid CPU oversubscription Apr 2, 2026

Merge branch 'main' into bisect-oom

e0af6a5

charlesbluca marked this pull request as ready for review April 2, 2026 15:14

charlesbluca requested review from a team as code owners April 2, 2026 15:14

charlesbluca requested a review from jperez999 April 2, 2026 15:14

charlesbluca closed this Apr 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(retriever): correctly cap PDF extract actors to avoid CPU oversubscription#1780

fix(retriever): correctly cap PDF extract actors to avoid CPU oversubscription#1780
charlesbluca wants to merge 3 commits intoNVIDIA:mainfrom
charlesbluca:bisect-oom

charlesbluca commented Apr 2, 2026 •

edited

Loading

Uh oh!

charlesbluca commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

charlesbluca commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

charlesbluca commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

charlesbluca commented Apr 2, 2026 •

edited

Loading