Skip to content

fix(retriever): correctly cap PDF extract actors to avoid CPU oversubscription#1780

Closed
charlesbluca wants to merge 3 commits intoNVIDIA:mainfrom
charlesbluca:bisect-oom
Closed

fix(retriever): correctly cap PDF extract actors to avoid CPU oversubscription#1780
charlesbluca wants to merge 3 commits intoNVIDIA:mainfrom
charlesbluca:bisect-oom

Conversation

@charlesbluca
Copy link
Copy Markdown
Collaborator

@charlesbluca charlesbluca commented Apr 2, 2026

Description

Fixes a deadlock introduced when GraphBatchIngestor replaced BatchIngestor as the default batch runtime.
Unlike the legacy path, the graph executor always passes operator classes (not instances) to map_batches, which requires ActorPoolStrategy for all nodes — including PDFExtractionActor, which was previously using TaskPoolStrategy.
Persistent actors hold their CPU allocations continuously, so with the default 12 PDF extract actors × 2 CPUs each, plus ~9 CPUs from other actors in the pipeline (DocToPdf, PDFSplit, PageElements, OCR, UDF, Embed), a single-GPU/32-CPU machine requests 33 CPUs at startup and deadlocks — ReadBinary is immediately backpressured and no data ever flows.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

@charlesbluca charlesbluca changed the title (retriever) Avoid oversubscription of PDF extraction actors fix(retriever): correctly cap PDF extract actors to avoid CPU oversubscription Apr 2, 2026
@charlesbluca charlesbluca marked this pull request as ready for review April 2, 2026 15:14
@charlesbluca charlesbluca requested review from a team as code owners April 2, 2026 15:14
@charlesbluca charlesbluca requested a review from jperez999 April 2, 2026 15:14
@charlesbluca
Copy link
Copy Markdown
Collaborator Author

Superseded by #1786

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant