Fix graph pipeline performance regressions vs batch_pipeline by charlesbluca · Pull Request #1786 · NVIDIA/NeMo-Retriever

charlesbluca · 2026-04-02T19:43:31Z

Description

The graph_pipeline entrypoint introduced in #1778 had several performance regressions compared to the old batch_pipeline entrypoint it replaced. This PR restores parity.

Root causes fixed:

1. BatchTuningParams were silently discarded.
The batch_tuning field on EmbedParams/ExtractParams was explicitly excluded when building actor kwargs, but nothing translated it into Ray-level scheduling config (batch_size, concurrency, num_gpus per node). GraphIngestor was creating RayDataExecutor with batch_size=1 and no node_overrides regardless of CLI flags. A new batch_tuning_to_node_overrides() function in ingestor_runtime.py performs this translation, and GraphIngestor.ingest() now merges the result with any explicit node_overrides passed at construction.

2. No heuristic defaults when CLI flags are absent.
The old pipeline scaled actor counts and batch sizes from cluster GPU count via resolve_requested_plan(). The new pipeline had no equivalent, so it fell back to batch_size=1 and a single embed actor at num_gpus=0.1. batch_tuning_to_node_overrides() now accepts cluster_resources and uses resolve_requested_plan() as a fallback for any field not explicitly set — matching the heuristic behaviour of batch_pipeline.

3. PDF extract concurrency was not capped.
Without a CPU budget check, PDF extract actors could oversubscribe the cluster and cause downstream actors to deadlock waiting for CPU slots. The cap is now applied: pdf_extract_tasks = min(requested, max(1, (total_cpus - non_pdf_overhead) // cpus_per_task)), where overhead accounts for the 4 fixed pipeline tasks plus initial page-elements, OCR, and embed actors.

4. Ray's new progress UI was not enabled.
Suppressed the hint log line by opting in to DataContext.enable_rich_progress_bars = True / use_ray_tqdm = False in RayDataExecutor.

Observed improvement: ingestion time on jp20 corpus dropped from ~206s back to ~74s (2.8×), matching batch_pipeline throughput.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

jperez999

Great catch on the perf drop here. I was actually seeing it go faster for bo767 on my machine... I will run again after this is in, should see some more perf gains.

nemo_retriever/src/nemo_retriever/examples/graph_pipeline.py

…fixes

charlesbluca added 3 commits April 2, 2026 19:07

Apply BatchTuningParams to relevant nodes

3990841

Ensure Ray resource heuristics are used

285d0a7

Merge branch 'main' into graph-pipeline-pps-fixes

cf192f9

charlesbluca changed the title ~~graph pipeline pps fixes~~ Fix graph pipeline performance regressions vs batch_pipeline Apr 2, 2026

Linting

73227d9

charlesbluca mentioned this pull request Apr 2, 2026

fix(retriever): correctly cap PDF extract actors to avoid CPU oversubscription #1780

Closed

4 tasks

jperez999 requested changes Apr 3, 2026

View reviewed changes

nemo_retriever/src/nemo_retriever/examples/graph_pipeline.py Outdated Show resolved Hide resolved

charlesbluca marked this pull request as ready for review April 3, 2026 16:32

charlesbluca requested review from a team as code owners April 3, 2026 16:32

charlesbluca requested a review from nkmcalli April 3, 2026 16:32

charlesbluca added 3 commits April 3, 2026 16:45

Move Ray progress UI tweaks directly into RayDataExecutor

61261a7

Merge remote-tracking branch 'upstream/main' into graph-pipeline-pps-…

9a18419

…fixes

Merge branch 'main' into graph-pipeline-pps-fixes

d56b725

jperez999 approved these changes Apr 6, 2026

View reviewed changes

jperez999 merged commit f116fc5 into NVIDIA:main Apr 6, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix graph pipeline performance regressions vs batch_pipeline#1786

Fix graph pipeline performance regressions vs batch_pipeline#1786
jperez999 merged 7 commits intoNVIDIA:mainfrom
charlesbluca:graph-pipeline-pps-fixes

charlesbluca commented Apr 2, 2026 •

edited

Loading

Uh oh!

jperez999 left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

charlesbluca commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

jperez999 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

charlesbluca commented Apr 2, 2026 •

edited

Loading