Skip to content

Expand TRT decoder YAML config for composite decoding [depends on PR #524]#536

Open
wsttiger wants to merge 26 commits into
NVIDIA:mainfrom
wsttiger:update_trt_decoder_yaml
Open

Expand TRT decoder YAML config for composite decoding [depends on PR #524]#536
wsttiger wants to merge 26 commits into
NVIDIA:mainfrom
wsttiger:update_trt_decoder_yaml

Conversation

@wsttiger

@wsttiger wsttiger commented May 8, 2026

Copy link
Copy Markdown
Collaborator

Add YAML/config support for TRT decoder runtime options including batch size,
CUDA graph execution, global decoder selection, and PyMatching-specific global
decoder parameters. Wire realtime decoder construction so TRT configs receive
the top-level observable matrix from O_sparse, and pass the same O matrix into
PyMatching global decoder params for composite observable decoding.

Expose the new config fields through Python bindings and heterogeneous_map
round-tripping. Extend YAML tests for TRT config round-trip, runtime parameter
conversion, and O_sparse-to-O injection.

Update test_trt_decoder_composite to support an optional --config-yaml path,
allowing the existing composite demo to construct and run a real TRT+PyMatching
decoder directly from YAML while preserving the original manual CLI path.

bmhowe23 and others added 5 commits April 29, 2026 23:57
…output

Add a "predecoder" execution mode to the TensorRT decoder so it can be
chained with a second decoder (e.g. PyMatching) and return logical-frame
observables directly. The TRT model is assumed to emit a single output
that concatenates [pre_L (num_observables entries), residual_dets (rest)].

New constructor parameters:
- "batch_size": required when the ONNX model has a dynamic batch dim.
  Used to size the optimization profile and pre-allocate I/O buffers.
- "global_decoder" + "global_decoder_params": optional decoder name and
  params for a follow-up decoder run on the residual_dets portion of
  the TRT output. Created with the same H passed to trt_decoder.
- "O": observables matrix (num_observables x block_size). Enables
  decode()/decode_batch() to return the predicted logical frame.
  Number of observables is inferred from O.shape()[0].

Decode behavior matrix:
- no global_decoder, no O   -> raw TRT output (unchanged).
- no global_decoder, O      -> return the pre_L prefix only.
- global_decoder, no O      -> entire output -> global_decoder.result.
- global_decoder, O         -> residual -> global_decoder; return
                               pre_L XOR global_decoder.logical_frame.

Constructor validation when O is set:
- output_size_per_sample >= num_observables, and
- when global_decoder_ is set,
  output_size_per_sample == num_observables + global_decoder.syndrome_size.

Other changes:
- Dynamic batch support: setInputShape per call when the model's batch
  dim is -1; ONNX builder now installs a min/opt/max optimization
  profile when "batch_size" is provided.
- Split decode_batch into a typed decode_batch_impl<float|uint8_t> for
  cleaner dtype dispatch (engine I/O dtypes float32 / uint8 unchanged).
- Better INFO logging: total non-zero input vs residual detector counts
  per batch to help diagnose predecoder behavior.

Signed-off-by: Ben Howe <bhowe@nvidia.com>
Add a realtime test/demo that initializes the TensorRT decoder from an ONNX
predecoder model with PyMatching configured as the global decoder. The driver
loads detector, observable, parity-check, observable, and prior data from the
Stim export bundle, decodes samples through the composite TRT+PyMatching path,
and reports latency, throughput, correctness, and residual-syndrome diagnostics.

Register the new test_trt_decoder_composite target when TensorRT, realtime,
and the TRT decoder plugin are available.

Signed-off-by: Scott Thornton <wsttiger@gmail.com>
Add YAML/config support for TRT decoder runtime options including batch size,
CUDA graph execution, global decoder selection, and PyMatching-specific global
decoder parameters. Wire realtime decoder construction so TRT configs receive
the top-level observable matrix from O_sparse, and pass the same O matrix into
PyMatching global decoder params for composite observable decoding.

Expose the new config fields through Python bindings and heterogeneous_map
round-tripping. Extend YAML tests for TRT config round-trip, runtime parameter
conversion, and O_sparse-to-O injection.

Update test_trt_decoder_composite to support an optional --config-yaml path,
allowing the existing composite demo to construct and run a real TRT+PyMatching
decoder directly from YAML while preserving the original manual CLI path.

Signed-off-by: Scott Thornton <wsttiger@gmail.com>
…yaml

# Conflicts:
#	libs/qec/unittests/realtime/CMakeLists.txt
#	libs/qec/unittests/realtime/test_trt_decoder_composite.cpp
@wsttiger wsttiger marked this pull request as ready for review May 11, 2026 22:10
wsttiger added 5 commits May 12, 2026 00:45
Replace the TRT decoder's hardcoded optional PyMatching global decoder params
with a tagged global_decoder_config variant. Preserve PyMatching as the current
supported concrete config while using std::monostate for the unset case.

Update heterogeneous-map conversion, YAML mapping, and Python bindings so the
existing PyMatching YAML/Python surface continues to round-trip. Extend the YAML
unit test to verify the PyMatching variant arm is selected and still produces
the expected runtime parameter map.

Signed-off-by: Scott Thornton <wsttiger@gmail.com>
…yaml

# Conflicts:
#	libs/qec/python/bindings/py_decoding_config.cpp
Signed-off-by: Scott Thornton <wsttiger@gmail.com>
Signed-off-by: Scott Thornton <wsttiger@gmail.com>
@wsttiger wsttiger force-pushed the update_trt_decoder_yaml branch from 6c2eefc to 26be6b4 Compare May 29, 2026 17:15
@wsttiger wsttiger requested a review from melody-ren May 29, 2026 18:51
Comment thread test_surface_code_trt.py
@@ -0,0 +1,91 @@
# ============================================================================ #

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file sits at the root of the repo. I'm not sure if it's intentional? There are also several hardcoded paths used in this file. From the comment, it seems like this file is meant to be a draft. Is it ready for review or should it be removed from the PR?

Comment thread libs/qec/lib/realtime/config.cpp Outdated
Comment thread libs/qec/lib/realtime/config.cpp
Comment thread libs/qec/lib/realtime/realtime_decoding.cpp
@copy-pr-bot

copy-pr-bot Bot commented Jun 10, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Melody Ren <melodyr@nvidia.com>
The trt_decoder constructed with an "O" observable matrix projects to
observables internally, so it must report decode_result_type::decode_to_obs
to enqueue_syndrome(). Set the result type where decode_to_observables_ is
enabled, and assert it in the composite test.

Signed-off-by: Melody Ren <melodyr@nvidia.com>
Signed-off-by: Melody Ren <melodyr@nvidia.com>
A monostate global_decoder_params (no global decoder attached) was being
mutated into a default pymatching_decoder_config across a serialize ->
deserialize cycle, through two independent serialization layers:

1. heterogeneous_map: to_heterogeneous_map() emitted an empty
   global_decoder_params map whenever global_decoder was set but the
   params were monostate, which read back as a pymatching config.

2. YAML MappingTraits (the path used by to_yaml_str/from_yaml_str, and
   thus by save_dem/load_dem): mapOptional emitted an empty
   'global_decoder_params: {}' for the monostate case, which read back
   into a default pymatching config.

Both layers now emit nothing for monostate. Any runtime need for an empty
params map is handled in prepare_decoder_params (realtime_decoding.cpp),
not in serialization. The heterogeneous_map path also rejects a params
map that carries global_decoder_params without a global_decoder.

Add regression tests: monostate round-trips unchanged through both YAML
and heterogeneous_map and emits no params key; params-without-decoder
throws.

Signed-off-by: Melody Ren <melodyr@nvidia.com>
Signed-off-by: Melody Ren <melodyr@nvidia.com>
Two follow-ups to the monostate round-trip fix:

1. prepare_decoder_params now synthesizes an empty global_decoder_params
   map for a pymatching global decoder before the O_sparse early return.
   Since serialization stopped emitting an empty params map for monostate,
   a global decoder configured to run on residual detectors without an O
   matrix was no longer attached by the plugin (which requires both
   global_decoder and global_decoder_params keys). This is a documented,
   valid configuration, so restore it in runtime prep where it belongs.

2. trt_decoder_config::to_heterogeneous_map now throws when
   global_decoder_params is set but global_decoder is not, matching the
   rejection already enforced by from_heterogeneous_map.

Add regression tests for both.

Signed-off-by: Melody Ren <melodyr@nvidia.com>
Signed-off-by: Melody Ren <melodyr@nvidia.com>
Signed-off-by: Melody Ren <melodyr@nvidia.com>
Signed-off-by: Melody Ren <melodyr@nvidia.com>
Signed-off-by: Melody Ren <melodyr@nvidia.com>
Signed-off-by: Melody Ren <melodyr@nvidia.com>
@melody-ren

Copy link
Copy Markdown
Collaborator

The following were added on top of 26be6b4:

  • decode_to_errs: decode().result is an error vector; enqueue_syndrome() projects it through O_sparse.
  • decode_to_obs: decode().result is already an observable vector; enqueue_syndrome() applies it directly.

enqueue_syndrome() now validates result size against the selected result type, applies corrections through an explicit switch, and logs the result type plus either Errors or Observables. This avoids treating obs-frame decoder output as error indices.

  • Added explicit decoder result-type handling for realtime replay/runtime paths.
    The TRT composite decoder returns observable-frame results, not error-frame results, so callers should not infer meaning from “not errors” or from vector shape. The runtime now uses the decoder result type to decide whether to project errors through O or apply observable results directly.

  • Expanded decoder stats logging to include ResultType, Errors, and Observables.
    This makes log replay able to tell what decode().result meant at the time it was produced, which is required for replaying obs-frame decoders correctly.

  • Fixed replay_decoder_logs.py to interpret logged result types explicitly.
    Replay now handles errs as an error vector and obs as an observable vector. It no longer assumes “non-errs means obs”; unknown result-type sets fail clearly instead of being silently misinterpreted.

  • Fixed TRT global_decoder_params round-tripping.
    global_decoder_params is now represented as variant<monostate, pymatching_decoder_config>. Serialization omits it for monostate, so a config with no params does not round-trip into global_decoder_params: {} and come back as default PyMatching params.

  • Moved runtime-only TRT parameter synthesis into prepare_decoder_params().
    Serialization stays faithful to the config, while runtime still synthesizes an empty global_decoder_params map for global_decoder: pymatching when the plugin needs that key to attach the global decoder. This also happens before the no-O_sparse early return, so the no-O path still attaches PyMatching correctly.

  • Added validation for heterogeneous-map global_decoder_params without global_decoder.
    That catches the write/read path where params are present but there is no decoder name to interpret them.

  • Added/consolidated tests around TRT YAML round-trip, heterogeneous-map conversion, observable matrix injection, monostate global decoder params, no-O runtime prep, and params-without-decoder errors.

Signed-off-by: Melody Ren <melodyr@nvidia.com>
Signed-off-by: Melody Ren <melodyr@nvidia.com>
@melody-ren

Copy link
Copy Markdown
Collaborator

/ok to test e157661

Resolve conflicts by keeping both PyMatching config structs for now:
- pymatching_decoder_config (trt nested global decoder, PR536) and
  Vedika's pymatching_config (standalone realtime decoder, NVIDIA#614) coexist.
- Kept the global_decoder_config variant + its serialization + YAML traits.
- realtime_decoding.cpp: unioned includes; kept both prepare_decoder_params
  and Chuck's new get_realtime_session (NVIDIA#609).

Follow-up: unify on pymatching_config (drop pymatching_decoder_config).
Signed-off-by: Melody Ren <melodyr@nvidia.com>
Signed-off-by: Melody Ren <melodyr@nvidia.com>
@melody-ren

Copy link
Copy Markdown
Collaborator

/ok to test 33057f5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants