Skip to content

Dem from stim text can use error decompositions#615

Open
eliotheinrich wants to merge 3 commits into
NVIDIA:mainfrom
eliotheinrich:pr-dem-stim-decompose
Open

Dem from stim text can use error decompositions#615
eliotheinrich wants to merge 3 commits into
NVIDIA:mainfrom
eliotheinrich:pr-dem-stim-decompose

Conversation

@eliotheinrich

@eliotheinrich eliotheinrich commented Jun 15, 2026

Copy link
Copy Markdown

Description

In the current implementation of dem_from_stim_text, the suggested error decomposition ^ characters are ignored, but it is sometimes useful to output graphlike edge check matrices. For example,

import numpy as np
import stim
import cudaq_qec as qec
import beliefmatching

dem_text = "error(0.1) D0 ^ D1\n"

check_matrices = beliefmatching.detector_error_model_to_check_matrices(stim.DetectorErrorModel(dem_text))

# Current behavior: D0 and D1 merged into one column
dem_old = qec.dem_from_stim_text(dem_text)
print("dem_from_stim_text:")
print(np.array(dem_old.detector_error_matrix))          # [[1], [1]]

# Sometimes desirable: accept the error decomposition indicated by ^ and split into two columns
print("beliefmatching's edge_check_matrix:")
print(check_matrices.edge_check_matrix.toarray())                   # [[1, 0], [0, 1]

This PR adds a flag to dem_from_stim_text which allows users to toggle if the ^ characters should be ignored (default) or should be used to generate an error-decomposed DEM. This is useful if the DEM will eventually be used for, i.e., pymatching.

Runtime / performance impact

N/A

Self-review checklist

Please confirm each item before requesting review. Check [x] or strike
through and explain.

Before requesting review

  • I reviewed my own full diff in GitHub or my editor.
  • PR is in Draft if it is not yet ready for review.
  • Temporary / debugging changes have been removed.
  • Local test logs reviewed; no unexplained warnings or errors.
  • CI logs reviewed; no unexplained warnings or errors.
  • Full CI has been run.

Scope and size

  • PR is under ~1000 lines, or an exception is justified in the description.
  • Refactoring-only changes are isolated in their own PR(s).
  • No existing tests were disabled or modified just to make this PR pass
    (if so, an issue has been raised).

Tests

  • New functionality has new tests.
  • Tests fail if the new functionality is broken (including crashes), not
    just when it is missing.
  • Negative tests added where exceptions are expected.
  • Truth data added where simple EXPECT_* / assert checks are
    insufficient for algorithmic correctness.
  • CI runtime impact considered; team notified if significant.

Documentation

  • Public-facing APIs have Doxygen docs.
  • User-visible behavior changes have public docs, or a follow-up is
    tracked.

Code style

  • Naming follows the existing convention (snake_case vs camelCase) for
    the area being modified.

Dependencies

  • No new third-party dependencies, or the team has been notified and
    OSRB tickets filed.

Signed-off-by: Eliot Heinrich <eheinrich@nvidia.com>
Signed-off-by: Eliot Heinrich <eheinrich@nvidia.com>
Signed-off-by: Eliot Heinrich <eheinrich@nvidia.com>
@eliotheinrich eliotheinrich force-pushed the pr-dem-stim-decompose branch from b8f5989 to acbd502 Compare June 15, 2026 19:28
@copy-pr-bot

copy-pr-bot Bot commented Jun 15, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@bmhowe23

Copy link
Copy Markdown
Collaborator

/ok to test acbd502

@eliotheinrich eliotheinrich marked this pull request as ready for review June 16, 2026 14:44
if (!dets.empty() || !obs.empty()) {
detector_hits.push_back(dets);
observable_hits.push_back(obs);
rates.push_back(prob);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the probability semantics for decomposed Stim errors. In Stim, error(p) A ^ B is still one correlated error mechanism with a suggested decomposition, but this creates multiple CUDA-Q columns each with probability p. Existing CUDA-Q consumers treat error_rates / error_rate_vec as per-column independent probabilities, so this would make the split components independently sampleable/weighted.

Can we avoid exposing decomposed columns as independent error mechanisms, or preserve the correlation explicitly, e.g. with shared error_ids plus clear consumer support? Otherwise decompose_errors=True should probably not populate ordinary error_rates in a way that looks probability-preserving.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Vedika's concern that this changes the semantics of the error rate vec. I do think that we need to have both meanings. Since error_rates has an established meaning in this codebase (physical error rates), I suggest that we add another way rather than overloading the existing meaning of error_rates

// Each segment delimited by '^' in the DEM text becomes its own column.
auto flush = [&]() {
if (!dets.empty() || !obs.empty()) {
detector_hits.push_back(dets);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stores the raw component before duplicate detector targets are XOR-cancelled. For example, D0 D0 ^ D1 later becomes an all-zero detector column with a nonzero rate. That is not graphlike and will break consumers such as the PyMatching plugin, which requires each column to have one or two detector endpoints.

Can we canonicalize/XOR-cancel each component before appending, then reject decomposed components with 0 or more than 2 detector endpoints instead of emitting them?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it probably makes the most sense to remove all-zero columns, but it looks like there is an existing test case that expects it as an output:

TEST(StimDemGetDecoder, RepeatedDetectorOrObservableTargetsXorFold) {
const std::string dem_text = R"(error(0.1) D0 D0
error(0.1) L0 L0
)";
auto dem = cudaq::qec::dem_from_stim_text(dem_text);
ASSERT_EQ(dem.num_detectors(), 1u);
ASSERT_EQ(dem.num_observables(), 1u);
ASSERT_EQ(dem.num_error_mechanisms(), 2u);
EXPECT_EQ(dem.detector_error_matrix.at({0u, 0u}), 0u)
<< "duplicate D0 in error 0 should XOR-cancel to 0";
EXPECT_EQ(dem.observables_flips_matrix.at({0u, 1u}), 0u)
<< "duplicate L0 in error 1 should XOR-cancel to 0";
}

Does it make sense for these all-zero columns to be removed only when decompose_errors=true, or to change this test?

/// graphlike decomposition (components separated by '^' in the DEM text) are
/// expanded into one column per component; otherwise the '^' separators are
/// ignored and each error instruction produces a single column.
detector_error_model dem_from_stim_text(const std::string &dem_text,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the only declaration from dem_from_stim_text(const std::string&) to dem_from_stim_text(const std::string&, bool) preserves source compatibility via the default argument, but removes the old exported C++ symbol. Any already-built downstream code/plugin linked against the one-argument symbol would fail to load.

Can we keep the one-argument overload and add a separate two-argument overload? The one-argument overload can simply delegate to dem_from_stim_text(dem_text, false).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is somewhat of a breaking change. However since we don't have a hard downstream dependency (we typically expect downstream to rebuild against the latest main/release), I would prefer that we don't add this overload just to preserve an old mangled symbol.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. If we expect downstream consumers to rebuild against the latest main/release, I’m fine not preserving the old one-arg mangled symbol here. Thanks for clarifying.

@vedika-saravanan

vedika-saravanan commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Thanks @eliotheinrich. I left a few more comments below.

One testing gap that I think should be fixed in this PR: this adds new behavior to the public C++ dem_from_stim_text API, but the new decompose_errors=true path is only covered through Python tests. Could we add C++ gtest coverage as well?

At minimum, I’d like a small test in libs/qec/unittests/test_decoders.cpp covering:

  • default / false behavior still ignores ^ and produces one column per Stim error instruction
  • true behavior splits ^ components into separate columns
  • edge behavior for duplicate detector targets after XOR cancellation, especially if we validate/reject empty decomposed components

Potential follow-up: docs/example clarity. The examples now show dem_from_stim_text(..., decompose_errors=True), but the decoder is still constructed from dem_text, which uses the default non-decomposed path. That can make readers think get_decoder(..., dem_text) is using the decomposed DEM when it is not.

Could we either restructure the example to actually use the decomposed matrix where relevant, or track a follow-up to explicitly separate this as parser inspection only and state that direct get_decoder(..., dem_text) uses the default non-decomposed parsing?

@melody-ren melody-ren left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Eliot! Could you please update the description to clarify that this PR doesn't decompose errors (already done by stim)?

/// expanded into one column per component; otherwise the '^' separators are
/// ignored and each error instruction produces a single column.
detector_error_model dem_from_stim_text(const std::string &dem_text,
bool decompose_errors = false);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want a refund on my high hopes due this name decompose_errors ;). I thought it meant that we're (cudaqx) doing the decomposition. Can we use something like use_stim_err_decomp or some other concise name to indicate that this flags toggles whether we'll ignore or use the stim decomposition? Feel free to suggest a better name. I don't quite like what I suggested but can't think of a better one

if (!dets.empty() || !obs.empty()) {
detector_hits.push_back(dets);
observable_hits.push_back(obs);
rates.push_back(prob);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Vedika's concern that this changes the semantics of the error rate vec. I do think that we need to have both meanings. Since error_rates has an established meaning in this codebase (physical error rates), I suggest that we add another way rather than overloading the existing meaning of error_rates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants