Refined MDP for RL #449

Shaobo-Zhou · 2025-09-06T13:27:04Z

Description

This PR refactors the underlying MDP used by the RL-based predictor. It expands the action space, refines state transitions and observations, and adds support for stochastic transpiler passes.

🚀 Major Changes
Expanded Action Space

Added AIRouting as a routing/mapping option and wrapped in SafeAIRouting for stable integration into the RL pipeline.
Added tket layout actions: GraphPlacement and NoiseAwarePlacement (fidelity-aware).
Added optimization actions KAKDecomposition and ElidePermutations.

Support for Stochastic Passes

Wrapped stochastic actions (e.g., AIRouting, SabreLayout) in a multi-trial evaluation loop, optimizing for the figure of merit instead of the default gate count internally in Qiskit passes.
Introduced max_iterations as parameters to control trial counts, enabling improved predictor performance and stability.

Changes in determine_valid_actions_for_state

Refined the logic: After a circuit is mapped, only mapping-preserving optimization actions are allowed to prevent reusing outdated layouts when the circuit structure changes.

Expanded state observation vector

Added normalized 1q/2q gate counts for standard OpenQASM 2.0 quantum gates (as defined in https://github.com/Qiskit/qiskit-terra/blob/main/qiskit/qasm/libs/qelib1.inc).

Fixes and Enhancements

Fixed a bug in OptimizeCliffords by ensuring CollectCliffords runs beforehand.
Corrected computation of estimated success probability in reward.py
Fixed incorrect usage of GatesInBasis in rl/predictorenv.py
Adjusted benchmark level to INDEP in test_predictor_rl.py, since the current action space does not guarantee support for high-level gates.

Dependency Update

Added qiskit-ibm-ai-local-transpiler to the dependencies
Pinned networkx==2.8.5 to ensure compatibility with qiskit-ibm-ai-local-transpiler
Upgraded pytket_qiskit>=0.71.0 for compatibility with the new action space

Checklist:

The pull request only contains commits that are focused and relevant to this change.
I have added appropriate tests that cover the new/changed functionality.
I have updated the documentation to reflect these changes.
I have added entries to the changelog for any noteworthy additions, changes, fixes, or removals.
I have added migration instructions to the upgrade guide (if needed).
The changes follow the project's style guidelines and introduce no new warnings.
The changes are fully tested and pass the CI checks.
I have reviewed my own code changes.

Update action space and feature space Update actions Update action space

Fix: resolve pre-commit issues and add missing annotations Fix: resolve pre-commit issues and add missing annotations Remove example_test.py Remove example_test.py

Fix: resolve pre-commit issues and add missing annotations Fix: resolve pre-commit issues and add missing annotations Fix: resolve pre-commit issues and add missing annotations

Signed-off-by: Shaobo-Zhou <109073755+Shaobo-Zhou@users.noreply.github.com>

Fix bugs Fix bugs Fix bugs

…e gate check

Fix windows runtime warning problem Fix windows runtime warning issue

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (4)

src/mqt/predictor/rl/predictorenv.py (4)

240-247: Decompose both unitary and clifford gates and remove unused noqa.

Right now, the elif means circuits containing both unitary and clifford gates will only have unitary decomposed, potentially leaving clifford in place and still upsetting BasisTranslator. Also, # noqa: SLF001 is flagged by Ruff as an unused directive since SLF001 isn’t enabled.

You can address both points like this:

-        # in case the Qiskit.QuantumCircuit has unitary or u gates or clifford in it, decompose them (because otherwise qiskit will throw an error when applying the BasisTranslator
-        if self.state.count_ops().get("unitary"):
-            self.state = self.state.decompose(gates_to_decompose="unitary")
-        elif self.state.count_ops().get("clifford"):
-            self.state = self.state.decompose(gates_to_decompose="clifford")
-
-        self.state._layout = self.layout  # noqa: SLF001
+        # in case the Qiskit.QuantumCircuit has unitary or clifford gates, decompose them
+        if self.state.count_ops().get("unitary"):
+            self.state = self.state.decompose(gates_to_decompose="unitary")
+        if self.state.count_ops().get("clifford"):
+            self.state = self.state.decompose(gates_to_decompose="clifford")
+
+        self.state._layout = self.layout

If you still want to enforce “private attribute access” linting elsewhere, you can configure that in Ruff instead of keeping a now-ineffective noqa.

250-261: Update calculate_reward docstring and assert to reflect the new signature.

The method now accepts an optional qc, but the docstring still only mentions the “current state”, and the unreachable branch asserts on circuit instead of the discriminant (reward_function).

A small refactor keeps behavior but makes intent clearer:

-    def calculate_reward(self, qc: QuantumCircuit | None = None) -> float:
-        """Calculates and returns the reward for the current state."""
-        circuit = self.state if qc is None else qc
-        if self.reward_function == "expected_fidelity":
-            return expected_fidelity(circuit, self.device)
-        if self.reward_function == "estimated_success_probability":
-            return estimated_success_probability(circuit, self.device)
-        if self.reward_function == "estimated_hellinger_distance":
-            return estimated_hellinger_distance(circuit, self.device, self.hellinger_model)
-        if self.reward_function == "critical_depth":
-            return crit_depth(circuit)
-        assert_never(circuit)
+    def calculate_reward(self, qc: QuantumCircuit | None = None) -> float:
+        """Calculates and returns the reward for either the current state or a given circuit."""
+        circuit = self.state if qc is None else qc
+        if self.reward_function == "expected_fidelity":
+            return expected_fidelity(circuit, self.device)
+        if self.reward_function == "estimated_success_probability":
+            return estimated_success_probability(circuit, self.device)
+        if self.reward_function == "estimated_hellinger_distance":
+            return estimated_hellinger_distance(circuit, self.device, self.hellinger_model)
+        if self.reward_function == "critical_depth":
+            return crit_depth(circuit)
+        assert_never(self.reward_function)

This also makes the “implicit return None” path clearly unreachable to both humans and type-checkers.

362-424: Strengthen fom_aware_compile sentinel and logging/typing.

The overall multi-trial selection logic looks good, but a few tweaks would make it more robust and clearer:

best_fom = -1.0 assumes all reward values are ≥ 0; using float("-inf") avoids coupling this helper to the current set of metrics and is safer for future figures of merit.
best_property_set is always a plain dict (via dict(pm.property_set)), so the return type PropertySet | None is misleading and forces downstream casts.
Ruff flags blind exceptions and f-strings in logging here; parameterized logging plus an explicit comment/# noqa keeps intent while satisfying tooling.

Suggested changes:

-    def fom_aware_compile(
-        self, action: Action, device: Target, qc: QuantumCircuit, max_iteration: int = 4
-    ) -> tuple[QuantumCircuit, PropertySet | None]:
+    def fom_aware_compile(
+        self, action: Action, device: Target, qc: QuantumCircuit, max_iteration: int = 4
+    ) -> tuple[QuantumCircuit, dict[str, Any] | None]:
@@
-        best_result = None
-        best_property_set = None
-        best_fom = -1.0
+        best_result = None
+        best_property_set: dict[str, Any] | None = None
+        best_fom = float("-inf")
@@
-                except Exception as e:
-                    logger.warning(f"[Fallback to SWAP counts] Synthesis or fidelity computation failed: {e}")
+                except Exception as e:  # noqa: BLE001  # Broad catch is intentional to keep RL loop robust to arbitrary passes
+                    logger.warning(
+                        "[Fallback to SWAP counts] Synthesis or fidelity computation failed: %s",
+                        e,
+                    )
@@
-            except Exception:
-                logger.exception(f"[Error] Pass failed at iteration {i + 1}")
+            except Exception:  # noqa: BLE001
+                logger.exception("[Error] Pass failed at iteration %d", i + 1)

You’d also want _apply_qiskit_action and _handle_qiskit_layout_postprocessing to treat pm_property_set as dict[str, Any] | None for consistency (see next comment).

497-540: Make TKET placement→Qiskit layout mapping independent of dict key order.

The current mapping:

qiskit_mapping = {
    qc_tmp.qubits[i]: placement[list(placement.keys())[i]].index[0] for i in range(len(placement))
}

implicitly assumes that:

placement.keys() is ordered in a way that matches qc_tmp.qubits[i], and
TKET’s placement dict iteration order is stable and aligned with Qiskit’s qubit ordering.

That’s fragile and can lead to incorrect layouts if TKET ever changes insertion order or the mapping isn’t constructed in that exact pattern. It’s safer to use the TKET qubit’s own index to locate the corresponding Qiskit qubit explicitly.

For example:

-            try:
-                placement = transpile_pass[0].get_placement_map(tket_qc)
-            except Exception as e:
-                logger.warning("Placement failed (%s): %s. Falling back to original circuit.", action.name, e)
-                return tk_to_qiskit(tket_qc, replace_implicit_swaps=True)
-            else:
-                qc_tmp = tk_to_qiskit(tket_qc, replace_implicit_swaps=True)
-
-                qiskit_mapping = {
-                    qc_tmp.qubits[i]: placement[list(placement.keys())[i]].index[0] for i in range(len(placement))
-                }
+            try:
+                placement = transpile_pass[0].get_placement_map(tket_qc)
+            except Exception as e:  # noqa: BLE001  # TKET placement can fail in multiple ways; we fall back safely.
+                logger.warning("Placement failed (%s): %s. Falling back to original circuit.", action.name, e)
+                return tk_to_qiskit(tket_qc, replace_implicit_swaps=True)
+            else:
+                qc_tmp = tk_to_qiskit(tket_qc, replace_implicit_swaps=True)
+
+                qiskit_mapping = {}
+                for tket_qubit, node in placement.items():
+                    # Map TKET qubit index back to the corresponding Qiskit qubit
+                    qiskit_q = qc_tmp.qubits[tket_qubit.index[0]]
+                    qiskit_mapping[qiskit_q] = node.index[0]

This uses the Qubit.index[0] / Node.index[0] attributes directly and no longer depends on dict ordering, which should be more robust across pytket versions.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a3d4baa and 914192c.

📒 Files selected for processing (3)

pyproject.toml (5 hunks)
src/mqt/predictor/rl/predictorenv.py (12 hunks)
tests/hellinger_distance/test_estimated_hellinger_distance.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/mqt/predictor/rl/predictorenv.py (5)

src/mqt/predictor/rl/helper.py (2)

create_feature_dict (84-110)

get_state_sample (32-69)

src/mqt/predictor/rl/parsing.py (2)

final_layout_bqskit_to_qiskit (186-226)

prepare_noise_data (253-278)

src/mqt/predictor/utils.py (1)

get_rl_openqasm_gates (148-185)

src/mqt/predictor/reward.py (3)

expected_fidelity (46-74)

estimated_success_probability (102-226)

estimated_hellinger_distance (327-349)

tests/compilation/test_predictor_rl.py (1)

get_placement_map (162-164)

🪛 Ruff (0.14.6)

src/mqt/predictor/rl/predictorenv.py

246-246: Unused noqa directive (non-enabled: SLF001)

Remove unused noqa directive

(RUF100)

409-409: Do not catch blind exception: Exception

(BLE001)

410-410: Logging statement uses f-string

(G004)

418-418: Logging statement uses f-string

(G004)

514-514: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: 🐍 Test (macos-14) / 🐍 macos-14
GitHub Check: 🐍 Test (windows-2022) / 🐍 windows-2022
GitHub Check: 🐍 Test (ubuntu-24.04) / 🐍 ubuntu-24.04

🔇 Additional comments (6)

tests/hellinger_distance/test_estimated_hellinger_distance.py (2)

224-224: Unable to verify — repository access failure prevents codebase inspection.

The review comment requests verification that generate_training_data calls don't pass num_workers as an external caller concern. However, the repository could not be cloned for inspection. Without access to:

The actual generate_training_data definition and its signature changes

All call sites of generate_training_data across the codebase

Whether this is part of a public API

I cannot confirm or refute the verification request.

219-221: The repository is not accessible in the sandbox environment. However, based on the review comment provided, I can see the key concerns that were raised:

The 6000-second (100-minute) timeout - flagged as exceptionally long

Internal NO_PARALLEL logic - mentioned in PR context as part of Windows/Python 3.13 parallelism fallback

Since I cannot access the codebase to verify these implementation details directly, I'll provide the rewritten review comment based on the available context:

Verify the exceptionally long 6000-second timeout and confirm NO_PARALLEL logic handles all platforms correctly.

The simplification to a unified code path (removing Windows-specific handling and num_workers parameter) is good. However, the 6000-second (100-minute) timeout for a test operation is exceptionally long and may mask performance issues or cause tests to hang indefinitely if something fails.

Please confirm:

Is the 6000s timeout required for CI stability on Windows/Python 3.13, or can it be reduced?

Verify that the internal NO_PARALLEL fallback logic correctly handles parallel execution across Windows, Linux, and macOS with Python 3.13+, since the parallelism control has moved from explicit num_workers parameters to internal boolean logic.

pyproject.toml (2)

51-66: Clarify Python 3.13 support: dependencies are gated <3.13 but classifiers claim 3.14 support.

Lines 51-52 gate qiskit-ibm-transpiler and qiskit-ibm-ai-local-transpiler to python_version < '3.13', yet the project's classifiers (line 66) declare support for Python 3.14. This creates an ambiguity: will Python 3.14 users have these transpiler dependencies available, or is 3.14 support aspirational pending upstream fixes? Clarify the intent and update either the classifiers or the version constraints accordingly.

40-40: I'm unable to access the repository directly due to clone failures. However, I can assess the review comment based on the provided context. Let me evaluate whether the verification requests are reasonable and well-formed:

The original review comment raises three concerns:

stable-baselines3>=2.7.0 version validation - requests confirmation this is the correct minimum version and that SB3 is used in codebase

Transpiler dependencies usage - requests verification that qiskit_ibm_transpiler and qiskit_ibm_ai_local_transpiler are actually used

Comment consistency - notes that line 52 lacks an explanatory comment (unlike line 51)

These are reasonable verification concerns for a dependency addition. Since I cannot access the codebase to verify these claims, I'll rewrite the review comment to be more focused and action-oriented while preserving the verification requests:

Verify stable-baselines3 minimum version and confirm transpiler dependencies are actively used.

The PR adds stable-baselines3>=2.7.0 (line 40) and gates two transpiler packages to Python <3.13 (lines 51-52). Please confirm: (1) that 2.7.0 is the correct minimum version for stable-baselines3, (2) that both qiskit_ibm_transpiler and qiskit_ibm_ai_local_transpiler are actually imported/used in the codebase, and (3) whether line 52 should include an explanatory comment for consistency with line 51.

src/mqt/predictor/rl/predictorenv.py (2)

181-199: Observation space extension looks consistent with feature extraction.

Using get_rl_openqasm_gates() to build gate_spaces and wiring those into spaces matches the structure of create_feature_dict (per helper.py) and keeps the observation space in sync with the RL gate list. This change looks solid.

592-625: determine_valid_actions_for_state logic is clear and matches the intended policy.

The refactored branching on (only_nat_gates, mapped, self.layout) is much easier to reason about and aligns with the described behavior:

Non-native + unmapped → synthesis/optimization only.

Non-native + mapped → synthesis + mapping-preserving optimizations.

Native + mapped + layout → terminate or mapping-preserving/final optimization.

Native + mapped w/o layout → layout/mapping improvements.

Layout-only → routing.

Otherwise → optimization + layout + mapping.

This is a nice improvement in readability and should make future adjustments to the action policy straightforward.

coderabbitai · 2025-11-27T15:19:04Z

pyproject.toml

+    # TODO(denialhaag): Remove once pytket_qiskit is updated
+    #   https://github.com/munich-quantum-toolkit/predictor/issues/471
+    "qiskit-ibm-runtime>=0.30.0,<0.42.0",


🧹 Nitpick | 🔵 Trivial

TODO comment references issue #471 for qiskit-ibm-runtime constraint.

The upper bound <0.42.0 on qiskit-ibm-runtime is intentional (with a TODO for issue #471). Ensure this constraint is tracked and revisited once the underlying issue is resolved to allow future versions.

If you'd like, I can help monitor or generate a reminder to revisit this constraint when issue #471 is addressed.

🤖 Prompt for AI Agents

In pyproject.toml around lines 36 to 38, the qiskit-ibm-runtime upper bound (<0.42.0) is intentionally pinned with a TODO referencing issue #471; keep the version constraint as-is but update the TODO to include the full issue URL and a clear action (e.g., "remove upper bound when issue #471 is resolved"), and add a tracking reminder (create a GitHub issue/board card or calendar reminder) so this constraint is revisited and removed/relaxed once the referenced issue is closed.

coderabbitai · 2025-11-27T15:19:04Z

pyproject.toml

+[tool.uv]
+override-dependencies = [
+  "networkx==2.8.5", # Required by `qiskit-ibm-transpiler`
 ]


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

uv override-dependencies syntax Python version constraints

💡 Result:

Short answer: override-dependencies entries use standard PEP 508 dependency specifiers (name, optional extras, version specifier, optional environment marker). To constrain by Python version put the marker after a semicolon and quote the version. Examples:

Force exact version for all Pythons:
override-dependencies = ["werkzeug==2.3.0"]. [1]

Only override for older Pythons:
override-dependencies = [
"numpy<1.24; python_version < "3.10"",
"numpy>=1.24; python_version >= "3.10"",
]. [1][2]

Notes:

Version specifiers (outside markers) follow PEP 440 and are unquoted; marker version literals (inside the marker) must be quoted strings. [2]

uv treats override-dependencies as absolute replacements during resolution and reads them from [tool.uv] in pyproject.toml at the workspace root. [1]

Sources:
[1] uv settings / override-dependencies docs.
[2] uv dependency specifiers / PEP 508 markers (packaging) docs.

🏁 Script executed:

# Check the pyproject.toml around lines 51-52 to see transpiler dependency format cat -n pyproject.toml | sed -n '40,65p'

Repository: munich-quantum-toolkit/predictor

Length of output: 208

🌐 Web query:

networkx 2.8.5 Python 3.13 compatibility constraint qiskit-ibm-transpiler

💡 Result:

Short answer: NetworkX 2.8.5 is not compatible with Python 3.13 (it supports Python 3.8–3.10), so if qiskit-ibm-transpiler requires or installs networkx==2.8.5 you cannot use that combo on Python 3.13. [1][2][3]

Details & sources:

NetworkX 2.8.5 supports Python 3.8–3.10 (release notes). [1]

NetworkX newer 3.x releases add support for Python 3.11–3.13 (e.g. NetworkX 3.4+ support 3.13). If you need Python 3.13 install a NetworkX 3.x release. [3]

qiskit-ibm-transpiler release notes indicate NetworkX was added to its requirements in recent releases (so the transpiler can pull in networkx from your environment). Check which networkx version qiskit-ibm-transpiler requests in your installed package/pyproject to confirm whether it pins 2.8.x. [2]

Recommendation: either use Python 3.10 (to keep networkx 2.8.5), or upgrade NetworkX to a 3.x release that supports Python 3.13 and ensure qiskit-ibm-transpiler is compatible with that NetworkX version (check the transpiler’s dependency metadata or test in a venv). [2][3]

Sources:
[1] NetworkX 2.8.5 release notes (supported Python versions).
[2] qiskit-ibm-transpiler release notes (networkx added to requirements).
[3] NetworkX 3.4/3.x release notes (Python 3.13 support).

🌐 Web query:

qiskit-ibm-transpiler networkx version Python 3.13

💡 Result:

Short answer: Yes — qiskit-ibm-transpiler depends on networkx (but does not pin a restrictive networkx version), and networkx releases (3.4+) support Python 3.13, while recent Qiskit releases add official Python 3.13 support. So you can run qiskit-ibm-transpiler on Python 3.13 after installing a networkx release that supports 3.13 (e.g. networkx 3.4+). [1][2][3]

Sources

qiskit-ibm-transpiler (package + release notes showing networkx listed as a dependency). [1]

Qiskit release notes stating Python 3.13 support added. [2]

NetworkX release notes / install docs showing support for Python 3.13 (networkx 3.4+). [3]

References

qiskit-ibm-transpiler package / PyPI & release notes.

Qiskit SDK / release notes (Python 3.13 support).

NetworkX release notes / install docs (supports Python 3.13).

🏁 Script executed:

# Try to fetch raw file from GitHub curl -s "https://raw.githubusercontent.com/munich-quantum-toolkit/predictor/HEAD/pyproject.toml" | sed -n '40,280p' | cat -n | grep -A 20 -B 5 "networkx\|qiskit-ibm-transpiler" | head -100

Repository: munich-quantum-toolkit/predictor

Length of output: 208

Based on the web search results, I can now provide a conclusive verification:

The review comment's concern is valid and the suggested fix is necessary.

Key findings:

uv DOES support Python version constraints via PEP 508 environment markers (syntax: "package==version; python_version < \"X.Y\"")

NetworkX 2.8.5 supports ONLY Python 3.8–3.10 — it is incompatible with Python 3.13

The current code lacks the version constraint, which means projects targeting Python 3.13 would fail when trying to use networkx==2.8.5

The correct implementation should be:

[tool.uv] override-dependencies = [ "networkx==2.8.5; python_version < \"3.13\"", # Required by `qiskit-ibm-transpiler` ]

Add a Python version constraint to the networkx override dependency.

NetworkX 2.8.5 supports only Python 3.8–3.10 and is incompatible with Python 3.13. Without the constraint, users on Python 3.13 will encounter resolution failures. Update the override to: "networkx==2.8.5; python_version < \"3.13\"".

🤖 Prompt for AI Agents

In pyproject.toml around lines 261 to 264, the override for networkx pins 2.8.5 but lacks a Python-version constraint so it will be applied on Python 3.13 (where that release is incompatible); update the override entry to use a PEP 508 environment marker restricting it to Python versions older than 3.13 (i.e., add `; python_version < "3.13"` to the networkx override) so the override only applies for Python < 3.13.

src/mqt/predictor/rl/predictorenv.py

Signed-off-by: Shaobo-Zhou <109073755+Shaobo-Zhou@users.noreply.github.com>

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/mqt/predictor/rl/actions.py (1)
168-191: remove_action docstring and raised exception type are inconsistent.

The docstring promises a ValueError when an action name is missing, but the code actually raises KeyError. This can surprise callers and tests.

Either change the exception type or the docstring; for consistency with the dictionary‑like semantics, I’d lean toward KeyError in both:
 def remove_action(name: str) -> None:
     """Removes an action from the global actions registry by name.

-    Raises:
-        ValueError: If no action with the given name is registered.
+    Raises:
+        KeyError: If no action with the given name is registered.
@@
     if name not in _ACTIONS:
         msg = f"No action with name {name} is registered."
-        raise KeyError(msg)
+        raise KeyError(msg)
(Or switch the raised type to ValueError if that better matches your public API.)

♻️ Duplicate comments (2)

pyproject.toml (2)
51-52: Track future upgrade of qiskit-ibm-transpiler and possible removal of the Python <3.13 guard.

You still pin qiskit-ibm-transpiler==0.13.1 and guard it with python_version < '3.13'. Earlier review already suggested upgrading to a newer release (and relaxing the Python guard) once you’ve validated compatibility. Consider adding a TODO/issue reference here so it doesn’t get forgotten when you move to a 0.14.x/1.x line with full 3.13 support.

265-269: Constrain networkx==2.8.5 override to Python <3.13 to avoid resolver failures.

The unconditional uv override:
override-dependencies = [
  "networkx==2.8.5", # Required by `qiskit-ibm-transpiler`
]
will force-install networkx==2.8.5 even on Python 3.13, where that version is not compatible. Since qiskit-ibm-transpiler itself is only installed for python_version < '3.13', the override should be similarly guarded to prevent 3.13 envs from breaking.

Suggested change:
[tool.uv]
 override-dependencies = [
-  "networkx==2.8.5", # Required by `qiskit-ibm-transpiler`
+  "networkx==2.8.5; python_version < \"3.13\"",  # Required by `qiskit-ibm-transpiler` (<3.13 only)
 ]

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a4ec1c7 and 283345b.

📒 Files selected for processing (2)

pyproject.toml (5 hunks)
src/mqt/predictor/rl/actions.py (21 hunks)

🧰 Additional context used

🧠 Learnings (7)

📓 Common learnings

Learnt from: burgholzer
Repo: munich-quantum-toolkit/core PR: 1326
File: python/mqt/core/__init__.py:22-22
Timestamp: 2025-11-24T10:19:41.147Z
Learning: In the munich-quantum-toolkit/core repository, Ruff is configured with 'ALL' rules enabled by default, and only specific rules are selectively disabled. When reviewing changes that enable previously-disabled rules (like PLC0415), noqa directives for those rules become necessary and should be retained.

📚 Learning: 2025-11-04T15:22:19.558Z

Learnt from: marcelwa
Repo: munich-quantum-toolkit/core PR: 1243
File: test/python/qdmi/qiskit/conftest.py:155-157
Timestamp: 2025-11-04T15:22:19.558Z
Learning: The munich-quantum-toolkit/core repository requires Python 3.10 or later, so Python 3.10+ features (such as `zip(..., strict=...)`, pattern matching, etc.) are acceptable and should not be flagged as compatibility issues.

Applied to files:

pyproject.toml

📚 Learning: 2025-10-09T22:15:59.924Z

Learnt from: denialhaag
Repo: munich-quantum-toolkit/core PR: 1246
File: pyproject.toml:340-341
Timestamp: 2025-10-09T22:15:59.924Z
Learning: Qiskit publishes ABI3 wheels (e.g., cp39-abi3) that are forward-compatible with newer Python versions including Python 3.14, so no explicit Python 3.14 wheels are required for qiskit to work on Python 3.14.

Applied to files:

pyproject.toml

📚 Learning: 2025-10-11T19:39:32.050Z

Learnt from: denialhaag
Repo: munich-quantum-toolkit/debugger PR: 160
File: pyproject.toml:54-54
Timestamp: 2025-10-11T19:39:32.050Z
Learning: Qiskit packages use cp39-abi3 wheels (stable ABI) which are forward-compatible with Python 3.9+ including Python 3.14, even if the package classifiers don't explicitly list Python 3.14 support.

Applied to files:

pyproject.toml

📚 Learning: 2025-11-28T14:33:15.199Z

Learnt from: burgholzer
Repo: munich-quantum-toolkit/core PR: 1337
File: pyproject.toml:351-351
Timestamp: 2025-11-28T14:33:15.199Z
Learning: In the munich-quantum-toolkit/core repository, the dev dependency "ty==0.0.1a27" is intentionally pinned to an exact pre-release version. This dependency is used by the `ty-check` pre-commit hook (via `uv run --only-dev ty check`), and the exact pin ensures all developers run the same version, preventing unexpected issues from updates since ty is in alpha and changing rapidly.

Applied to files:

pyproject.toml

📚 Learning: 2025-10-13T00:03:08.078Z

Learnt from: denialhaag
Repo: munich-quantum-toolkit/debugger PR: 160
File: pyproject.toml:252-256
Timestamp: 2025-10-13T00:03:08.078Z
Learning: In uv's `override-dependencies`, multiple entries for the same package with different environment markers (or one without a marker and one with a specific marker) can coexist. uv correctly resolves these by applying the appropriate constraint based on the active Python version. For example, `["symengine>=0.11,<0.14", "symengine>=0.14.1; python_version >= '3.14'"]` is valid and will apply the second constraint on Python 3.14+.

Applied to files:

pyproject.toml

📚 Learning: 2025-11-04T14:26:25.420Z

Learnt from: marcelwa
Repo: munich-quantum-toolkit/core PR: 1243
File: test/python/qdmi/qiskit/conftest.py:11-19
Timestamp: 2025-11-04T14:26:25.420Z
Learning: In the munich-quantum-toolkit/core repository, Qiskit is always available as a dependency during testing, so import guards for qiskit-dependent imports in test files (e.g., test/python/qdmi/qiskit/*.py) are not necessary.

Applied to files:

pyproject.toml

🔇 Additional comments (6)

pyproject.toml (3)

35-40: Confirm new minimum versions for pytket_qiskit and stable-baselines3 across all supported Pythons.

The raised lower bounds for pytket_qiskit and stable-baselines3 look reasonable, but they tighten your dependency constraints. Please verify that your full test matrix (Python 3.10–3.13, including minimum-constraints runs) passes with these floors and that downstream consumers won’t be broken by the bump.

172-173: mypy override additions for stable_baselines3 and qiskit_ibm_transpiler look good.

Extending the ignore_missing_imports override to cover stable_baselines3.* and qiskit_ibm_transpiler.* is consistent with the rest of this block and will avoid extra stub noise in CI.

132-133: The current regex pattern is correct; the proposed change would cause a TOML parsing error.

The current code uses \\. in TOML, which correctly produces the regex pattern \. (escaped dot) after TOML parsing. This properly matches the literal period in the warning message "Timeout is not supported on Windows."

The suggested change to use \. in TOML is invalid—TOML does not recognize \. as a valid escape sequence and would fail to parse the file entirely. The correct TOML syntax for a single backslash is \\ (double backslash), which is already present in the current code.

Likely an incorrect or invalid review comment.

src/mqt/predictor/rl/actions.py (3)

124-161: Action/DeviceDependentAction typing and new flags look coherent with the registry usage.

The widened transpile_pass union and the new stochastic/preserve_layout fields align with how you instantiate actions below (static lists for device‑independent passes and lambdas for device‑dependent ones). This structure should give the RL env enough metadata to handle stochastic passes and layout‑preserving optimizations cleanly.

484-540: Conditional AIRouting/Sabre actions align with dependency guards; keep them in sync with pyproject.

The new SabreSwap/SabreMapping actions marked as stochastic=True and the conditional AIRouting / AIRouting_opt actions under sys.version_info < (3, 13) are consistent with the Python-version guards in pyproject.toml for qiskit-ibm-transpiler. This should prevent import/runtime errors on 3.13+ where those packages are not installed, while still exposing the richer routing options on older Pythons.

Just ensure tests cover both code paths (with and without AIRouting available) so the RL environment handles the differing action sets correctly.

610-712: Classical-register helpers and SafeAIRouting.run have object-identity issues; reuse original registers/bits and avoid remapping qubits.

Three interacting correctness issues exist:

Classical registers/bits rebuilt inconsistently — extract_cregs_and_measurements creates new ClassicalRegister instances but stores original Clbit references. Appending measurements with mismatched register objects causes failures.

remove_cregs creates new quantum registers, desynchronizing from final_layout — New qubits in qc_noclassical differ from qc_orig.qubits, so lookups in final_layout (keyed by the new qubits passed to AIRouting) fail when code iterates original qubits.

Type hints incorrect — Measurement cargs are list[Clbit], not list[ClassicalRegister].

Solution: Reuse original registers (list(qc.cregs) instead of recreating), avoid qubit remapping in remove_cregs by using QuantumCircuit(*qc.qregs), and correct type hints to list[Clbit]. This preserves object identity through extraction, routing, and restoration.

src/mqt/predictor/rl/actions.py

Shaobo-Zhou · 2025-12-03T17:46:20Z

Note that your CI will likely be failing until #516 is merged later today.

My CI is failing because uvx check-sdist --inject-junk reports that the sdist includes the generated file src/mqt/predictor/_version.py (coming from hatch-vcs).
Is there an existing issue about this, or do you have a recommended way to handle this cleanly?

denialhaag · 2025-12-03T17:58:30Z

Note that your CI will likely be failing until #516 is merged later today.

My CI is failing because uvx check-sdist --inject-junk reports that the sdist includes the generated file src/mqt/predictor/_version.py (coming from hatch-vcs). Is there an existing issue about this, or do you have a recommended way to handle this cleanly?

I'm a bit surprised that this is popping up now, but you can fix it by including the following in pyproject.toml:

[tool.check-sdist]
sdist-only = ["src/mqt/predictor/_version.py"]

We also do this in other projects: https://github.com/munich-quantum-toolkit/core/blob/e30566c236373c489cd546c54323f54b617c5c18/pyproject.toml#L110-L111

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c398d8 and 9211aab.

📒 Files selected for processing (1)

src/mqt/predictor/rl/actions.py (21 hunks)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: 🐍 Test (macos-14) / 🐍 macos-14
GitHub Check: 🐍 Test (windows-2022) / 🐍 windows-2022
GitHub Check: 🐍 Test (ubuntu-24.04) / 🐍 ubuntu-24.04

🔇 Additional comments (1)

src/mqt/predictor/rl/actions.py (1)

676-712: SafeAIRouting.run should use explicit error handling instead of asserts and document mid-circuit measurement limitations

SafeAIRouting.run has two areas that should be hardened:

Replace asserts with explicit RuntimeErrors (lines 696, 704, 706, 707): Asserts disappear under Python's -O flag, making failures silent. Replace with if checks raising RuntimeError.

Document and guard against mid-circuit measurements: The current implementation extracts all measure operations and restores them afterward. If measurements are not at the end of qc_routed, this may change program semantics. Either verify that the input guarantees trailing measurements only, or add an explicit guard to reject unsupported mid-circuit measurement patterns.

The final_layout mapping assumption (that virtual qubits match after removing classical bits) should hold as long as the helper functions preserve qubit identity, but the defensive checks above will catch any violations.

src/mqt/predictor/rl/actions.py

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (5)

src/mqt/predictor/rl/predictorenv.py (5)
427-427: Type annotation mismatch: PropertySet | None initialized to {}.

The type annotation says PropertySet | None but initializes to {} (empty dict). While PropertySet is a dict subclass, this is semantically confusing. Also, the actual assignments use dict(pm.property_set), making the type effectively dict[str, Any] | None.

This was flagged in a previous review. Consider:
-        pm_property_set: PropertySet | None = {}
+        pm_property_set: dict[str, Any] | None = None
459-460: Guard against missing final_layout key.

Direct dictionary access pm_property_set["final_layout"] can raise KeyError if the routing pass doesn't populate this key.

As flagged in previous review, use .get() with a guard:
-        elif action_index in self.actions_routing_indices and self.layout and pm_property_set is not None:
-            self.layout.final_layout = pm_property_set["final_layout"]
+        elif action_index in self.actions_routing_indices and self.layout and pm_property_set is not None:
+            final_layout = pm_property_set.get("final_layout")
+            if final_layout is not None:
+                self.layout.final_layout = final_layout
520-522: Dict key ordering assumption is fragile.

The comprehension relies on placement.keys() iteration order matching qc_tmp.qubits[i] indexing. This was flagged in a previous review.

Use explicit iteration over placement.items() with TKET's UnitID.index:
-                qiskit_mapping = {
-                    qc_tmp.qubits[i]: placement[list(placement.keys())[i]].index[0] for i in range(len(placement))
-                }
+                qiskit_mapping = {}
+                for tket_qubit, node in placement.items():
+                    qiskit_q = qc_tmp.qubits[tket_qubit.index[0]]
+                    qiskit_mapping[qiskit_q] = node.index[0]
250-261: Update docstring and fix assert_never argument.

Two issues flagged in a previous review:

Docstring says "for the current state" but the method now accepts an optional qc parameter

assert_never(circuit) should be assert_never(self.reward_function) since the discriminant is reward_function
     def calculate_reward(self, qc: QuantumCircuit | None = None) -> float:
-        """Calculates and returns the reward for the current state."""
+        """Calculates and returns the reward for the given circuit or current state."""
         circuit = self.state if qc is None else qc
         if self.reward_function == "expected_fidelity":
             return expected_fidelity(circuit, self.device)
         if self.reward_function == "estimated_success_probability":
             return estimated_success_probability(circuit, self.device)
         if self.reward_function == "estimated_hellinger_distance":
             return estimated_hellinger_distance(circuit, self.device, self.hellinger_model)
         if self.reward_function == "critical_depth":
             return crit_depth(circuit)
-        assert_never(circuit)
+        assert_never(self.reward_function)
240-244: Consider decomposing both unitary and clifford gates.

Using elif means if a circuit contains both unitary and clifford gates, only unitary will be decomposed. If both should be decomposed independently, use separate if statements.
-        if self.state.count_ops().get("unitary"):
-            self.state = self.state.decompose(gates_to_decompose="unitary")
-        elif self.state.count_ops().get("clifford"):
-            self.state = self.state.decompose(gates_to_decompose="clifford")
+        if self.state.count_ops().get("unitary"):
+            self.state = self.state.decompose(gates_to_decompose="unitary")
+        if self.state.count_ops().get("clifford"):
+            self.state = self.state.decompose(gates_to_decompose="clifford")

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9211aab and a32571b.

📒 Files selected for processing (2)

src/mqt/predictor/rl/actions.py (21 hunks)
src/mqt/predictor/rl/predictorenv.py (12 hunks)

🧰 Additional context used

🧠 Learnings (4)

📚 Learning: 2025-11-04T15:22:19.558Z

Learnt from: marcelwa
Repo: munich-quantum-toolkit/core PR: 1243
File: test/python/qdmi/qiskit/conftest.py:155-157
Timestamp: 2025-11-04T15:22:19.558Z
Learning: The munich-quantum-toolkit/core repository requires Python 3.10 or later, so Python 3.10+ features (such as `zip(..., strict=...)`, pattern matching, etc.) are acceptable and should not be flagged as compatibility issues.

Applied to files:

src/mqt/predictor/rl/actions.py

📚 Learning: 2025-12-04T06:59:40.294Z

Learnt from: MatthiasReumann
Repo: munich-quantum-toolkit/core PR: 1301
File: mlir/lib/Dialect/MQTOpt/Transforms/Transpilation/LayeredUnit.cpp:84-85
Timestamp: 2025-12-04T06:59:40.294Z
Learning: In the MQTOpt MLIR routing passes (NaiveRoutingPassSC, AStarRoutingPassSC), the input IR is guaranteed to contain only 1-qubit and 2-qubit gates. All 3+-qubit gates must be decomposed before routing; otherwise the input IR is invalid. This invariant allows skipTwoQubitBlock in LayeredUnit.cpp to safely assert wires.size() == 2.

Applied to files:

src/mqt/predictor/rl/actions.py
src/mqt/predictor/rl/predictorenv.py

📚 Learning: 2025-11-24T10:19:41.147Z

Learnt from: burgholzer
Repo: munich-quantum-toolkit/core PR: 1326
File: python/mqt/core/__init__.py:22-22
Timestamp: 2025-11-24T10:19:41.147Z
Learning: In the munich-quantum-toolkit/core repository, Ruff is configured with 'ALL' rules enabled by default, and only specific rules are selectively disabled. When reviewing changes that enable previously-disabled rules (like PLC0415), noqa directives for those rules become necessary and should be retained.

Applied to files:

src/mqt/predictor/rl/predictorenv.py

📚 Learning: 2025-10-07T15:30:42.946Z

Learnt from: MatthiasReumann
Repo: munich-quantum-toolkit/core PR: 1237
File: mlir/include/mlir/Dialect/MQTOpt/Transforms/Transpilation/Layout.h:219-231
Timestamp: 2025-10-07T15:30:42.946Z
Learning: In the Layout class for MLIR quantum routing (mlir/include/mlir/Dialect/MQTOpt/Transforms/Transpilation/Layout.h), the swap method intentionally does NOT swap the hw fields in QubitInfo. This is correct because SSA values represent quantum states at fixed hardware locations, and only their program index associations change during a SWAP gate. The hw field indicates where an SSA value physically resides and remains constant.

Applied to files:

src/mqt/predictor/rl/predictorenv.py

🧬 Code graph analysis (1)

src/mqt/predictor/rl/predictorenv.py (5)

src/mqt/predictor/rl/actions.py (5)

CompilationOrigin (103-109)

DeviceDependentAction (150-161)

PassType (112-121)

Action (125-141)

run (712-740)

src/mqt/predictor/rl/helper.py (2)

create_feature_dict (84-110)

get_path_training_circuits (123-125)

src/mqt/predictor/rl/parsing.py (5)

final_layout_bqskit_to_qiskit (186-226)

final_layout_pytket_to_qiskit (167-183)

postprocess_vf2postlayout (229-250)

prepare_noise_data (253-278)

apply (89-92)

src/mqt/predictor/utils.py (1)

get_rl_openqasm_gates (148-185)

src/mqt/predictor/reward.py (3)

expected_fidelity (46-74)

estimated_success_probability (102-226)

estimated_hellinger_distance (327-349)

🪛 Ruff (0.14.7)

src/mqt/predictor/rl/predictorenv.py

246-246: Unused noqa directive (non-enabled: SLF001)

Remove unused noqa directive

(RUF100)

409-409: Do not catch blind exception: Exception

(BLE001)

410-410: Logging statement uses f-string

(G004)

418-418: Logging statement uses f-string

(G004)

514-514: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: 🐍 Test (windows-2022) / 🐍 windows-2022
GitHub Check: 🐍 Test (macos-14) / 🐍 macos-14
GitHub Check: 🐍 Test (ubuntu-24.04) / 🐍 ubuntu-24.04

🔇 Additional comments (10)

src/mqt/predictor/rl/predictorenv.py (5)

362-424: LGTM: Stochastic compilation logic is correct.

The fom_aware_compile method correctly implements FOM-based selection with uniform maximization (fom > best_fom) for all reward functions. The fallback to SWAP-count optimization on synthesis failure is a sensible resilience pattern.

Two minor static analysis hints (optional to address):

Lines 410, 418: f-string logging could use parameterized logging (logger.warning("... %s", e)) per G004

Lines 409, 417: Broad Exception catches are acceptable here for robustness against arbitrary pass failures

592-625: Well-structured state machine logic.

The refactored determine_valid_actions_for_state method is clearly organized with helpful comments explaining each branch. The logic correctly handles the circuit state transitions:

Non-native gates → synthesis/optimization

Mapped circuits with layout → terminate or fine-tune

Layout chosen but unmapped → routing required

181-199: Observation space expansion and noise data caching look correct.

The observation space now includes normalized gate counts for OpenQASM 2.0 gates via get_rl_openqasm_gates(). The noise data caching (node_err, edge_err, readout_err) avoids redundant extraction from the device target.

137-142: LGTM: preserve_layout attribute check is correct.

The code correctly uses getattr(elem, "preserve_layout", False) to populate actions_mapping_preserving_indices. This was fixed from a previous issue where it incorrectly checked for "preserve".

499-508: LGTM: NoiseAwarePlacement handling with lazy noise data caching.

The noise data is only extracted once via prepare_noise_data() and cached in instance attributes. The action.transpile_pass is correctly called with the noise parameters.

src/mqt/predictor/rl/actions.py (5)

454-470: LGTM: NoiseAwarePlacement uses keyword arguments.

The noise parameters are now passed using keyword arguments (node_errors=, link_errors=, readout_errors=), which addresses the previous concern about positional argument ordering being error-prone.

610-699: LGTM: Helper functions correctly preserve register/bit identity.

The helper functions properly handle Qiskit circuit manipulation:

extract_cregs_and_measurements: Reuses original ClassicalRegister objects and correctly types Clbit

remove_cregs: Preserves Qubit identity by reusing original QuantumRegister objects

add_cregs_and_measurements: Uses qubit_map is not None for proper truthiness handling

All issues from previous reviews have been addressed.

530-540: Clarify: max_iterations=1 in SabreLayout vs outer stochastic loop.

The action is marked stochastic=True, but SabreLayout(..., max_iterations=1) makes each individual pass run deterministic. The stochasticity comes from fom_aware_compile running the entire pass multiple times. This is likely intentional but could be confusing—consider adding a comment.

Is the intent that fom_aware_compile provides the stochastic behavior by running the deterministic SabreLayout(max_iterations=1) multiple times? If so, a brief comment would help future maintainers.

276-283: Good fix: CollectCliffords now runs before OptimizeCliffords.

The PR objectives mentioned ensuring CollectCliffords runs before OptimizeCliffords, which is now correctly implemented. OptimizeCliffords requires Clifford gates to be collected first.

443-451: LGTM: GraphPlacement action registration.

The new GraphPlacement action is properly structured with appropriate timeout and maximum_matches parameters.

src/mqt/predictor/rl/actions.py

src/mqt/predictor/rl/predictorenv.py

Shaobo-Zhou · 2025-12-04T15:24:01Z

@coderabbitai review

coderabbitai · 2025-12-04T15:24:10Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

src/mqt/predictor/rl/actions.py (2)

140-141: Simplify type annotations for stochastic and preserve_layout.

The type bool | None = False is confusing—if False is a valid default, None likely isn't needed. Consider simplifying to bool = False unless there's a semantic distinction for None.

-    stochastic: bool | None = False
-    preserve_layout: bool | None = False
+    stochastic: bool = False
+    preserve_layout: bool = False

723-734: Replace assertions with defensive error handling.

The assertions on lines 724, 727, 730, and 733 will raise AssertionError in production if final_layout is unexpectedly None or malformed. Since AIRouting is an external pass, consider graceful degradation with logging:

             final_layout = getattr(self, "property_set", {}).get("final_layout", None)
-            assert final_layout is not None, "final_layout is None — cannot map virtual qubits"
+            if final_layout is None:
+                msg = "final_layout is None — cannot map virtual qubits"
+                raise ValueError(msg)
             qubit_map = {}
             for virt in qc_orig.qubits:
-                assert virt in final_layout, f"Virtual qubit {virt} not found in final layout!"
+                if virt not in final_layout:
+                    msg = f"Virtual qubit {virt} not found in final layout!"
+                    raise ValueError(msg)
                 phys = final_layout[virt]
                 if isinstance(phys, int):
-                    assert 0 <= phys < len(qc_routed.qubits), f"Physical index {phys} out of range in routed circuit!"
+                    if not (0 <= phys < len(qc_routed.qubits)):
+                        msg = f"Physical index {phys} out of range in routed circuit!"
+                        raise ValueError(msg)
                     qubit_map[virt] = qc_routed.qubits[phys]
                 else:
-                    assert phys in qc_routed.qubits, f"Physical qubit {phys} not found in output circuit!"
+                    if phys not in qc_routed.qubits:
+                        msg = f"Physical qubit {phys} not found in output circuit!"
+                        raise ValueError(msg)
                     qubit_map[virt] = qc_routed.qubits[qc_routed.qubits.index(phys)]

Using explicit exceptions instead of assertions ensures errors are raised even when Python runs with optimizations (-O flag), which strips assertions.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 323938b and 1c70b55.

📒 Files selected for processing (1)

src/mqt/predictor/rl/actions.py (21 hunks)

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-11-04T15:22:19.558Z

Learnt from: marcelwa
Repo: munich-quantum-toolkit/core PR: 1243
File: test/python/qdmi/qiskit/conftest.py:155-157
Timestamp: 2025-11-04T15:22:19.558Z
Learning: The munich-quantum-toolkit/core repository requires Python 3.10 or later, so Python 3.10+ features (such as `zip(..., strict=...)`, pattern matching, etc.) are acceptable and should not be flagged as compatibility issues.

Applied to files:

src/mqt/predictor/rl/actions.py

📚 Learning: 2025-12-04T06:59:40.294Z

Learnt from: MatthiasReumann
Repo: munich-quantum-toolkit/core PR: 1301
File: mlir/lib/Dialect/MQTOpt/Transforms/Transpilation/LayeredUnit.cpp:84-85
Timestamp: 2025-12-04T06:59:40.294Z
Learning: In the MQTOpt MLIR routing passes (NaiveRoutingPassSC, AStarRoutingPassSC), the input IR is guaranteed to contain only 1-qubit and 2-qubit gates. All 3+-qubit gates must be decomposed before routing; otherwise the input IR is invalid. This invariant allows skipTwoQubitBlock in LayeredUnit.cpp to safely assert wires.size() == 2.

Applied to files:

src/mqt/predictor/rl/actions.py

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: 🐍 Test (macos-14) / 🐍 macos-14
GitHub Check: 🐍 Test (windows-2022) / 🐍 windows-2022
GitHub Check: 🐍 Test (ubuntu-24.04) / 🐍 ubuntu-24.04

🔇 Additional comments (8)

src/mqt/predictor/rl/actions.py (8)
276-283: LGTM!

Adding CollectCliffords() before OptimizeCliffords() is correct—OptimizeCliffords requires Clifford gates to be collected first.

452-468: LGTM!

Using keyword arguments (node_errors=, link_errors=, readout_errors=) makes the API less error-prone compared to positional arguments.

528-538: Clarify interaction between max_iterations=1 and stochastic=True.

SabreMapping is marked stochastic=True but uses max_iterations=1. If the RL environment handles multiple trials externally (as mentioned in PR description), this is fine. However, if stochastic=True is meant to indicate internal randomness that benefits from multiple iterations, consider documenting this or increasing max_iterations.

608-697: LGTM!

The helper functions correctly preserve register and qubit identity by reusing original objects rather than cloning. The type annotations are accurate, and the qubit_map is not None check handles the edge case properly.

398-405: Verify time_limit=100 is appropriate for VF2PostLayout.

A 100-second timeout per layout optimization could significantly impact overall compilation time for complex circuits or when many actions are applied. Consider whether this value has been tuned for typical use cases or if it should be configurable.

573-588: Inconsistent CI environment handling for max_synthesis_size.

Other BQSKit parameters (optimization_level, synthesis_epsilon, num_workers) use os.getenv("GITHUB_ACTIONS") checks to reduce resource usage in CI, but max_synthesis_size=3 on line 583 is unconditional. Was this intentional, or should it also be reduced in CI?
-            max_synthesis_size=3,
+            max_synthesis_size=2 if os.getenv("GITHUB_ACTIONS") == "true" else 3,
294-306: Inconsistent lambda parameter conventions across device-dependent actions.

Optimize1qGatesDecomposition_preserve (line 208) takes device and accesses device.operation_names, while Opt2qBlocks_preserve takes native_gate, coupling_map as separate parameters. Verify whether these lambdas are called with compatible signatures and whether a normalization layer exists, or standardize the parameter passing approach across all device-dependent actions.

493-526: The local_mode=True parameter in AIRouting actions is intentional and well-supported. Setting local_mode=True enables local execution of the AI routing pass (when the optional ai-local-mode dependencies are installed) rather than relying on the Qiskit Transpiler Service. This is a documented feature that allows for faster iteration during development and avoids cloud service dependencies. The fallback to cloud execution when dependencies are unavailable ensures robustness. This parameter choice aligns with standard usage patterns in qiskit-ibm-transpiler.

coderabbitai · 2025-12-04T15:48:35Z

src/mqt/predictor/rl/actions.py

+if sys.version_info < (3, 13):
+    from qiskit_ibm_transpiler.ai.routing import AIRouting
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

qiskit-ibm-transpiler Python 3.13 support compatibility

💡 Result:

Short answer: Yes — current qiskit-ibm-transpiler releases are compatible with Python 3.13. The Qiskit SDK has official Python 3.13 support, and the qiskit-ibm-transpiler PyPI releases are published as universal "py3" wheels (recent uploads Nov 21, 2025). [1][2]

Sources:

qiskit-ibm-transpiler on PyPI (wheel metadata showing py3; upload Nov 21, 2025). [1]

Qiskit SDK release notes (notes that Python 3.13 support was added). [2]

🏁 Script executed:

# Search for related version checks or comments in the codebase rg "3\.13|AIRouting|qiskit_ibm_transpiler" --type py -B 2 -A 2

Repository: munich-quantum-toolkit/predictor

Length of output: 208

Remove unnecessary Python 3.13 version gate for AIRouting — it's compatible with Python 3.13+.

The condition if sys.version_info < (3, 13) excludes AIRouting from Python 3.13 and later, but qiskit-ibm-transpiler is officially compatible with Python 3.13+ (published with universal py3 wheels). Remove the version check or update it to allow Python 3.13+:

from qiskit_ibm_transpiler.ai.routing import AIRouting

Users on Python 3.13+ should have access to AIRouting without artificial restrictions.

🤖 Prompt for AI Agents

In src/mqt/predictor/rl/actions.py around lines 84 to 86, the current version check `if sys.version_info < (3, 13):` incorrectly blocks importing AIRouting on Python 3.13+, so remove that conditional and import AIRouting unconditionally; update the code to simply import AIRouting from qiskit_ibm_transpiler.ai.routing (or, if a conditional is still desired, invert/adjust it to include 3.13+), ensuring any linter imports/order remain correct.

@Shaobo-Zhou that is good news, so you can remove these checks (there might also be other files where we restricted these versions)

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

flowerthrower

Hi @Shaobo-Zhou this looks already much better! Thank you very much for the continued efforts. There are only a few final small remarks to be addressed now.

flowerthrower · 2025-12-05T12:33:42Z

src/mqt/predictor/rl/actions.py

+if sys.version_info < (3, 13):
+    from qiskit_ibm_transpiler.ai.routing import AIRouting
+


@Shaobo-Zhou that is good news, so you can remove these checks (there might also be other files where we restricted these versions)

flowerthrower · 2025-12-05T13:58:17Z

pyproject.toml

+    "qiskit-ibm-transpiler==0.13.1; python_version < '3.13'", # Do not support python 3.13 yet
+    "qiskit-ibm-ai-local-transpiler==0.4.2; python_version < '3.13'",


Sounds good.

flowerthrower · 2025-12-05T14:01:07Z

pyproject.toml

+ignore = ["GH200"]
+
+[tool.uv]
+override-dependencies = [
+  "networkx==2.8.5", # Required by `qiskit-ibm-transpiler`


@Shaobo-Zhou check after loosening the qiskit-ibm-transpiler dependency

flowerthrower · 2025-12-05T14:02:06Z

pyproject.toml

+[tool.uv]
+override-dependencies = [
+  "networkx==2.8.5", # Required by `qiskit-ibm-transpiler`
 ]


@Shaobo-Zhou can you verify this

Signed-off-by: Patrick Hopf <81010725+flowerthrower@users.noreply.github.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (4)

src/mqt/predictor/rl/predictorenv.py (4)

241-246: Remove unused noqa and consider decomposing both unitary and clifford independently.

Two minor points:

Line 246 uses # noqa: SLF001, but static analysis indicates this rule is not enabled in your configuration, making the directive unnecessary noise.
The elif at line 243 means if a circuit contains both unitary and clifford gates, only unitary will be decomposed. If both types should be decomposed when present, use two independent if statements.

Apply this diff:

-        if self.state.count_ops().get("unitary"):
+        if self.state.count_ops().get("unitary"):
             self.state = self.state.decompose(gates_to_decompose="unitary")
-        elif self.state.count_ops().get("clifford"):
+        if self.state.count_ops().get("clifford"):
             self.state = self.state.decompose(gates_to_decompose="clifford")
 
-        self.state._layout = self.layout  # noqa: SLF001
+        self.state._layout = self.layout
```<!-- review_comment_end -->

<!-- [scratchpad_start]
[task review_file_1/1 segment_4/9]
Review calculate_reward updates lines 250-261.
[observations]
- Line 250: Added optional qc parameter
- Line 251: Updated docstring
- Line 252: circuit = self.state if qc is None else qc
- Lines 254-260: Return statements for each reward type
- Line 261: assert_never(circuit)
[analysis]
- Past comment suggested docstring update and assert_never(self.reward_function) instead of circuit
- Marked as "Minor" in past comments
- The assert_never on circuit is weird, should be on reward_function
[actions]
Generate review comment (duplicate of past).
[scratchpad_end] -->

---

`250-261`: **Update docstring and fix `assert_never` argument.**

Two minor points:

1. Line 251: The docstring still says "for either the current state or a quantum circuit (if provided)" which is accurate, but could be more precise about the `qc` parameter's role.
2. Line 261: Using `assert_never(circuit)` is incorrect. The exhaustiveness check should be on the discriminant (`self.reward_function`), not on the circuit variable.




Apply this diff:

```diff
     def calculate_reward(self, qc: QuantumCircuit | None = None) -> float:
-        """Calculates and returns the reward for either the current state or a quantum circuit (if provided)."""
+        """Calculate the reward for the current state or an optionally provided circuit.
+
+        Args:
+            qc: Optional circuit to compute reward for. Defaults to self.state if None.
+
+        Returns:
+            The computed reward value.
+        """
         circuit = self.state if qc is None else qc
         if self.reward_function == "expected_fidelity":
             return expected_fidelity(circuit, self.device)
         if self.reward_function == "estimated_success_probability":
             return estimated_success_probability(circuit, self.device)
         if self.reward_function == "estimated_hellinger_distance":
             return estimated_hellinger_distance(circuit, self.device, self.hellinger_model)
         if self.reward_function == "critical_depth":
             return crit_depth(circuit)
-        assert_never(circuit)
+        assert_never(self.reward_function)

426-462: Fix type inconsistency and guard final_layout key access.

Three issues:

Line 427: Type annotation declares PropertySet | None but initializes to {}. The actual values assigned (lines 446, 453) are dict objects converted from pm.property_set, not PropertySet instances.
Lines 446, 453: Assignments use dict(pm.property_set) if hasattr(...) else {}, which returns dict or empty dict, contradicting the | None annotation.
Line 460: Accessing pm_property_set["final_layout"] will raise KeyError if routing passes don't populate this key. The check pm_property_set is not None only guards against None, not missing keys.

Apply this diff:

     def _apply_qiskit_action(self, action: Action, action_index: int) -> QuantumCircuit:
-        pm_property_set: PropertySet | None = {}
+        pm_property_set: dict[str, Any] | None = None
         if getattr(action, "stochastic", False):  # Wrap stochastic action to optimize for the used figure of merit
             altered_qc, pm_property_set = self.fom_aware_compile(
                 action,
                 self.device,
                 self.state,
                 max_iteration=self.max_iter,
             )
         else:
             if action.name in ["QiskitO3", "Opt2qBlocks_preserve"] and isinstance(action, DeviceDependentAction):
                 passes = action.transpile_pass(
                     self.device.operation_names,
                     CouplingMap(self.device.build_coupling_map()) if self.layout else None,
                 )
                 if action.name == "QiskitO3":
                     pm = PassManager([DoWhileController(passes, do_while=action.do_while)])
                 else:
                     pm = PassManager(passes)
                 altered_qc = pm.run(self.state)
-                pm_property_set = dict(pm.property_set) if hasattr(pm, "property_set") else {}
+                pm_property_set = dict(pm.property_set) if hasattr(pm, "property_set") else None
             else:
                 transpile_pass = (
                     action.transpile_pass(self.device) if callable(action.transpile_pass) else action.transpile_pass
                 )
                 pm = PassManager(transpile_pass)
                 altered_qc = pm.run(self.state)
-                pm_property_set = dict(pm.property_set) if hasattr(pm, "property_set") else {}
+                pm_property_set = dict(pm.property_set) if hasattr(pm, "property_set") else None
 
         if action_index in (
             self.actions_layout_indices + self.actions_mapping_indices + self.actions_final_optimization_indices
         ):
             altered_qc = self._handle_qiskit_layout_postprocessing(action, pm_property_set, altered_qc)
-        elif action_index in self.actions_routing_indices and self.layout and pm_property_set is not None:
-            self.layout.final_layout = pm_property_set["final_layout"]
+        elif (
+            action_index in self.actions_routing_indices
+            and self.layout is not None
+            and pm_property_set is not None
+            and "final_layout" in pm_property_set
+        ):
+            self.layout.final_layout = pm_property_set["final_layout"]
 
         return altered_qc

Also update the parameter type in _handle_qiskit_layout_postprocessing (line 467) to dict[str, Any] | None for consistency.

499-540: Fragile dict key ordering assumption in TKET placement mapping.

Two issues:

Lines 520-522: The dict comprehension assumes placement.keys() iteration order aligns with qc_tmp.qubits[i] indexing:
```
qiskit_mapping = {
    qc_tmp.qubits[i]: placement[list(placement.keys())[i]].index[0] for i in range(len(placement))
}
```
This is fragile. If get_placement_map changes iteration order, the layout will be incorrect. According to pytket documentation, get_placement_map() returns dict[Qubit, Node] where both types have .index attributes.
Line 514: Catching broad Exception is flagged by static analysis. If possible, catch the specific TKET exception raised by get_placement_map.

Apply this diff to use explicit iteration:

             try:
                 placement = transpile_pass[0].get_placement_map(tket_qc)
             except Exception as e:
-                logger.warning("Placement failed (%s): %s. Falling back to original circuit.", action.name, e)
+                logger.warning(
+                    "Placement failed (%s): %s. Falling back to original circuit.",
+                    action.name,
+                    e,
+                )
                 return tk_to_qiskit(tket_qc, replace_implicit_swaps=True)
             else:
                 qc_tmp = tk_to_qiskit(tket_qc, replace_implicit_swaps=True)
 
-                qiskit_mapping = {
-                    qc_tmp.qubits[i]: placement[list(placement.keys())[i]].index[0] for i in range(len(placement))
-                }
+                qiskit_mapping = {}
+                for tket_qubit, node in placement.items():
+                    # Map TKET qubit index to corresponding Qiskit qubit
+                    qiskit_q = qc_tmp.qubits[tket_qubit.index[0]]
+                    qiskit_mapping[qiskit_q] = node.index[0]
                 layout = Layout(qiskit_mapping)

This removes the ordering assumption and correctly uses the pytket UnitID.index attributes.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c70b55 and 2153967.

📒 Files selected for processing (1)

src/mqt/predictor/rl/predictorenv.py (12 hunks)

🧰 Additional context used

🧠 Learnings (4)

📚 Learning: 2025-11-24T10:19:41.147Z

Learnt from: burgholzer
Repo: munich-quantum-toolkit/core PR: 1326
File: python/mqt/core/__init__.py:22-22
Timestamp: 2025-11-24T10:19:41.147Z
Learning: In the munich-quantum-toolkit/core repository, Ruff is configured with 'ALL' rules enabled by default, and only specific rules are selectively disabled. When reviewing changes that enable previously-disabled rules (like PLC0415), noqa directives for those rules become necessary and should be retained.

Applied to files:

src/mqt/predictor/rl/predictorenv.py

📚 Learning: 2025-12-04T06:59:40.314Z

Learnt from: MatthiasReumann
Repo: munich-quantum-toolkit/core PR: 1301
File: mlir/lib/Dialect/MQTOpt/Transforms/Transpilation/LayeredUnit.cpp:84-85
Timestamp: 2025-12-04T06:59:40.314Z
Learning: In the MQTOpt MLIR routing passes (NaiveRoutingPassSC, AStarRoutingPassSC), the input IR is guaranteed to contain only 1-qubit and 2-qubit gates. All 3+-qubit gates must be decomposed before routing; otherwise the input IR is invalid. This invariant allows skipTwoQubitBlock in LayeredUnit.cpp to safely assert wires.size() == 2.

Applied to files:

src/mqt/predictor/rl/predictorenv.py

📚 Learning: 2025-10-07T15:30:42.946Z

Learnt from: MatthiasReumann
Repo: munich-quantum-toolkit/core PR: 1237
File: mlir/include/mlir/Dialect/MQTOpt/Transforms/Transpilation/Layout.h:219-231
Timestamp: 2025-10-07T15:30:42.946Z
Learning: In the Layout class for MLIR quantum routing (mlir/include/mlir/Dialect/MQTOpt/Transforms/Transpilation/Layout.h), the swap method intentionally does NOT swap the hw fields in QubitInfo. This is correct because SSA values represent quantum states at fixed hardware locations, and only their program index associations change during a SWAP gate. The hw field indicates where an SSA value physically resides and remains constant.

Applied to files:

src/mqt/predictor/rl/predictorenv.py

📚 Learning: 2025-11-27T21:26:39.677Z

Learnt from: burgholzer
Repo: munich-quantum-toolkit/qmap PR: 846
File: python/mqt/qmap/plugins/qiskit/sc/load_calibration.py:34-34
Timestamp: 2025-11-27T21:26:39.677Z
Learning: In the qmap project, the Ruff linter has the "PL" (pylint) rule category enabled, which includes PLC0415 (import-outside-top-level). Therefore, `# noqa: PLC0415` directives on lazy imports are appropriate and necessary, not unused.

Applied to files:

src/mqt/predictor/rl/predictorenv.py

🧬 Code graph analysis (1)

src/mqt/predictor/rl/predictorenv.py (1)

src/mqt/predictor/utils.py (1)

get_rl_openqasm_gates (148-185)

🪛 Ruff (0.14.7)

src/mqt/predictor/rl/predictorenv.py

246-246: Unused noqa directive (non-enabled: SLF001)

Remove unused noqa directive

(RUF100)

409-409: Do not catch blind exception: Exception

(BLE001)

410-410: Logging statement uses f-string

(G004)

418-418: Logging statement uses f-string

(G004)

514-514: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (1)

src/mqt/predictor/rl/predictorenv.py (1)

592-625: Excellent refactoring of action selection logic.

The refactored determine_valid_actions_for_state method provides clear, well-structured logic for determining valid actions based on the circuit's state (native gates, mapping status, and layout presence). The multi-branch approach correctly handles:

Non-native circuits requiring synthesis or optimization

Mapped circuits needing only mapping-preserving actions

Circuits with layouts ready for fine-tuning or termination

Unmapped circuits needing routing or layout exploration

The comments make the logic easy to follow. Nice work!

coderabbitai · 2025-12-05T14:16:10Z

src/mqt/predictor/rl/predictorenv.py

+    def fom_aware_compile(
+        self, action: Action, device: Target, qc: QuantumCircuit, max_iteration: int = 20
+    ) -> tuple[QuantumCircuit, PropertySet | None]:
+        """Run a stochastic pass multiple times optimizing for the given figure of merit.
+
+        Args:
+            action: The action containing the transpile pass.
+            device: The compilation target device.
+            qc: The input quantum circuit.
+            max_iteration: Number of iterations to run the pass.
+
+        Returns:
+            A tuple of the best circuit found and its property set (if available).
+        """
+        best_result = None
+        best_property_set = None
+        best_fom = -1.0
+        best_swap_count = float("inf")  # for fallback
+
+        assert callable(action.transpile_pass), "Mapping action should be callable"
+        for i in range(max_iteration):
+            pm = PassManager(action.transpile_pass(device))
+            try:
+                out_circ = pm.run(qc)
+                prop_set = dict(pm.property_set)
+
+                try:
+                    # For fidelity-based metrics, do a cheap "lookahead" synthesis step:
+                    # routing may have introduced non-native SWAPs, so we translate the
+                    # circuit into the device's native basis before evaluating the metric.
+                    #
+                    # Note:
+                    # - BasisTranslator *only* performs basis conversion; it does not optimize.
+                    # - This isolates the effect of mapping (inserted SWAPs) on fidelity
+                    #   without conflating it with further optimizations.
+
+                    synth_pass = PassManager([
+                        BasisTranslator(StandardEquivalenceLibrary, target_basis=device.operation_names)
+                    ])
+                    synth_circ = synth_pass.run(out_circ.copy())
+                    fom = self.calculate_reward(synth_circ)
+
+                    if fom > best_fom:
+                        best_fom = fom
+                        best_result = out_circ
+                        best_property_set = prop_set
+
+                except Exception as e:
+                    logger.warning(f"[Fallback to SWAP counts] Synthesis or fidelity computation failed: {e}")
+                    swap_count = out_circ.count_ops().get("swap", 0)
+                    if best_result is None or swap_count < best_swap_count:
+                        best_swap_count = swap_count
+                        best_result = out_circ
+                        best_property_set = prop_set
+
+            except Exception:
+                logger.exception(f"[Error] Pass failed at iteration {i + 1}")
+                continue
+
+        if best_result is not None:
+            return best_result, best_property_set
+        logger.error("All attempts failed.")
+        return qc, None


⚠️ Potential issue | 🔴 Critical

Critical: fom_aware_compile only optimizes fidelity-based metrics; critical_depth is ignored.

The current implementation at lines 398-407 only handles synthesis + fidelity evaluation in a single code path, with no alternative branch for non-fidelity metrics like critical_depth. This means:

When reward_function == "critical_depth", the code never computes or compares FOM values.

best_result remains None throughout all iterations.

The method always falls through to line 424 returning (qc, None) with "All attempts failed" logged, even though all passes may have succeeded.

This breaks the intended multi-trial optimization for stochastic actions when using critical_depth as the reward function.

Additionally, address the static analysis hints:

Lines 409, 417: Catching broad Exception is flagged. If feasible, catch specific Qiskit exceptions, or document why blanket catches are required for RL robustness.

Lines 410, 418: Replace f-string logging with parameterized logging to avoid eager string formatting.

Apply this diff:

def fom_aware_compile( self, action: Action, device: Target, qc: QuantumCircuit, max_iteration: int = 20 ) -> tuple[QuantumCircuit, PropertySet | None]: """Run a stochastic pass multiple times optimizing for the given figure of merit. Args: action: The action containing the transpile pass. device: The compilation target device. qc: The input quantum circuit. max_iteration: Number of iterations to run the pass. Returns: A tuple of the best circuit found and its property set (if available). """ best_result = None best_property_set = None best_fom = -1.0 best_swap_count = float("inf") # for fallback assert callable(action.transpile_pass), "Mapping action should be callable" for i in range(max_iteration): pm = PassManager(action.transpile_pass(device)) try: out_circ = pm.run(qc) prop_set = dict(pm.property_set) try: - # For fidelity-based metrics, do a cheap "lookahead" synthesis step: - # routing may have introduced non-native SWAPs, so we translate the - # circuit into the device's native basis before evaluating the metric. - # - # Note: - # - BasisTranslator *only* performs basis conversion; it does not optimize. - # - This isolates the effect of mapping (inserted SWAPs) on fidelity - # without conflating it with further optimizations. - + # Synthesize for lookahead evaluation (routing may insert non-native SWAPs) synth_pass = PassManager([ BasisTranslator(StandardEquivalenceLibrary, target_basis=device.operation_names) ]) synth_circ = synth_pass.run(out_circ.copy()) fom = self.calculate_reward(synth_circ) if fom > best_fom: best_fom = fom best_result = out_circ best_property_set = prop_set except Exception as e: - logger.warning(f"[Fallback to SWAP counts] Synthesis or fidelity computation failed: {e}") + logger.warning( + "[Fallback to SWAP counts] Synthesis or fidelity computation failed: %s", + e, + ) swap_count = out_circ.count_ops().get("swap", 0) if best_result is None or swap_count < best_swap_count: best_swap_count = swap_count best_result = out_circ best_property_set = prop_set except Exception: - logger.exception(f"[Error] Pass failed at iteration {i + 1}") + logger.exception("[Error] Pass failed at iteration %d", i + 1) continue if best_result is not None: return best_result, best_property_set logger.error("All attempts failed.") return qc, None

Note: The current design assumes all reward functions should be maximized (higher is better). If you need to support minimization metrics in the future, initialize best_fom = float("-inf") and keep the > comparator universally, ensuring all reward functions return values where higher = better.

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.14.7)

409-409: Do not catch blind exception: Exception

(BLE001)

410-410: Logging statement uses f-string

(G004)

418-418: Logging statement uses f-string

(G004)

flowerthrower

First review got lost - this is my attempt to restore the most important comments.

flowerthrower · 2025-12-05T14:09:04Z

src/mqt/predictor/rl/actions.py

            optimization_level=1 if os.getenv("GITHUB_ACTIONS") == "true" else 2,
            synthesis_epsilon=1e-1 if os.getenv("GITHUB_ACTIONS") == "true" else 1e-8,
-            max_synthesis_size=2 if os.getenv("GITHUB_ACTIONS") == "true" else 3,
+            max_synthesis_size=3,


flowerthrowerPending
Is there a specific reason for removing this (and not the others above)?

Not really, I think this is outdated

flowerthrower · 2025-12-05T14:13:53Z

src/mqt/predictor/rl/predictorenv.py

+            if not mapped:
+                # Non-native + unmapped → synthesis or optimization
+                return self.actions_synthesis_indices + self.actions_opt_indices
+            # Non-native + mapped → synthesis or mapping-preserving


I thought all post-synthesis passes are native-gate prerserving (e.g. by doing re-synthesis after routing) in order to keep the flow pictured in the MDP of your thesis?

Indeed after mapping the passes are native-gate preserving as well. I think the comments are outdated and I will update them.

flowerthrower · 2025-12-05T14:14:47Z

src/mqt/predictor/utils.py

    )
+
+
+def get_rl_openqasm_gates() -> list[str]:


I think we should not duplicate get_openqasm_gates here, but instead move the original function into the shared utils.py folder and only explicitly remove unnecessary gates here. Naming here could then be get_openqasm_gates_without_u.

flowerthrower · 2025-12-05T14:18:11Z

CHANGELOG.md


 ### Changed

+- ✨ Improved the MDP for RL ([#449]) ([**@Shaobo-Zhou**])


Very humble of you to list only these things, but you also added:
Expanded Action Space
Added AIRouting
Added optimization actions KAKDecomposition and ElidePermutations.
Support for Stochastic Passes
Wrapped stochastic actions
Expanded state observation vector

Shaobo-Zhou · 2025-12-05T14:33:21Z

First review got lost - this is my attempt to restore the most important comments

Regarding the versions, qiskit-ibm-transpiler has a new release 0.15.0 just 3 days ago, which supports Python 3.13. However, the requirement remains networkx==2.8.5 (https://github.com/Qiskit/qiskit-ibm-transpiler/blob/0.15.0/pyproject.toml). I will check if the new qiskit-ibm-transpiler version indeed work with Python 3.13 and other dependencies

Signed-off-by: Shaobo-Zhou <109073755+Shaobo-Zhou@users.noreply.github.com>

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (10)

src/mqt/predictor/rl/actions.py (3)
84-86: Remove the Python 3.13 version gate for AIRouting.

A previous review confirmed that qiskit-ibm-transpiler is compatible with Python 3.13+ (published with universal py3 wheels). This version check artificially excludes AIRouting from Python 3.13+ users.
qiskit-ibm-transpiler Python 3.13 compatibility December 2025
140-141: Simplify type annotations for stochastic and preserve_layout.

The type bool | None = False is semantically confusing—if False is a valid default, None likely has no meaningful distinction. Consider simplifying to bool = False unless there's a specific use case for None.
-    stochastic: bool | None = False
-    preserve_layout: bool | None = False
+    stochastic: bool = False
+    preserve_layout: bool = False
723-734: Consider defensive handling instead of assertions for final_layout.

The assertions in SafeAIRouting.run() will crash in production if final_layout is unexpectedly None or malformed. Since this wraps an external pass (AIRouting), consider logging warnings and returning gracefully instead of asserting.
             final_layout = getattr(self, "property_set", {}).get("final_layout", None)
-            assert final_layout is not None, "final_layout is None — cannot map virtual qubits"
+            if final_layout is None:
+                logger.warning("final_layout is None; returning routed circuit without measurement restoration")
+                return dag_routed
             qubit_map = {}
             for virt in qc_orig.qubits:
-                assert virt in final_layout, f"Virtual qubit {virt} not found in final layout!"
+                if virt not in final_layout:
+                    logger.warning("Virtual qubit %s not found in final layout; skipping", virt)
+                    continue
src/mqt/predictor/rl/predictorenv.py (7)
240-244: Handle both unitary and clifford gates if both are present.

The elif branch means only unitary is decomposed if both gate types exist in the circuit. If both should be decomposed when present, use two independent if statements.
-        if self.state.count_ops().get("unitary"):
-            self.state = self.state.decompose(gates_to_decompose="unitary")
-        elif self.state.count_ops().get("clifford"):
+        if self.state.count_ops().get("unitary"):
+            self.state = self.state.decompose(gates_to_decompose="unitary")
+        if self.state.count_ops().get("clifford"):
             self.state = self.state.decompose(gates_to_decompose="clifford")
246-246: Remove unused noqa directive.

Static analysis indicates SLF001 is not enabled, so the directive is unnecessary.
-        self.state._layout = self.layout  # noqa: SLF001
+        self.state._layout = self.layout
259-261: Use assert_never on the discriminant, not the circuit.

For exhaustiveness checking, assert_never(self.reward_function) is more idiomatic than assert_never(circuit) since the unreachable branch is determined by the reward function value.
         if self.reward_function == "critical_depth":
             return crit_depth(circuit)
-        assert_never(circuit)
+        assert_never(self.reward_function)
459-460: Guard against missing final_layout key.

Accessing pm_property_set["final_layout"] without checking key existence risks KeyError if routing passes don't populate it.
-        elif action_index in self.actions_routing_indices and self.layout and pm_property_set is not None:
-            self.layout.final_layout = pm_property_set["final_layout"]
+        elif (
+            action_index in self.actions_routing_indices
+            and self.layout is not None
+            and pm_property_set is not None
+            and "final_layout" in pm_property_set
+        ):
+            self.layout.final_layout = pm_property_set["final_layout"]
514-516: Improve exception handling for TKET placement.

The broad Exception catch hides potential programming errors. If feasible, catch the specific TKET exception type raised by get_placement_map.

520-522: Dict key ordering assumption is fragile.

The comprehension assumes placement.keys() iteration order matches qc_tmp.qubits[i] indexing, which is fragile if get_placement_map changes iteration order.

Use explicit iteration over placement.items() with the TKET qubit's index:
-                qiskit_mapping = {
-                    qc_tmp.qubits[i]: placement[list(placement.keys())[i]].index[0] for i in range(len(placement))
-                }
+                qiskit_mapping = {}
+                for tket_qubit, node in placement.items():
+                    qiskit_q = qc_tmp.qubits[tket_qubit.index[0]]
+                    qiskit_mapping[qiskit_q] = node.index[0]
This correctly uses the pytket UnitID.index attributes and eliminates the ordering assumption.

409-419: Use parameterized logging instead of f-strings.

Replace f-string logging with parameterized logging for consistency with Python logging best practices and to satisfy Ruff G004.
             except Exception as e:
-                logger.warning(f"[Fallback to SWAP counts] Synthesis or fidelity computation failed: {e}")
+                logger.warning("[Fallback to SWAP counts] Synthesis or fidelity computation failed: %s", e)
                 swap_count = out_circ.count_ops().get("swap", 0)
                 if best_result is None or swap_count < best_swap_count:
                     best_swap_count = swap_count
                     best_result = out_circ
                     best_property_set = prop_set

         except Exception:
-            logger.exception(f"[Error] Pass failed at iteration {i + 1}")
+            logger.exception("[Error] Pass failed at iteration %d", i + 1)
             continue

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2153967 and 4ff9196.

📒 Files selected for processing (7)

CHANGELOG.md (2 hunks)
src/mqt/predictor/ml/helper.py (1 hunks)
src/mqt/predictor/reward.py (1 hunks)
src/mqt/predictor/rl/actions.py (20 hunks)
src/mqt/predictor/rl/helper.py (3 hunks)
src/mqt/predictor/rl/predictorenv.py (12 hunks)
src/mqt/predictor/utils.py (2 hunks)

🧰 Additional context used

🧠 Learnings (10)

📚 Learning: 2025-10-14T14:37:38.047Z

Learnt from: burgholzer
Repo: munich-quantum-toolkit/yaqs PR: 212
File: CHANGELOG.md:12-15
Timestamp: 2025-10-14T14:37:38.047Z
Learning: In the munich-quantum-toolkit/yaqs project, changelog entries follow the template: "- $TITLE ([#$NUMBER]($URL)) ([**AUTHOR**](https://github.com/$AUTHOR))". Issue references should not be included in changelog entries; the PR number is sufficient for traceability.

Applied to files:

CHANGELOG.md

📚 Learning: 2025-11-01T15:57:31.153Z

Learnt from: burgholzer
Repo: munich-quantum-toolkit/core PR: 1283
File: src/qir/runtime/QIR.cpp:196-201
Timestamp: 2025-11-01T15:57:31.153Z
Learning: In the QIR runtime (src/qir/runtime/QIR.cpp), the PRX gate (__quantum__qis__prx__body) is an alias for the R gate (Phased X-Rotation) and should call runtime.apply<qc::R>(theta, phi, qubit), not runtime.apply<qc::RX>() which is a single-parameter rotation gate.

Applied to files:

src/mqt/predictor/utils.py

📚 Learning: 2025-10-09T13:20:11.483Z

Learnt from: DRovara
Repo: munich-quantum-toolkit/core PR: 1108
File: mlir/test/Dialect/MQTOpt/Transforms/lift-measurements.mlir:269-288
Timestamp: 2025-10-09T13:20:11.483Z
Learning: In the MQT MLIR dialect, the `rz` gate should not be included in the `DIAGONAL_GATES` set for the `ReplaceBasisStateControlsWithIfPattern` because its operator matrix does not have the required shape | 1 0 | / | 0 x | for the targets-as-controls optimization. It is only included in `LiftMeasurementsAboveGatesPatterns` where the matrix structure requirement differs.

Applied to files:

src/mqt/predictor/utils.py

📚 Learning: 2025-11-24T10:19:41.147Z

Learnt from: burgholzer
Repo: munich-quantum-toolkit/core PR: 1326
File: python/mqt/core/__init__.py:22-22
Timestamp: 2025-11-24T10:19:41.147Z
Learning: In the munich-quantum-toolkit/core repository, Ruff is configured with 'ALL' rules enabled by default, and only specific rules are selectively disabled. When reviewing changes that enable previously-disabled rules (like PLC0415), noqa directives for those rules become necessary and should be retained.

Applied to files:

src/mqt/predictor/rl/predictorenv.py

📚 Learning: 2025-12-04T06:59:40.314Z

Learnt from: MatthiasReumann
Repo: munich-quantum-toolkit/core PR: 1301
File: mlir/lib/Dialect/MQTOpt/Transforms/Transpilation/LayeredUnit.cpp:84-85
Timestamp: 2025-12-04T06:59:40.314Z
Learning: In the MQTOpt MLIR routing passes (NaiveRoutingPassSC, AStarRoutingPassSC), the input IR is guaranteed to contain only 1-qubit and 2-qubit gates. All 3+-qubit gates must be decomposed before routing; otherwise the input IR is invalid. This invariant allows skipTwoQubitBlock in LayeredUnit.cpp to safely assert wires.size() == 2.

Applied to files:

src/mqt/predictor/rl/predictorenv.py
src/mqt/predictor/rl/actions.py

📚 Learning: 2025-10-07T15:30:42.946Z

Learnt from: MatthiasReumann
Repo: munich-quantum-toolkit/core PR: 1237
File: mlir/include/mlir/Dialect/MQTOpt/Transforms/Transpilation/Layout.h:219-231
Timestamp: 2025-10-07T15:30:42.946Z
Learning: In the Layout class for MLIR quantum routing (mlir/include/mlir/Dialect/MQTOpt/Transforms/Transpilation/Layout.h), the swap method intentionally does NOT swap the hw fields in QubitInfo. This is correct because SSA values represent quantum states at fixed hardware locations, and only their program index associations change during a SWAP gate. The hw field indicates where an SSA value physically resides and remains constant.

Applied to files:

src/mqt/predictor/rl/predictorenv.py

📚 Learning: 2025-11-27T21:26:39.677Z

Learnt from: burgholzer
Repo: munich-quantum-toolkit/qmap PR: 846
File: python/mqt/qmap/plugins/qiskit/sc/load_calibration.py:34-34
Timestamp: 2025-11-27T21:26:39.677Z
Learning: In the qmap project, the Ruff linter has the "PL" (pylint) rule category enabled, which includes PLC0415 (import-outside-top-level). Therefore, `# noqa: PLC0415` directives on lazy imports are appropriate and necessary, not unused.

Applied to files:

src/mqt/predictor/rl/predictorenv.py

📚 Learning: 2025-11-04T15:22:19.558Z

Learnt from: marcelwa
Repo: munich-quantum-toolkit/core PR: 1243
File: test/python/qdmi/qiskit/conftest.py:155-157
Timestamp: 2025-11-04T15:22:19.558Z
Learning: The munich-quantum-toolkit/core repository requires Python 3.10 or later, so Python 3.10+ features (such as `zip(..., strict=...)`, pattern matching, etc.) are acceptable and should not be flagged as compatibility issues.

Applied to files:

src/mqt/predictor/rl/actions.py

📚 Learning: 2025-10-09T22:15:59.924Z

Learnt from: denialhaag
Repo: munich-quantum-toolkit/core PR: 1246
File: pyproject.toml:340-341
Timestamp: 2025-10-09T22:15:59.924Z
Learning: Qiskit publishes ABI3 wheels (e.g., cp39-abi3) that are forward-compatible with newer Python versions including Python 3.14, so no explicit Python 3.14 wheels are required for qiskit to work on Python 3.14.

Applied to files:

src/mqt/predictor/rl/actions.py

📚 Learning: 2025-10-11T19:39:32.050Z

Learnt from: denialhaag
Repo: munich-quantum-toolkit/debugger PR: 160
File: pyproject.toml:54-54
Timestamp: 2025-10-11T19:39:32.050Z
Learning: Qiskit packages use cp39-abi3 wheels (stable ABI) which are forward-compatible with Python 3.9+ including Python 3.14, even if the package classifiers don't explicitly list Python 3.14 support.

Applied to files:

src/mqt/predictor/rl/actions.py

🧬 Code graph analysis (3)

src/mqt/predictor/rl/helper.py (2)

src/mqt/predictor/utils.py (2)

calc_supermarq_features (93-145)

get_openqasm_gates_without_u (197-233)

src/mqt/predictor/ml/helper.py (1)

dict_to_featurevector (53-60)

src/mqt/predictor/ml/helper.py (1)

src/mqt/predictor/utils.py (2)

calc_supermarq_features (93-145)

get_openqasm_gates (148-194)

src/mqt/predictor/rl/predictorenv.py (3)

src/mqt/predictor/rl/actions.py (4)

CompilationOrigin (103-109)

DeviceDependentAction (150-161)

PassType (112-121)

Action (125-141)

src/mqt/predictor/rl/parsing.py (1)

prepare_noise_data (253-278)

src/mqt/predictor/utils.py (1)

get_openqasm_gates_without_u (197-233)

🪛 Ruff (0.14.7)

src/mqt/predictor/rl/predictorenv.py

246-246: Unused noqa directive (non-enabled: SLF001)

Remove unused noqa directive

(RUF100)

409-409: Do not catch blind exception: Exception

(BLE001)

410-410: Logging statement uses f-string

(G004)

418-418: Logging statement uses f-string

(G004)

514-514: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: 🐍 Test (macos-14) / 🐍 macos-14
GitHub Check: 🐍 Test (windows-2022) / 🐍 windows-2022
GitHub Check: 🐍 Test (ubuntu-24.04) / 🐍 ubuntu-24.04

🔇 Additional comments (12)

src/mqt/predictor/reward.py (1)

214-214: Verify that this formatting change alone is the "correction" referenced in the PR.

The PR summary mentions "corrected estimated success probability calculation in reward.py," but the provided diff shows only a blank line added at line 214 for readability between the single-qubit and multi-qubit branches. No functional logic changes are visible in this file snippet.

Please confirm:

Is this blank line the only change to the estimated_success_probability function, or are there other functional modifications not shown in the diff?

If this is purely a formatting change, the PR summary should clarify that the functional correction was made elsewhere or describe the specific logic that was corrected.

CHANGELOG.md (1)

51-51: Changelog references are properly formatted.

The PR link reference and contributor link have been correctly added and formatted in the reference sections.

Also applies to: 70-70

src/mqt/predictor/rl/helper.py (2)

72-81: LGTM!

The normalized feature vector implementation is correct. The exclusion of "barrier" from the total and the division-by-zero guard are properly handled. The return type dict[str, float] correctly reflects the normalized values.

93-99: LGTM!

The integration with dict_to_featurevector and separate handling of the "measure" feature is correct, since "measure" is not part of get_openqasm_gates_without_u() but still needs to be tracked as a normalized feature.

src/mqt/predictor/utils.py (2)

148-194: LGTM!

The centralized get_openqasm_gates() function properly consolidates the gate list that was previously duplicated across modules. The documentation correctly references the qelib1.inc source.

197-233: LGTM!

The RL-specific gate list correctly excludes u-family gates and multi-qubit gates with >2 qubits as documented. The "r" gate inclusion is valid as it's a parametric rotation gate supported by Qiskit.

src/mqt/predictor/ml/helper.py (1)

18-18: LGTM!

Clean refactor that centralizes the gate list definition in utils.py while preserving the existing behavior of the ML helper.

src/mqt/predictor/rl/actions.py (4)

276-282: LGTM!

Good fix: CollectCliffords() now runs before OptimizeCliffords(), which is required for the optimization pass to work correctly as noted in the PR objectives.

326-333: LGTM!

The KAKDecomposition action with allow_swaps=False is correctly configured for the optimization workflow.

441-468: LGTM!

The TKET placement actions (GraphPlacement and NoiseAwarePlacement) are correctly configured with appropriate timeouts and match limits. The use of keyword arguments for noise parameters (node_errors, link_errors, readout_errors) addresses the previous review concern about positional argument ordering.

608-636: LGTM!

The helper functions correctly preserve register and qubit identity by reusing original objects instead of cloning, addressing the issues from previous reviews. The type annotations are now correct with list[Clbit] for the measurements tuple.

src/mqt/predictor/rl/predictorenv.py (1)

592-627: LGTM!

The refactored determine_valid_actions_for_state logic is much clearer with explicit comments explaining each state transition. The use of actions_structure_preserving_indices correctly restricts post-mapping actions to those that preserve the layout and native gates.

CHANGELOG.md

coderabbitai · 2025-12-05T15:40:02Z

src/mqt/predictor/rl/predictorenv.py

-            passes = action.transpile_pass(
-                self.device.operation_names,
-                CouplingMap(self.device.build_coupling_map()) if self.layout else None,
+        pm_property_set: PropertySet | None = {}


⚠️ Potential issue | 🟡 Minor

Fix type annotation inconsistency.

The annotation PropertySet | None = {} has issues:

Initialized to {} but type includes | None

Actual values assigned are dict objects (from dict(pm.property_set))

Update to match actual usage:

- pm_property_set: PropertySet | None = {} + pm_property_set: dict[str, Any] | None = None

Also update the fallback assignments to use None:

- pm_property_set = dict(pm.property_set) if hasattr(pm, "property_set") else {} + pm_property_set = dict(pm.property_set) if hasattr(pm, "property_set") else None

🤖 Prompt for AI Agents

In src/mqt/predictor/rl/predictorenv.py around line 427, change the declaration pm_property_set: PropertySet | None = {} to a type that matches actual usage (e.g., pm_property_set: dict[str, Any] | None = None) and update any fallback assignments later in the file to assign None instead of {} so the annotation and runtime values are consistent.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Shaobo-Zhou <109073755+Shaobo-Zhou@users.noreply.github.com>

Shaobo Zhou and others added 30 commits March 29, 2025 19:20

Update predictor(adding callbacks)

129b60f

Update

08889bd

Restore helper.py and predictor.py to match upstream

e2ff3fe

Merge remote-tracking branch 'upstream/main'

1c32d15

Implement new mapping actions

78dc1aa

Update action space and feature space Update actions Update action space

Fix: resolve pre-commit issues and add missing annotations

a3ba836

Fix: resolve pre-commit issues and add missing annotations Fix: resolve pre-commit issues and add missing annotations Remove example_test.py Remove example_test.py

Fix: resolve pre-commit issues and add missing annotations

5935e6f

Fix: resolve pre-commit issues and add missing annotations Fix: resolve pre-commit issues and add missing annotations Fix: resolve pre-commit issues and add missing annotations

Fix: resolve pre-commit issues and add missing annotations

f71fb29

Fix: resolve pre-commit issues and add missing annotations

3c7592b

Fix mypy errors

6db5c27

Fix mypy errors

47841c5

Fix dependencies issues

b1ac8ce

Fix dependency issues

5f8473c

Add missing zip file

7491ec0

Fix issue with Python 3.13

3346842

Merge branch 'main' into hybrid-mapping

6f7a73c

Signed-off-by: Shaobo-Zhou <109073755+Shaobo-Zhou@users.noreply.github.com>

Remove Python 3.13 from noxfile.py due to compatibility issue

6c67349

Skip minimums session on Windows due to CI slowness

2692b96

Fix bugs

f4874e6

Fix bugs Fix bugs Fix bugs

Fix bugs

54eec91

Use default Qiskit settings for VF2Layout and add assertion for nativ…

845f7de

…e gate check

Debug

3418936

Fix missing argument

ae870cc

Fix warning issues

861bc62

Fix window runtime warning problem

fa989b6

Fix window runtime warning problem

405bd39

Add time limit for VF2PostLayout

7b2f321

Fix windows runtime warning problem

b67d0a6

Fix windows runtime warning problem Fix windows runtime warning issue

Add new actions

bf7c9ee

Add new actions

6d2733f

coderabbitai bot reviewed Nov 27, 2025

View reviewed changes

Update changelog

a4ec1c7

Shaobo-Zhou force-pushed the new_structure branch from 00ee1c3 to a4ec1c7 Compare December 3, 2025 15:21

Shaobo-Zhou and others added 4 commits December 3, 2025 16:23

Merge branch 'main' into new_structure

942ceff

Signed-off-by: Shaobo-Zhou <109073755+Shaobo-Zhou@users.noreply.github.com>

🎨 pre-commit fixes

0f5f636

Small fixes

283345b

Update uv.lock

108a326

coderabbitai bot requested changes Dec 3, 2025

View reviewed changes

src/mqt/predictor/rl/actions.py Show resolved Hide resolved

Shaobo-Zhou added 2 commits December 3, 2025 19:03

Fix CI issues

8c398d8

Slight Fix

9211aab

coderabbitai bot requested changes Dec 3, 2025

View reviewed changes

src/mqt/predictor/rl/actions.py Show resolved Hide resolved

src/mqt/predictor/rl/actions.py Outdated Show resolved Hide resolved

Shaobo-Zhou added 2 commits December 3, 2025 23:59

Update actions.py

632656f

Update predictorenv.py

a32571b

coderabbitai bot requested changes Dec 4, 2025

View reviewed changes

src/mqt/predictor/rl/actions.py Show resolved Hide resolved

src/mqt/predictor/rl/actions.py Outdated Show resolved Hide resolved

src/mqt/predictor/rl/actions.py Show resolved Hide resolved

src/mqt/predictor/rl/predictorenv.py Show resolved Hide resolved

Fix typos

323938b

Fix typos

1c70b55

coderabbitai bot requested changes Dec 4, 2025

View reviewed changes

flowerthrower requested changes Dec 5, 2025

View reviewed changes

Update src/mqt/predictor/rl/predictorenv.py

2153967

Signed-off-by: Patrick Hopf <81010725+flowerthrower@users.noreply.github.com>

coderabbitai bot requested changes Dec 5, 2025

View reviewed changes

flowerthrower requested changes Dec 5, 2025

View reviewed changes

Shaobo-Zhou and others added 2 commits December 5, 2025 15:50

Fixes

ea1e117

Merge branch 'main' into new_structure

4ff9196

Signed-off-by: Shaobo-Zhou <109073755+Shaobo-Zhou@users.noreply.github.com>

coderabbitai bot requested changes Dec 5, 2025

View reviewed changes

Update CHANGELOG.md

8532b74

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Shaobo-Zhou <109073755+Shaobo-Zhou@users.noreply.github.com>

		if sys.version_info < (3, 13):
		from qiskit_ibm_transpiler.ai.routing import AIRouting

		"qiskit-ibm-transpiler==0.13.1; python_version < '3.13'", # Do not support python 3.13 yet
		"qiskit-ibm-ai-local-transpiler==0.4.2; python_version < '3.13'",


		### Changed

		- ✨ Improved the MDP for RL ([#449]) ([@Shaobo-Zhou])

Uh oh!

Refined MDP for RL #449

Are you sure you want to change the base?

Refined MDP for RL #449

Uh oh!

Conversation

Shaobo-Zhou commented Sep 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist:

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Shaobo-Zhou commented Dec 3, 2025

Uh oh!

denialhaag commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Shaobo-Zhou commented Dec 4, 2025

Uh oh!

coderabbitai bot commented Dec 4, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

flowerthrower left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

flowerthrower left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Shaobo-Zhou commented Sep 6, 2025 •

edited

Loading

denialhaag commented Dec 3, 2025 •

edited

Loading

coderabbitai bot Dec 4, 2025 •

edited

Loading

flowerthrower left a comment •

edited

Loading

Shaobo-Zhou commented Dec 5, 2025 •

edited

Loading