Skip to content

Conversation

@antotu
Copy link

@antotu antotu commented Aug 19, 2025

Description

This PR introduces a Graph Neural Network (GNN) as an alternative to the Random Forest model for predicting the best device to run a quantum circuit.
To support this, the preprocessing pipeline was redesigned: instead of manually extracting features from the circuit, the model now directly takes as input the Directed Acyclic Graph (DAG) representation of the quantum circuit.


🚀 Major Changes

Graph Neural Network Integration

  • Added a GNN model for predicting the target quantum device and estimating the Hellinger distance between output distributions.
  • Added a preprocessing method to transform quantum circuits into DAGs.
  • DAG representation captures gate dependencies and circuit topology for improved graph-based learning.
  • Integrated automated hyperparameter search with Optuna for tuning GNN performance.

🎯 Motivation

  • Previously, features were manually extracted from the quantum circuit, leading to loss of structural information.
  • This new method preserves the full circuit structure by representing it as a graph.
  • GNNs can exploit graph connectivity to make more accurate predictions.
  • Optuna ensures that GNN hyperparameters are efficiently optimized in a reproducible way.

🔧 Fixes and Enhancements

  • Transform input quantum circuits into DAGs, where each node is encoded as a numeric vector.
  • Integrated GNNs as an additional predictor in the pipeline.

📦 Dependency Updates

  • optuna>=4.5.0
  • torch-geometric>=2.6.1

Checklist:

  • The pull request only contains commits that are focused and relevant to this change.
  • I have added appropriate tests that cover the new/changed functionality.
  • I have updated the documentation to reflect these changes.
  • I have added entries to the changelog for any noteworthy additions, changes, fixes, or removals.
  • I have added migration instructions to the upgrade guide (if needed).
  • The changes follow the project's style guidelines and introduce no new warnings.
  • The changes are fully tested and pass the CI checks.
  • I have reviewed my own code changes.

@antotu antotu marked this pull request as draft August 19, 2025 16:16

TYPE_CHECKING = False
if TYPE_CHECKING:
VERSION_TUPLE = tuple[int | str, ...]

Check warning

Code scanning / CodeQL

Unreachable code Warning

This statement is unreachable.
@antotu antotu changed the title Gnn branch Add GNN-Based Predictor with DAG Preprocessing Aug 21, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (5)
tests/hellinger_distance/test_estimated_hellinger_distance.py (5)

186-192: num_nodes should be an integer, not a tensor.

PyTorch Geometric's Data class expects num_nodes to be a plain Python int. Wrapping it in torch.tensor([n_nodes]) creates a 1D tensor, which can cause unexpected behavior during batching or graph operations.

             gnn_training_sample = Data(
                 x=x,
                 y=torch.tensor(labels_list[i], dtype=torch.float32),
                 edge_index=edge_idx,
-                num_nodes=torch.tensor([n_nodes]),
+                num_nodes=n_nodes,
             )

219-221: Docstring is outdated after parameterization.

The docstring still references "the Hellinger distance model that was trained in the previous test," but this test now trains its own model and is parameterized for both RF and GNN backends. Update to reflect the current behavior.

 def test_train_and_qcompile_with_hellinger_model(
     source_path: Path, target_path: Path, device: Target, gnn: bool
 ) -> None:
-    """Test the entire predictor toolchain with the Hellinger distance model that was trained in the previous test."""
+    """Test the entire predictor toolchain for estimated_hellinger_distance with both RF and GNN backends."""

307-317: Consolidate duplicate data_path computation.

data_path is computed twice with the same value. Merge the cleanup loops to avoid redundant computation.

     data_path = get_path_training_data() / "training_data_aggregated"
     if data_path.exists():
         for file in data_path.iterdir():
-            if file.suffix == ".npy":
+            if file.suffix in (".npy", ".pt"):
                 file.unlink()
-
-    data_path = get_path_training_data() / "training_data_aggregated"
-    if data_path.exists():
-        for file in data_path.iterdir():
-            if file.suffix == ".pt":
-                file.unlink()

319-323: Use tuple membership instead of chained or comparisons.

For readability and consistency, use in with a tuple.

-            if file.suffix == ".joblib" or file.suffix == ".pth" or file.suffix == ".json":
+            if file.suffix in (".joblib", ".pth", ".json"):

293-324: Consider converting cleanup test to a fixture.

test_remove_files acts as teardown for artifacts created by other tests. This couples correctness to test ordering and can break under parallel execution or when running a subset of tests with -k. Consider moving this logic to a module-scoped autouse fixture with yield to guarantee cleanup regardless of test selection or execution order.

Example fixture approach:

@pytest.fixture(scope="module", autouse=True)
def cleanup_test_artifacts(source_path: Path, target_path: Path):
    """Clean up test artifacts after all tests in the module complete."""
    yield
    # Cleanup logic here...
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 64a94ff and c4cc835.

📒 Files selected for processing (1)
  • tests/hellinger_distance/test_estimated_hellinger_distance.py (5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/hellinger_distance/test_estimated_hellinger_distance.py (3)
src/mqt/predictor/hellinger/utils.py (2)
  • calc_device_specific_features (37-132)
  • hellinger_distance (29-34)
src/mqt/predictor/ml/helper.py (3)
  • TrainingData (575-585)
  • create_dag (192-301)
  • get_path_training_data (36-38)
src/mqt/predictor/rl/helper.py (1)
  • get_path_training_data (97-99)
🪛 Ruff (0.14.6)
tests/hellinger_distance/test_estimated_hellinger_distance.py

158-158: Boolean-typed positional argument in function definition

(FBT001)


218-218: Boolean-typed positional argument in function definition

(FBT001)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: 🐍 Test (windows-2022) / 🐍 windows-2022
  • GitHub Check: 🐍 Test (ubuntu-24.04) / 🐍 ubuntu-24.04
  • GitHub Check: 🐍 Test (macos-14) / 🐍 macos-14
🔇 Additional comments (2)
tests/hellinger_distance/test_estimated_hellinger_distance.py (2)

21-31: LGTM!

The new imports for torch, torch_geometric.data (Batch, Data), and create_dag are appropriate for the GNN testing functionality.


254-265: Clarify the Windows + GNN timeout behavior.

The condition sys.platform == "win32" and not gnn means GNN tests on Windows skip the timeout warning path. If this is intentional (e.g., GNN compilation doesn't use the same timeout mechanism), a brief comment would help future maintainers understand this distinction.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c4cc835 and 6ebded4.

📒 Files selected for processing (2)
  • tests/device_selection/test_predictor_ml.py (4 hunks)
  • tests/hellinger_distance/test_estimated_hellinger_distance.py (5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/device_selection/test_predictor_ml.py (2)
src/mqt/predictor/ml/helper.py (1)
  • get_path_training_data (36-38)
src/mqt/predictor/ml/predictor.py (2)
  • predict_device_for_figure_of_merit (800-880)
  • Predictor (166-797)
🪛 Ruff (0.14.6)
tests/device_selection/test_predictor_ml.py

42-42: Boolean-typed positional argument in function definition

(FBT001)


42-42: Boolean-typed positional argument in function definition

(FBT001)


115-115: Boolean-typed positional argument in function definition

(FBT001)

tests/hellinger_distance/test_estimated_hellinger_distance.py

158-158: Boolean-typed positional argument in function definition

(FBT001)


219-219: Boolean-typed positional argument in function definition

(FBT001)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: 🐍 Test (macos-14) / 🐍 macos-14
  • GitHub Check: 🐍 Test (windows-2022) / 🐍 windows-2022
  • GitHub Check: 🐍 Test (ubuntu-24.04) / 🐍 ubuntu-24.04
🔇 Additional comments (6)
tests/device_selection/test_predictor_ml.py (3)

38-43: Parameterization covers key scenarios appropriately.

The test matrix exercises RF-only, GNN without verbose, and GNN with verbose modes. This provides good coverage for the new GNN path integration.


68-75: Artifact assertions correctly differentiate GNN and RF paths.

The conditional logic now properly validates:

  • GNN path: graph_dataset_expected_fidelity.pt plus the shared .npy files
  • RF path: training_data_expected_fidelity.npy plus the shared .npy files

This addresses the earlier feedback about asserting names_list and scores_list existence for the GNN path.


114-117: Parameterized test correctly exercises both predictor backends.

The GNN flag is properly passed to Predictor, ensuring both RF and GNN backends are tested for the _get_prepared_training_data error path.

tests/hellinger_distance/test_estimated_hellinger_distance.py (3)

166-193: GNN training data construction is correct.

The branching logic properly:

  • Uses calc_device_specific_features for RF and create_dag for GNN
  • Constructs appropriate TrainingData objects for each path
  • Correctly passes n_nodes as an integer to Data (addressing earlier feedback)

196-214: Model training and evaluation paths are properly separated.

The conditional training logic correctly dispatches to train_gnn_model vs train_random_forest_model, and the evaluation uses appropriate methods for each model type. The tolerance comment at line 213 explains the rationale for atol=2e-1.


255-266: Clarify whether Windows timeout warning should apply to GNN path.

The condition sys.platform == "win32" and not gnn means the timeout warning test is skipped when gnn=True on Windows. Is this intentional because the GNN path handles timeouts differently, or should both paths warn on Windows?

If GNN compilation also uses timeouts that aren't supported on Windows, consider:

-        if sys.platform == "win32" and not gnn:
+        if sys.platform == "win32":

Otherwise, add a comment explaining why GNN is exempt from this warning.

Comment on lines 157 to 159
@pytest.mark.parametrize("gnn", [False, True], ids=["rf", "gnn"])
def test_train_model_and_predict(device: Target, gnn: bool) -> None:
"""Test the training of the random forest regressor. The trained model is saved and used in the following tests."""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Docstring no longer accurately reflects parameterized test behavior.

The docstring still says "Test the training of the random forest regressor" but the test is now parameterized to cover both RF and GNN backends.

 @pytest.mark.parametrize("gnn", [False, True], ids=["rf", "gnn"])
 def test_train_model_and_predict(device: Target, gnn: bool) -> None:
-    """Test the training of the random forest regressor. The trained model is saved and used in the following tests."""
+    """Test model training and prediction for both RF and GNN backends."""
🧰 Tools
🪛 Ruff (0.14.6)

158-158: Boolean-typed positional argument in function definition

(FBT001)

🤖 Prompt for AI Agents
In tests/hellinger_distance/test_estimated_hellinger_distance.py around lines
157 to 159, the test docstring incorrectly states it only tests training of the
random forest regressor while the test is parameterized to run for both RF and
GNN; update the docstring to accurately describe that the test trains and saves
a model for both backends (random forest and GNN) and that the saved model is
used in subsequent tests, and briefly document the gnn parameter meaning (False
=> RF, True => GNN).

Comment on lines +294 to +318
def test_remove_files(source_path: Path, target_path: Path) -> None:
"""Remove files created during testing."""
if source_path.exists():
for file in source_path.iterdir():
if file.suffix == ".qasm":
file.unlink()
source_path.rmdir()

if target_path.exists():
for file in target_path.iterdir():
if file.suffix == ".qasm":
file.unlink()
target_path.rmdir()

data_path = get_path_training_data() / "training_data_aggregated"
if data_path.exists():
for file in data_path.iterdir():
if file.suffix in (".npy", ".pt"):
file.unlink()

model_path = get_path_training_data() / "trained_model"
if model_path.exists():
for file in model_path.iterdir():
if file.suffix in (".joblib", ".pth", ".json"):
file.unlink()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider converting cleanup to a fixture for test isolation.

test_remove_files acts as teardown but depends on test execution order. If tests are run in isolation or in parallel, cleanup may not occur. This was flagged in a previous review.

Convert to a session-scoped autouse fixture:

@pytest.fixture(scope="module", autouse=True)
def cleanup_test_artifacts(source_path: Path, target_path: Path):
    """Clean up test artifacts after all tests in the module complete."""
    yield  # Let tests run
    # Cleanup logic here...

This ensures cleanup runs regardless of test selection or ordering.

🤖 Prompt for AI Agents
In tests/hellinger_distance/test_estimated_hellinger_distance.py around lines
294 to 318, replace the standalone test_remove_files teardown function with a
pytest fixture that runs automatically for the module (or session) and performs
cleanup after tests by yielding control and then executing the same removal
logic; make the fixture use scope="module" (or "session") and autouse=True,
accept source_path and target_path as parameters, ensure the post-yield cleanup
is idempotent (check existence before unlink/rmdir or use safe removal) and
covers data_path and model_path cleanup so cleanup runs regardless of test
ordering or isolation.

@coderabbitai coderabbitai bot mentioned this pull request Dec 5, 2025
8 tasks
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (10)
src/mqt/predictor/ml/predictor.py (10)

108-109: Consider making gnn parameter keyword-only.

The gnn boolean parameter could be made keyword-only to prevent accidental positional usage and address Ruff's FBT001/FBT002 warnings consistently across the codebase.

Apply this diff:

     path_training_data: Path | None = None,
     timeout: int = 600,
+    *,
     gnn: bool = False,
     **gnn_kwargs: Unpack[TrainGNNKwargs],

328-338: Device score ordering may diverge from self.devices canonical order.

Line 330 uses sorted(scores.keys()) to order device scores, but self.devices is sorted by dev.description at initialization (lines 190-192). If the sorted order of scores.keys() differs from the sorted order of device descriptions, the y tensor indices won't match the model's expected output ordering during training and inference.

Use the canonical self.devices ordering:

             if self.gnn:
                 x, _y, edge_idx, n_nodes, target_label = training_sample
-                name_device = sorted(scores.keys())
-                value_device = [scores[i] for i in name_device]
+                # Use canonical device ordering from self.devices
+                value_device = [scores.get(dev.description, -1.0) for dev in self.devices]
                 gnn_training_sample = Data(
                     x=x,
                     y=torch.tensor(value_device, dtype=torch.float32),

422-422: Remove or clarify the commented assignment.

Line 422 assigns scores_list = scores with a commented-out alternative # list(scores.values()). Since scores is already a dict and is used directly, remove the comment to avoid confusion.

Apply this diff:

-        scores_list = scores  # list(scores.values())
+        scores_list = scores

427-430: Clarify or remove the intermediate y tensor that gets discarded.

The y tensor created at line 429 (device index) is immediately discarded in generate_training_data (line 329: x, _y, edge_idx, ...) and replaced with the full device scores tensor. This intermediate value adds confusion without serving a purpose.

Consider either:

  1. Not computing y here and returning a 4-tuple (x, edge_index, number_of_gates, target_label), or
  2. Adding a comment explaining why this value exists but is unused.
         if self.gnn:
             x, edge_index, number_of_gates = create_dag(qc)
-            y = torch.tensor([[dev.description for dev in self.devices].index(target_label)], dtype=torch.float)
-            training_sample = (x, y, edge_index, number_of_gates, target_label)
+            # Note: y placeholder - actual targets are computed in generate_training_data
+            training_sample = (x, None, edge_index, number_of_gates, target_label)

504-507: Optuna objective has unused parameter and potential KFold failure.

Two issues in the objective method:

  1. Line 504: num_outputs is immediately overwritten by max(1, len(self.devices)), making the parameter unused. Either remove the parameter or honor the passed value.

  2. Line 507: KFold(n_splits=k_folds) will fail if k_folds < 2. For small datasets, guard this:

    k_folds = max(2, k_folds)

Apply these fixes:

-        num_outputs = max(1, len(self.devices))
+        # num_outputs already passed as parameter, use it directly
 
         # Split into k-folds
+        k_folds = max(2, k_folds)  # KFold requires at least 2 splits
         kf = KFold(n_splits=k_folds, shuffle=True)

If you want to keep the overwriting behavior, then remove num_outputs from the function signature and docstring.


859-859: Security risk: torch.load without weights_only=True can execute arbitrary code.

torch.load(path) without weights_only=True can execute arbitrary code if the model file is malicious. Since you're loading a state dict:

Apply this diff:

-        gnn_model.load_state_dict(torch.load(path))
+        gnn_model.load_state_dict(torch.load(path, weights_only=True))

627-633: Add parsed mlp_units to JSON metadata for inference.

Line 631 parses mlp_units from the string, but this parsed list is not stored in json_dict. While the string "mlp" is saved, prediction code may expect the parsed list. Add it to the JSON before saving:

Apply this diff:

         mlp_str = dict_best_hyper["mlp"]
         mlp_units = [] if mlp_str == "none" else [int(x) for x in mlp_str.split(",")]
 
         json_dict["num_outputs"] = len(self.devices) if self.figure_of_merit != "hellinger_distance" else 1
+        json_dict["mlp_units"] = mlp_units  # Store parsed list for inference

607-607: Remove dead code: list comprehension result not assigned.

Line 607 computes [dev.description for dev in self.devices] but doesn't assign the result. This line should be removed as the class labels are correctly stored at line 666.

Apply this diff:

             else:
                 # task = "binary"
                 task = "classification"
-            [dev.description for dev in self.devices]
         sampler_obj = TPESampler(n_startup_trials=10)

861-863: Use standard PyTorch Geometric attribute num_nodes instead of custom num_gates.

Line 861 creates a Data object with num_gates=number_of_gates, but PyTorch Geometric's Data class expects num_nodes as the standard attribute name. While custom attributes are permitted, using the standard name ensures compatibility with GNN operations that rely on this attribute.

Apply this diff:

-        feature_vector = Data(x=x, edge_index=edge_index, num_gates=number_of_gates).to(
+        feature_vector = Data(x=x, edge_index=edge_index, num_nodes=number_of_gates).to(
             "cuda" if torch.cuda.is_available() else "cpu"
         )

696-701: Task handling in verbose evaluation always uses classification, ignoring regression tasks.

Line 698 unconditionally calls evaluate_classification_model even when task == "regression" (for Hellinger distance). This produces incorrect metrics for regression tasks.

Branch on the task variable:

         if verbose:
             test_loader = DataLoader(training_data.X_test, batch_size=16, shuffle=False)
-            avg_loss_test, dict_results, _ = evaluate_classification_model(
-                model, test_loader, loss_fn=loss_fn, device=device, verbose=verbose
-            )
+            if task == "regression":
+                avg_loss_test, dict_results, _ = evaluate_regression_model(
+                    model, test_loader, loss_fn=loss_fn, device=device, verbose=verbose
+                )
+            else:
+                avg_loss_test, dict_results, _ = evaluate_classification_model(
+                    model, test_loader, loss_fn=loss_fn, device=device, verbose=verbose
+                )
             print(f"Test loss: {avg_loss_test:.4f}, {dict_results}")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a09b7dc and fa7c1f7.

📒 Files selected for processing (1)
  • src/mqt/predictor/ml/predictor.py (12 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/mqt/predictor/ml/predictor.py (2)
src/mqt/predictor/hellinger/utils.py (1)
  • get_hellinger_model_path (135-144)
src/mqt/predictor/ml/helper.py (9)
  • TrainingData (575-585)
  • create_dag (192-301)
  • create_feature_vector (164-189)
  • evaluate_classification_model (323-402)
  • evaluate_regression_model (405-469)
  • get_openqasm3_gates (119-151)
  • get_path_trained_model (48-50)
  • get_path_trained_model_gnn (53-57)
  • get_path_training_data (36-38)
🪛 Ruff (0.14.7)
src/mqt/predictor/ml/predictor.py

108-108: Boolean-typed positional argument in function definition

(FBT001)


108-108: Boolean default positional argument in function definition

(FBT002)


147-147: Logging statement uses f-string

(G004)


150-150: Logging statement uses f-string

(G004)


174-174: Boolean-typed positional argument in function definition

(FBT001)


174-174: Boolean default positional argument in function definition

(FBT002)


449-449: Boolean-typed positional argument in function definition

(FBT001)


449-449: Boolean default positional argument in function definition

(FBT002)


571-571: Boolean-typed positional argument in function definition

(FBT001)


571-571: Boolean default positional argument in function definition

(FBT002)


802-802: Boolean-typed positional argument in function definition

(FBT001)


802-802: Boolean default positional argument in function definition

(FBT002)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/mqt/predictor/ml/predictor.py (1)

171-190: Make gnn parameter keyword-only in Predictor.__init__.

The gnn parameter is currently positional, which triggers Ruff FBT001/FBT002. Making it keyword-only prevents accidental positional usage and maintains consistency with setup_device_predictor.

Apply this diff:

     def __init__(
         self,
         devices: list[Target],
         figure_of_merit: figure_of_merit = "expected_fidelity",
-        gnn: bool = False,
+        *,
+        gnn: bool = False,
         logger_level: int = logging.INFO,
     ) -> None:
♻️ Duplicate comments (8)
src/mqt/predictor/_version.py (2)

9-11: Autogenerated _version.py should not be tracked in git (per header and prior review).

The header explicitly states “file generated by setuptools-scm” and “don’t change, don’t track in version control”, but this file is committed and contains a concrete dev version and None commit id. This will constantly drift from the real SCM state and cause noisy diffs and scanner findings.

Strongly recommend:

  • Remove src/mqt/predictor/_version.py from version control.
  • Add this path to .gitignore (and/or configure setuptools-scm’s write_to as intended) so the file is only generated into sdists/wheels, not tracked in the repo.

This aligns with the earlier reviewer note that the file should not be tracked.

Also applies to: 36-39


21-27: Fix misuse of TYPE_CHECKING and unreachable type-alias branch.

TYPE_CHECKING = False makes the if TYPE_CHECKING: block unreachable at runtime, and it is not the standard sentinel (from typing) that type checkers special‑case. This is what the “unreachable code” scanner warning is pointing at, and it adds indirection without benefit for such a small module.

A simpler and clearer approach is to drop the conditional entirely and use regular runtime type aliases:

-from typing import TYPE_CHECKING
-
-if TYPE_CHECKING:
-    VERSION_TUPLE = tuple[int | str, ...]
-    COMMIT_ID = str | None
-else:
-    VERSION_TUPLE = object
-    COMMIT_ID = object
+from typing import Optional, Tuple, Union
+
+VERSION_TUPLE = Tuple[Union[int, str], ...]
+COMMIT_ID = Optional[str]

This removes unreachable code and keeps the annotations straightforward. Also ensure your python_requires matches the syntax level you rely on (PEP 604 unions etc.), or adjust the annotations accordingly.

Also applies to: 29-34

src/mqt/predictor/ml/helper.py (3)

306-320: get_results_classes still doesn't handle integer class label targets.

When targets are 1D or [N, 1] shaped integer class indices, argmax(targets, dim=1) will always return 0, corrupting accuracy metrics. This was flagged in a previous review.

Apply this diff to handle both one-hot and integer encodings:

 def get_results_classes(preds: torch.Tensor, targets: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor]:
     """Return predicted and target class indices.

     Arguments:
         preds: model predictions
-        targets: ground truth targets
+        targets: ground truth targets (one-hot encoded or integer class labels)
     Returns:
         pred_idx: predicted class indices
         targets_idx: target class indices
     """
     pred_idx = torch.argmax(preds, dim=1)
-    targets_idx = torch.argmax(targets, dim=1)
+    if targets.dim() == 1:
+        targets_idx = targets.long()
+    elif targets.size(1) == 1:
+        targets_idx = targets.view(-1).long()
+    else:
+        targets_idx = torch.argmax(targets, dim=1)

     return pred_idx, targets_idx

325-405: Return type annotation mismatch.

Line 333 declares dict[str, float] but line 391 assigns a string to metrics["classification_report"]. This was flagged in a previous review.

Apply this diff:

-) -> tuple[float, dict[str, float], tuple[np.ndarray, np.ndarray] | None]:
+) -> tuple[float, dict[str, float | str], tuple[np.ndarray, np.ndarray] | None]:

536-544: Validation evaluation calls hardcode verbose=True, causing noisy output.

Lines 539 and 543 pass verbose=True regardless of the outer function's verbose parameter. This was flagged in a previous review.

Apply this diff:

             if task == "classification":
                 val_loss, val_metrics, _ = evaluate_classification_model(
-                    model, val_loader, loss_fn, device=str(device), verbose=True
+                    model, val_loader, loss_fn, device=str(device), verbose=False
                 )
             elif task == "regression":
                 val_loss, val_metrics, _ = evaluate_regression_model(
-                    model, val_loader, loss_fn, device=str(device), verbose=True
+                    model, val_loader, loss_fn, device=str(device), verbose=False
                 )
src/mqt/predictor/ml/predictor.py (3)

668-674: Redundant train/val split after data preparation already performed 70/30 split.

_get_prepared_training_data() at line 596 already performs a 70/30 train/test split. Lines 668-670 split X_train again (80/20), resulting in effective proportions of 56%/14%/30%. Consider using training_data.X_test as the validation set instead. This was flagged in a previous review.

Apply this diff:

         optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
-        x_train, x_val, _y_train, _y_val = train_test_split(
-            training_data.X_train, training_data.y_train, test_size=0.2, random_state=5
-        )
         # Dataloader
-        train_loader = DataLoader(x_train, batch_size=16, shuffle=True)
-
-        val_loader = DataLoader(x_val, batch_size=16, shuffle=False)
+        train_loader = DataLoader(training_data.X_train, batch_size=16, shuffle=True)
+        val_loader = DataLoader(training_data.X_test, batch_size=16, shuffle=False)

680-688: Type mismatch: device is torch.device but train_model expects str | None.

Line 682 passes device (a torch.device object) to train_model, which expects device: str | None. This works via duck typing but is a type annotation mismatch. This was flagged in a previous review.

Apply this diff:

         train_model(
             model,
             train_loader,
             optimizer,
             loss_fn,
             task=task,
             num_epochs=num_epochs,
-            device=device,
+            device=str(device),
             verbose=verbose,
             val_loader=val_loader,
             patience=30,
             min_delta=0.0,
             restore_best=True,
         )

626-655: Reduce code duplication in GNN instantiation.

The two GNN instantiation blocks are nearly identical, differing only in output_dim. This can be simplified.

Apply this diff:

-        json_dict["num_outputs"] = len(self.devices) if self.figure_of_merit != "hellinger_distance" else 1
-        if self.figure_of_merit != "hellinger_distance":
-            model = GNN(
-                in_feats=int(len(get_openqasm3_gates()) + 1 + 6 + 3 + 1 + 1 + 1),
-                num_conv_wo_resnet=dict_best_hyper["num_conv_wo_resnet"],
-                hidden_dim=dict_best_hyper["hidden_dim"],
-                num_resnet_layers=dict_best_hyper["num_resnet_layers"],
-                mlp_units=mlp_units,
-                output_dim=len(self.devices),
-                dropout_p=dict_best_hyper["dropout"],
-                bidirectional=dict_best_hyper["bidirectional"],
-                use_sag_pool=dict_best_hyper["sag_pool"],
-                sag_ratio=0.7,
-                conv_activation=torch.nn.functional.leaky_relu,
-                mlp_activation=torch.nn.functional.leaky_relu,
-            ).to("cuda" if torch.cuda.is_available() else "cpu")
-        else:
-            model = GNN(
-                in_feats=int(len(get_openqasm3_gates()) + 1 + 6 + 3 + 1 + 1 + 1),
-                num_conv_wo_resnet=dict_best_hyper["num_conv_wo_resnet"],
-                hidden_dim=dict_best_hyper["hidden_dim"],
-                num_resnet_layers=dict_best_hyper["num_resnet_layers"],
-                mlp_units=mlp_units,
-                output_dim=1,
-                dropout_p=dict_best_hyper["dropout"],
-                bidirectional=dict_best_hyper["bidirectional"],
-                use_sag_pool=dict_best_hyper["sag_pool"],
-                sag_ratio=0.7,
-                conv_activation=torch.nn.functional.leaky_relu,
-                mlp_activation=torch.nn.functional.leaky_relu,
-            ).to("cuda" if torch.cuda.is_available() else "cpu")
+        num_outputs = 1 if self.figure_of_merit == "hellinger_distance" else len(self.devices)
+        json_dict["num_outputs"] = num_outputs
+        device_str = "cuda" if torch.cuda.is_available() else "cpu"
+        model = GNN(
+            in_feats=int(len(get_openqasm3_gates()) + 1 + 6 + 3 + 1 + 1 + 1),
+            num_conv_wo_resnet=dict_best_hyper["num_conv_wo_resnet"],
+            hidden_dim=dict_best_hyper["hidden_dim"],
+            num_resnet_layers=dict_best_hyper["num_resnet_layers"],
+            mlp_units=mlp_units,
+            output_dim=num_outputs,
+            dropout_p=dict_best_hyper["dropout"],
+            bidirectional=dict_best_hyper["bidirectional"],
+            use_sag_pool=dict_best_hyper["sag_pool"],
+            sag_ratio=0.7,
+            conv_activation=torch.nn.functional.leaky_relu,
+            mlp_activation=torch.nn.functional.leaky_relu,
+        ).to(device_str)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f3d7c55 and 77f4889.

📒 Files selected for processing (4)
  • src/mqt/predictor/_version.py (1 hunks)
  • src/mqt/predictor/ml/helper.py (4 hunks)
  • src/mqt/predictor/ml/predictor.py (12 hunks)
  • src/mqt/predictor/rl/predictorenv.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-04T06:59:40.314Z
Learnt from: MatthiasReumann
Repo: munich-quantum-toolkit/core PR: 1301
File: mlir/lib/Dialect/MQTOpt/Transforms/Transpilation/LayeredUnit.cpp:84-85
Timestamp: 2025-12-04T06:59:40.314Z
Learning: In the MQTOpt MLIR routing passes (NaiveRoutingPassSC, AStarRoutingPassSC), the input IR is guaranteed to contain only 1-qubit and 2-qubit gates. All 3+-qubit gates must be decomposed before routing; otherwise the input IR is invalid. This invariant allows skipTwoQubitBlock in LayeredUnit.cpp to safely assert wires.size() == 2.

Applied to files:

  • src/mqt/predictor/ml/helper.py
🧬 Code graph analysis (2)
src/mqt/predictor/ml/helper.py (1)
src/mqt/predictor/utils.py (1)
  • calc_supermarq_features (94-146)
src/mqt/predictor/ml/predictor.py (1)
src/mqt/predictor/ml/helper.py (12)
  • TrainingData (577-587)
  • create_dag (192-303)
  • create_feature_vector (164-189)
  • evaluate_classification_model (325-404)
  • evaluate_regression_model (407-471)
  • get_openqasm3_gates (119-151)
  • get_path_trained_model (48-50)
  • get_path_trained_model_gnn (53-57)
  • get_path_training_circuits (60-62)
  • get_path_training_circuits_compiled (65-67)
  • get_path_training_data (36-38)
  • train_model (474-573)
🪛 Ruff (0.14.8)
src/mqt/predictor/ml/predictor.py

148-148: Logging statement uses f-string

(G004)


151-151: Logging statement uses f-string

(G004)


175-175: Boolean-typed positional argument in function definition

(FBT001)


175-175: Boolean default positional argument in function definition

(FBT002)


448-448: Boolean-typed positional argument in function definition

(FBT001)


448-448: Boolean default positional argument in function definition

(FBT002)


569-569: Boolean-typed positional argument in function definition

(FBT001)


569-569: Boolean default positional argument in function definition

(FBT002)


800-800: Boolean-typed positional argument in function definition

(FBT001)


800-800: Boolean default positional argument in function definition

(FBT002)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: 🐍 Test (ubuntu-24.04) / 🐍 ubuntu-24.04
  • GitHub Check: 🐍 Test (macos-14) / 🐍 macos-14
  • GitHub Check: 🐍 Test (windows-2022) / 🐍 windows-2022
🔇 Additional comments (13)
src/mqt/predictor/ml/helper.py (6)

18-26: LGTM on new imports.

The added imports for numpy, torch, qiskit utilities, and sklearn metrics are appropriate for the new GNN and evaluation functionality.


53-58: LGTM on GNN path helper.

Clean implementation using f-string and proper Path handling for the GNN checkpoint file.


119-152: LGTM on OpenQASM 3.0 gate list.

The function includes a helpful reference to the specification URL and a maintenance note.


192-304: DAG featurization handles edge cases well.

The implementation correctly guards against:

  • Empty edges (lines 267-270)
  • Empty topo_nodes (lines 274-278)

One observation: nodes (line 234) and topo_nodes (line 273) come from different DAG methods, but critical path lookup at line 297 assumes all nodes in nodes exist in dist_in/dist_out (keyed by topo_nodes). This should be safe since both return operation nodes from the same DAG, but verify this assumption holds if Qiskit's DAG API changes.


407-472: LGTM on regression evaluator.

Proper handling of edge cases including empty arrays and constant targets for R2 calculation.


576-587: LGTM on TrainingData type widening.

The updated types properly support both classical numpy arrays and GNN graph data.

src/mqt/predictor/ml/predictor.py (7)

18-64: LGTM on new imports and type definitions.

The imports and type aliases (GNNSample, FeatureSample, TrainingSample) are well-organized and support the new GNN functionality.


92-99: LGTM on TrainGNNKwargs TypedDict.

Good use of TypedDict with total=False for optional GNN training configuration.


329-342: LGTM on GNN training data generation.

The use of canonical device ordering via [scores.get(dev.description, -1.0) for dev in self.devices] ensures consistent label ordering between training and inference.


502-506: LGTM on k_folds clamping.

The k_folds = max(2, k_folds) guard prevents KFold(n_splits=1) which would raise an error.


657-660: LGTM on class labels persistence.

Storing json_dict["class_labels"] ensures device ordering is preserved for inference.


689-699: LGTM on task-aware verbose evaluation.

The branching on task for evaluation (regression vs classification) is correct.


835-873: LGTM on GNN prediction path.

Key improvements addressed from previous reviews:

  • mlp_units parsed from json_dict["mlp"] (line 841)
  • class_labels loaded from json_dict["class_labels"] (line 863)
  • torch.load(path, weights_only=True) for security (line 857)
  • Batch dimension squeezed before length check (line 866)

Comment on lines +436 to +450
def objective(
self,
trial: optuna.Trial,
dataset: NDArray[np.float64] | list[torch_geometric.data.Data],
task: str,
in_feats: int,
num_outputs: int,
loss_fn: nn.Module,
k_folds: int,
batch_size: int = 32,
num_epochs: int = 10,
patience: int = 10,
verbose: bool = False,
device: str | None = None,
) -> float:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Make verbose parameter keyword-only.

The verbose: bool = False parameter at line 448 triggers Ruff FBT001/FBT002. Consider making it keyword-only for consistency.

Apply this diff:

     def objective(
         self,
         trial: optuna.Trial,
         dataset: NDArray[np.float64] | list[torch_geometric.data.Data],
         task: str,
         in_feats: int,
         num_outputs: int,
         loss_fn: nn.Module,
         k_folds: int,
         batch_size: int = 32,
         num_epochs: int = 10,
         patience: int = 10,
-        verbose: bool = False,
         device: str | None = None,
+        *,
+        verbose: bool = False,
     ) -> float:
🧰 Tools
🪛 Ruff (0.14.8)

448-448: Boolean-typed positional argument in function definition

(FBT001)


448-448: Boolean default positional argument in function definition

(FBT002)

🤖 Prompt for AI Agents
In src/mqt/predictor/ml/predictor.py around lines 436 to 450, make the verbose
parameter keyword-only by inserting a lone asterisk (*) immediately before the
verbose parameter in the objective function signature; this ensures verbose
cannot be passed positionally while leaving the other parameters and defaults
unchanged.

Comment on lines 799 to 801
def predict_device_for_figure_of_merit(
qc: Path | QuantumCircuit, figure_of_merit: figure_of_merit = "expected_fidelity"
qc: Path | QuantumCircuit, figure_of_merit: figure_of_merit = "expected_fidelity", gnn: bool = False
) -> Target:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider making gnn parameter keyword-only.

The gnn: bool = False parameter triggers Ruff FBT001/FBT002. Making it keyword-only maintains consistency with setup_device_predictor.

Apply this diff:

 def predict_device_for_figure_of_merit(
-    qc: Path | QuantumCircuit, figure_of_merit: figure_of_merit = "expected_fidelity", gnn: bool = False
+    qc: Path | QuantumCircuit, figure_of_merit: figure_of_merit = "expected_fidelity", *, gnn: bool = False
 ) -> Target:
🧰 Tools
🪛 Ruff (0.14.8)

800-800: Boolean-typed positional argument in function definition

(FBT001)


800-800: Boolean default positional argument in function definition

(FBT002)

🤖 Prompt for AI Agents
In src/mqt/predictor/ml/predictor.py around lines 799-801, the gnn parameter
should be keyword-only to satisfy Ruff FBT001/FBT002; update the function
signature to place a bare * before gnn (e.g., def
predict_device_for_figure_of_merit(qc: Path | QuantumCircuit, figure_of_merit:
figure_of_merit = "expected_fidelity", *, gnn: bool = False) -> Target) and then
update any call sites that pass gnn positionally to use the gnn=... keyword.

Comment on lines +382 to +383
# added replace_implicit_swaps=True for a warning to make sure that implicit swaps are also replaced
altered_qc = tk_to_qiskit(tket_qc, replace_implicit_swaps=True)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

🌐 Web query:

pytket-qiskit tk_to_qiskit replace_implicit_swaps parameter

💡 Result:

The tk_to_qiskit(..., replace_implicit_swaps=False) parameter (default False) tells the converter whether to replace pytket "implicit" qubit permutations with explicit SWAP gates before converting to Qiskit. If True, tk_to_qiskit calls Circuit.replace_implicit_wire_swaps() (so implicit wire-swaps are implemented as SWAP gates); if False a warning may be emitted when implicit permutations are present (controlled by the perm_warning flag). [1][2]

References:

  • pytket-qiskit tk_to_qiskit docstring. [1]
  • Implementation / Circuit.replace_implicit_wire_swaps() in pytket docs. [2]

🏁 Script executed:

# Examine the actual code context around lines 382-383
cat -n src/mqt/predictor/rl/predictorenv.py | sed -n '375,395p'

Repository: munich-quantum-toolkit/predictor

Length of output: 311


🏁 Script executed:

# Check if there's any documentation or comments nearby that provide context
cat -n src/mqt/predictor/rl/predictorenv.py | sed -n '370,400p'

Repository: munich-quantum-toolkit/predictor

Length of output: 311


🏁 Script executed:

# Look for imports to understand the pytket-qiskit version and setup
head -50 src/mqt/predictor/rl/predictorenv.py | grep -E "import|from.*pytket|from.*qiskit"

Repository: munich-quantum-toolkit/predictor

Length of output: 311


Improve comment clarity for better code documentation.

The parameter replace_implicit_swaps=True is valid and well-documented in pytket-qiskit. When set to True, it converts implicit qubit permutations to explicit SWAP gates before converting to Qiskit. The current comment could be clearer about this intent. Consider revising to:

-        # added replace_implicit_swaps=True for a warning to make sure that implicit swaps are also replaced
+        # Replace implicit wire swaps with explicit SWAP gates during conversion to Qiskit
         altered_qc = tk_to_qiskit(tket_qc, replace_implicit_swaps=True)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# added replace_implicit_swaps=True for a warning to make sure that implicit swaps are also replaced
altered_qc = tk_to_qiskit(tket_qc, replace_implicit_swaps=True)
# Replace implicit wire swaps with explicit SWAP gates during conversion to Qiskit
altered_qc = tk_to_qiskit(tket_qc, replace_implicit_swaps=True)
🤖 Prompt for AI Agents
In src/mqt/predictor/rl/predictorenv.py around lines 382-383, the inline comment
for replace_implicit_swaps=True is vague; update it to explicitly state that
setting replace_implicit_swaps=True tells pytket-qiskit to convert implicit
qubit permutations into explicit SWAP gates before conversion to Qiskit so those
qubit mappings are preserved in the resulting Qiskit circuit, and mention that
this parameter is provided by pytket-qiskit for that purpose.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (5)
src/mqt/predictor/_version.py (1)

1-39: This auto-generated file should not be tracked in version control.

The file header explicitly states "don't change, don't track in version control" (line 10). This file is generated by setuptools-scm at build time and should be added to .gitignore rather than committed.

Additionally, lines 21-24 contain unreachable code: TYPE_CHECKING = False is hardcoded, making the if TYPE_CHECKING: block dead code.

src/mqt/predictor/ml/helper.py (2)

306-319: get_results_classes will corrupt target labels when targets are 1D or [N, 1] shaped.

The function applies argmax(targets, dim=1) unconditionally, which works for one-hot encoded targets but silently corrupts integer class labels. When targets are [N, 1] shaped (as done at line 362-363 in evaluate_classification_model), argmax on dimension 1 returns all zeros regardless of actual values.

Make the function robust to both encodings:

 def get_results_classes(preds: torch.Tensor, targets: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor]:
     """Return predicted and target class indices.

     Arguments:
         preds: model predictions
-        targets: ground truth targets
+        targets: ground truth targets (one-hot encoded or integer class labels)
     Returns:
         pred_idx: predicted class indices
         targets_idx: target class indices
     """
     pred_idx = torch.argmax(preds, dim=1)
-    targets_idx = torch.argmax(targets, dim=1)
+    if targets.dim() == 1:
+        targets_idx = targets.long()
+    elif targets.size(1) == 1:
+        targets_idx = targets.view(-1).long()
+    else:
+        targets_idx = torch.argmax(targets, dim=1)

     return pred_idx, targets_idx

536-544: Validation evaluation hardcodes verbose=True, causing noisy output.

Lines 539 and 543 pass verbose=True to the evaluation functions regardless of the outer verbose parameter. This produces excessive output during training when the caller intended quiet operation.

             if task == "classification":
                 val_loss, val_metrics, _ = evaluate_classification_model(
-                    model, val_loader, loss_fn, device=str(device), verbose=True
+                    model, val_loader, loss_fn, device=str(device), verbose=False
                 )
             elif task == "regression":
                 val_loss, val_metrics, _ = evaluate_regression_model(
-                    model, val_loader, loss_fn, device=str(device), verbose=True
+                    model, val_loader, loss_fn, device=str(device), verbose=False
                 )
src/mqt/predictor/ml/predictor.py (2)

626-655: GNN instantiation is duplicated; consolidate using output_dim variable.

The two GNN instantiation blocks (lines 627-640 and 642-655) are nearly identical, differing only in output_dim. This was flagged in previous reviews but remains unaddressed.

         json_dict["num_outputs"] = len(self.devices) if self.figure_of_merit != "hellinger_distance" else 1
-        if self.figure_of_merit != "hellinger_distance":
-            model = GNN(
-                in_feats=int(len(get_openqasm3_gates()) + 1 + 6 + 3 + 1 + 1 + 1),
-                ...
-                output_dim=len(self.devices),
-                ...
-            ).to("cuda" if torch.cuda.is_available() else "cpu")
-        else:
-            model = GNN(
-                ...
-                output_dim=1,
-                ...
-            ).to("cuda" if torch.cuda.is_available() else "cpu")
+        output_dim = 1 if self.figure_of_merit == "hellinger_distance" else len(self.devices)
+        model = GNN(
+            in_feats=int(len(get_openqasm3_gates()) + 1 + 6 + 3 + 1 + 1 + 1),
+            num_conv_wo_resnet=dict_best_hyper["num_conv_wo_resnet"],
+            hidden_dim=dict_best_hyper["hidden_dim"],
+            num_resnet_layers=dict_best_hyper["num_resnet_layers"],
+            mlp_units=mlp_units,
+            output_dim=output_dim,
+            dropout_p=dict_best_hyper["dropout"],
+            bidirectional=dict_best_hyper["bidirectional"],
+            use_sag_pool=dict_best_hyper["sag_pool"],
+            sag_ratio=0.7,
+            conv_activation=torch.nn.functional.leaky_relu,
+            mlp_activation=torch.nn.functional.leaky_relu,
+        ).to("cuda" if torch.cuda.is_available() else "cpu")

668-688: Redundant train/val split and device type mismatch.

  1. Double split: _get_prepared_training_data() already performs a 70/30 train/test split (line 783-785). Lines 668-670 then split X_train again (80/20), resulting in effective 56%/14%/30% train/val/test proportions. Consider using training_data.X_test as the validation set instead.

  2. Type mismatch: Line 682 passes device (a torch.device object) to train_model, but train_model expects device: str | None. This works due to duck typing but is technically incorrect.

         optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
-        x_train, x_val, _y_train, _y_val = train_test_split(
-            training_data.X_train, training_data.y_train, test_size=0.2, random_state=5
-        )
-        train_loader = DataLoader(x_train, batch_size=16, shuffle=True)
-        val_loader = DataLoader(x_val, batch_size=16, shuffle=False)
+        train_loader = DataLoader(training_data.X_train, batch_size=16, shuffle=True)
+        val_loader = DataLoader(training_data.X_test, batch_size=16, shuffle=False)
         train_model(
             model,
             train_loader,
             optimizer,
             loss_fn,
             task=task,
             num_epochs=num_epochs,
-            device=device,
+            device=str(device),
             verbose=verbose,
             ...
         )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f3d7c55 and 77f4889.

📒 Files selected for processing (4)
  • src/mqt/predictor/_version.py (1 hunks)
  • src/mqt/predictor/ml/helper.py (4 hunks)
  • src/mqt/predictor/ml/predictor.py (12 hunks)
  • src/mqt/predictor/rl/predictorenv.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-04T06:59:40.314Z
Learnt from: MatthiasReumann
Repo: munich-quantum-toolkit/core PR: 1301
File: mlir/lib/Dialect/MQTOpt/Transforms/Transpilation/LayeredUnit.cpp:84-85
Timestamp: 2025-12-04T06:59:40.314Z
Learning: In the MQTOpt MLIR routing passes (NaiveRoutingPassSC, AStarRoutingPassSC), the input IR is guaranteed to contain only 1-qubit and 2-qubit gates. All 3+-qubit gates must be decomposed before routing; otherwise the input IR is invalid. This invariant allows skipTwoQubitBlock in LayeredUnit.cpp to safely assert wires.size() == 2.

Applied to files:

  • src/mqt/predictor/ml/helper.py
🧬 Code graph analysis (1)
src/mqt/predictor/ml/helper.py (1)
src/mqt/predictor/utils.py (1)
  • calc_supermarq_features (94-146)
🪛 Ruff (0.14.8)
src/mqt/predictor/ml/predictor.py

148-148: Logging statement uses f-string

(G004)


151-151: Logging statement uses f-string

(G004)


175-175: Boolean-typed positional argument in function definition

(FBT001)


175-175: Boolean default positional argument in function definition

(FBT002)


448-448: Boolean-typed positional argument in function definition

(FBT001)


448-448: Boolean default positional argument in function definition

(FBT002)


569-569: Boolean-typed positional argument in function definition

(FBT001)


569-569: Boolean default positional argument in function definition

(FBT002)


800-800: Boolean-typed positional argument in function definition

(FBT001)


800-800: Boolean default positional argument in function definition

(FBT002)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: 🐍 Test (ubuntu-24.04) / 🐍 ubuntu-24.04
  • GitHub Check: 🐍 Test (macos-14) / 🐍 macos-14
  • GitHub Check: 🐍 Test (windows-2022) / 🐍 windows-2022
🔇 Additional comments (6)
src/mqt/predictor/rl/predictorenv.py (1)

382-383: LGTM! Explicit implicit swap handling is appropriate.

The replace_implicit_swaps=True parameter ensures that implicit swaps from Tket circuits are explicitly materialized in the Qiskit circuit, preventing potential layout/routing inconsistencies downstream.

Minor note: The comment mentions "for a warning" but no warning is actually emitted. Consider either removing that phrase or adding a warnings.warn() call if you want to notify users when implicit swaps are replaced.

src/mqt/predictor/ml/helper.py (2)

192-303: LGTM! DAG featurization handles edge cases correctly.

The create_dag function properly handles:

  • Empty edges (lines 267-270): Creates torch.empty((2, 0), dtype=torch.long) for edge-less graphs
  • Empty topo_nodes (lines 274-278): Returns node features with zero critical flags
  • Unknown gates: Raises a clear ValueError with the gate name

The feature layout (one-hot gate, sin/cos params, arity, controls, num_params, critical_flag, fan_in, fan_out) is well-documented.


576-587: LGTM! TrainingData type widening supports both classical and GNN paths.

The widened types for X_train/X_test to support both numpy arrays and torch_geometric.data.Data lists, along with y_train/y_test supporting both numpy arrays and torch Tensors, correctly accommodates the dual RF/GNN training pipeline.

src/mqt/predictor/ml/predictor.py (3)

83-86: LGTM! Type aliases improve code readability.

The GNNSample, FeatureSample, and TrainingSample union types clearly document the expected data structures for both training paths.


835-873: LGTM! GNN prediction path handles model loading and inference correctly.

The implementation properly:

  • Loads hyperparameters from JSON metadata (lines 837-838)
  • Parses mlp_units from the stored string format (lines 840-841)
  • Uses weights_only=True for secure model loading (line 857)
  • Removes the batch dimension with squeeze(0) (line 866)
  • Retrieves class labels from JSON metadata (line 863)
  • Validates output/label length consistency (lines 868-870)

329-342: LGTM! GNN training sample uses canonical device ordering.

The code correctly uses [scores.get(dev.description, -1.0) for dev in self.devices] (line 331) to ensure device score ordering matches self.devices, which is sorted at initialization. This addresses the previous concern about ordering consistency between training and inference.

self,
devices: list[Target],
figure_of_merit: figure_of_merit = "expected_fidelity",
gnn: bool = False,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider making gnn parameter keyword-only to avoid accidental positional usage.

The gnn boolean parameter at line 175 (and similarly at lines 448, 569, 800) is a positional argument, which Ruff flags with FBT001/FBT002. While setup_device_predictor correctly uses keyword-only (*, at line 107), the Predictor.__init__ and other methods do not.

     def __init__(
         self,
         devices: list[Target],
         figure_of_merit: figure_of_merit = "expected_fidelity",
+        *,
         gnn: bool = False,
         logger_level: int = logging.INFO,
     ) -> None:

Similarly for predict_device_for_figure_of_merit:

 def predict_device_for_figure_of_merit(
-    qc: Path | QuantumCircuit, figure_of_merit: figure_of_merit = "expected_fidelity", gnn: bool = False
+    qc: Path | QuantumCircuit, figure_of_merit: figure_of_merit = "expected_fidelity", *, gnn: bool = False
 ) -> Target:
🧰 Tools
🪛 Ruff (0.14.8)

175-175: Boolean-typed positional argument in function definition

(FBT001)


175-175: Boolean default positional argument in function definition

(FBT002)

🤖 Prompt for AI Agents
In src/mqt/predictor/ml/predictor.py around line 175 (and also update signatures
at lines 448, 569, 800 and for predict_device_for_figure_of_merit), the boolean
parameter `gnn` is currently positional; change the function/method signatures
so `gnn` becomes a keyword-only parameter (e.g., introduce a `*` marker before
the keyword-only params or move `gnn` after an existing `*`), and update any
callers that relied on positional ordering to pass `gnn=` explicitly; ensure
signature consistency across Predictor.__init__ and the other mentioned methods
to satisfy Ruff FBT001/FBT002.

# Prepare data
if training_data is None:
training_data = self._get_prepared_training_data()
number_in_features = int(len(get_openqasm3_gates()) + 1 + 6 + 3 + 1 + 1 + 1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Magic number for in_feats is error-prone and duplicated.

The formula int(len(get_openqasm3_gates()) + 1 + 6 + 3 + 1 + 1 + 1) appears multiple times (lines 597, 628, 643, 844) with slight variations and no documentation of what each component represents.

Consider extracting this to a helper function or constant:

def get_gnn_input_features() -> int:
    """Calculate GNN input feature dimension.
    
    Components:
    - len(get_openqasm3_gates()) + 1: one-hot gate encoding (including 'measure')
    - 6: sin/cos for up to 3 parameters
    - 1: arity
    - 1: controls
    - 1: num_params
    - 1: critical_flag
    - 2: fan_in, fan_out
    """
    return len(get_openqasm3_gates()) + 1 + 6 + 1 + 1 + 1 + 1 + 2

Note: The current formula sums to 40, but create_dag concatenates: [onehots(28), params(6), arity(1), controls(1), num_params(1), critical_flag(1), fan_in(1), fan_out(1)] = 40 features. Verify this matches.

🤖 Prompt for AI Agents
In src/mqt/predictor/ml/predictor.py around line 597 (and also update
occurrences at ~628, ~643, ~844), the input-feature dimension is expressed as a
hard-to-read duplicated magic-number expression int(len(get_openqasm3_gates()) +
1 + 6 + 3 + 1 + 1 + 1); replace these with a single well-documented helper
function or constant (e.g. get_gnn_input_features()) that computes the sum with
named components (one-hot gate size, sin/cos params, arity, controls,
num_params, critical_flag, fan_in, fan_out), add a docstring listing each
component and its value, use that helper everywhere the magic expression
appears, and update any off-by-one discrepancies by verifying the helper result
matches the actual concatenation order in create_dag (onehots, params, arity,
controls, num_params, critical_flag, fan_in, fan_out).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (7)
tests/hellinger_distance/test_estimated_hellinger_distance.py (3)

294-319: Cleanup test still exists instead of using fixture teardown.

Past reviews recommended converting test_remove_files to a module- or session-scoped autouse fixture with yield-based teardown to ensure cleanup runs regardless of test ordering or parallelization. This suggestion was previously raised but the cleanup is still implemented as a test function.


217-220: Same FBT001 issue as previous test.

Apply the same parametrization fix here for consistency:

-@pytest.mark.parametrize("gnn", [False, True], ids=["rf", "gnn"])
-def test_train_and_qcompile_with_hellinger_model(
-    source_path: Path, target_path: Path, device: Target, gnn: bool
-) -> None:
+@pytest.mark.parametrize("model_type", ["rf", "gnn"])
+def test_train_and_qcompile_with_hellinger_model(
+    source_path: Path, target_path: Path, device: Target, model_type: str
+) -> None:
+    gnn = model_type == "gnn"

157-159: Boolean parameter remains despite past review addressing FBT001.

Past reviews suggested changing the gnn: bool parameter to a string-based parametrization (e.g., model_type: str with values "rf" and "gnn"), and this was marked as addressed in commit 1acc0cb. However, the current code still uses gnn: bool triggering Ruff FBT001.

If the team decided to keep the boolean parameter, consider silencing the warning in the linter configuration. Otherwise, apply the previously suggested fix:

-@pytest.mark.parametrize("gnn", [False, True], ids=["rf", "gnn"])
-def test_train_model_and_predict(device: Target, gnn: bool) -> None:
+@pytest.mark.parametrize("model_type", ["rf", "gnn"])
+def test_train_model_and_predict(device: Target, model_type: str) -> None:
+    gnn = model_type == "gnn"
src/mqt/predictor/ml/predictor.py (4)

436-450: Make verbose parameter keyword-only for consistency.

The verbose: bool = False parameter triggers Ruff FBT001/FBT002. A past review suggested making it keyword-only, which would align with the rest of the codebase and prevent accidental positional usage.

     def objective(
         self,
         trial: optuna.Trial,
         dataset: NDArray[np.float64] | list[torch_geometric.data.Data],
         task: str,
         in_feats: int,
         num_outputs: int,
         loss_fn: nn.Module,
         k_folds: int,
         batch_size: int = 32,
         num_epochs: int = 10,
         patience: int = 10,
-        verbose: bool = False,
         device: str | None = None,
+        *,
+        verbose: bool = False,
     ) -> float:

598-598: Magic number for GNN input features still duplicated.

Past reviews identified that the expression int(len(get_openqasm3_gates()) + 1 + 6 + 3 + 1 + 1 + 1) appears at lines 598, 629, 644, and 845 without documentation and suggested extracting it to a helper function like get_gnn_input_features() with a docstring explaining each component (one-hot gates, sin/cos params, arity, controls, num_params, critical_flag, fan_in, fan_out).


175-175: Consider keyword-only boolean parameters for consistency.

The module has inconsistent handling of the gnn boolean parameter:

  • setup_device_predictor (line 107) correctly uses keyword-only (*, gnn: bool = False)
  • Predictor.__init__ (line 175), objective (line 448), train_gnn_model (line 570), and predict_device_for_figure_of_merit (line 801) use positional boolean parameters

For consistency and to address Ruff FBT001/FBT002, consider making all boolean parameters keyword-only:

 def __init__(
     self,
     devices: list[Target],
     figure_of_merit: figure_of_merit = "expected_fidelity",
+    *,
     gnn: bool = False,
     logger_level: int = logging.INFO,
 ) -> None:

Apply similarly to other methods.


669-675: Redundant train/validation split persists after previous review.

Past reviews (marked as addressed in commits 0992bd0 to 77f4889) identified that _get_prepared_training_data already performs a 70/30 train/test split, and the additional 80/20 split at lines 669-671 results in unintended effective proportions (56%/14%/30%). The recommendation was to use training_data.X_test as the validation set.

         optimizer = torch.optim.Adam(model.parameters(), lr=dict_best_hyper["lr"])
-        # Train-validation split (needed for early stopping and hyperparameter optimization)
-        x_train, x_val, _y_train, _y_val = train_test_split(
-            training_data.X_train, training_data.y_train, test_size=0.2, random_state=5
-        )
         # Dataloader
-        train_loader = DataLoader(x_train, batch_size=16, shuffle=True)
-
-        val_loader = DataLoader(x_val, batch_size=16, shuffle=False)
+        train_loader = DataLoader(training_data.X_train, batch_size=16, shuffle=True)
+        val_loader = DataLoader(training_data.X_test, batch_size=16, shuffle=False)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77f4889 and a6ada4d.

📒 Files selected for processing (2)
  • src/mqt/predictor/ml/predictor.py (12 hunks)
  • tests/hellinger_distance/test_estimated_hellinger_distance.py (5 hunks)
🧰 Additional context used
🪛 Ruff (0.14.8)
tests/hellinger_distance/test_estimated_hellinger_distance.py

158-158: Boolean-typed positional argument in function definition

(FBT001)


219-219: Boolean-typed positional argument in function definition

(FBT001)

src/mqt/predictor/ml/predictor.py

148-148: Logging statement uses f-string

(G004)


151-151: Logging statement uses f-string

(G004)


175-175: Boolean-typed positional argument in function definition

(FBT001)


175-175: Boolean default positional argument in function definition

(FBT002)


448-448: Boolean-typed positional argument in function definition

(FBT001)


448-448: Boolean default positional argument in function definition

(FBT002)


570-570: Boolean-typed positional argument in function definition

(FBT001)


570-570: Boolean default positional argument in function definition

(FBT002)


801-801: Boolean-typed positional argument in function definition

(FBT001)


801-801: Boolean default positional argument in function definition

(FBT002)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: 🐍 Test (macos-14) / 🐍 macos-14
  • GitHub Check: 🐍 Test (windows-2022) / 🐍 windows-2022
  • GitHub Check: 🐍 Test (ubuntu-24.04) / 🐍 ubuntu-24.04
🔇 Additional comments (1)
tests/hellinger_distance/test_estimated_hellinger_distance.py (1)

255-266: Windows timeout warning logic appears inconsistent.

The test expects a Windows timeout warning only when gnn=False (line 255: if sys.platform == "win32" and not gnn). However, both branches pass the same timeout=600 parameter to compile_training_circuits. Why would the warning behavior differ based on the GNN flag?

If this is intentional (e.g., GNN compilation doesn't use timeouts), please add a comment explaining the rationale. Otherwise, consider whether the condition should be if sys.platform == "win32" without the and not gnn clause.

Comment on lines +627 to +656
if self.figure_of_merit != "hellinger_distance":
model = GNN(
in_feats=int(len(get_openqasm3_gates()) + 1 + 6 + 3 + 1 + 1 + 1),
num_conv_wo_resnet=dict_best_hyper["num_conv_wo_resnet"],
hidden_dim=dict_best_hyper["hidden_dim"],
num_resnet_layers=dict_best_hyper["num_resnet_layers"],
mlp_units=mlp_units,
output_dim=len(self.devices),
dropout_p=dict_best_hyper["dropout"],
bidirectional=dict_best_hyper["bidirectional"],
use_sag_pool=dict_best_hyper["sag_pool"],
sag_ratio=0.7,
conv_activation=torch.nn.functional.leaky_relu,
mlp_activation=torch.nn.functional.leaky_relu,
).to("cuda" if torch.cuda.is_available() else "cpu")
else:
model = GNN(
in_feats=int(len(get_openqasm3_gates()) + 1 + 6 + 3 + 1 + 1 + 1),
num_conv_wo_resnet=dict_best_hyper["num_conv_wo_resnet"],
hidden_dim=dict_best_hyper["hidden_dim"],
num_resnet_layers=dict_best_hyper["num_resnet_layers"],
mlp_units=mlp_units,
output_dim=1,
dropout_p=dict_best_hyper["dropout"],
bidirectional=dict_best_hyper["bidirectional"],
use_sag_pool=dict_best_hyper["sag_pool"],
sag_ratio=0.7,
conv_activation=torch.nn.functional.leaky_relu,
mlp_activation=torch.nn.functional.leaky_relu,
).to("cuda" if torch.cuda.is_available() else "cpu")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

GNN instantiation duplication persists despite past review.

Past reviews (marked as addressed in commits 2a9457d to 77cca31) suggested consolidating the two GNN instantiation blocks that differ only in output_dim. However, the duplication remains in the current code.

-        if self.figure_of_merit != "hellinger_distance":
-            model = GNN(
-                in_feats=int(len(get_openqasm3_gates()) + 1 + 6 + 3 + 1 + 1 + 1),
-                num_conv_wo_resnet=dict_best_hyper["num_conv_wo_resnet"],
-                ...
-                output_dim=len(self.devices),
-                ...
-            ).to("cuda" if torch.cuda.is_available() else "cpu")
-        else:
-            model = GNN(
-                ...
-                output_dim=1,
-                ...
-            ).to("cuda" if torch.cuda.is_available() else "cpu")
+        output_dim = 1 if self.figure_of_merit == "hellinger_distance" else len(self.devices)
+        device_str = "cuda" if torch.cuda.is_available() else "cpu"
+        model = GNN(
+            in_feats=int(len(get_openqasm3_gates()) + 1 + 6 + 3 + 1 + 1 + 1),
+            num_conv_wo_resnet=dict_best_hyper["num_conv_wo_resnet"],
+            hidden_dim=dict_best_hyper["hidden_dim"],
+            num_resnet_layers=dict_best_hyper["num_resnet_layers"],
+            mlp_units=mlp_units,
+            output_dim=output_dim,
+            dropout_p=dict_best_hyper["dropout"],
+            bidirectional=dict_best_hyper["bidirectional"],
+            use_sag_pool=dict_best_hyper["sag_pool"],
+            sag_ratio=0.7,
+            conv_activation=torch.nn.functional.leaky_relu,
+            mlp_activation=torch.nn.functional.leaky_relu,
+        ).to(device_str)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants