Graph provenance by calvinp0 · Pull Request #841 · ReactionMechanismGenerator/ARC

calvinp0 · 2026-03-28T14:46:32Z

This pull request introduces a provenance tracking and visualization system to the ARC workflow, enabling detailed recording and rendering of the sequence of computational events (such as job launches, completions, troubleshooting, and decision points) in each run. The provenance data is saved in YAML format and, if Graphviz is available, also rendered as a graph (DOT and SVG). The scheduler now records all relevant events and generates these artifacts at the end of a run. Comprehensive tests are included to validate the new functionality.

Key changes include:

Provenance tracking and event recording:

Added a provenance dictionary to the Scheduler class to track run metadata and a list of events, with initialization and persistence logic. Events such as species initialization, job start, job finish, troubleshooting, and TS guess selection are now recorded via the new record_provenance_event method. [1] [2] [3] [4]
On restart, previous provenance logs are loaded, and new events are appended, ensuring continuity across interrupted runs. [1] [2]
The scheduler finalizes provenance at the end of a run, generating all artifacts. [1] [2]

Provenance artifact generation and visualization:

Implemented save_provenance_artifacts in arc/plotter.py to save the provenance event log as YAML and, if possible, render the event graph using Graphviz (DOT and SVG). The graph visualizes the relationships between species, jobs, troubleshooting decisions, and TS guess selections. Helper functions ensure graph labels are readable and node IDs are safe.
Added logic to handle missing Graphviz gracefully, falling back to YAML-only output.

Testing and validation:

Added comprehensive tests for label wrapping and for the full provenance artifact generation pipeline, verifying that all key node types and relationships are rendered in the output graph.

API and typing improvements:

Updated function signatures and docstrings to support provenance tracking, including new parameters for parent job and reason in run_job. [1] [2]
Minor typing and import cleanups in arc/scheduler.py.

Utility and robustness:

Ensured output directories are created as needed and that provenance logs are robust to parsing errors or missing files. [1] [2]

These changes lay the foundation for reproducible, auditable ARC runs and provide a clear visual summary of complex computational workflows.

Provenance tracking and event recording

Added a provenance dictionary and event recording methods to the Scheduler class, capturing all key events during a run and persisting them to YAML. [1] [2] [3] [4]
Implemented logic to load previous provenance logs on restart and ensure continuity of event tracking. [1] [2]
Scheduler now finalizes provenance and generates artifacts at the end of a run. [1] [2]

Provenance artifact generation and visualization

Added save_provenance_artifacts in arc/plotter.py to render provenance graphs (DOT/SVG) and YAML logs, with readable labels and safe node IDs. Handles missing Graphviz gracefully.

Testing

Added robust tests for label wrapping and for provenance artifact generation, ensuring all key events and relationships are rendered and validated.

API and typing improvements

Updated run_job and related methods to accept provenance-related parameters and improved docstrings and typing. [1] [2] [3]

Utility and robustness

Ensured output directories are created as needed and made provenance log handling robust to errors and missing files. [1] [2]

Updating branch to main

- Improve provenance logging by avoiding duplicate initialization events and handling potentially corrupted provenance files. - Ensure internal consistency on restart by verifying that species marked as converged have all required output paths, resetting their status otherwise. - Fix job key generation for reactions (lists of labels) and improve tracking for running conformer jobs. - Defer TS switching during conformer optimization batches to avoid unnecessary job deletions.

Ensure that successful and unsuccessful transition state generation methods are listed uniquely and formatted using join to avoid trailing commas in the species report.

- Update graph logic to correctly link jobs to parent jobs, troubleshooting diamonds, or TS selection decisions instead of always defaulting to the last node. - Preserve intentional newlines in wrapped labels to improve node readability. - Ensure the provenance YAML file is saved with an updated timestamp even when the graphviz package is unavailable. - Add support for visualizing TS guess selection failure events as decision nodes.

- Use stable indices for TS guesses to ensure correct mapping between jobs and guess objects during conformer optimization. - Add unit tests for provenance deduplication, restart output sanitization, and multi-species label handling in the Scheduler.

- Correct "unsuccessfully" to "unsuccessful" in the transition state report string. - Update unit tests to reflect the deduplication of generation methods and the removal of trailing commas in the report output.

Copilot

Pull request overview

Adds provenance tracking to ARC runs, persisting an event log to YAML and optionally rendering a Graphviz (DOT/SVG) visualization at the end of scheduling.

Changes:

Introduces scheduler-side provenance event recording (job start/finish, troubleshooting, TS guess selection) with persistence and restart behavior.
Adds plotter support to save provenance artifacts (YAML + Graphviz DOT/SVG) with label wrapping and safe node IDs.
Updates/extends unit tests to validate provenance logging/rendering and improves TS report formatting (deduped method lists).

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`environment.yml`	Adds conda package for the Python Graphviz bindings used for rendering provenance graphs.
`arc/species/species.py`	Deduplicates TS report method lists and fixes wording for unsuccessful methods.
`arc/species/species_test.py`	Updates expected TS report string to match new formatting.
`arc/scheduler.py`	Implements provenance state/events, restart sanitization for missing paths, and records key scheduling events.
`arc/scheduler_test.py`	Adds tests for provenance restart dedup, restart sanitization, delete-all-jobs reset behavior, and multi-label provenance.
`arc/plotter.py`	Adds provenance artifact generation (YAML + optional DOT/SVG) and helper functions for Graphviz output.
`arc/plotter_test.py`	Adds tests for graph label wrapping and provenance artifact generation/graph structure.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

arc/scheduler.py

Copilot · 2026-03-28T14:51:52Z

arc/scheduler.py

+        self.provenance['events'].append(event)
+        self.save_provenance()


record_provenance_event() persists provenance.yml on every event. In real runs this could be thousands of events (job starts/finishes, troubleshooting, etc.) and may noticeably slow scheduling due to synchronous disk I/O. Consider buffering events in memory and flushing periodically (e.g., every N events / every M seconds) and/or only persisting on key milestones + finalize, while still ensuring durability on restart.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

arc/scheduler.py

Copilot · 2026-03-28T16:39:36Z

arc/scheduler.py

        self.species_dict, self.rxn_dict = dict(), dict()
        for species in self.species_list:
            self.species_dict[species.label] = species
        for rxn in self.rxn_list:
            self.rxn_dict[rxn.index] = rxn
+        self._initialize_provenance()


_initialize_provenance() is called before TS species are created/added from rxn_list, so those TS labels never get a species_initialized event and the provenance graph/log will be incomplete for reaction runs. Consider moving _initialize_provenance() to after the reaction/TS-species construction block, or explicitly recording species_initialized when a TS species is created and appended to species_list.

alongd · 2026-03-28T18:20:57Z

Thanks for this awesome addition! For a while we wanted to add something to visualize ARC's progress. Is this meant to be live or static at the end of the run? Eventually, we want a live HTML portal to track ARC/T3 progress, will be great to have that in mind when developing the feature in the present PR so we can build on top of that

codecov · 2026-03-28T18:44:00Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.58%. Comparing base (960197e) to head (272fc55).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #841      +/-   ##
==========================================
+ Coverage   60.09%   60.58%   +0.48%     
==========================================
  Files         102      105       +3     
  Lines       31045    31825     +780     
  Branches     8087     8236     +149     
==========================================
+ Hits        18658    19282     +624     
- Misses      10071    10169      +98     
- Partials     2316     2374      +58

Flag	Coverage Δ
functionaltests	`60.58% <ø> (+0.48%)`	⬆️
unittests	`60.58% <ø> (+0.48%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

arc/provenance/provenance_test.py

arc/plotter.py

arc/provenance/graph.py

arc/provenance/nodes.py

arc/provenance/provenance_test.py

arc/scheduler.py

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-11T19:24:13Z

arc/scheduler.py

        successful_tsgs = [tsg for tsg in self.species_dict[label].ts_guesses if tsg.success]
        if len(successful_tsgs) > 1:
            xyzs = [tsg.initial_xyz for tsg in successful_tsgs]
            piped_indices = self.pipe_planner.try_pipe_ts_opt(label, xyzs, self.ts_guess_level)
            if not piped_indices:
                self.job_dict[label]['conf_opt'] = dict()
            for i, tsg in enumerate(successful_tsgs):
                tsg.conformer_index = i  # Store the conformer index to match them later.
                if i in piped_indices:
                    continue
                if 'conf_opt' not in self.job_dict[label]:
                    self.job_dict[label]['conf_opt'] = dict()
+            self.job_dict[label]['conf_opt'] = dict()
+            for tsg in successful_tsgs:
+                if tsg.index is None:
+                    existing_indices = [guess.index for guess in self.species_dict[label].ts_guesses
+                                        if guess.index is not None]
+                    tsg.index = max(existing_indices or [-1]) + 1
+                tsg.conformer_index = tsg.index  # Set before run_job so restart state is consistent.
                self.run_job(label=label,
                             xyz=tsg.initial_xyz,
                             level_of_theory=self.ts_guess_level,
                             job_type='conf_opt',
-                             conformer=i,
+                             conformer=tsg.index,
                             )


In run_ts_conformer_jobs(), the new block resets self.job_dict[label]['conf_opt'] and then loops over successful_tsgs a second time, which effectively ignores piped_indices and will spawn conf_opt jobs even for TS guesses that were supposed to be piped (and also overwrites the earlier tsg.conformer_index assignment). This looks like an indentation/logic error: the job_dict reset and run_job calls should respect piped_indices and avoid wiping any already-planned piped work.

Copilot · 2026-04-11T19:24:14Z

arc/scheduler.py

+            for job_type, spawn_job_type in self.job_types.items():
+                if spawn_job_type and not self.output[label]['job_types'][job_type] \
+                        and not ((self.species_dict[label].is_ts and job_type in ['scan', 'conf_opt'])
+                                 or (self.species_dict[label].number_of_atoms == 1
+                                     and job_type in ['conf_opt', 'opt', 'fine', 'freq', 'rotors', 'bde'])
+                                 or job_type == 'bde' and self.species_dict[label].bdes is None
+                                 or job_type == 'conf_opt'
+                                 or job_type == 'irc'
+                                 or job_type == 'tsg'):
+                    logger.debug(f'Species {label} did not converge.')
+                    all_converged = False
+                    break


check_all_done() now iterates over self.job_types twice with the same condition block (the new loop repeats the existing convergence check). This duplication is easy to miss and can complicate future edits; it should be removed or consolidated so the convergence logic is only evaluated once (and the TS E0 special-case remains intact).

Suggested change

for job_type, spawn_job_type in self.job_types.items():

if spawn_job_type and not self.output[label]['job_types'][job_type] \

and not ((self.species_dict[label].is_ts and job_type in ['scan', 'conf_opt'])

or (self.species_dict[label].number_of_atoms == 1

and job_type in ['conf_opt', 'opt', 'fine', 'freq', 'rotors', 'bde'])

or job_type == 'bde' and self.species_dict[label].bdes is None

or job_type == 'conf_opt'

or job_type == 'irc'

or job_type == 'tsg'):

logger.debug(f'Species {label} did not converge.')

all_converged = False

break

Copilot · 2026-04-11T19:24:14Z

arc/provenance/nodes.py

+    return val.value if isinstance(val, Enum) else val
+
+
+# ── Enums ───────────────────────────────��──────────────────────────────────���─


The section header comment contains corrupted/unprintable characters ("��"/"��"), which will show up in diffs and can cause encoding noise in editors. Please replace this with plain ASCII/UTF-8 characters so the file remains clean and searchable.

Suggested change

# ── Enums ───────────────────────────────��──────────────────────────────────��─

# -- Enums ------------------------------------------------------------------

Copilot · 2026-04-11T19:24:14Z

arc/plotter.py

+def render_provenance_graph(prov_graph, run_label: str = 'ARC run') -> 'graphviz.Digraph':
+    """
+    Render a :class:`ProvenanceGraph` as a Graphviz directed graph.
+
+    Node styling by type:
+      - **species**: box / aliceblue
+      - **calculation**: box / color by status (honeydew=done, mistyrose=errored, white=pending)
+      - **data**: note / cornsilk
+      - **decision**: diamond / color by kind (lavender, moccasin, mistyrose)
+
+    Edge styling by type:
+      - ``selected_by``: solid green
+      - ``rejected_by``: dashed red
+      - ``troubleshot_by``: dashed orange
+      - ``retried_as`` / ``fine_of``: dotted gray
+      - others: solid black
+
+    Args:
+        prov_graph: A :class:`ProvenanceGraph` instance.
+        run_label (str): Label for the root run node.
+
+    Returns:
+        graphviz.Digraph: The rendered graph object.
+    """
+    gv = graphviz.Digraph(
+        name='arc_provenance',
+        comment=f'ARC provenance for {run_label}',
+        graph_attr={'rankdir': 'LR', 'splines': 'true', 'overlap': 'false'},
+        node_attr={'shape': 'box', 'style': 'rounded,filled', 'fillcolor': 'white', 'fontname': 'Helvetica'},
+        edge_attr={'fontname': 'Helvetica'},
+    )


render_provenance_graph() assumes the optional dependency graphviz is available; if the import failed earlier, this will raise an AttributeError when trying to access graphviz.Digraph. Since graphviz is treated as optional elsewhere, consider adding an explicit guard at the top of this function (raise a clear ImportError/RuntimeError) so callers get a helpful message.

Copilot · 2026-04-11T19:24:15Z

arc/plotter.py

                                   )
 from arc.species.perceive import perceive_molecule_from_xyz
 from arc.species.species import ARCSpecies, rmg_mol_to_dict_repr
+from arc.provenance.nodes import _enum_val, NodeType, EdgeType, DecisionKind


Unused imports were added from arc.provenance.nodes (_enum_val, NodeType, EdgeType, DecisionKind) but they are not referenced anywhere in this module. Please remove them (or use them) to avoid lint/static-analysis failures and keep the dependency surface minimal.

Suggested change

from arc.provenance.nodes import _enum_val, NodeType, EdgeType, DecisionKind

Copilot · 2026-04-11T19:24:15Z

arc/provenance/graph.py

+    def add_species_node(self, label: str, is_ts: bool = False,
+                         timestamp: Optional[str] = None) -> str:
+        """
+        Convenience method to add a species node.
+
+        Args:
+            label: Species label.


ProvenanceGraph.add_species_node() is annotated as taking label: str, but the new test suite exercises label=None (e.g., to ensure rendering falls back to node_id). To keep typing consistent with actual supported inputs, consider changing the signature to label: Optional[str] (and similarly for other node-creation helpers if None is allowed).

Suggested change

def add_species_node(self, label: str, is_ts: bool = False,

timestamp: Optional[str] = None) -> str:

"""

Convenience method to add a species node.

Args:

label: Species label.

def add_species_node(self, label: Optional[str] = None, is_ts: bool = False,

timestamp: Optional[str] = None) -> str:

"""

Convenience method to add a species node.

Args:

label: Optional species label.

Copilot · 2026-04-11T19:24:15Z

arc/species/species.py

+            Optional[dict]: A summary dict with keys ``n_before``, ``n_after``, and
+            ``merged`` (list of lists of merged indices), or ``None`` if clustering
+            was skipped.


cluster_tsgs() now always returns a summary dict for TS species, even when no clustering actually occurred (n_before == n_after). The docstring still says it returns None when clustering was skipped, which is no longer accurate. Please update the docstring to match the behavior, or return None when nothing was clustered so callers can rely on the Optional[dict] contract.

Suggested change

Optional[dict]: A summary dict with keys ``n_before``, ``n_after``, and

``merged`` (list of lists of merged indices), or ``None`` if clustering

was skipped.

Optional[dict]: ``None`` if this species is not a TS or has no TS guesses.

Otherwise, returns a summary dict with keys ``n_before``, ``n_after``,

and ``merged`` (a list of lists of merged indices), even if no TS

guesses were merged.

Added also TS troubleshoots

calvinp0 added 8 commits March 27, 2026 19:34

Initial work

eec3acd

Merge branch 'main' into graph_provenance

76dc8e9

Updating branch to main

Added graphviz to environment

c72df97

Deduplicate and format methods in the TS report

989a9fd

Ensure that successful and unsuccessful transition state generation methods are listed uniquely and formatted using join to avoid trailing commas in the species report.

Fix TS report typo and update test expectations

4f0882f

- Correct "unsuccessfully" to "unsuccessful" in the transition state report string. - Update unit tests to reflect the deduplication of generation methods and the removal of trailing commas in the report output.

Copilot AI review requested due to automatic review settings March 28, 2026 14:46

github-actions bot added Module: Plotter Module: Scheduler Module: Species labels Mar 28, 2026

Copilot started reviewing on behalf of calvinp0 March 28, 2026 14:47 View session

Copilot AI reviewed Mar 28, 2026

View reviewed changes

Updates

7f53dd8

calvinp0 requested a review from Copilot March 28, 2026 16:34

Copilot started reviewing on behalf of calvinp0 March 28, 2026 16:34 View session

Copilot AI reviewed Mar 28, 2026

View reviewed changes

Further updates

987037f

Merge branch 'main' into graph_provenance

9ba5c3a

github-advanced-security bot found potential problems Apr 11, 2026

View reviewed changes

calvinp0 requested a review from Copilot April 11, 2026 19:20

Copilot started reviewing on behalf of calvinp0 April 11, 2026 19:20 View session

Copilot AI reviewed Apr 11, 2026

View reviewed changes

calvinp0 force-pushed the graph_provenance branch from 795b090 to 60a8226 Compare April 11, 2026 19:36

calvinp0 added 3 commits April 11, 2026 22:37

Update graph building

60a8226

Merge branch 'main' into graph_provenance

4297c16

Refining

5a3febb

Emitting SP node when LoT for geo == sp

272fc55

Added also TS troubleshoots

		self.provenance['events'].append(event)
		self.save_provenance()

		return val.value if isinstance(val, Enum) else val


		# ── Enums ───────────────────────────────��──────────────────────────────────��─

	# ── Enums ───────────────────────────────��──────────────────────────────────��─
	# -- Enums ------------------------------------------------------------------

-            Optional[dict]: A summary dict with keys ``n_before``, ``n_after``, and
-            ``merged`` (list of lists of merged indices), or ``None`` if clustering
-            was skipped.
+            Optional[dict]: ``None`` if this species is not a TS or has no TS guesses.
+            Otherwise, returns a summary dict with keys ``n_before``, ``n_after``,
+            and ``merged`` (a list of lists of merged indices), even if no TS
+            guesses were merged.

Conversation

calvinp0 commented Mar 28, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

alongd commented Mar 28, 2026

Uh oh!

codecov bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Mar 28, 2026 •

edited

Loading