Bug Description
You can see in the regression tests that we are getting non deterministic decompositions of the fused ring systems:
How To Reproduce
Just making documentation changes in a PR will show this. Could also probably run locally, but I haven't tried this yet.
Expected Behavior
Both ring decompositions are valid, but we just want to always get the same one.
Installation Information
Referring to the CI in particular here - if running locally, I'll update this.
Additional Context
We traded one non-determinism for another in #2796
RDKit is deterministic in the GetSSSR step, but that step is based on the atom identifiers, which are not sorted deterministically when converting from RMG to RDKit (I think). This is some AI-assisted code which I think might solve the problem, and I will (eventually) open a PR to test:
def get_smallest_set_of_smallest_rings(self, symmetrized=False):
"""
Returns the smallest set of smallest rings (SSSR) as a list of lists of Atom objects, optionally with symmetrization.
Uses RDKit's built-in ring perception (GetSSSR) by default.
Note that this is not the same as the Symmetrized SSSR (GetSymmSSSR) with symmetrized=True.
The symmetrized SSSR is at least as large as the SSSR for a molecule.
In certain highly-symmetric cases (e.g. cubane), the symmetrized SSSR
can be a bit larger (i.e. the number of symmetrized rings is >= NumBonds-NumAtoms+1).
It is usually more chemically meaningful, and is less random/arbitrary than the SSSR,
though RMG uses SSSR for historical reasons.
"""
if symmetrized:
if self._symm_sssr is not None:
return list(self._symm_sssr)
else:
if self._sssr is not None:
return list(self._sssr)
# RDKit does not support electron
if self.is_electron():
return []
from rdkit import Chem
sssr = []
# Get the symmetric SSSR using RDKit
rdkit_mol = self.to_rdkit_mol(remove_h=False,
sanitize=False,
return_mapping=False,
save_order=True,
ignore_bond_orders=True)
if symmetrized:
ring_info = Chem.GetSymmSSSR(rdkit_mol)
else:
# Force deterministic SSSR by canonicalizing the atom numbering first.
# This prevents RDKit's arbitrary graph traversal from selecting
# different sets of equivalent rings across different runs/platforms.
ranks = list(Chem.CanonicalRankAtoms(rdkit_mol, breakTies=True))
rank_to_idx = {rank: idx for idx, rank in enumerate(ranks)}
# new_order maps the new atom index to the original atom index
new_order = [rank_to_idx[i] for i in range(rdkit_mol.GetNumAtoms())]
canonical_mol = Chem.RenumberAtoms(rdkit_mol, new_order)
canonical_rings = Chem.GetSSSR(canonical_mol)
# Map the resulting ring indices back to the original rdkit_mol numbering
ring_info = []
for ring in canonical_rings:
orig_ring = tuple([new_order[idx] for idx in ring])
ring_info.append(orig_ring)
for ring in ring_info:
atom_ring = [self.atoms[idx] for idx in ring]
sorted_ring = self.sort_cyclic_vertices(atom_ring)
sssr.append(sorted_ring)
if symmetrized:
self._symm_sssr = tuple(sssr)
else:
self._sssr = tuple(sssr)
return sssr
Bug Description
You can see in the regression tests that we are getting non deterministic decompositions of the fused ring systems:
How To Reproduce
Just making documentation changes in a PR will show this. Could also probably run locally, but I haven't tried this yet.
Expected Behavior
Both ring decompositions are valid, but we just want to always get the same one.
Installation Information
Referring to the CI in particular here - if running locally, I'll update this.
Additional Context
We traded one non-determinism for another in #2796
RDKit is deterministic in the
GetSSSRstep, but that step is based on the atom identifiers, which are not sorted deterministically when converting from RMG to RDKit (I think). This is some AI-assisted code which I think might solve the problem, and I will (eventually) open a PR to test: