Skip to content

RDKit GetSSSR Nondeterminism #2912

@JacksonBurns

Description

@JacksonBurns

Bug Description

You can see in the regression tests that we are getting non deterministic decompositions of the fused ring systems:

Image

How To Reproduce

Just making documentation changes in a PR will show this. Could also probably run locally, but I haven't tried this yet.

Expected Behavior

Both ring decompositions are valid, but we just want to always get the same one.

Installation Information

Referring to the CI in particular here - if running locally, I'll update this.

Additional Context

We traded one non-determinism for another in #2796

RDKit is deterministic in the GetSSSR step, but that step is based on the atom identifiers, which are not sorted deterministically when converting from RMG to RDKit (I think). This is some AI-assisted code which I think might solve the problem, and I will (eventually) open a PR to test:

def get_smallest_set_of_smallest_rings(self, symmetrized=False):
        """
        Returns the smallest set of smallest rings (SSSR) as a list of lists of Atom objects, optionally with symmetrization.

        Uses RDKit's built-in ring perception (GetSSSR) by default.
        Note that this is not the same as the Symmetrized SSSR (GetSymmSSSR) with symmetrized=True.
        The symmetrized SSSR is at least as large as the SSSR for a molecule. 
        In certain highly-symmetric cases (e.g. cubane), the symmetrized SSSR 
        can be a bit larger (i.e. the number of symmetrized rings is >= NumBonds-NumAtoms+1).
        It is usually more chemically meaningful, and is less random/arbitrary than the SSSR,
        though RMG uses SSSR for historical reasons.
        """
        if symmetrized:
            if self._symm_sssr is not None:
                return list(self._symm_sssr)
        else:
            if self._sssr is not None:
                return list(self._sssr)

        # RDKit does not support electron
        if self.is_electron():
            return []
        
        from rdkit import Chem
        
        sssr = []
        # Get the symmetric SSSR using RDKit
        rdkit_mol = self.to_rdkit_mol(remove_h=False,
                                      sanitize=False,
                                      return_mapping=False,
                                      save_order=True,
                                      ignore_bond_orders=True)

        if symmetrized:
            ring_info = Chem.GetSymmSSSR(rdkit_mol)
        else:
            # Force deterministic SSSR by canonicalizing the atom numbering first.
            # This prevents RDKit's arbitrary graph traversal from selecting 
            # different sets of equivalent rings across different runs/platforms.
            ranks = list(Chem.CanonicalRankAtoms(rdkit_mol, breakTies=True))
            rank_to_idx = {rank: idx for idx, rank in enumerate(ranks)}
            
            # new_order maps the new atom index to the original atom index
            new_order = [rank_to_idx[i] for i in range(rdkit_mol.GetNumAtoms())]
            
            canonical_mol = Chem.RenumberAtoms(rdkit_mol, new_order)
            canonical_rings = Chem.GetSSSR(canonical_mol)
            
            # Map the resulting ring indices back to the original rdkit_mol numbering
            ring_info = []
            for ring in canonical_rings:
                orig_ring = tuple([new_order[idx] for idx in ring])
                ring_info.append(orig_ring)

        for ring in ring_info:
            atom_ring = [self.atoms[idx] for idx in ring]
            sorted_ring = self.sort_cyclic_vertices(atom_ring)
            sssr.append(sorted_ring)
            
        if symmetrized:
            self._symm_sssr = tuple(sssr)
        else:
            self._sssr = tuple(sssr)
            
        return sssr

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions