Skip to content

[Code scan] pymatgen Molecule export can assign species to the wrong coordinates #994

Description

@njzjz

This issue is part of a Codex global repository code scan.

PyMatgenMoleculeFormat.to_system() builds the species list from grouped atom_names/atom_numbs, but it passes coordinates in their current atom_types order. If atoms are not grouped by type, the resulting pymatgen.core.Molecule assigns species to the wrong coordinates.

Affected code:

def to_system(self, data, **kwargs):
"""Convert System to Pymatgen Molecule obj."""
molecules = []
try:
from pymatgen.core import Molecule
except ModuleNotFoundError as e:
raise ImportError("No module pymatgen.Molecule") from e
species = []
for name, numb in zip(data["atom_names"], data["atom_numbs"]):
species.extend([name] * numb)
data = dpdata.system.remove_pbc(data)
for ii in range(np.array(data["coords"]).shape[0]):
molecule = Molecule(species, data["coords"][ii])
molecules.append(molecule)

Minimal example:

import numpy as np
import dpdata

s = dpdata.System(data={
    "atom_names": ["H", "O"],
    "atom_numbs": [2, 1],
    "atom_types": np.array([0, 1, 0]),
    "orig": np.zeros(3),
    "cells": np.eye(3).reshape(1, 3, 3) * 20,
    "coords": np.array([[[0.0, 0.0, 0.0], [9.0, 0.0, 0.0], [1.0, 0.0, 0.0]]]),
})

mol = s.to("pymatgen/molecule")[0]
print([str(sp) for sp in mol.species])

Current species construction produces grouped species H, H, O, while the coordinate order is H, O, H.

The species list should be built per atom, e.g. from data["atom_names"][tt] for tt in data["atom_types"], matching the coordinate order.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions