This issue is part of a Codex global repository code scan.
PyMatgenMoleculeFormat.to_system() builds the species list from grouped atom_names/atom_numbs, but it passes coordinates in their current atom_types order. If atoms are not grouped by type, the resulting pymatgen.core.Molecule assigns species to the wrong coordinates.
Affected code:
|
def to_system(self, data, **kwargs): |
|
"""Convert System to Pymatgen Molecule obj.""" |
|
molecules = [] |
|
try: |
|
from pymatgen.core import Molecule |
|
except ModuleNotFoundError as e: |
|
raise ImportError("No module pymatgen.Molecule") from e |
|
|
|
species = [] |
|
for name, numb in zip(data["atom_names"], data["atom_numbs"]): |
|
species.extend([name] * numb) |
|
data = dpdata.system.remove_pbc(data) |
|
for ii in range(np.array(data["coords"]).shape[0]): |
|
molecule = Molecule(species, data["coords"][ii]) |
|
molecules.append(molecule) |
Minimal example:
import numpy as np
import dpdata
s = dpdata.System(data={
"atom_names": ["H", "O"],
"atom_numbs": [2, 1],
"atom_types": np.array([0, 1, 0]),
"orig": np.zeros(3),
"cells": np.eye(3).reshape(1, 3, 3) * 20,
"coords": np.array([[[0.0, 0.0, 0.0], [9.0, 0.0, 0.0], [1.0, 0.0, 0.0]]]),
})
mol = s.to("pymatgen/molecule")[0]
print([str(sp) for sp in mol.species])
Current species construction produces grouped species H, H, O, while the coordinate order is H, O, H.
The species list should be built per atom, e.g. from data["atom_names"][tt] for tt in data["atom_types"], matching the coordinate order.
This issue is part of a Codex global repository code scan.
PyMatgenMoleculeFormat.to_system()builds the species list from groupedatom_names/atom_numbs, but it passes coordinates in their currentatom_typesorder. If atoms are not grouped by type, the resultingpymatgen.core.Moleculeassigns species to the wrong coordinates.Affected code:
dpdata/dpdata/plugins/pymatgen.py
Lines 61 to 75 in a7a50bf
Minimal example:
Current species construction produces grouped species
H, H, O, while the coordinate order isH, O, H.The species list should be built per atom, e.g. from
data["atom_names"][tt] for tt in data["atom_types"], matching the coordinate order.