This issue is part of a Codex global repository code scan.
The ABACUS MD dump reader initializes forces as a zero array for all frames and atoms, fills it only when the dump header contains FORCE, and later always assigns data["forces"] = force for LabeledSystem output. If an ABACUS MD dump contains positions and energies but no force columns, dpdata can return valid-looking zero forces instead of omitting forces or reporting that force labels are unavailable.
Affected code:
|
check_line = 10 |
|
assert "POSITION" in dumplines[check_line], ( |
|
"keywords 'POSITION' cannot be found in the 6th line. Please check." |
|
) |
|
if "FORCE" in dumplines[check_line]: |
|
calc_force = True |
|
|
|
nframes_dump = -1 |
|
if calc_stress: |
|
nframes_dump = int(nlines / (total_natoms + 13)) |
|
else: |
|
nframes_dump = int(nlines / (total_natoms + 9)) |
|
assert nframes_dump > 0, ( |
|
"Number of lines in MD_dump file = %d. Number of atoms = %d. The MD_dump file is incomplete." # noqa: UP031 |
|
% (nlines, total_natoms) |
|
) |
|
cells = np.zeros([nframes_dump, 3, 3]) |
|
stresses = np.zeros([nframes_dump, 3, 3]) |
|
forces = np.zeros([nframes_dump, total_natoms, 3]) |
|
for iat in range(total_natoms): |
|
# INDEX LABEL POSITION (Angstrom) FORCE (eV/Angstrom) VELOCITY (Angstrom/fs) |
|
# 0 Sn 0.000000000000 0.000000000000 0.000000000000 -0.000000000000 -0.000000000001 -0.000000000001 0.001244557166 -0.000346684288 0.000768457739 |
|
# 1 Sn 0.000000000000 3.102800034079 3.102800034079 -0.000186795145 -0.000453823768 -0.000453823768 0.000550996187 -0.000886442775 0.001579501983 |
|
# for abacus version >= v3.1.4, the value of POSITION is the real cartessian position, and unit is angstrom, and if cal_force the VELOCITY is added at the end. |
|
# for abacus version < v3.1.4, the real position = POSITION * celldm |
|
coords[iframe, iat] = np.array( |
|
[float(i) for i in dumplines[iline + skipline + iat].split()[2:5]] |
|
) |
|
|
|
if not newversion: |
|
coords[iframe, iat] *= celldm |
|
|
|
if calc_force: |
|
forces[iframe, iat] = np.array( |
|
[ |
|
float(i) |
|
for i in dumplines[iline + skipline + iat].split()[5:8] |
|
] |
|
) |
|
for iframe in range(ndump): |
|
stress[iframe] *= np.linalg.det(cells[iframe, :, :].reshape([3, 3])) |
|
if np.sum(np.abs(stress[0])) < 1e-10: |
|
stress = None |
|
|
|
magmom, magforce = get_mag_force(outlines) |
|
|
|
data["cells"] = cells |
|
# for idx in range(ndump): |
|
# data['cells'][:, :, :] = cell |
|
data["coords"] = coords |
|
data["energies"] = energy |
|
data["forces"] = force |
|
data["virials"] = stress |
|
if not isinstance(data["virials"], np.ndarray): |
|
del data["virials"] |
|
data["orig"] = np.zeros(3) |
Problematic flow:
calc_force defaults to False.
forces = np.zeros([nframes_dump, total_natoms, 3]) is allocated before parsing.
- The parser only overwrites
forces[...] inside if calc_force:.
- The labeled data dict always receives
data["forces"] = force.
Expected behavior: if the dump does not contain force columns, the parser should not create zero force labels. It should either omit forces when returning unlabeled-compatible data, or fail clearly when the caller requests a LabeledSystem that requires forces for this format.
This issue is part of a Codex global repository code scan.
The ABACUS MD dump reader initializes
forcesas a zero array for all frames and atoms, fills it only when the dump header containsFORCE, and later always assignsdata["forces"] = forceforLabeledSystemoutput. If an ABACUS MD dump contains positions and energies but no force columns, dpdata can return valid-looking zero forces instead of omitting forces or reporting that force labels are unavailable.Affected code:
dpdata/dpdata/formats/abacus/md.py
Lines 57 to 75 in a7a50bf
dpdata/dpdata/formats/abacus/md.py
Lines 105 to 124 in a7a50bf
dpdata/dpdata/formats/abacus/md.py
Lines 198 to 214 in a7a50bf
Problematic flow:
calc_forcedefaults toFalse.forces = np.zeros([nframes_dump, total_natoms, 3])is allocated before parsing.forces[...]insideif calc_force:.data["forces"] = force.Expected behavior: if the dump does not contain force columns, the parser should not create zero force labels. It should either omit
forceswhen returning unlabeled-compatible data, or fail clearly when the caller requests aLabeledSystemthat requires forces for this format.