Skip to content

[Code scan] ABACUS MD can fabricate zero force labels when FORCE is absent #1000

Description

@njzjz

This issue is part of a Codex global repository code scan.

The ABACUS MD dump reader initializes forces as a zero array for all frames and atoms, fills it only when the dump header contains FORCE, and later always assigns data["forces"] = force for LabeledSystem output. If an ABACUS MD dump contains positions and energies but no force columns, dpdata can return valid-looking zero forces instead of omitting forces or reporting that force labels are unavailable.

Affected code:

check_line = 10
assert "POSITION" in dumplines[check_line], (
"keywords 'POSITION' cannot be found in the 6th line. Please check."
)
if "FORCE" in dumplines[check_line]:
calc_force = True
nframes_dump = -1
if calc_stress:
nframes_dump = int(nlines / (total_natoms + 13))
else:
nframes_dump = int(nlines / (total_natoms + 9))
assert nframes_dump > 0, (
"Number of lines in MD_dump file = %d. Number of atoms = %d. The MD_dump file is incomplete." # noqa: UP031
% (nlines, total_natoms)
)
cells = np.zeros([nframes_dump, 3, 3])
stresses = np.zeros([nframes_dump, 3, 3])
forces = np.zeros([nframes_dump, total_natoms, 3])

for iat in range(total_natoms):
# INDEX LABEL POSITION (Angstrom) FORCE (eV/Angstrom) VELOCITY (Angstrom/fs)
# 0 Sn 0.000000000000 0.000000000000 0.000000000000 -0.000000000000 -0.000000000001 -0.000000000001 0.001244557166 -0.000346684288 0.000768457739
# 1 Sn 0.000000000000 3.102800034079 3.102800034079 -0.000186795145 -0.000453823768 -0.000453823768 0.000550996187 -0.000886442775 0.001579501983
# for abacus version >= v3.1.4, the value of POSITION is the real cartessian position, and unit is angstrom, and if cal_force the VELOCITY is added at the end.
# for abacus version < v3.1.4, the real position = POSITION * celldm
coords[iframe, iat] = np.array(
[float(i) for i in dumplines[iline + skipline + iat].split()[2:5]]
)
if not newversion:
coords[iframe, iat] *= celldm
if calc_force:
forces[iframe, iat] = np.array(
[
float(i)
for i in dumplines[iline + skipline + iat].split()[5:8]
]
)

for iframe in range(ndump):
stress[iframe] *= np.linalg.det(cells[iframe, :, :].reshape([3, 3]))
if np.sum(np.abs(stress[0])) < 1e-10:
stress = None
magmom, magforce = get_mag_force(outlines)
data["cells"] = cells
# for idx in range(ndump):
# data['cells'][:, :, :] = cell
data["coords"] = coords
data["energies"] = energy
data["forces"] = force
data["virials"] = stress
if not isinstance(data["virials"], np.ndarray):
del data["virials"]
data["orig"] = np.zeros(3)

Problematic flow:

  • calc_force defaults to False.
  • forces = np.zeros([nframes_dump, total_natoms, 3]) is allocated before parsing.
  • The parser only overwrites forces[...] inside if calc_force:.
  • The labeled data dict always receives data["forces"] = force.

Expected behavior: if the dump does not contain force columns, the parser should not create zero force labels. It should either omit forces when returning unlabeled-compatible data, or fail clearly when the caller requests a LabeledSystem that requires forces for this format.

Metadata

Metadata

Assignees

No one assigned

    Labels

    abacusbugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions