Skip to content

[Code scan] LAMMPS atom-style parsing misreads neutral charge and dipole data #995

Description

@njzjz

This issue is part of a Codex global repository code scan.

The LAMMPS data parser has two atom-style issues:

  1. atom_style="auto" misdetects 6-column neutral charge style without an Atoms # charge comment, because zero charges look integer-like and the heuristic chooses bond.
  2. get_spins() treats any Atoms row with at least 8 columns as spin data. For atom_style="dipole", those columns are dipole fields, so the parser registers bogus spins.

Affected code:

# If no explicit style found, try to infer from first data line
if atom_lines:
first_line = atom_lines[0].split()
num_cols = len(first_line)
# Try to match based on number of columns and content patterns
# This is a heuristic approach
if num_cols == 5:
# Could be atomic style: atom-ID atom-type x y z
return "atomic"
elif num_cols == 6:
# Could be charge or bond/molecular style
# Try to determine if column 2 (index 2) looks like a charge (float) or type (int)
try:
val = float(first_line[2])
# If it's a small float, likely a charge
if abs(val) < 10 and val != int(val):
return "charge"
else:
# Likely molecule ID (integer), so bond/molecular style
return "bond"
except ValueError:
return "atomic" # fallback
elif num_cols == 7:
# Could be full style: atom-ID molecule-ID atom-type charge x y z
return "full"
elif num_cols >= 8:
# Could be dipole or sphere style
# For now, default to dipole if we have enough columns
return "dipole"

def get_spins(lines: list[str], atom_style: str = "atomic") -> np.ndarray | None:
atom_lines = get_atoms(lines)
if len(atom_lines[0].split()) < 8:
return None
spins_ori = []
spins_norm = []
for ii in atom_lines:
iis = ii.split()
spins_ori.append([float(jj) for jj in iis[5:8]])
spins_norm.append([float(iis[-1])])
return np.array(spins_ori) * np.array(spins_norm)

# Add charges if the atom style supports them
charges = get_charges(lines, atom_style=atom_style)
if charges is not None:
system["charges"] = np.array([charges])
spins = get_spins(lines, atom_style=atom_style)
if spins is not None:
system["spins"] = np.array([spins])

Neutral charge-style reproducer:

import tempfile
import dpdata

content = """2 atoms
2 atom types
0.0 2.0 xlo xhi
0.0 2.0 ylo yhi
0.0 2.0 zlo zhi

Atoms

1 1 0.0 0.0 0.0 0.0
2 2 0.0 1.0 1.0 1.0
"""

with tempfile.NamedTemporaryFile("w", suffix=".lmp") as f:
    f.write(content)
    f.flush()
    dpdata.System(f.name, fmt="lammps/lmp", type_map=["O", "H"])

Current behavior:

ValueError: invalid literal for int() with base 10: '0.0'

Dipole-style reproducer:

content = """1 atoms
1 atom types
0.0 10.0 xlo xhi
0.0 10.0 ylo yhi
0.0 10.0 zlo zhi

Atoms # dipole

1 1 0.0 1.0 2.0 3.0 0.1 0.2 0.3
"""

Loading this with atom_style="dipole" currently produces both charges and a bogus spins array. Dipole fields should not be registered as spins.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions