Skip to content

fix(spectrum): use 'wavelength' key in pad_spectrum sentinels#69

Merged
tom-hehir merged 1 commit into
mainfrom
fix/pad-spectrum-wavelength-sentinel
May 27, 2026
Merged

fix(spectrum): use 'wavelength' key in pad_spectrum sentinels#69
tom-hehir merged 1 commit into
mainfrom
fix/pad-spectrum-wavelength-sentinel

Conversation

@tom-hehir
Copy link
Copy Markdown
Contributor

Summary

One-character key-mismatch fix in aion/codecs/preprocessing/spectrum.py. padding_values declared "lambda" as the wavelength sentinel key, but the loop iterates over the Spectrum attribute names (flux, ivar, mask, wavelength), so the wavelength lookup misses and falls through to the default 0.

-    padding_values = {"lambda": 99999, "mask": True, "ivar": 0}
+    padding_values = {"wavelength": 99999, "mask": True, "ivar": 0}

Impact

This makes pad_spectrum produce a non-monotonic wavelength array whenever it actually appends elements (i.e. when the input is shorter than pad_length):

[real_min, …, real_max, 0, 0, …, 0]

LatentSpectralGrid.to_latent passes this straight into interp1d, whose out-of-range mask is:

mask = (xnew < x[..., 0]) | (xnew > x[..., -1])
ynew[mask] = mask_value  # default 0.0

With x[..., -1] == 0, every position of the latent grid satisfies xnew > 0, so the entire latent representation is overwritten with mask_value. The encoder then receives an all-zeros input and every spectrum maps to (essentially) the same null embedding.

The upstream test (tests/codecs/test_spectrum_codec.py) uses the abstract Spectrum base class — which has no pad_length ClassVar — so the hasattr(x, "pad_length") guard in _encode short-circuits and pad_spectrum is never exercised in CI. The bug only surfaces in downstream consumers that pass DESISpectrum / SDSSSpectrum instances of length less than 7808 / 4800 respectively. We hit it in ContrAst with DESI inputs padded to 7800 (so F.pad appended 8 elements).

Why this key

aion.modalities.Spectrum declares the field as wavelength, not lambda. Matching the attribute name is the minimal fix; the "lambda" key was dead before this change.

Test plan

  • Add a regression test that round-trips a DESISpectrum of length < 7808 through pad_spectrum and asserts the wavelength tail is 99999, not 0.
  • Optionally exercise SpectrumCodec.encode with a DESISpectrum instance to confirm the latent representation is not uniformly zeroed.

🤖 Generated with Claude Code

`pad_spectrum` looks up sentinel values in `padding_values` while
iterating over the attribute names of `Spectrum` (`flux`, `ivar`,
`mask`, `wavelength`), but the dict key is `"lambda"`, so the
`wavelength` lookup misses and the wavelength tensor is padded with
the default `0` instead of the intended `99999`.

That makes the wavelength array non-monotonic
(`[real_min, …, real_max, 0, 0, …, 0]`). `LatentSpectralGrid.to_latent`
feeds it to `interp1d`, whose out-of-range mask is
`(xnew < x[..., 0]) | (xnew > x[..., -1])`. With `x[..., -1] == 0`,
every latent grid position satisfies `xnew > 0` and the whole latent
representation is replaced with `mask_value = 0.0`. Every spectrum then
encodes to (essentially) the same null embedding.

This shows up in downstream consumers that call the spectrum codec on
inputs whose length is less than `pad_length` (so `F.pad` actually
appends elements). Switching the dict key to `"wavelength"` so it
matches the attribute name fixes the sentinel.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tom-hehir tom-hehir merged commit 3434e31 into main May 27, 2026
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant