fix(spectrum): use 'wavelength' key in pad_spectrum sentinels#69
Merged
Conversation
`pad_spectrum` looks up sentinel values in `padding_values` while iterating over the attribute names of `Spectrum` (`flux`, `ivar`, `mask`, `wavelength`), but the dict key is `"lambda"`, so the `wavelength` lookup misses and the wavelength tensor is padded with the default `0` instead of the intended `99999`. That makes the wavelength array non-monotonic (`[real_min, …, real_max, 0, 0, …, 0]`). `LatentSpectralGrid.to_latent` feeds it to `interp1d`, whose out-of-range mask is `(xnew < x[..., 0]) | (xnew > x[..., -1])`. With `x[..., -1] == 0`, every latent grid position satisfies `xnew > 0` and the whole latent representation is replaced with `mask_value = 0.0`. Every spectrum then encodes to (essentially) the same null embedding. This shows up in downstream consumers that call the spectrum codec on inputs whose length is less than `pad_length` (so `F.pad` actually appends elements). Switching the dict key to `"wavelength"` so it matches the attribute name fixes the sentinel. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
One-character key-mismatch fix in
aion/codecs/preprocessing/spectrum.py.padding_valuesdeclared"lambda"as the wavelength sentinel key, but the loop iterates over theSpectrumattribute names (flux,ivar,mask,wavelength), so the wavelength lookup misses and falls through to the default0.Impact
This makes
pad_spectrumproduce a non-monotonic wavelength array whenever it actually appends elements (i.e. when the input is shorter thanpad_length):LatentSpectralGrid.to_latentpasses this straight intointerp1d, whose out-of-range mask is:With
x[..., -1] == 0, every position of the latent grid satisfiesxnew > 0, so the entire latent representation is overwritten withmask_value. The encoder then receives an all-zeros input and every spectrum maps to (essentially) the same null embedding.The upstream test (
tests/codecs/test_spectrum_codec.py) uses the abstractSpectrumbase class — which has nopad_lengthClassVar — so thehasattr(x, "pad_length")guard in_encodeshort-circuits andpad_spectrumis never exercised in CI. The bug only surfaces in downstream consumers that passDESISpectrum/SDSSSpectruminstances of length less than 7808 / 4800 respectively. We hit it in ContrAst with DESI inputs padded to 7800 (soF.padappended 8 elements).Why this key
aion.modalities.Spectrumdeclares the field aswavelength, notlambda. Matching the attribute name is the minimal fix; the"lambda"key was dead before this change.Test plan
DESISpectrumof length < 7808 throughpad_spectrumand asserts the wavelength tail is99999, not0.SpectrumCodec.encodewith aDESISpectruminstance to confirm the latent representation is not uniformly zeroed.🤖 Generated with Claude Code