feat(imaging): 2D resonance imaging pipeline (#172)#238
Draft
feat(imaging): 2D resonance imaging pipeline (#172)#238
Conversation
* feat(imaging): implement hyperspectral data loader (Issue #175) Implemented Phase 1 of 2D resonance imaging: hyperspectral data loader for batch SAMMY fitting across detector pixels. Changes: - Created pleiades.imaging package - Implemented PixelSpectrum pydantic model with CSV export - Implemented HyperspectralData model for 3D transmission data - Implemented HyperspectralLoader using tifffile (scitiff) - Added ImagingConfig for material properties - Comprehensive unit tests (23 tests, all passing) Features: - Load multi-page TIFF with tifffile.imread() - Pixel iteration: iter_pixels(roi=None) - Individual pixel extraction: get_pixel(row, col) - Energy axis handling (provided or placeholder) - Uncertainty estimation (Poisson approximation) Testing: - Unit tests for models (validation, array checks) - Unit tests for loader (TIFF loading, pixel iteration, ROI) - Integration test with LANL-ORNL_example.tif (500×256×256) - All 23 tests passing Physics validation: - Transmission values validated (0-1 range) - Array shapes enforced via pydantic validators - Uncertainty generation for missing data Next phase: Batch fitting orchestrator (Issue #173) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * feat(imaging): add support for directory and NeXus data loading Extends HyperspectralLoader to support three data formats: 1. Single multi-page TIFF (original) 2. Multiple sorted TIFFs from directory (new) 3. NeXus/HDF5 histogram data (new) Changes: - Renamed 'filepath' attribute to 'source' for generality - Added from_directory() classmethod with sorted file loading - Added from_nexus() classmethod with HDF5 support - Implemented mode-based loading dispatch (_load_mode) - Added comprehensive tests for all three formats - Updated existing tests for renamed attribute Testing: - 30/30 tests passing - Added TestLoaderFromDirectory (4 tests) - Added TestLoaderFromNexus (3 tests) - Ruff checks and formatting passed Addresses requirement to support all common neutron imaging data formats for 2D resonance imaging workflow. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(imaging): correct ImagingConfig dictionary keys and natural abundances Fixes two P1 priority issues in ImagingConfig: Issue 1: Emit correct minimum-energy key for INP generation - Changed "min_energy" to "min_energy_eV" in get_material_properties() - InpManager expects "min_energy_eV" key (verified at inp_manager.py:225) - Previously, configured lower energy bound was silently ignored, defaulting to 0.001 eV Issue 2: Return real natural abundances instead of all ones - get_abundances() now queries IsotopeManager for actual natural abundance data - Converts from percent (database format) to fractions for JsonManager - Previously returned [1.0, ...] for multi-isotope cases, creating non-physical compositions (sum > 1) instead of natural isotope fractions Testing: - Added comprehensive test suite (7 tests) for ImagingConfig - Tests verify correct dictionary keys - Tests verify natural abundance lookup for single and multi-isotope cases - Tests verify abundances sum to ≤ 1.0 (physically meaningful) - Tests verify custom abundance handling - 37/37 imaging tests passing Both issues would have caused incorrect SAMMY fits for 2D imaging workflow. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(imaging): resolve FitResults import and add P0 critical validators * fix(imaging): resolve uncertainty floor, frame sorting, and test path issues * fix: remove CLAUDE.md from tracking and relax transmission validation CRITICAL: Remove CLAUDE.md from git tracking - CLAUDE.md is local user configuration and should not be in repo - Added to .gitignore to prevent future tracking - Local file remains for current developer use P1: Accept non-ideal normalized transmission values - Relaxed PixelSpectrum.transmission validation to [-0.05, 1.05] - Accommodates real neutron transmission data with normalization artifacts - Added tests for tolerance acceptance and rejection P2: Enforce consistent spatial shapes in Imaging2DResults - Added validation that abundance_maps, chi_squared_map, success_mask, and source_hyperspectral all have matching (height, width) dimensions - Prevents internally inconsistent result objects - Added comprehensive tests for shape validation All 47 imaging tests pass. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
…n (Issue #173) (#224) * feat(imaging): implement batch fitting orchestrator for parallel SAMMY execution Add BatchFittingOrchestrator for parallelized resonance fitting across hyperspectral imaging pixels (262,144+ pixels for 512×512 images). Features: - Parallel execution via ProcessPoolExecutor with configurable workers - Checkpoint/resume capability for long-running jobs - Comprehensive validation of physics parameters - Progress tracking and detailed logging - Robust error handling for failed pixels Implementation: - _fit_pixel_worker: Complete SAMMY workflow per pixel in subprocess (CSV export → .twenty conversion → JSON config → .inp creation → SAMMY execution → results parsing) - BatchFittingOrchestrator: Queue management, checkpointing, aggregation - CheckpointData: Serializable checkpoint state with config validation Validation: - Validates all physics parameters on checkpoint resume (isotopes, density, thickness, atomic_mass, temperature) to prevent scientifically invalid results - Validates file existence (SAMMY executable, resolution file) - Handles edge cases (empty pixel lists, missing files) Testing: - 15 unit tests passing (4 skipped due to ProcessPoolExecutor pickling) - Worker function tested with mocked SAMMY components - Comprehensive checkpoint validation tests - Edge case coverage (empty lists, file validation, config mismatches) Review: - Iteration 1: C+ grade (10 critical issues identified) - Iteration 2: A- grade (all issues resolved, production-ready) - Ruff check/format: passed Closes #173 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(imaging): add comprehensive checkpoint validation and resume safety checks Address P1 and P2 issues identified in code review: P2 (High Priority): Reject resume mode without checkpoint path - Add validation to fail fast when resume=True but checkpoint_file is None - Previously silently performed full fresh run, wasting compute time - Now raises ValueError with clear message - Test: test_fit_pixels_resume_without_checkpoint P1 (Critical): Validate all fit-driving config fields on checkpoint resume - Add validation for energy bounds (min_energy_eV, max_energy_eV) - Add validation for abundance settings (natural_abundances, custom_abundances) - Previously only validated subset of ImagingConfig fields - Resuming with different energy/abundance would mix old pixels (fit under prior settings) with new pixels (fit under new settings), yielding scientifically inconsistent batch results - Tests: - test_load_checkpoint_min_energy_mismatch - test_load_checkpoint_max_energy_mismatch - test_load_checkpoint_natural_abundances_mismatch - test_load_checkpoint_custom_abundances_mismatch Impact: Prevents silent data corruption in long-running imaging jobs Testing: 20 tests passing (24 total, 4 skipped) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(imaging): address Copilot feedback on orchestrator Fixes two inconsistencies identified in code review: 1. Test Consistency (test_orchestrator.py:183) - Mark test_fit_pixels_checkpoint_save as skipped with same reason as other ProcessPoolExecutor tests - ProcessPoolExecutor cannot pickle mocked _fit_pixel_worker function - Now 5 skipped tests instead of 4, all consistently documented 2. Exception Handling Checkpoint Tracking (orchestrator.py:314-320) - Increment iterations_since_checkpoint in exception handler - Apply checkpoint logic even for failed pixels - Ensures consistent checkpoint timing regardless of success/failure - Prevents checkpoint starvation when consecutive failures occur Impact: More consistent test suite and reliable checkpoint behavior under exception conditions. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * feat(imaging): enhance checkpoint validation for pixel fitting resume functionality * feat(imaging): enhance batch fitting orchestrator with shared input staging and duplicate coordinate checks --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
…rchestrator (#178) Add ProgressReporter (tqdm-based), GracefulShutdownHandler (two-stage SIGINT/SIGTERM), and _worker_initializer for subprocess signal isolation. Key changes: - ProgressReporter: wraps tqdm with success/failure counts in postfix, context manager support, tqdm.write() for clean log output - GracefulShutdownHandler: first signal sets flag for graceful shutdown, second signal raises SystemExit for emergency exit - _worker_initializer: sets SIG_IGN in child processes so only parent handles signals (prevents BrokenProcessPool on Ctrl+C) - Atomic checkpoint writes: write to .tmp, then Path.replace() for crash-safe saves - Shutdown check placed AFTER processing current future to avoid dropping already-completed results - Simplified config validation: single Pydantic equality check with detailed mismatch reporting replaces 9 field-by-field comparisons - Type validation on checkpoint load (isinstance check) - Partial results handling: missing pixels filled with shutdown failure placeholders Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- P1: Add tqdm as runtime dependency in pyproject.toml - P1: Drain completed futures before breaking on shutdown to avoid losing already-finished results that haven't been yielded yet - P2: Guard signal handler registration to main thread to prevent ValueError when fit_pixels() is called from non-main threads Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Running futures cannot be cancelled by Future.cancel(), so they complete during ProcessPoolExecutor.__exit__. The drain loop now waits for all non-cancelled futures (including still-running ones) instead of only checking already-done futures with timeout=0. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove the pixi-version: v0.41.4 pin from unittest.yaml and package.yaml. setup-pixi defaults to latest when omitted, which prevents the lock file format from drifting between local dev and CI. Dependabot cannot update with: input parameters, so the pin was stuck at v0.41.4 for 10 months. Also regenerates pixi.lock for the tqdm dependency added to pyproject.toml. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Shutdown drain only collects already-done futures (no blocking on running ones), so graceful shutdown returns promptly - ProgressReporter docstring no longer overclaims about routing output through tqdm.write() - Removed dead code from checkpoint test (unused spy function and variables) and renamed to match what it actually verifies Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
On shutdown, the loop now only cancels pending futures and breaks. Running futures complete during ProcessPoolExecutor.__exit__ (which blocks regardless). A post-executor drain pass then collects their results into the completed dict before the final checkpoint is saved. This avoids both blocking inside the signal handler and losing results from futures that were running when Ctrl+C arrived. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat(imaging): add progress tracking and graceful shutdown to batch orchestrator (#178)
…ng results (#176) Add ResultsAggregator class that builds spatially-resolved isotope abundance maps from per-pixel SAMMY fit results, and load_hdf5() classmethod on Imaging2DResults for round-trip HDF5 persistence. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…dances Map abundances by isotope name rather than positional index to prevent silent mislabeling when SAMMY returns isotopes in a different order. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use assert/assert not instead of identity checks against np.True_/np.False_ - Fix metadata bytes decoding: check bytes/np.bytes_ before .item() to ensure string attrs are consistently decoded to str Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat(imaging): ResultsAggregator and HDF5 round-trip (#176)
…gic to orchestrator (#177) Add three features to BatchFittingOrchestrator for robust parallel SAMMY execution: - Backend abstraction: `backend` and `backend_kwargs` params allow switching between local and Docker SAMMY backends via lightweight `_create_sammy_runner()` factory - Per-job timeout: `timeout_per_job` param marks slow pixels as failed with best-effort future cancellation; uses sequential iteration (documented performance trade-off) - Retry logic: `max_retries` param re-submits failed pixels with exhausted-retry annotation; shutdown-aware with proper post-executor drain of all future maps Also extracts `_maybe_checkpoint()` helper to eliminate duplicated checkpoint code, adds `checkpoint_interval` validation, and improves type hints throughout. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r for retries
Address two Copilot review findings:
P1: Replace sequential future iteration with wait(FIRST_COMPLETED) loop.
Only futures that are actively running() past the deadline are marked
as timed out. Queued futures that have not started yet are NOT falsely
timed out due to queue delay. Includes a stall safety net to bail out
if workers are all occupied by timed-out zombie tasks.
P2: Each retry round now uses a fresh ProcessPoolExecutor so that
timed-out workers from the previous round (which cannot be killed)
do not occupy worker slots and starve retry submissions.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add missing `import time` for `time.monotonic()` in timeout tracking - Track per-future start times via `running_start_times` dict so only actively running futures are timed out (not queued ones) - Replace executor context manager with explicit `shutdown(wait=False, cancel_futures=True)` to avoid blocking on timed-out zombie workers - Validate `exe=None` in `_create_sammy_runner` for local backend - Update `_create_sammy_runner` docstring to match actual behavior - Track actual attempt counts per pixel for accurate retry annotation - Remove unused `test_pixel` fixture and update test module docstring - Fix test mock setup for explicit executor (no context manager) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
P1: Remove Docker/Nova backend support from orchestrator - Docker backend currently incompatible with SammyFilesMultiMode (expects files.parameter_file which JSON mode doesn't provide) - Remove backend/backend_kwargs params from orchestrator and worker - Remove _create_sammy_runner factory; use LocalSammyRunner directly - Make sammy_executable required (non-Optional) - Remove 14 Docker/backend-specific tests P2: Fix timeout clock to start when futures begin running - Record running_start_times for already-running futures BEFORE the first wait() call, so their timeout clock starts immediately rather than being deferred by one timeout window Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat(imaging): backend abstraction, per-job timeout, and retry logic (#177)
Implement centralized temp file management for 262K+ SAMMY pixel jobs with workspace isolation, disk monitoring, and configurable cleanup. - TempFileManager with immediate/batch/manual cleanup policies - Disk usage enforcement via consumed-space tracking (max_disk_usage_gb) - Thread-safe active workspace tracking, path traversal protection - Integrated into BatchFittingOrchestrator (backward compatible) - 73 tests (67 unit + 6 integration), all 220 imaging tests pass Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
P1: Include attempt_id in worker workspace directory names
(pixel_{row}_{col}_a{attempt_id}) so retries don't collide with
timed-out zombie workers from previous attempts.
P2: Add _check_worker_disk_space() to enforce configured disk limits
during worker workspace creation. Workers now receive max_disk_usage_gb
and initial_free_gb from the orchestrator and check before creating
directories, preventing bypass of TempFileManager limits.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…p and cleanup fit_pixels now wraps batch execution in the TempFileManager context via ExitStack. This fixes two issues: P1: _initial_free_gb is now recorded by __enter__ before workers are submitted, so the consumed-space disk cap (max_disk_usage_gb) is actually enforced in worker subprocesses. P2: TempFileManager.__exit__ now runs after the batch completes, so non-manual cleanup policies (immediate, batch) properly remove per-pixel workspaces and the base directory instead of leaking them. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
C1: Expose public `initial_free_gb` property on TempFileManager so
orchestrator doesn't access private `_initial_free_gb`.
C2: Harden job_workspace to reject '.' and '..' job IDs (path traversal).
C3: Harden shared_workspace with the same single-component validation.
C4: Use tempfile.mkdtemp for default base_dir to avoid cross-run
collisions and stale directory reuse.
C5: Log immediate cleanup failures instead of silently ignoring via
ignore_errors=True.
C6: Import _BYTES_PER_GB from temp_manager in _check_worker_disk_space
instead of duplicating 1024**3.
C7: Create temp_base_dir before calling shutil.disk_usage in the worker
so the check doesn't fail with FileNotFoundError.
C8: Remove stale workspace directories before creating new ones to
prevent contamination from crashed runs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat(imaging): add TempFileManager for batch temp file lifecycle
…179) Add generator for extracting individual isotope abundance maps and quality metric maps from Imaging2DResults, and visualizer for matplotlib plotting of single/multi-isotope maps and quality overlays. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat(imaging): add AbundanceMapGenerator and AbundanceMapVisualizer (#179)
#180) 58 validation tests covering uniform/gradient/multi-isotope/boundary samples, failed pixel handling, HDF5 round-trip, and full pipeline integration (TIFF → loader → aggregator → generator → visualizer). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move matplotlib.use("Agg") before importing pyplot to ensure backend
is set before any GUI initialization
- Remove misleading patch context in test_pipeline_mocked_orchestrator_fit_pixels
that wrapped code never calling the patched method
- Remove unused unittest.mock.patch import
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
test(imaging): add end-to-end validation suite for 2D imaging pipeline (#180)
…ing (#181) (#232) * feat(imaging): add high-level analyze_imaging API (#181) Single-function entry point that wires together all imaging pipeline components: loader → orchestrator → aggregator → optional HDF5 save. Includes input validation, parameter forwarding, and 43 unit tests covering wiring, parameter passing, pipeline ordering, and edge cases. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs(imaging): add 2D imaging tutorial notebook Demonstrates the full pipeline using the test TIFF stack with synthetic fit results: data loading, pixel iteration, configuration, aggregation, abundance map visualization, quality overlays, HDF5 round-trip, and the high-level analyze_imaging() API reference. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs(imaging): rewrite 2D imaging notebook with real SAMMY fitting Replace synthetic-only demo with actual pipeline demonstration: - Locate SAMMY executable (PATH or known locations) - Load hyperspectral TIFF and visualize raw data - Configure ImagingConfig for Ta-181 - Run BatchFittingOrchestrator on a 4x4 ROI with real SAMMY execution - Inspect per-pixel fit results (abundances, chi-squared) - Aggregate into 2D maps with ResultsAggregator - Visualize abundance maps and quality overlays - HDF5 save/load round-trip - Demonstrate analyze_imaging() high-level API on a second ROI Requires SAMMY installation to execute. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(imaging): fix LPT parser typo and inject isotope names in worker Two bugs prevented end-to-end 2D imaging with real SAMMY fitting: 1. Typo in lpt_manager.py: `isotope_infomation` (missing 'r') caused pydantic to silently ignore the kwarg, leaving isotope_information=None on all parsed IsotopeParameters. 2. The LPT parser extracts abundances and masses but not isotope names. The aggregator needs names to build per-isotope maps. Added name injection in the orchestrator worker using ImagingConfig.isotopes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs(imaging): correct energy grid in 2D imaging notebook Reverse-engineered the energy axis for the LANL-ORNL test TIFF by matching 16 observed resonance dip positions (bin indices) to known Ta-181 ENDF resonance energies. Linear regression gives: E(eV) = 0.6613 * bin + 13.005 (R^2 = 0.99998) Equivalent to np.linspace(13.0, 343.0, 500). Validation: SAMMY chi-squared drops from 10,112,470 (wrong axis) to 25,424 (correct axis) — a 400x improvement confirming the calibration. Also corrected thickness_mm from 0.127 to 0.025 and energy range in ImagingConfig to match the calibrated grid. Cleared stale outputs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(imaging): handle single-isotope SAMMY LPT output in worker For single-isotope fits, SAMMY does not output the "Isotopic abundance and mass for each nuclide" section in the LPT file, leaving the parsed isotopes list empty. This caused the aggregator to fail with "Isotope count mismatch". The worker now builds IsotopeParameters from ImagingConfig when the LPT parser returns no isotopes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs(imaging): add energy calibration analysis to 2D imaging notebook Added reverse-engineered energy grid (13-343 eV) derived by matching resonance dip positions to known Ta-181 ENDF energies. Updated material properties (thickness 0.025 mm) for better SAMMY fit quality. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs(imaging): update notebook to two-isotope Ta-181 + U-235 config Identified U-235 as the second isotope in the LANL-ORNL test data by matching all 28 observed resonance dips against ENDF databases. Ta-181 alone explains 22 dips; the remaining 6 (at 53, 71, 73, 154, 299, 320 eV) uniquely match U-235 resonances. Natural W and U-238 leave 3-5 dips unexplained. Updated ImagingConfig to use ["Ta-181", "U-235"] with 50/50 starting abundances and all visualization cells for multi-isotope display. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(imaging): enable SAMMY abundance adjustment for isotope fitting The SAMMY JSON config was created with adjust="false" for all isotopes, meaning SAMMY treated abundances as fixed and never optimized them. Added _enable_abundance_adjustment() helper that patches the JSON config to set adjust="true" so SAMMY fits the abundance ratio at each pixel. Applied in both the shared staging path and worker fallback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(imaging): fix plot_multi_isotope API call in notebook Removed invalid cmap kwarg from plot_multi_isotope() calls — this parameter is only available on plot_single_isotope(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * revert(imaging): remove incorrect abundance adjust flag change The JSON config "adjust" flag controls whether SAMMY modifies ENDF resonance parameters, NOT isotopic abundances. For resonance imaging we must never adjust nuclear cross-sections from ENDF. Reverts the _enable_abundance_adjustment() helper added in 95effb2. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(sammy): add two-pass abundance fitting strategy Implements a two-pass SAMMY execution strategy for accurate isotope abundance fitting. Pass 1 (JSON mode) fits global thickness with fixed abundances; Pass 2 (traditional mode with IFLISO=1) fits per-isotope abundances using the thickness from Pass 1. Key changes: - Add _move_broadening_inp_to_par() to resolve SAMMY broadening parameter conflicts between INP and PAR files - Add _enable_abundance_fitting_in_par() to set IFLISO flags - Wire fit_abundances flag through ImagingConfig, SammyFilesMultiMode, and BatchFittingOrchestrator - Add unit tests for new helper functions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(imaging): add stride parameter and update notebook for larger regions Add spatial stride to iter_pixels() and analyze_imaging() for downsampled iteration over large images. stride=8 on a 256x256 image fits a 32x32 grid (1024 pixels) instead of all 65536. Notebook changes: - Increase n_workers to min(8, cpu_count) for better parallelism - Use full-image with stride=8 instead of small 4x4 ROI - Fix axes iteration bug (axes.flat instead of np.atleast_1d) - Add stride parameter documentation and performance tips Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Implement feature X to enhance user experience and optimize performance * fix(sammy): handle broadening EOF and restrict IFLISO to isotope lines Two fixes for the two-pass SAMMY helper functions: 1. _move_broadening_inp_to_par: capture broadening blocks that terminate at EOF without a trailing blank line. Previously the parser only recorded a section when it encountered a blank terminator, silently dropping data if the INP file ended mid-block. 2. _enable_abundance_fitting_in_par: positively identify isotope data lines by parsing atomic mass in cols 1-10 instead of relying on line length. This prevents overwriting columns 31-32 on spin-group continuation lines (after -1 markers) that happen to be >= 32 chars. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(sammy): fail on skipped pass-2 and reject fabricated abundances Two P1 review fixes: 1. local.py: When fit_abundances=True but SAMNDF.PAR/INP are missing after pass 1, mark the run as failed (success=False) instead of logging a warning and returning success from pass 1 alone. This prevents callers from silently getting fixed-abundance results under a mode documented to vary abundances. 2. orchestrator.py: Only use config-based isotope fallback for genuinely single-isotope fits (where SAMMY omits the isotope section in LPT). For multi-isotope fits, an empty parse result is now surfaced as a PixelFitResult failure instead of being papered over with config defaults, which would produce scientifically incorrect abundance maps. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(tests): mock disk usage in disk space sufficiency test The test_returns_none_when_space_sufficient test used real disk usage without mocking shutil.disk_usage, causing it to fail on CI runners where consumed disk exceeds the 50 GB test limit. Now mocks disk usage consistently like all other tests in the class. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(imaging): stream pixels to orchestrator and expose resolution_file Two review fixes: P1: Stream pixel spectra instead of materializing a giant list. - analyze_imaging() now passes loader.iter_pixels() directly to fit_pixels() instead of wrapping it in list(). - fit_pixels() accepts Iterable[PixelSpectrum] and materializes internally. - iter_pixels() shares the energy array across all PixelSpectrum objects (no .copy()) since workers get their own copy via pickle. This avoids ~1 GB of duplicated energy arrays for 512x512 images. P2: Expose resolution_file on analyze_imaging(). - Add resolution_file parameter to analyze_imaging() signature. - Forward it to BatchFittingOrchestrator so instrument broadening is available through the high-level API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(imaging): use numpy views instead of copies in pixel iteration Remove .copy() from transmission and uncertainty in iter_pixels() and get_pixel(). Each PixelSpectrum now holds a lightweight numpy view (~48 bytes) into the shared hyperspectral array instead of an independent copy (~4KB per array). For a 512x512 image with 500 energy bins, this reduces the materialized pixel list from ~2 GB (262K copies x 8KB) to ~52 MB (262K view headers), eliminating OOM risk on 32 GB nodes. Views are safe here because: - The base 3D array stays in memory (referenced by HyperspectralLoader and the hyperspectral variable in analyze_imaging) - No code in the parent process mutates pixel data - Workers receive pixels via pickle, which serializes just the viewed 1D slice regardless of whether it is a view or a copy Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(imaging): stream pixels via factory pattern instead of materializing Replace list(pixels) materialization in BatchFittingOrchestrator.fit_pixels() with a callable factory pattern. The factory is invoked twice: a lightweight first pass collects only (row, col) coordinates for validation/deduplication, and a second pass streams PixelSpectrum objects directly to the executor. This avoids holding the entire hyperspectral cube in memory simultaneously. - fit_pixels() now accepts Union[Callable, Iterable] for backwards compat - _submit_pixels stores (row, col) tuples instead of PixelSpectrum objects - _collect_results/_collect_results_with_timeout use coord tuples - Retry rounds re-iterate the factory for only failed coordinates - api.py passes a lambda factory instead of a materialized list - All 44 API tests and 421 imaging tests pass Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(imaging): copy shared ENDF files locally so timed-out workers survive teardown When timeout_per_job is set, timed-out workers keep running after executor.shutdown(wait=False), but the ExitStack immediately tears down the shared workspace. Workers using symlinks back into that directory would get ENOENT. Fix: copy shared ENDF files and JSON config into each worker's local temp directory so workers are fully self-contained. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(imaging): defer default TempFileManager creation until after data loading Move TempFileManager() instantiation after HyperspectralLoader.load() so that early failures (missing TIFF, bad config, etc.) don't leave orphaned temp directories on disk. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(imaging): improve stride docstring and add explicit UTF-8 encoding - Expand stride parameter docstring to clarify that output maps retain full (height, width) shape with NaN for unevaluated pixels - Add explicit encoding="utf-8" to all open() calls in local.py helper functions to avoid platform-dependent encoding issues Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(imaging): clean up default TempFileManager on downstream failure Wrap the default TempFileManager in try/except so its base_dir is removed if the orchestrator constructor, fit_pixels, or any later step raises before the manager's __exit__ runs. User-provided managers are left untouched. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(imaging): normalize multi-isotope abundances to sum to 1.0 in ResultsAggregator * chore: update test data submodule with TIFF metadata Update submodule reference to include documented composition and energy axis for LANL-ORNL_example.tif (U-235 + Pu-241, 1-50 eV). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: pin pixi dependency versions and add nova-galaxy Pin all pixi feature dependencies to explicit version ranges instead of wildcards (*). Also adds nova-galaxy to project dependencies and pins astropy and scikit-image versions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(imaging): update notebook with correct U-235/Pu-241 metadata Update the 2D imaging notebook based on collaborator metadata: - Correct isotopes: U-235 (ORNL logo) + Pu-241 (LANL logo) - Correct energy range: 1-50 eV (was 13-343 eV) - Add configure_logger(console_level="WARNING") to suppress per-pixel noise - Set stride=2 for practical runtime (~7h for 256x256) - Clear bloated cell outputs (was 1M+ log lines) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- Validate ROI coordinates in recover_image() before pixel iteration: reject negative coords, reversed ranges, and out-of-bounds values. Matches the validation in the main analyze_imaging() API. - Skip pixels with NaN or Inf in transmission/uncertainty instead of letting nnls crash with ValueError. Marks them as failed (NaN + success_mask=False) so the rest of the image still recovers. - Add 5 new tests: 3 for invalid ROI, 2 for NaN/Inf resilience. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Relax ROI validation from strict (x1 < x2) to permissive (x1 <= x2), matching the standard fitting path in _iter_hyperspectral_pixels. When bin_size > 1, analyze_imaging can remap a valid original ROI to x1 == x2 or y1 == y2 if edge pixels are cropped. The standard path treats this as an empty selection (zero pixels processed, all-NaN output). The physics recovery path now does the same instead of raising ValueError. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The function lives in pleiades.sammy.io.data_manager, not the non-existent pleiades.sammy.io.twenty_manager. Without this fix, generate_reference_spectra() raises ModuleNotFoundError at runtime. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Validate stride >= 1 in recover_image() to match analyze_imaging() - Clarify recover_pixel() docstring: coefficients are scale factors relative to reference areal density, not absolute areal densities; rename variables in docs accordingly - Wrap nnls() in try/except so solver failures mark the pixel as failed (success=False) instead of crashing the entire image recovery - Sanitize isotope_name for filesystem paths (replace /, \, space) to prevent directory traversal from user-provided config - Fix np.bool_ identity assertion in test (use `not bool(...)`) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…, test naming - Enforce dictionary must be 2-D in recover_pixel(); raise ValueError if not, preventing silent broadcast of 1-D arrays into (n, n) matrices - Validate dictionary rows match transmission length - Split bare except into ValueError (expected NNLS failures) and Exception (unexpected); log shapes at debug/warning level for diagnosis - Rename test_open_beam_pixel_skipped → test_open_beam_returns_near_zero_coefficients and clarify docstring: recover_pixel converges on open-beam with near-zero coeffs; open-beam skipping happens in recover_image Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Physics recovery uses fast per-pixel NNLS (seconds, not hours) so checkpointing is not applicable. Raise ValueError when resume=True or checkpoint_file is set with physics_recovery=True, so callers don't silently get a full recomputation expecting a resumed run. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat(imaging): TRINIDI-style physics recovery via NNLS (#184)
…189) Add new sections to the 2D imaging notebook demonstrating the sparsity-aware analysis pipeline: - Section 8: SparsityAssessor — assess data quality (L0-L4 severity) - Section 9: DataDegrader — degrade clean data to L2/L3 with side-by-side transmission plots - Section 10: SpatialBinner — 2x2 binning on L2 data with SNR comparison before/after - Section 11: PhysicsRecovery — TRINIDI-style NNLS on L3 data with synthetic reference spectra, abundance map visualization - Updated imports, parameter reference table (bin_size, physics_recovery), and summary Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…pendent seeds The preset degrade_to_level() is too mild for the high-SNR LANL-ORNL synthetic data — the assessor still reports L0 after L2/L3 presets. Use direct add_poisson_noise() with n_incident=3 (→ L2) and n_incident=2 (→ L3) to actually reach the advertised severity levels. Use different RNG seeds (42 and 123) for L2 and L3 so the noise realisations are truly independent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… param table - Use PhysicsRecovery(imaging_config=config, sammy_executable=...) instead of __new__() to avoid relying on internal implementation - Define STRIDE_NNLS once and share between recover_image() call and visualization to prevent desync - Add missing parameters to analyze_imaging() reference table: directory_pattern, resolution_file, checkpoint_interval Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat(notebook): sparsity-aware demo sections (#189)
In forward-model mode (DO NOT SOLVE BAYES EQUATIONS), SAMMY populates the "Zeroth-order theoretical transmission" column with the model prediction but leaves the "Final theoretical transmission" column as all zeros (since no Bayes solving occurred). _run_sammy_forward_model() was reading the "Final" column, producing all-zero reference spectra that made NNLS recovery produce garbage. Fix: read the "Zeroth-order" column (which has the actual R-matrix model prediction) with fallback to "Final" for non-forward-model use. Also update notebook Section 11 to use real SAMMY-generated reference spectra via generate_reference_spectra() instead of synthetic Gaussians. The demo now produces correct abundance maps matching the per-pixel SAMMY results in ~2s total (vs ~1h for per-pixel fitting). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ry (#185) Implements NMFRecovery class that decomposes hyperspectral transmission data via non-negative matrix factorisation in the attenuation domain A = -ln(T). Following the Fast Hyperspectral Reconstruction approach (Venkatakrishnan et al., IEEE 2025), NMF simultaneously denoises and decomposes the image into isotopic abundance maps. Key advantages over per-pixel NNLS: - No SAMMY executable required (learns basis from data) - Processes full image at once (no stride needed) - Low-rank constraint filters spectral noise automatically - Works on severely degraded data (L2-L4) where NNLS fails New files: - nmf_recovery.py: NMFRecovery class with decompose(), denoise(), and label_components() methods - test_nmf_recovery.py: 17 unit tests Updated: - __init__.py: export NMFRecovery - Notebook: add Section 12 with NMF demo on clean and degraded data, denoising comparison, method comparison table in summary Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…e, use Hungarian matching - Replace non-finite attenuation values with zero before NMF init so one bad pixel doesn't collapse the entire decomposition (nanmean for init, zero-fill for updates) - Move spectral_basis into result.metadata["spectral_basis"] so label_components reads basis tied to the specific result, not mutable instance state. Make label_components a @staticmethod. - Replace greedy component-to-reference assignment with scipy.optimize.linear_sum_assignment (Hungarian algorithm) for globally optimal matching with 3+ components - Update tests and notebook cells to use new API Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… in HDF5 - Replace NaN correlation values (from zero-variance basis vectors) with -1.0 before linear_sum_assignment so label_components doesn't crash on degenerate/open-beam components - Save numpy arrays in metadata to HDF5 as datasets (alongside scalar attrs) so spectral_basis survives save/reload round-trip - Load array datasets back from metadata group on HDF5 reload Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat(imaging): NMF spectral decomposition for noise-robust recovery (#185)
Add denoise_nmf parameter to PhysicsRecovery.recover_image() and analyze_imaging() that applies NMF low-rank denoising before per-pixel NNLS recovery. This exploits spatial redundancy across all pixels to filter spectral noise while preserving resonance structure, combining NMF's noise robustness with NNLS's physical accuracy from SAMMY reference spectra. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ill timed-out slots Two bugs in _collect_results_with_timeout: 1. The stall_rounds bailout cancelled all pending futures after 2 timeout windows without progress, even when no futures had started running yet. During slow ProcessPoolExecutor startup, this killed pixels that were never attempted. Fix: only increment stall_rounds when at least one future is in running_start_times. 2. Timed-out futures were removed from pending but submit_fn() was never called to replace the freed slot. With bounded submission, each timeout permanently shrank the in-flight window, causing remaining pixels to never be submitted. Fix: call submit_fn() after each timed-out future, mirroring the existing refill logic for completed futures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lock queue When all running futures time out, their worker processes keep running (ProcessPoolExecutor cannot kill in-flight tasks). Replacement futures stay queued forever because no worker slot is available. The previous fix (only advance stall_rounds when futures are currently running) prevented this case from ever bailing out, causing fit_pixels() to hang. Fix: track whether any worker has ever been observed running. Advance the stall counter when workers_have_started is True even if no pending future is currently running — this correctly identifies zombie-occupied pools while still protecting against spurious cancellation during genuine slow pool startup (where workers_have_started remains False). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
NB0 (pleiades_noise_recovery_research.ipynb): Documents progressive failure of NNLS, SG denoising, blind NMF, and Semi-NMF at low photon counts. Establishes binning sweep, SAMMY strided fitting, and representative pixel benchmarks as the testing framework. NB1 (pleiades_noise_recovery_nb1_transmission_domain.ipynb): Tests fitting in transmission domain to avoid log-clipping catastrophe, plus TV spatial regularization. Key findings: - T-domain+TV(w=0.1) achieves MAE 0.108 at n=2 (30% better than raw NNLS) - T-domain has systematic bias vs NNLS on clean data (MAE 0.036) - Iterative alternating provides no benefit (converges in 1 iteration) - At n=2, per-pixel information is fundamentally inadequate; methods that borrow across similar spatial regions are needed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…PM visualization
Add configurable vary flags (normalization, background, TZERO, thickness)
to ImagingConfig and InpDatasetMetadata, allowing users to fix nuisance
parameters for clean synthetic data while keeping them free for real data.
Defaults are True (vary all) for backwards compatibility.
Add units parameter ("fraction", "percent", "ppm") to all
AbundanceMapVisualizer plot methods, enabling instrument scientists to
view composition maps in parts-per-million as requested by VENUS team.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…onclusions Switch all composition metrics from fractional (0-1) to ppm scale. Normalize SAMMY validation tables. Add verified conclusions based on actual notebook output: T-domain raw is the clear win for visualization (19-43% at n=10-50), TV regularization is questionable due to blurring, and SAMMY is robust to noise — preprocessing hurts characterization by erasing resonance fine structure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
NB2 tests Gaussian, Non-Local Means, and Wavelet BayesShrink spatial denoising on the raw transmission cube before T-domain fitting. All three methods reduce composition map MAE by 27-29% at n=2. NLM wins composition maps (59K ppm at n=10, +27%), Wavelet wins SAMMY fidelity (mean |δ|=116K ppm). T-domain reconstruction universally hurts SAMMY regardless of prior denoising — confirmed across all experiments. Adds pywavelets dependency required by skimage wavelet denoising. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace 3 research notebooks (NB0-baseline, NB1-tdomain, NB2-spatial) with 2 consolidated notebooks using correct physical ground truth from sample metadata instead of NNLS-on-clean reference. NB0 (ground truth): Derives truth from calculate_number_density(), discovers 4 spectral classes, quantifies NNLS normalization artifact (~250K ppm crosstalk), characterizes clipping catastrophe, and maps the accuracy-resolution trade-off via binning sweep. NB1 (methods): Compares T-domain fitting, spatial denoising (NLM, wavelet, Gaussian), NMF neighborhood prototype, and hierarchical subdivision against physical truth. Includes honest SAMMY assessment. Key finding: blind NMF does not beat informed NNLS; spatial denoising helps ~10-15% at moderate noise; hierarchical bin=2 is optimal. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tightened all observation and conclusion cells against executed output: - NB0 §3: precise crosstalk range (63-218K ppm) - NB0 §5: per-class gate check numbers, model bias range (100-335K ppm) - NB0 §6: binning only helps at n=5-10, not n>=50 - NB1 §2: exact spatial denoising improvement percentages - NB1 §7: re-ranked methods table by n=10 avg MAE (Wavelet 1st, Hier 2nd) - NB1 recap: updated to match verified NB0 findings Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address PR #249 comment: the noise simulation already uses the physically correct count-domain approach (DataDegrader generates Poisson counts then normalizes to T), but the notebook text was misleading. Changes: - NB0 §4: explicitly document count-domain Poisson model in code comments - NB0 §4: reframe "clipping catastrophe" as "boundary values" — T=0 and T=1 are real Poisson outcomes, not simulation artifacts - NB0: fix column headers and variable names (unphysical → boundary) - NB1: update compute_poisson_sigma docstring to show equivalence: σ_T = √(counts)/I₀ = √(T_obs/I₀) - Both: add DataDegrader source reference in code comments Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
research: noise recovery notebooks with physical ground truth
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
pleiades.imagingpackage for 2D neutron resonance imagingProcessPoolExecutor, per-job timeouts, retry logic, and graceful shutdownanalyze_imaging()API for simple usageNew Modules
imaging.loaderHyperspectralLoader— loads multi-page TIFF with energy axisimaging.orchestratorBatchFittingOrchestrator— parallel SAMMY execution per pixelimaging.aggregatorResultsAggregator— builds 2D abundance/chi-squared mapsimaging.generatorAbundanceMapGenerator— synthetic test data generationimaging.visualizerAbundanceMapVisualizer— matplotlib visualizationimaging.temp_managerTempFileManager— disk-aware temp file lifecycleimaging.apianalyze_imaging()— high-level entry pointimaging.configImagingConfig— material/isotope configurationimaging.modelsSub-issues Completed
Other Changes
nova-galaxyto project dependenciesinp05_broadening→inp03_constants,inp07_density→inp03_densitypleiades_2d_imaging.ipynbTest Plan
🤖 Generated with Claude Code