Add example 06: spectral analysis with chirp signal inference#42
Open
cweniger wants to merge 35 commits into
Open
Add example 06: spectral analysis with chirp signal inference#42cweniger wants to merge 35 commits into
cweniger wants to merge 35 commits into
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #42 +/- ##
=====================================
Coverage 9.57% 9.57%
=====================================
Files 32 32
Lines 3803 3803
=====================================
Hits 364 364
Misses 3439 3439
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
cweniger
commented
Feb 22, 2026
| validation_samples: 256 # Size of validation split (for early stopping) | ||
| simulate_count: 128 # Number of new samples drawn per simulation step | ||
| simulate_when_full: true # Continue simulating after reaching max samples | ||
| simulate_interval: 10 # Simulate every N training epochs |
Owner
Author
There was a problem hiding this comment.
I thought this is about seconds, not epochs. Check this in the code, and correct the comment if necessary.
cweniger
commented
Feb 22, 2026
| TRUE_CHIRP_MASS = 1.0 | ||
| TRUE_HARMONIC_DECAY = 1.5 | ||
|
|
||
| signal = emri_signal( |
Owner
Author
There was a problem hiding this comment.
The entire thing should be framed around emirs, but just general spectral signals. Don't mention EMRI here or anywhere, also not in the name of the pull request. It is confusing for non-GW experts. Also CHIRP MASS should be replaced, HARMONIC_DECAY is fine.
c72362b to
4dfc737
Compare
4dfc737 to
7544d4e
Compare
Demonstrates falcon + fuge library integration for EMRI gravitational wave parameter inference using a Gaussian posterior estimator and a nested embedding pipeline (ToneTokenizer → ToneTokenEmbedding → TransformerEmbedding) configured declaratively via YAML. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Without normalization, raw frequency bin indices (0-512) dominated the embedding features, preventing the transformer from learning. Now lazily calls compute_normalization() to produce zero-mean, unit-variance features before passing to the transformer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ray sets CUDA_VISIBLE_DEVICES="" for actors without GPU allocation, causing JAX to crash. Signal generation runs fine on CPU. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use GPU when available (e.g. when Ray allocates one), fall back to CPU only when CUDA_VISIBLE_DEVICES is empty (Ray no-GPU workers). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Computes the full 3-parameter Fisher information matrix using JAX autodiff on the EMRI signal model, then overlays the Cramér-Rao Gaussian on a corner plot alongside falcon posterior samples. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The ToneTokenizer STFT requires ~16GB per 1000 signals. Without chunking, all posterior samples go through the embedding at once. chunk_size: 64 processes them in manageable batches. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…imation script Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…okens caching config Expand the inference graph to include deterministic ancestor nodes reachable via evidence references (BFS expansion). This enables graph structures like theta -> x -> tokens where tokens is a cacheable intermediate without its own estimator. Add fallback to _simulate for nodes without estimators during posterior/proposal sampling. Rename internal graph attributes for symmetry: parents_dict -> forward_deps, sorted_node_names -> forward_order, sorted_inference_node_names -> backward_order. Add backward_deps with merged dependency dict for the inference direction. Add config2.yml with separate tokens node for STFT caching, Tokenizer class in model.py for float64 signal processing, and force JAX to CPU mode unconditionally since EMRI signal generation only needs CPU. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Embedding pipeline and posterior no longer inherit float64 from numpy theta. Neural network layers (TransformerEmbedding, MLP) run in float32 for tensor-core speed; parameter-space operations (covariance, sampling) stay float64 for precision. TokenEmbed normalization replaced with explicit RunningNorm pipeline layer that supports EMA across SBI rounds. - Rename LazyOnlineNorm → RunningNorm with 3D reduce_dims and output_dtype - GaussianPosterior: override to() to protect float64 buffers, cast MLP output to float64 before de-whitening - base.py: remove dtype forcing, cast conditions to float32 for embeddings - TokenEmbed: remove one-shot z-score, use _embed() directly - Configs: insert RunningNorm(output_dtype=float32) between TokenEmbed and TransformerEmbedding Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
7544d4e to
12fda24
Compare
During proposal re-simulation, observation-based values for deterministic intermediate nodes (e.g. tokens) were overwriting freshly re-simulated values, producing inconsistent training triples. Filter both condition_refs and the final merge step to only propagate latent node (estimator) values from the proposal, ensuring intermediates are correctly re-derived from the forward simulation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Shows noise-free power spectra evolving as training progresses, with true signal as reference. Bottom panels show 2D parameter scatter accumulating over time with CRB error ellipses. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of hardcoding N, t_c, A0, n_harmonics, noise_sigma, read them from the saved config.yml in the run directory. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace CPU Python loop with vectorized JAX vmap + JIT on GPU (9x faster) - Move noise generation to JAX (avoids slow np.random.randn at scale) - Remove hardcoded seq_len from config (now lazy in TransformerEmbedding) - Scale to N=1M bins with k=20240 STFT windows - Widen priors and tune training hyperparameters Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…N=100k Replace fuge.emri dependency with local chirp.py signal generator. Restructure graph to Signal+Noise+Data decomposition for both configs. Set uniform priors to ±10σ CRB at N=100k for all parameters. Fix GPU allocation (0.2 per node) to avoid deadlock on single-GPU machines. Add Fisher/CRB estimation script and prior sample visualization. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Output buffers (_output_mean, _output_std, _residual_cov, _residual_eigvals, _residual_eigvecs) were initialized as float32 by default. For parameters like f0 ~ 2.75e-3, the float32 ULP (3.28e-10) is comparable to the CRB (4.89e-10), leaving only ~30 distinct representable values in the prior range — making learning impossible. Fix: always initialize output buffers as float64. Cast sample() output to conditions' dtype and log_prob() output to theta's dtype. Override to() to preserve float64 buffers when moving to device. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When training with zooming (simulate_when_full), the Gaussian estimator trains on proposal data, biasing the residual covariance toward the proposal distribution rather than the true posterior. Apply analytical gamma correction: gamma_correct = (1+gamma)/gamma, which inverts the tempering used for proposal generation. Also: - Fix completion message to show best val loss instead of last - Add N=256 configs (config3: no embedding, config4: SVD embedding) with normal priors (5σ CRB) for A/B testing - Add obs_256.npz and obs_1M.npz observation files - Update SVDEmbedding for StreamingPCA register_buffer fix (numel>0) - Add make_plots2.py (Fisher comparison) and summarize_posterior.py - Update make_spectrum_animation.py to use local chirp module Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r design doc - SpectralTokenEmbed: multi-scale sin/cos encoding for frequencies, log-scaled amplitude, sin/cos phase boundary encoding, explicit time coordinate. Replaces TokenEmbed for richer frequency discrimination. - config.yml: updated to use SpectralTokenEmbed with RunningNorm, normal priors, larger batch/network, proposal-based training (k=10000) - PhaseFormer.md: design document for dual-stream hierarchical transformer with structural coherent phase accumulation via complex multiplication gated by frequency-matched attention. Includes Debye-Waller damping concept for uncertainty-aware spectral modes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
StreamingPCA.forward() now returns zeros before initialization, removing the need for manual component checks in the caller. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Matches the double precision used by SVDEmbedding downstream. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The old flags still work but emit a FutureWarning deprecation message. Updated all docs, examples, and tests to use the new flags. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds config_combined.yml for dual-stream embedding (SVD on raw signal + spectral tokens through transformer). Adds Concat helper module and detaches batch_mean in RunningNorm to prevent gradient flow through normalization statistics. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
examples/06_spectral_analysisdemonstrating falcon + fuge integration for multi-harmonic chirp signal parameter inferencetheta → y (clean signal)+n (noise)→x (observed = y + n)→tokens (STFT)config.yml: tokenized pipeline withTransformerEmbedding(via fuge)config2.yml: scaffolded SVD embedding on raw signalsfalcon.estimators.Gaussianposterior with uniform priors set to ±10σ CRB at N=100kchirp.pysignal generator (JAX autodiff-compatible) replaces external dependency for waveform generationTest plan
cd examples/06_spectral_analysis && python data/generate_obs.pyfalcon launch --run-dir outputs/runto verify training convergesfalcon sample prior --run-dir outputs/runandpython make_prior_samples_plot.pypython make_fisher_estimate.py --obs data/obs.npzfor both configspython make_plots.py outputs/runto compare posterior with Fisher/CRB bounds🤖 Generated with Claude Code