Skip to content

Multipoles analysis#157

Open
p-slash wants to merge 46 commits intomasterfrom
r-mu-binning-merge-master
Open

Multipoles analysis#157
p-slash wants to merge 46 commits intomasterfrom
r-mu-binning-merge-master

Conversation

@p-slash
Copy link
Collaborator

@p-slash p-slash commented Mar 24, 2026

This PR is a version of the multipoles analysis code. I solved some heavy merge conflicts. I can reproduce my analysis with this code. However, baseline analysis should be checked. One notable shortcoming: Cannot marginalize small scales and do multipoles compression at the same time.

p-slash added 30 commits June 23, 2025 14:04
@codecov
Copy link

codecov bot commented Mar 24, 2026

Codecov Report

❌ Patch coverage is 28.34138% with 445 lines in your changes missing coverage. Please review.
✅ Project coverage is 35.59%. Comparing base (88b62bd) to head (c86d244).

Files with missing lines Patch % Lines
vega/metals.py 30.37% 187 Missing and 17 partials ⚠️
vega/data.py 19.31% 65 Missing and 6 partials ⚠️
vega/coordinates.py 31.64% 53 Missing and 1 partial ⚠️
vega/output.py 9.37% 29 Missing ⚠️
vega/vega_interface.py 30.00% 20 Missing and 1 partial ⚠️
vega/postprocess/fit_results.py 13.63% 19 Missing ⚠️
vega/utils.py 17.64% 14 Missing ⚠️
vega/power_spectrum.py 50.00% 4 Missing and 4 partials ⚠️
vega/scale_parameters.py 27.27% 7 Missing and 1 partial ⚠️
vega/analysis.py 0.00% 7 Missing ⚠️
... and 3 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #157      +/-   ##
==========================================
- Coverage   36.61%   35.59%   -1.03%     
==========================================
  Files          30       30              
  Lines        4296     4557     +261     
  Branches      815      854      +39     
==========================================
+ Hits         1573     1622      +49     
- Misses       2555     2755     +200     
- Partials      168      180      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR integrates a “multipoles analysis” workflow into Vega, adding support for r–μ binning, multipole compression, and covariance corrections (Hartlap/Percival), plus updating outputs/readers to carry the extra metadata needed to reproduce the analysis.

Changes:

  • Add multipole (ℓ) support end-to-end: r–μ coordinates, Legendre bin averaging, P(k)→ξ transforms, and model/data/covariance conversions.
  • Add global-covariance Hartlap scaling and a Percival correction factor (propagated into FITS bestfit covariance/errors).
  • Update FITS output and postprocessing readers to include correlation sizes and multipole metadata.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
vega/vega_interface.py Adds global Hartlap/Percival handling and global-cov multipole conversion.
vega/utils.py Adds Legendre bin helpers and Percival correction function.
vega/templates/parameters.ini Updates default BAO/AP parameter defaults (epsilon, aap).
vega/scale_parameters.py Adds aiso_aap parametrisation and centralises allowed parametrisations.
vega/power_spectrum.py Adds optional r–μ binning mode for binning kernel G(k).
vega/postprocess/fit_results.py Extends correlation output metadata and adjusts legacy FITS reading.
vega/pktoxi.py Adds optional μ-smoothing for ξℓ Legendre evaluation.
vega/parameters/latex_names.txt Adds latex name for aap and adjusts aiso latex.
vega/parameters/default_values.txt Adjusts epsilon range and adds aap defaults.
vega/output.py Writes multipole-aware model grids, stores extra header metadata, applies Percival to bestfit cov/errors.
vega/model.py Threads r–μ binning/multipole logic into model evaluation and post-distortion transforms.
vega/minimizer.py Stores a p-value on the minimizer for output.
vega/metals.py Refactors metals computation and adds interpolation support for r–μ binning.
vega/data.py Adds r–μ coordinate mode, multipole compression, and (optional) Hartlap scaling on covariance.
vega/correlation_item.py Plumbs multipole configuration into correlation items.
vega/coordinates.py Introduces RMuCoordinates and consolidates common masking logic in base Coordinates.
vega/analysis.py Updates global-mock generation to use model/data masks in the global-cov pathway.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

else:
G[i, i] = eye_array(self.data[name].full_data_size)
G = block_array(G, format='csr')
self.global_cov = G.dot(G.dot(self.global_cov).T).T
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global covariance transformation for multipoles is currently G.dot(G.dot(C).T).T, which is a non-obvious way to compute G @ C @ G.T and can be slower / easier to get wrong. Prefer the explicit and standard form self.global_cov = G @ self.global_cov @ G.T (or G.dot(self.global_cov).dot(G.T)) for clarity and to avoid unnecessary transposes.

Suggested change
self.global_cov = G.dot(G.dot(self.global_cov).T).T
# Apply standard covariance transformation: C -> G C G^T
self.global_cov = G.dot(self.global_cov).dot(G.T)

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if scipy sparse arrays support this. Needs testing

Comment on lines +66 to +73
def percival_correction(nsamples, nbins, nparams):
"""Percival 2014 correction of the estimated parameter covariance.
MNRAS, Volume 439, Issue 3, p.2531-2541
"""
a = nsamples - nbins
denom = (a - 1) * (a - 4)
A, B = 2.0 / denom, (a - 2.0) / denom
return (1.0 + B * (nbins - nparams)) / (1.0 + A + B * (nparams - 1))
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

percival_correction() can divide by zero or return nonsensical values when nsamples - nbins is ≤ 4 because denom = (a-1)*(a-4). Since this is used alongside Hartlap (which only enforces nsamples > nbins + 2), consider validating nsamples > nbins + 4 here and raising a clear ValueError when the correction is undefined.

Copilot uses AI. Check for mistakes.
Comment on lines +801 to +803
mult_matrix = np.zeros((n_out, nr * nmu))
for i, x in enumerate(np.nonzero(data_mask_ell)[0]):
mult_matrix[mell[i], self.model_mask] = M[x, self.data_mask]
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _convert_to_multipoles(), the distortion-matrix remapping assumes self.data_mask and self.model_mask select the same number of elements (it assigns M[x, self.data_mask] into mult_matrix[..., self.model_mask]). If these masks differ, this will raise a shape error or silently mis-map bins. Add an explicit check that the masked sizes match (and/or build an index mapping between model/data coordinates) before doing this assignment.

Suggested change
mult_matrix = np.zeros((n_out, nr * nmu))
for i, x in enumerate(np.nonzero(data_mask_ell)[0]):
mult_matrix[mell[i], self.model_mask] = M[x, self.data_mask]
# Build explicit index mappings between data/model coordinates and validate sizes
data_mask_indices = np.flatnonzero(self.data_mask)
model_mask_indices = np.flatnonzero(self.model_mask)
if data_mask_indices.size != model_mask_indices.size:
raise ValueError(
"In _convert_to_multipoles, data_mask and model_mask select different "
f"numbers of elements (data: {data_mask_indices.size}, "
f"model: {model_mask_indices.size}). "
"Cannot safely remap distortion matrix."
)
data_ell_indices = np.flatnonzero(data_mask_ell)
if data_ell_indices.size != mell.size:
raise ValueError(
"In _convert_to_multipoles, data and model ell masks select different "
f"numbers of elements (data_ell: {data_ell_indices.size}, "
f"model_ell: {mell.size}). "
"Cannot safely remap distortion matrix."
)
mult_matrix = np.zeros((n_out, nr * nmu))
for row_idx, x in zip(mell, data_ell_indices):
mult_matrix[row_idx, model_mask_indices] = M[x, data_mask_indices]

Copilot uses AI. Check for mistakes.
Comment on lines +23 to +25
ndata: int
is_multipoles: bool
nell: int
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CorrelationOutput now has new required fields (ndata, is_multipoles, nell) but not all constructor call sites were updated. In this file, read_correlations() still instantiates CorrelationOutput(...) without these args, which will raise TypeError at runtime. Either give these new fields defaults in the dataclass, or pass them everywhere CorrelationOutput is constructed (including the non-legacy reader).

Suggested change
ndata: int
is_multipoles: bool
nell: int
ndata: int = 0
is_multipoles: bool = False
nell: int = 0

Copilot uses AI. Check for mistakes.
Comment on lines 201 to 224
# TODO The corr items need to have an imposed order
masked_fiducial = []
for name, data in self._data.items():
mask = data.dist_model_coordinates.get_mask_to_other(data.data_coordinates)
if data.data_mask.size == fiducial_model[name].size:
masked_fiducial.append(fiducial_model[name])
elif mask.size == fiducial_model[name].size:
masked_fiducial.append(fiducial_model[name][mask])
else:
raise ValueError('Input fiducial has unknown size. '
'It must match the data or the model.')
masked_fiducial = np.concatenate(masked_fiducial)
full_fiducial_model = np.concatenate([fiducial_model[name] for name in self._data])
masked_fiducial = full_fiducial_model[full_model_mask]
# Naim comment: unsure about the purpose of the below code
# masked_fiducial = []
# for name, data in self._data.items():
# mask = data.dist_model_coordinates.get_mask_to_other(data.data_coordinates)
# if data.data_mask.size == fiducial_model[name].size:
# masked_fiducial.append(fiducial_model[name])
# elif mask.size == fiducial_model[name].size:
# masked_fiducial.append(fiducial_model[name][mask])
# else:
# raise ValueError('Input fiducial has unknown size. '
# 'It must match the data or the model.')
# masked_fiducial = np.concatenate(masked_fiducial)

if forecast:
self.current_mc_mock = masked_fiducial[full_data_mask]
self.current_mc_mock = masked_fiducial
# self.current_mc_mock = masked_fiducial[full_data_mask]
else:
ran_vec = np.random.randn(full_data_mask.sum())
self.current_mc_mock = masked_fiducial[full_data_mask] + self._cholesky_global_cov.dot(
# self.current_mc_mock = masked_fiducial[full_data_mask] + self._cholesky_global_cov.dot(
self.current_mc_mock = masked_fiducial + self._cholesky_global_cov.dot(
ran_vec)
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

masked_fiducial is built using full_model_mask, but the global covariance Cholesky and noise vector are built using full_data_mask. If model_mask and data_mask differ (e.g., due to distortion/model coordinate differences), this leads to shape mismatches when adding the noise term. Use full_data_mask consistently for the masked mean, or explicitly map the fiducial model from model-space to data-space before applying the global covariance.

Copilot uses AI. Check for mistakes.
p-slash and others added 3 commits March 24, 2026 17:59
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants