Cfret pilot analysis #60

axiomcura · 2026-01-16T16:08:17Z

This PR prepares the analysis of outputs generated by the buscar pipeline. It introduces three notebooks that together document the full buscar workflow, the generation of aggregate profiles, and the computation of cluster centroids identified by buscar.

These notebooks produce intermediate outputs that enable inspection of how buscar processes the data and derives its scoring. Plots and final figures will be added in a subsequent PR.

Notebooks

1.cfret-pilot-buscar-analysis.ipynb: Runs the full buscar pipeline.
2.2.generate-aggregate-profiles.ipynb: Generates aggregate profiles at the replicate level and treatment-level consensus.
3.generate-centroid.ipynb: Computes a centroid for each cluster identified by the buscar module.

Other changes

2.preprocessing.ipynb: Removed drug_x (data cannot be published).
Updated the clustering parameter grid to allow a broader search of the parameter space.

review-notebook-app · 2026-01-16T16:08:22Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

wli51

LGTM! Great work and happy to see things coming together. I have a question about the centroid generating median aggregation step if you could clarify things a little more it would be great

notebooks/2.cfret-analysis/nbconverted/1.cfret-pilot-buscar-analysis.py

wli51 · 2026-01-16T20:45:12Z

notebooks/2.cfret-analysis/nbconverted/1.cfret-pilot-buscar-analysis.py

+
+# calculate phenotypic distance scores
 if treatment_dist_scores_outpath.exists():
    print("Treatment phenotypic distance scores already exist, skipping this step.")


might want to perform some content check to ensure everything needed is present so in the event you include/uninclude samples earlier the pipeline knows if stuff needs to be re-generated or perhaps a force-re-run flag would be handy.

wli51 · 2026-01-16T20:58:15Z

notebooks/2.cfret-analysis/nbconverted/3.generate-centroid.py

+cfret_df.select(cfret_meta + cfret_feats).head()
+
+
+# We use **median aggregation** to generate centroid profiles for each cluster. For each cluster, we calculate the component-wise median across all cells to create a synthetic representative profile that captures the central tendency. This approach is robust to outliers, consistent with replicate and consensus profile generation workflow, and works well for high-dimensional morphological features.


Maybe I am understanding your goals wrong or the pycytominer aggregation wrong, so please let me know if so. You said here in documentation that the data was aggregated by component-wise median, which I assume just involves median-ing every feature separately (or is this a typo and you meant geometric median?). I have read about component-wise median operations generalizing poorly from 1d to higher dimensional data and can produce unrealistic centroid profiles.

In the extreme case if we have very anti-correlated or correlated features the median profile would be far from anything that is typical or realistic, unless this type of correlation structure has been previously dealt with:

| Sample | Feature X | Feature Y | |--------|-----------|-----------| | S1 | 1 | 10 | | S2 | 2 | 9 | | S3 | 9 | 2 | | S4 | 10 | 1 | | Median | 5.5 | 5.5 |

axiomcura added 7 commits December 23, 2025 12:58

added on and off pca/umap analysis

ca5b704

removed drug_x from the data set

9876715

updated buscar analysis without drug x

b3a30de

reran buscar without drug_x

395f1b2

ignored notebooks for other PR

c146135

aggregate cfret profiles notebook

b34981d

added cluster centroid notebook

ae59428

axiomcura marked this pull request as ready for review January 16, 2026 16:15

axiomcura requested a review from wli51 January 16, 2026 20:45

wli51 approved these changes Jan 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cfret pilot analysis #60

Cfret pilot analysis #60

Uh oh!

axiomcura commented Jan 16, 2026 •

edited

Loading

Uh oh!

review-notebook-app bot commented Jan 16, 2026

Uh oh!

wli51 left a comment

Uh oh!

Uh oh!

wli51 Jan 16, 2026

Uh oh!

wli51 Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		cfret_df.select(cfret_meta + cfret_feats).head()


		# We use median aggregation to generate centroid profiles for each cluster. For each cluster, we calculate the component-wise median across all cells to create a synthetic representative profile that captures the central tendency. This approach is robust to outliers, consistent with replicate and consensus profile generation workflow, and works well for high-dimensional morphological features.

Cfret pilot analysis #60

Are you sure you want to change the base?

Cfret pilot analysis #60

Uh oh!

Conversation

axiomcura commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Notebooks

Other changes

Uh oh!

review-notebook-app bot commented Jan 16, 2026

Uh oh!

wli51 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wli51 Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

wli51 Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

axiomcura commented Jan 16, 2026 •

edited

Loading