Skip to content

Butina clustering bench improvements#188

Merged
scal444 merged 3 commits into
NVIDIA-BioNeMo:mainfrom
scal444:split/butina-cleanup
Jun 1, 2026
Merged

Butina clustering bench improvements#188
scal444 merged 3 commits into
NVIDIA-BioNeMo:mainfrom
scal444:split/butina-cleanup

Conversation

@scal444
Copy link
Copy Markdown
Collaborator

@scal444 scal444 commented Jun 1, 2026

  • Fixed an error when tanimoto similarity of 1.1 was an option, it causes an error in some paths. 1.0 still does the same thing without the error.

  • Standardizes measurements for different modes. When we run the fused or low mem rdkit/nvmolkit versions, the proper comparison is against the tanimoto matrix THEN Butina for the non-fused versions. However, when directly comparing non-fused, it makes more sense to leave out the tanimoto aspect.

scal444 added 3 commits May 29, 2026 14:16
- Flip `--no-rdkit-lowmem` to opt-in `--rdkit-lowmem`. The lowmem backend
  builds its distance matrix in a pure-Python O(n^2) loop that doesn't
  finish in reasonable time at sizes >= 40k, so default off.
- Drop `--include-tanimoto-matrix` and always report both rdkit variants
  (`rdkit_cluster_only_*` and `rdkit_with_tanimoto_*`) so a single run
  characterises both the inner-loop and the with-distance-build timings.
- Allow rdkit_cluster_only / fused / nvmolkit to run on synthetic-fill
  fingerprints when the input has fewer mols than `size`; with_tanimoto
  and lowmem rows are skipped in that case (they need real fingerprints).
- Edge-case cutoff changes from 1.1 to 1.0.
@scal444 scal444 requested a review from evasnow1992 June 1, 2026 13:20
Copy link
Copy Markdown
Collaborator

@evasnow1992 evasnow1992 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me. Thanks!

@scal444 scal444 merged commit de2cc58 into NVIDIA-BioNeMo:main Jun 1, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants