Skip to content

Conversation

@jthorton
Copy link
Contributor

Fixes #60 adding a new script which takes an input SDF and generates the requested charges. Provenance info on the charge generation method is also included in the output SDF file as an SD tag formatted as:

>  <charge_provenance>  (1) 
{"openfe_version": "1.8.0", "openff_toolkit_version": "0.18.0", "rdkit_version": "2025.09.4", "charge_method": "am1bccelf10", "oeomega": "20251120", "oequacpac": "2111"}

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of quick first pass comments.

@click.command()
@click.option("--input-path", type=click.Path(exists=True, dir_okay=False, path_type=pathlib.Path), required=True, help="Path to the input SDF file containing the molecules to be charged.")
@click.option("--output-dir", type=click.Path(dir_okay=True, file_okay=False, exists=True, path_type=pathlib.Path), required=True, help="Path to the output folder the SDF file with charged molecules will be saved.")
@click.option("--charge-method", type=click.Choice(["am1bcc", "am1bccelf10"]), default="am1bcc", help="The method to use for charge assignment.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No NAGL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh yeah can add that in. I also wanted Gasteiger charges but thats blocked by our charge API maybe we can look at adding that later though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How unwieldy will this become to update as we add more, e.g. nagl models, etc..?
If it's easier to make one script per charge model or something like that, then please go with the easiest solution!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding from our meetings is that we will have a few based charge models, but since it's trivial to apply NAGL when we apply the forcefield, that it may not be worth it.

Since AM1BCC can take a non insignificant amount of time, it makes sense to have that already done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added NAGL for now but as @jaclark5 points out these are very quick to apply and so might not be needed, the only benifit is that if you want to do some analysis on the charges, they are all stored here rather than needing to be generated again.

Copy link
Member

@IAlibay IAlibay Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since it's trivial to apply NAGL when we apply the forcefield,

Given the nature of this repo, (and for many other reasons) I would prefer it if the partial charges were always stored here rather than generating at execution time.

Edit: unless the experiment asks for execution time charges - less variability == better benchmarks.


# construct the toolkit backend
method_to_backend = {
"am1bcc": "ambertools",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future planning a little bit, it might be good to also have am1bcc from openeye here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.

)
# for each ligand stamp the provenance info as sdf property and write to output sdf
# generate the provenance info
provenance = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want our charge API to just do this then this could all be replaced by using the openfe CLI

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rdkit version may not be relevant if openeye is used, can this have some logic, if rdkit is available add version, if openeye is available add version, if amber tools is available, add version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats true for the charges but OpenFE uses rdkit to convert to and from OpenFF so if there are any conversion issues we know which version it was done by, hopefully the provanence stamp makes it clear which backend was used though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want our charge API to just do this then this could all be replaced by using the openfe CLI

Yeah that's a good idea, at least it gets the ball moving towards the provenance thing you wanted to do - can you open an issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"am1bccelf10": "openeye_am1bccelf10",
}
output_path = output_dir / f"{input_path.stem}_{method_to_name[charge_method]}.sdf"
with Chem.SDWriter(str(output_path)) as writer:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, this might yield charged ligand files with different ordering to the input - that might be a headache when storing things. Could you resort?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added!

jthorton and others added 3 commits January 23, 2026 14:41
Co-authored-by: Hannah Baumann <43765638+hannahbaumann@users.noreply.github.com>
…lecules

# Conflicts:
#	openfe_benchmarks/data/data_generation/charge_molecules.py
Copy link
Collaborator

@jaclark5 jaclark5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job, thanks! I made some suggestions that I think are good for clarity and consistency but I'm approving because I don't think I need to review again. Do what you want though.

openff_charge_method = charge_method.replace("_at", "").replace("_oe", "")

# we need to generate conformers for am1bccelf10 or use the input conformer for other methods which is the None case
generate_n_conformers = None if charge_method != "am1bccelf10" else 500
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't conformers needed for am1bcc?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the input conformer for single conformer charge methods - that way it's more reproducible (i.e. you know exactly what conformer you used, not whatever the set seed gives on that architecture / compiler toolchain).


Parameters
----------
input_path : pathlib.Path
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we propose that the original SDFs will sit? Will they be ligands.sdf in the data directory and these new files will also be added? Then the factory method can ignore the old one?

or do we not care if the original SDF is in this repo?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure I fully understand the question so apologies if I'm off the mark.

The "uncharged" SDF files should exist in this repo in the data directory where other things like the PDB files should exist. Pretty much all the SDF files need to be in lockstep since they need to have the same coordinates (i.e. if we change coordinates, we have to change the initial SDF file and then regenerate partial charges).

the factory method can ignore the old one

There will be experiments that will call for generating networks without partial charges, they should be accessible by folks if they want to do these experiments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the uncharged SDFs will be in this repo as well with the name ligands.sdf .

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple small things, please do disregard my "changes requested" once addressed / answered - it looks good to me otherwise!

elif backend == "rdkit" and charge_method == "nagl":
from openff import nagl
from openff.nagl_models import get_models_by_type
if nagl_model is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're generating things for benchmarks, how about just not allowing None? Folks just set their model or it crashes (we don't need to take about general use, be restrictive :P )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point but I also want to make life easier and just leave this as None to get the newest charges!

"am1bcc_oe": "openeye_am1bcc"
}

output_path = output_dir / f"{input_path.stem}_{method_to_name[charge_method]}.sdf"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you encode the backend in the name too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that should already be in the output they have names like ligands_openeye_am1bcc.sdf & ligands_nagl_openff-gnn-am1bcc-1.0.0.pt.sdf

openff_charge_method = charge_method.replace("_at", "").replace("_oe", "")

# we need to generate conformers for am1bccelf10 or use the input conformer for other methods which is the None case
generate_n_conformers = None if charge_method != "am1bccelf10" else 500
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the input conformer for single conformer charge methods - that way it's more reproducible (i.e. you know exactly what conformer you used, not whatever the set seed gives on that architecture / compiler toolchain).

jthorton and others added 6 commits January 26, 2026 10:53
Co-authored-by: Jennifer A Clark <jennifer.a.clark13@gmail.com>
Co-authored-by: Jennifer A Clark <jennifer.a.clark13@gmail.com>
Co-authored-by: Jennifer A Clark <jennifer.a.clark13@gmail.com>
Co-authored-by: Jennifer A Clark <jennifer.a.clark13@gmail.com>
Co-authored-by: Jennifer A Clark <jennifer.a.clark13@gmail.com>
Copy link
Contributor

@hannahbaumann hannahbaumann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jthorton , looks great!

@jthorton jthorton merged commit 07e9240 into main Jan 26, 2026
@jthorton jthorton deleted the charge_molecules branch January 26, 2026 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

data: Create scripts to generate partial charge specific sdfs

5 participants