Charge molecules script #70

jthorton · 2026-01-23T12:09:57Z

Fixes #60 adding a new script which takes an input SDF and generates the requested charges. Provenance info on the charge generation method is also included in the output SDF file as an SD tag formatted as:

>  <charge_provenance>  (1) 
{"openfe_version": "1.8.0", "openff_toolkit_version": "0.18.0", "rdkit_version": "2025.09.4", "charge_method": "am1bccelf10", "oeomega": "20251120", "oequacpac": "2111"}

IAlibay

Couple of quick first pass comments.

IAlibay · 2026-01-23T12:36:17Z

openfe_benchmarks/data/data_generation/charge_molecules.py

+@click.command()
+@click.option("--input-path", type=click.Path(exists=True, dir_okay=False, path_type=pathlib.Path), required=True, help="Path to the input SDF file containing the molecules to be charged.")
+@click.option("--output-dir", type=click.Path(dir_okay=True, file_okay=False, exists=True, path_type=pathlib.Path), required=True, help="Path to the output folder the SDF file with charged molecules will be saved.")
+@click.option("--charge-method", type=click.Choice(["am1bcc", "am1bccelf10"]), default="am1bcc", help="The method to use for charge assignment.")


Ahh yeah can add that in. I also wanted Gasteiger charges but thats blocked by our charge API maybe we can look at adding that later though.

How unwieldy will this become to update as we add more, e.g. nagl models, etc..?
If it's easier to make one script per charge model or something like that, then please go with the easiest solution!

My understanding from our meetings is that we will have a few based charge models, but since it's trivial to apply NAGL when we apply the forcefield, that it may not be worth it.

Since AM1BCC can take a non insignificant amount of time, it makes sense to have that already done.

I have added NAGL for now but as @jaclark5 points out these are very quick to apply and so might not be needed, the only benifit is that if you want to do some analysis on the charges, they are all stored here rather than needing to be generated again.

since it's trivial to apply NAGL when we apply the forcefield,

Given the nature of this repo, (and for many other reasons) I would prefer it if the partial charges were always stored here rather than generating at execution time.

Edit: unless the experiment asks for execution time charges - less variability == better benchmarks.

IAlibay · 2026-01-23T12:36:49Z

openfe_benchmarks/data/data_generation/charge_molecules.py

+
+        # construct the toolkit backend
+        method_to_backend = {
+            "am1bcc": "ambertools",


Future planning a little bit, it might be good to also have am1bcc from openeye here.

IAlibay · 2026-01-23T12:38:37Z

openfe_benchmarks/data/data_generation/charge_molecules.py

+        )
+        # for each ligand stamp the provenance info as sdf property and write to output sdf
+        # generate the provenance info
+        provenance = {


I like this.

We may want our charge API to just do this then this could all be replaced by using the openfe CLI

rdkit version may not be relevant if openeye is used, can this have some logic, if rdkit is available add version, if openeye is available add version, if amber tools is available, add version.

Thats true for the charges but OpenFE uses rdkit to convert to and from OpenFF so if there are any conversion issues we know which version it was done by, hopefully the provanence stamp makes it clear which backend was used though.

We may want our charge API to just do this then this could all be replaced by using the openfe CLI

Yeah that's a good idea, at least it gets the ball moving towards the provenance thing you wanted to do - can you open an issue?

Sure see OpenFreeEnergy/openfe#1810

IAlibay · 2026-01-23T12:39:29Z

openfe_benchmarks/data/data_generation/charge_molecules.py

+            "am1bccelf10": "openeye_am1bccelf10",
+        }
+        output_path = output_dir / f"{input_path.stem}_{method_to_name[charge_method]}.sdf"
+        with Chem.SDWriter(str(output_path)) as writer:


If I remember correctly, this might yield charged ligand files with different ordering to the input - that might be a headache when storing things. Could you resort?

openfe_benchmarks/data/data_generation/charge_molecules.py

Co-authored-by: Hannah Baumann <43765638+hannahbaumann@users.noreply.github.com>

…lecules # Conflicts: # openfe_benchmarks/data/data_generation/charge_molecules.py

jaclark5

Nice job, thanks! I made some suggestions that I think are good for clarity and consistency but I'm approving because I don't think I need to review again. Do what you want though.

openfe_benchmarks/data/data_generation/charge_molecules.py

jaclark5 · 2026-01-23T16:40:54Z

openfe_benchmarks/data/data_generation/charge_molecules.py

+        openff_charge_method = charge_method.replace("_at", "").replace("_oe", "")
+
+        # we need to generate conformers for am1bccelf10 or use the input conformer for other methods which is the None case
+        generate_n_conformers = None if charge_method != "am1bccelf10" else 500


Why aren't conformers needed for am1bcc?

We use the input conformer for single conformer charge methods - that way it's more reproducible (i.e. you know exactly what conformer you used, not whatever the set seed gives on that architecture / compiler toolchain).

openfe_benchmarks/data/data_generation/charge_molecules.py

jaclark5 · 2026-01-23T16:50:42Z

openfe_benchmarks/data/data_generation/charge_molecules.py

+
+    Parameters
+    ----------
+    input_path : pathlib.Path


Where do we propose that the original SDFs will sit? Will they be ligands.sdf in the data directory and these new files will also be added? Then the factory method can ignore the old one?

or do we not care if the original SDF is in this repo?

I'm not 100% sure I fully understand the question so apologies if I'm off the mark.

The "uncharged" SDF files should exist in this repo in the data directory where other things like the PDB files should exist. Pretty much all the SDF files need to be in lockstep since they need to have the same coordinates (i.e. if we change coordinates, we have to change the initial SDF file and then regenerate partial charges).

the factory method can ignore the old one

There will be experiments that will call for generating networks without partial charges, they should be accessible by folks if they want to do these experiments.

Yes the uncharged SDFs will be in this repo as well with the name ligands.sdf .

IAlibay

Couple small things, please do disregard my "changes requested" once addressed / answered - it looks good to me otherwise!

IAlibay · 2026-01-25T23:27:43Z

openfe_benchmarks/data/data_generation/charge_molecules.py

+        elif backend == "rdkit" and charge_method == "nagl":
+            from openff import nagl
+            from openff.nagl_models import get_models_by_type
+            if nagl_model is None:


Since we're generating things for benchmarks, how about just not allowing None? Folks just set their model or it crashes (we don't need to take about general use, be restrictive :P )

Good point but I also want to make life easier and just leave this as None to get the newest charges!

IAlibay · 2026-01-25T23:28:05Z

openfe_benchmarks/data/data_generation/charge_molecules.py

+            "am1bcc_oe": "openeye_am1bcc"
+        }
+
+        output_path = output_dir / f"{input_path.stem}_{method_to_name[charge_method]}.sdf"


Could you encode the backend in the name too?

yeah that should already be in the output they have names like ligands_openeye_am1bcc.sdf & ligands_nagl_openff-gnn-am1bcc-1.0.0.pt.sdf

IAlibay · 2026-01-25T23:29:18Z

openfe_benchmarks/data/data_generation/charge_molecules.py

+        openff_charge_method = charge_method.replace("_at", "").replace("_oe", "")
+
+        # we need to generate conformers for am1bccelf10 or use the input conformer for other methods which is the None case
+        generate_n_conformers = None if charge_method != "am1bccelf10" else 500


We use the input conformer for single conformer charge methods - that way it's more reproducible (i.e. you know exactly what conformer you used, not whatever the set seed gives on that architecture / compiler toolchain).

Co-authored-by: Jennifer A Clark <jennifer.a.clark13@gmail.com>

hannahbaumann

Thanks @jthorton , looks great!

add charge molecules script

9fcd60b

jthorton requested review from IAlibay, hannahbaumann and jaclark5 January 23, 2026 12:10

update the environment for charge generation

8475db5

IAlibay reviewed Jan 23, 2026

View reviewed changes

hannahbaumann reviewed Jan 23, 2026

View reviewed changes

openfe_benchmarks/data/data_generation/charge_molecules.py Outdated Show resolved Hide resolved

jthorton and others added 3 commits January 23, 2026 14:41

Update openfe_benchmarks/data/data_generation/charge_molecules.py

5d34187

Co-authored-by: Hannah Baumann <43765638+hannahbaumann@users.noreply.github.com>

apply PR feedback

c662ae5

Merge remote-tracking branch 'origin/charge_molecules' into charge_mo…

6b64222

…lecules # Conflicts: # openfe_benchmarks/data/data_generation/charge_molecules.py

jaclark5 approved these changes Jan 23, 2026

View reviewed changes

IAlibay reviewed Jan 25, 2026

View reviewed changes

jthorton and others added 6 commits January 26, 2026 10:53

Update openfe_benchmarks/data/data_generation/charge_molecules.py

c0c627c

Co-authored-by: Jennifer A Clark <jennifer.a.clark13@gmail.com>

Update openfe_benchmarks/data/data_generation/charge_molecules.py

fecf9ce

Co-authored-by: Jennifer A Clark <jennifer.a.clark13@gmail.com>

Update openfe_benchmarks/data/data_generation/charge_molecules.py

79d3e6c

Co-authored-by: Jennifer A Clark <jennifer.a.clark13@gmail.com>

Update openfe_benchmarks/data/data_generation/charge_molecules.py

dbd8407

Co-authored-by: Jennifer A Clark <jennifer.a.clark13@gmail.com>

Update openfe_benchmarks/data/data_generation/charge_molecules.py

aa0ad4b

Co-authored-by: Jennifer A Clark <jennifer.a.clark13@gmail.com>

PR feedback

fb3baf0

hannahbaumann approved these changes Jan 26, 2026

View reviewed changes

jthorton mentioned this pull request Jan 26, 2026

Add charge generation provenance OpenFreeEnergy/openfe#1810

Open

jthorton merged commit 07e9240 into main Jan 26, 2026

jthorton deleted the charge_molecules branch January 26, 2026 11:56

Charge molecules script #70

Charge molecules script #70

Uh oh!

Conversation

jthorton commented Jan 23, 2026

Uh oh!

IAlibay left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IAlibay Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jaclark5 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IAlibay left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hannahbaumann left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

IAlibay Jan 23, 2026 •

edited

Loading