-
Notifications
You must be signed in to change notification settings - Fork 1
Charge molecules script #70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
IAlibay
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of quick first pass comments.
| @click.command() | ||
| @click.option("--input-path", type=click.Path(exists=True, dir_okay=False, path_type=pathlib.Path), required=True, help="Path to the input SDF file containing the molecules to be charged.") | ||
| @click.option("--output-dir", type=click.Path(dir_okay=True, file_okay=False, exists=True, path_type=pathlib.Path), required=True, help="Path to the output folder the SDF file with charged molecules will be saved.") | ||
| @click.option("--charge-method", type=click.Choice(["am1bcc", "am1bccelf10"]), default="am1bcc", help="The method to use for charge assignment.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No NAGL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh yeah can add that in. I also wanted Gasteiger charges but thats blocked by our charge API maybe we can look at adding that later though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How unwieldy will this become to update as we add more, e.g. nagl models, etc..?
If it's easier to make one script per charge model or something like that, then please go with the easiest solution!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding from our meetings is that we will have a few based charge models, but since it's trivial to apply NAGL when we apply the forcefield, that it may not be worth it.
Since AM1BCC can take a non insignificant amount of time, it makes sense to have that already done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added NAGL for now but as @jaclark5 points out these are very quick to apply and so might not be needed, the only benifit is that if you want to do some analysis on the charges, they are all stored here rather than needing to be generated again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since it's trivial to apply NAGL when we apply the forcefield,
Given the nature of this repo, (and for many other reasons) I would prefer it if the partial charges were always stored here rather than generating at execution time.
Edit: unless the experiment asks for execution time charges - less variability == better benchmarks.
|
|
||
| # construct the toolkit backend | ||
| method_to_backend = { | ||
| "am1bcc": "ambertools", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future planning a little bit, it might be good to also have am1bcc from openeye here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added.
| ) | ||
| # for each ligand stamp the provenance info as sdf property and write to output sdf | ||
| # generate the provenance info | ||
| provenance = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want our charge API to just do this then this could all be replaced by using the openfe CLI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rdkit version may not be relevant if openeye is used, can this have some logic, if rdkit is available add version, if openeye is available add version, if amber tools is available, add version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats true for the charges but OpenFE uses rdkit to convert to and from OpenFF so if there are any conversion issues we know which version it was done by, hopefully the provanence stamp makes it clear which backend was used though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want our charge API to just do this then this could all be replaced by using the openfe CLI
Yeah that's a good idea, at least it gets the ball moving towards the provenance thing you wanted to do - can you open an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure see OpenFreeEnergy/openfe#1810
| "am1bccelf10": "openeye_am1bccelf10", | ||
| } | ||
| output_path = output_dir / f"{input_path.stem}_{method_to_name[charge_method]}.sdf" | ||
| with Chem.SDWriter(str(output_path)) as writer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember correctly, this might yield charged ligand files with different ordering to the input - that might be a headache when storing things. Could you resort?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added!
Co-authored-by: Hannah Baumann <43765638+hannahbaumann@users.noreply.github.com>
…lecules # Conflicts: # openfe_benchmarks/data/data_generation/charge_molecules.py
jaclark5
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job, thanks! I made some suggestions that I think are good for clarity and consistency but I'm approving because I don't think I need to review again. Do what you want though.
| openff_charge_method = charge_method.replace("_at", "").replace("_oe", "") | ||
|
|
||
| # we need to generate conformers for am1bccelf10 or use the input conformer for other methods which is the None case | ||
| generate_n_conformers = None if charge_method != "am1bccelf10" else 500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why aren't conformers needed for am1bcc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use the input conformer for single conformer charge methods - that way it's more reproducible (i.e. you know exactly what conformer you used, not whatever the set seed gives on that architecture / compiler toolchain).
|
|
||
| Parameters | ||
| ---------- | ||
| input_path : pathlib.Path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do we propose that the original SDFs will sit? Will they be ligands.sdf in the data directory and these new files will also be added? Then the factory method can ignore the old one?
or do we not care if the original SDF is in this repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure I fully understand the question so apologies if I'm off the mark.
The "uncharged" SDF files should exist in this repo in the data directory where other things like the PDB files should exist. Pretty much all the SDF files need to be in lockstep since they need to have the same coordinates (i.e. if we change coordinates, we have to change the initial SDF file and then regenerate partial charges).
the factory method can ignore the old one
There will be experiments that will call for generating networks without partial charges, they should be accessible by folks if they want to do these experiments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the uncharged SDFs will be in this repo as well with the name ligands.sdf .
IAlibay
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple small things, please do disregard my "changes requested" once addressed / answered - it looks good to me otherwise!
| elif backend == "rdkit" and charge_method == "nagl": | ||
| from openff import nagl | ||
| from openff.nagl_models import get_models_by_type | ||
| if nagl_model is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're generating things for benchmarks, how about just not allowing None? Folks just set their model or it crashes (we don't need to take about general use, be restrictive :P )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point but I also want to make life easier and just leave this as None to get the newest charges!
| "am1bcc_oe": "openeye_am1bcc" | ||
| } | ||
|
|
||
| output_path = output_dir / f"{input_path.stem}_{method_to_name[charge_method]}.sdf" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you encode the backend in the name too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that should already be in the output they have names like ligands_openeye_am1bcc.sdf & ligands_nagl_openff-gnn-am1bcc-1.0.0.pt.sdf
| openff_charge_method = charge_method.replace("_at", "").replace("_oe", "") | ||
|
|
||
| # we need to generate conformers for am1bccelf10 or use the input conformer for other methods which is the None case | ||
| generate_n_conformers = None if charge_method != "am1bccelf10" else 500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use the input conformer for single conformer charge methods - that way it's more reproducible (i.e. you know exactly what conformer you used, not whatever the set seed gives on that architecture / compiler toolchain).
Co-authored-by: Jennifer A Clark <jennifer.a.clark13@gmail.com>
Co-authored-by: Jennifer A Clark <jennifer.a.clark13@gmail.com>
Co-authored-by: Jennifer A Clark <jennifer.a.clark13@gmail.com>
Co-authored-by: Jennifer A Clark <jennifer.a.clark13@gmail.com>
Co-authored-by: Jennifer A Clark <jennifer.a.clark13@gmail.com>
hannahbaumann
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jthorton , looks great!
Fixes #60 adding a new script which takes an input SDF and generates the requested charges. Provenance info on the charge generation method is also included in the output SDF file as an SD tag formatted as: