-
Notifications
You must be signed in to change notification settings - Fork 1
Industry benchmark network generation script #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Could you expand on your thoughts here - is the idea that you want to store charge-less networks that would be used in conjunction with any ligands SDF file? I.e. you have the one ligand network and you just use it in composition? |
|
Thank you @jthorton , this looks great!
|
Exactly this, then our planning scripts and CI can check for charges on the ligands and make some noise if they are missing and not using nagl as this is likely a mistake?
So for benchmarks it would be to use these stored network files, the case I mean is the rare case when you want to generate inputs for the data folder again. In that case it might make sense to store the ligands without charges as well in each folder to make it easier to do? |
Yes, storing a charge-less ligand file could make sense for that case! |
|
|
||
|
|
||
| @click.command() | ||
| @click.option("--system-group", type=str, required=True, help="The industry system group ie JACS/MERCK used to determine the edges.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| @click.option("--system-group", type=str, required=True, help="The industry system group ie JACS/MERCK used to determine the edges.") | |
| @click.option("--system-group", type=str, required=True, help="The industry system group of system names used to determine the edges, i.e., JACS/MERCK.") |
|
|
||
| @click.command() | ||
| @click.option("--system-group", type=str, required=True, help="The industry system group ie JACS/MERCK used to determine the edges.") | ||
| @click.option("--system-name", type=str, required=True, help="The industry system name ie TYK2 used to determine the edges.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| @click.option("--system-name", type=str, required=True, help="The industry system name ie TYK2 used to determine the edges.") | |
| @click.option("--system-name", type=str, required=True, help="The industry system name used to determine the edges, i.e., TYK2.") |
| If the ligands in the input SDF do not match the ligands in the reference edges, this is checked by name. | ||
| """ | ||
| # load the ref data stored on github | ||
| all_edges = pd.read_csv("https://raw.githubusercontent.com/OpenFreeEnergy/IndustryBenchmarks2024/refs/heads/main/industry_benchmarks/analysis/processed_results/combined_pymbar3_edge_data.csv", dtype={"ligand_A": str, "ligand_B": str, "system group": str, "system name": str}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect that this file will never change or does it make sense to put a copy in this repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest using the release's content here, that will be stable. Please don't make a copy of that csv here.
| raise RuntimeError(f"Ligands in edges do not match ligands in input sdf. Edge ligands: {edge_ligand_names}, Input ligands: {input_ligand_names}") | ||
|
|
||
| # generate the mappings using kartograf | ||
| mapper = KartografAtomMapper(map_hydrogens_on_hydrogens_only=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the previous version of this repo, the LomapAtomMapper, are we changing the standard and suggesting that other mappings won't be used?
| # save the network | ||
| out_path = out_dir / "lomap_network.json" | ||
| network.to_json(out_path) | ||
| print(f"LOMAP network saved to {out_path}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, the mapping can have an effect on the results of a network?
| print(f"LOMAP network saved to {out_path}") | |
| print(f"LOMAP network with kartograf mapping saved to {out_path}") |
Fixes #62 by adding a script to generate a network containing the edges from the industry benchmarking study for the given system.
Questions