Skip to content

Conversation

@jthorton
Copy link

Fixes #62 by adding a script to generate a network containing the edges from the industry benchmarking study for the given system.

Questions

  • should we make sure there are no charges on the network to ensure that the correct charges are used when building the benchmark network?
  • I think some networks might use different names in the sdf files and edge csv do we encode the mapping in the script?
  • Do we want to assume the ligand sdf is local or pull it from the github each time?

@IAlibay
Copy link
Member

IAlibay commented Jan 23, 2026

should we make sure there are no charges on the network to ensure that the correct charges are used when building the benchmark network?

Could you expand on your thoughts here - is the idea that you want to store charge-less networks that would be used in conjunction with any ligands SDF file? I.e. you have the one ligand network and you just use it in composition?

@hannahbaumann
Copy link
Contributor

Thank you @jthorton , this looks great!
Regarding your questions:

  1. I would say yes if the plan is to store a single lomap_network.json file per system for now.
  2. I thought we renamed them, but it could be that we missed some. Maybe this would also be a good time to align things?
  3. I think I'm a bit unclear what you mean here. Would we expect for most benchmarking exercises (e.g. ff benchmarks) for people to be re-using a stored network, or the plan be to not store the network, but run the script each time?

@jthorton
Copy link
Author

@IAlibay

Could you expand on your thoughts here - is the idea that you want to store charge-less networks that would be used in conjunction with any ligands SDF file? I.e. you have the one ligand network and you just use it in composition?

Exactly this, then our planning scripts and CI can check for charges on the ligands and make some noise if they are missing and not using nagl as this is likely a mistake?

@hannahbaumann

I think I'm a bit unclear what you mean here. Would we expect for most benchmarking exercises (e.g. ff benchmarks) for people to be re-using a stored network, or the plan be to not store the network, but run the script each time?

So for benchmarks it would be to use these stored network files, the case I mean is the rare case when you want to generate inputs for the data folder again. In that case it might make sense to store the ligands without charges as well in each folder to make it easier to do?

@hannahbaumann
Copy link
Contributor

So for benchmarks it would be to use these stored network files, the case I mean is the rare case when you want to generate inputs for the data folder again. In that case it might make sense to store the ligands without charges as well in each folder to make it easier to do?

Yes, storing a charge-less ligand file could make sense for that case!



@click.command()
@click.option("--system-group", type=str, required=True, help="The industry system group ie JACS/MERCK used to determine the edges.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@click.option("--system-group", type=str, required=True, help="The industry system group ie JACS/MERCK used to determine the edges.")
@click.option("--system-group", type=str, required=True, help="The industry system group of system names used to determine the edges, i.e., JACS/MERCK.")


@click.command()
@click.option("--system-group", type=str, required=True, help="The industry system group ie JACS/MERCK used to determine the edges.")
@click.option("--system-name", type=str, required=True, help="The industry system name ie TYK2 used to determine the edges.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@click.option("--system-name", type=str, required=True, help="The industry system name ie TYK2 used to determine the edges.")
@click.option("--system-name", type=str, required=True, help="The industry system name used to determine the edges, i.e., TYK2.")

If the ligands in the input SDF do not match the ligands in the reference edges, this is checked by name.
"""
# load the ref data stored on github
all_edges = pd.read_csv("https://raw.githubusercontent.com/OpenFreeEnergy/IndustryBenchmarks2024/refs/heads/main/industry_benchmarks/analysis/processed_results/combined_pymbar3_edge_data.csv", dtype={"ligand_A": str, "ligand_B": str, "system group": str, "system name": str})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect that this file will never change or does it make sense to put a copy in this repo?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest using the release's content here, that will be stable. Please don't make a copy of that csv here.

raise RuntimeError(f"Ligands in edges do not match ligands in input sdf. Edge ligands: {edge_ligand_names}, Input ligands: {input_ligand_names}")

# generate the mappings using kartograf
mapper = KartografAtomMapper(map_hydrogens_on_hydrogens_only=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the previous version of this repo, the LomapAtomMapper, are we changing the standard and suggesting that other mappings won't be used?

# save the network
out_path = out_dir / "lomap_network.json"
network.to_json(out_path)
print(f"LOMAP network saved to {out_path}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the mapping can have an effect on the results of a network?

Suggested change
print(f"LOMAP network saved to {out_path}")
print(f"LOMAP network with kartograf mapping saved to {out_path}")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

data: Create scripts to generate mapping json files

5 participants