Conversation
data/tcr_epitope_binding/meta.yaml
Outdated
| - tcr binding affinity | ||
| - binding affinity | ||
| - binding |
There was a problem hiding this comment.
I feel it would be better if we could include in all the "synonyms" also the binding site, e.g., "epitope binding affinity"
There was a problem hiding this comment.
@kjappelbaum I included "epitope binding affinity" and also "epitope binding" as synonyms.
data/tcr_epitope_binding/meta.yaml
Outdated
| - id: epitope_smiles | ||
| type: SMILES | ||
| description: 'epitope smiles ' | ||
| - id: epitope_aa | ||
| type: amino acid | ||
| description: epitope amino acid sequence | ||
| - id: tcr_aa | ||
| type: amino acid | ||
| description: tcr amino acid sequence |
There was a problem hiding this comment.
Do I understand the dataset correctly that the binding only makes sense if we specify both the TCR and the Epitope?
There was a problem hiding this comment.
That is right. Given the epitope and TCR, predict if the pair binds.
There was a problem hiding this comment.
In this case, we will need to add templates to sample this data correctly. There are examples for the templates in the Contribution Guide. Let me know if you want some hand with this
There was a problem hiding this comment.
@kjappelbaum Thanks for the feedback, I attempted to add a template. However, I am not sure if I fully understand what to do here. Can you please have a look and provide some help on this?
kjappelbaum
left a comment
There was a problem hiding this comment.
Thanks for your contribution 💯
I think I do not fully understand the dataset yet, perhaps you can help me?
for more information, see https://pre-commit.ci
…strubeyj/chemnlp into Add-TCR-epitope-binding-dataset
kjappelbaum
left a comment
There was a problem hiding this comment.
Thanks again for your contribution. Before we merge, we should add the templates for sampling, as mentioned in one of my comments.
…strubeyj/chemnlp into Add-TCR-epitope-binding-dataset
I added a
meta.yaml,transform.pyandexample_processing_and_templates.ipynbfor the TCR epitope binding data found at TDC commons. It is a dataset that contains epitope (SMILES and amino acid sequence) and TCR (amino acid sequence) pairs. For each pair there is a binary label for binding. The data is used in the Weber et al. paper.