-
Notifications
You must be signed in to change notification settings - Fork 41
[Rabies] Initialize Lyssavirus rabies all-clades community dataset #333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Thanks! Seems to be working As a dev I can only review the technical side. And I will let our scientists to check the sciency bits :) The virus is quite diverse it seems - lots of mutations. But this is probably expected. If you have an public repo where you prepare trees and other data for the dataset, it would be a great help to the users of your dataset if you add it to the readme. We typically use a boilerplate like this in Nextstrain datasets:
But that's not mandatory. All looks good to me. Smooth work! |
|
Thank you. Rabies is indeed very diverse - I contemplated creating independent datasets for each clade, but the genotyping of this "all-clades" dataset has been sufficient for our SME partners. Additionally, there were issues with sub-clade metadata quality that limit the improvements more refined datasets may provide. I do not have a repository for tree building - I built the Nextclade dataset from the Nextstrain rabies build as a template, though I ended up deviating with the tree building methodology and metadata acquisition. The methodology is hopefully adequately documented for users in this PR's README. |
|
Thanks a lot for contributing this dataset! Overall, this looks very good. But I have a few suggestions to make it better.
|
|
hey @rneher, just wanted to reply and inform you that I cannot return to this to address your concerns until a later date. not sure when, but hopefully within the next several weeks. Thanks for your suggestions and my apologies for my ignorance to some of the standardized procedures. RE: alignment parameters: I'm not really certain how to systematically adjust these parameters - do you have specific recommendations/procedures to determine what parameters are more ideal, or do you suggest dragging and dropping the linked pathogen.json you sent? RE: Apart from the tree-building, the workflow was performed with AUGUR. With this in mind, do these steps deviate from Nextclade like you're suggesting?: Alignment:Tree building:performed as discussed in the README Refinement:Trait application:Nucleotide mutation calling:Translation:Clade mutation extraction (non-AUGUR):Clade mutation application:Export: |
This pull request initializes a Lyssavirus rabies (rabies) Nextclade dataset with clade-subclade resolution. Created in collaboration with @kimandrews and with subject matter expertise/user input from Massachusetts Department of Public Health. Please review the README.md for information on dataset creation and citations.