Questions about creating a custom RSVA nextclade dataset with NC_038235 reference

I'm trying to create a custom RSV nextclade dataset following the tutorials from https://docs.nextstrain.org/en/latest/tutorials/creating-a-phylogenetic-workflow.html#annotate-the-phylogeny and https://github.com/nextstrain/nextclade_data/blob/master/docs/dataset-creation-guide.md.

I have two questions:

For the reference tree, I used the same sequences as provided in the Underlying data from https://nextstrain.org/rsv/a/genome/6y. I also used identical parameters in pathogen.json as the official dataset. However, my QC results differ significantly from the official nextclade results - many samples that pass QC in the official dataset are marked as "bad" in my custom dataset. What could be causing this discrepancy?
I noticed that the official nextclade datasets use EPI_ISL_412866 (for RSVA) and OP975389 (for RSVB) as references, while many academic publications, such as Nature Communications' "Distinct patterns of within-host virus populations between two subgroups of human respiratory syncytial virus", use NC_038235 (RSVA) and NC_001781 (RSVB) as references. What's the rationale behind these different reference choices?
Would appreciate any insights into these questions. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Questions about creating a custom RSVA nextclade dataset with NC_038235 reference #237

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Questions about creating a custom RSVA nextclade dataset with NC_038235 reference #237

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions