-
Notifications
You must be signed in to change notification settings - Fork 6
Config improvements #102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Config improvements #102
Conversation
Copied from nextstrain/WNV and removed the config validation part. https://github.com/nextstrain/WNV/blob/0c32f4d71f585c1924fa0c360e67643594795ace/phylogenetic/rules/config.smk
augur filter has never taken a reference file as input, so my guess is this was originally copied by mistake.
Preparing to make changes to the config that would have broken this. It seems fine to hardcode along with the already hardcoded sample size.
This will make it easier to compare changes in the switch to augur subsample which will define all parameters in config.
Prevents unintentional file pattern matching by rule inputs/outputs.
This shouldn't rely on config from another rule.
joverlee521
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General config improvements make sense to me 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like these changes were made before you added the shared write_config function. Update to use shared/vendored?
| """ | ||
| input: | ||
| sequences="data/{a_or_b}/sequences.fasta", | ||
| reference="config/{a_or_b}reference.gbk", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
augur filter has never taken a reference file as input, so my guess is
this was originally copied by mistake.
I wondered if this was added to enforce the order in which rules are run. However, based on initial workflow that just uses config directly, agree this was probably copied by mistake.
| --group-by {params.group_by} \ | ||
| --subsample-max-sequences {params.subsample_max_sequences} \ | ||
| --query '({params.min_coverage}) & missing_data<1000' | ||
| --query '({params.min_coverage}) & missing_data<{params.missing_data_threshold}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocking
I was going to suggest combining these two params into a single query param, but maybe not necessary to do here since we'll switch to augur subsample ~soon.
| build_name=r"genome|G|F", | ||
| resolution=r"all-time|6y|3y", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to suggest these should be dynamic
| build_name=r"genome|G|F", | |
| resolution=r"all-time|6y|3y", | |
| build_name="|".join(config["builds_to_run"]), | |
| resolution="|".join(config["resolutions_to_run"]), |
However I then realized that these already hardcoded due to the values in the config/distance_maps.tsv. Probably good to add a note here and add to config validation in the future.
Description of proposed changes
Small adjustments to config, particularly around filtering and subsampling.
Related issue(s)
Preparing for #103
Checklist