Config improvements #102

victorlin · 2025-09-22T22:00:35Z

Description of proposed changes

Small adjustments to config, particularly around filtering and subsampling.

Related issue(s)

Preparing for #103

Checklist

Checks pass
Update changelog

Copied from nextstrain/WNV and removed the config validation part. https://github.com/nextstrain/WNV/blob/0c32f4d71f585c1924fa0c360e67643594795ace/phylogenetic/rules/config.smk

augur filter has never taken a reference file as input, so my guess is this was originally copied by mistake.

Preparing to make changes to the config that would have broken this. It seems fine to hardcode along with the already hardcoded sample size.

This will make it easier to compare changes in the switch to augur subsample which will define all parameters in config.

Prevents unintentional file pattern matching by rule inputs/outputs.

This shouldn't rely on config from another rule.

joverlee521

General config improvements make sense to me 👍

joverlee521 · 2025-10-06T18:29:34Z

workflow/snakemake_rules/config.smk

Looks like these changes were made before you added the shared write_config function. Update to use shared/vendored?

joverlee521 · 2025-10-06T18:39:13Z

workflow/snakemake_rules/core.smk

        """
    input:
        sequences="data/{a_or_b}/sequences.fasta",
-        reference="config/{a_or_b}reference.gbk",


augur filter has never taken a reference file as input, so my guess is
this was originally copied by mistake.

I wondered if this was added to enforce the order in which rules are run. However, based on initial workflow that just uses config directly, agree this was probably copied by mistake.

joverlee521 · 2025-10-06T18:43:24Z

workflow/snakemake_rules/core.smk

            --group-by {params.group_by} \
            --subsample-max-sequences {params.subsample_max_sequences} \
-            --query '({params.min_coverage}) & missing_data<1000'
+            --query '({params.min_coverage}) & missing_data<{params.missing_data_threshold}'


non-blocking

I was going to suggest combining these two params into a single query param, but maybe not necessary to do here since we'll switch to augur subsample ~soon.

joverlee521 · 2025-10-06T18:59:15Z

Snakefile

+    build_name=r"genome|G|F",
+    resolution=r"all-time|6y|3y",


I was going to suggest these should be dynamic

Suggested change

build_name=r"genome|G|F",

resolution=r"all-time|6y|3y",

build_name="|".join(config["builds_to_run"]),

resolution="|".join(config["resolutions_to_run"]),

However I then realized that these already hardcoded due to the values in the config/distance_maps.tsv. Probably good to add a note here and add to config validation in the future.

victorlin added 4 commits September 22, 2025 14:59

Write config to a file

c47a680

Copied from nextstrain/WNV and removed the config validation part. https://github.com/nextstrain/WNV/blob/0c32f4d71f585c1924fa0c360e67643594795ace/phylogenetic/rules/config.smk

Remove unused input

1b5a278

augur filter has never taken a reference file as input, so my guess is this was originally copied by mistake.

Decouple example data grouping columns from config

e6ebe66

Preparing to make changes to the config that would have broken this. It seems fine to hardcode along with the already hardcoded sample size.

Move all filter parameters to config

9c7c6e2

This will make it easier to compare changes in the switch to augur subsample which will define all parameters in config.

victorlin self-assigned this Sep 22, 2025

victorlin added 2 commits September 22, 2025 15:24

Add wildcard constraints for build_name and resolution

01b4ae8

Prevents unintentional file pattern matching by rule inputs/outputs.

Add separate frequencies config

3340439

This shouldn't rely on config from another rule.

victorlin mentioned this pull request Sep 22, 2025

Use augur subsample #103

Draft

3 tasks

joverlee521 approved these changes Oct 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Config improvements #102

Config improvements #102

Uh oh!

victorlin commented Sep 22, 2025 •

edited

Loading

Uh oh!

joverlee521 left a comment

Uh oh!

joverlee521 Oct 6, 2025

Uh oh!

joverlee521 Oct 6, 2025

Uh oh!

joverlee521 Oct 6, 2025

Uh oh!

joverlee521 Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Config improvements #102

Are you sure you want to change the base?

Config improvements #102

Uh oh!

Conversation

victorlin commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of proposed changes

Related issue(s)

Checklist

Uh oh!

joverlee521 left a comment

Choose a reason for hiding this comment

Uh oh!

joverlee521 Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

joverlee521 Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

joverlee521 Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

joverlee521 Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

victorlin commented Sep 22, 2025 •

edited

Loading