-
Notifications
You must be signed in to change notification settings - Fork 10
Make AWS usage optional #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Makes the workflow more accessible for external users by not requiring AWS credentials or S3 bucket setup to run. Users can opt-in to S3 caching by editing defaults/config.yaml. S3 caching remains enabled for Nextstrain's automated builds. Changes: - Set s3_bucket to empty string in defaults/config.yaml - Add s3_bucket setting to build-configs/nextstrain-automation/config.yaml
63bc62d to
01edf78
Compare
| #### To enable AWS S3 caching: | ||
| 1. Edit `defaults/config.yaml` and set `s3_bucket` to your bucket name: | ||
| ```yaml | ||
| s3_bucket: "your-bucket-name" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in person, so this is brief!
Consider moving from
s3_bucket=nextstrain-data
s3_path="files/workflows/tb/${snippy_output_path}"
to
s3_bucket=nextstrain-data/files/workflows/tb/
s3_path="${snippy_output_path}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for making the S3 path part of the configurable param.
| conda activate tb-profiler | ||
| scripts/run_tbprofiler.sh {wildcards.sample} \ | ||
| {params.s3_bucket} \ | ||
| "{params.s3_bucket}" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the quotes were added to make sure the positional arguments still match even if the s3_bucket is an empty string. I'd be worried about someone tripping over this later...
I'd update the script to move the s3_bucket to be a last optional argument:
sample="$1"
tb_output_path="$2"
fastq_outdir="$3"
tb_outdir="$4"
threads="$5"
s3_bucket="${6:-}"Then you wouldn't need quotes here
scripts/run_tbprofiler.sh {wildcards.sample} \
{params.tb_output_path} \
{params.fastq_outdir} \
{params.tb_outdir} \
{threads} \
{params.s3_bucket} \
Description of proposed changes
This PR makes AWS S3 caching optional rather than required, allowing users to run the TB workflow locally without AWS setup. Previously, the workflow would fail if AWS credentials weren't configured or the S3 bucket wasn't accessible.
Summary:
s3_bucketindefaults/config.yamls3_bucketinbuild-configs/nextstrain-automation/config.yamlKey modifications:
s3_bucketto empty string indefaults/config.yaml(disables S3 by default)s3_bucket: "nextstrain-data"tobuild-configs/nextstrain-automation/config.yaml(preserves S3 caching for Nextstrain automated builds)scripts/run_tbprofiler.shandscripts/run_snippy.shto check if S3 bucket is configured and accessible before attempting S3 operationsThe user will receive these messages when running the workflow under different S3 bucket configurations:
S3 bucket accessible. Will use S3 caching.S3 bucket not specified. Running without S3 caching.Warning: Cannot access s3://test. Running without S3 caching.I tested that the workflow performed as expected by:
augur filterparameters to select just 5 samples) with and without AWS Batch under the three S3 bucket configurations listed above.Related issue(s)
Closes #39
Checklist