Skip to content

Conversation

@kimandrews
Copy link
Contributor

Description of proposed changes

This PR makes AWS S3 caching optional rather than required, allowing users to run the TB workflow locally without AWS setup. Previously, the workflow would fail if AWS credentials weren't configured or the S3 bucket wasn't accessible.

Summary:

  • Makes the workflow more accessible for external users by not requiring AWS credentials or S3 bucket setup to run
  • Disables S3 caching by default; users can opt-in by configuring s3_bucket in defaults/config.yaml
  • Nextstrain's automated builds continue to use S3 caching by configuring s3_bucket in build-configs/nextstrain-automation/config.yaml

Key modifications:

  • Set s3_bucket to empty string in defaults/config.yaml (disables S3 by default)
  • Added s3_bucket: "nextstrain-data" to build-configs/nextstrain-automation/config.yaml (preserves S3 caching for Nextstrain automated builds)
  • Updated scripts/run_tbprofiler.sh and scripts/run_snippy.sh to check if S3 bucket is configured and accessible before attempting S3 operations
  • Added documentation to README explaining how to enable S3 caching

The user will receive these messages when running the workflow under different S3 bucket configurations:

  • WITH S3 bucket specified: S3 bucket accessible. Will use S3 caching.
  • WITHOUT an S3 bucket specified: S3 bucket not specified. Running without S3 caching.
  • With an INACCESSIBLE S3 bucket specified (e.g. an inaccessible bucket called "test"): Warning: Cannot access s3://test. Running without S3 caching.

I tested that the workflow performed as expected by:

  • Running the workflow with Github Actions (see resulting output)
  • Running the workflow from my local computer (after temporarily modifying the augur filter parameters to select just 5 samples) with and without AWS Batch under the three S3 bucket configurations listed above.

Related issue(s)

Closes #39

Checklist

  • Checks pass
  • Update changelog

Makes the workflow more accessible for external users by not requiring
AWS credentials or S3 bucket setup to run. Users can opt-in to S3
caching by editing defaults/config.yaml. S3 caching remains enabled
for Nextstrain's automated builds.

Changes:
- Set s3_bucket to empty string in defaults/config.yaml
- Add s3_bucket setting to build-configs/nextstrain-automation/config.yaml
@kimandrews kimandrews force-pushed the make-s3-bucket-optional branch from 63bc62d to 01edf78 Compare December 3, 2025 05:05
#### To enable AWS S3 caching:
1. Edit `defaults/config.yaml` and set `s3_bucket` to your bucket name:
```yaml
s3_bucket: "your-bucket-name"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in person, so this is brief!

Consider moving from

s3_bucket=nextstrain-data
s3_path="files/workflows/tb/${snippy_output_path}"

to

s3_bucket=nextstrain-data/files/workflows/tb/
s3_path="${snippy_output_path}"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for making the S3 path part of the configurable param.

conda activate tb-profiler
scripts/run_tbprofiler.sh {wildcards.sample} \
{params.s3_bucket} \
"{params.s3_bucket}" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the quotes were added to make sure the positional arguments still match even if the s3_bucket is an empty string. I'd be worried about someone tripping over this later...

I'd update the script to move the s3_bucket to be a last optional argument:

sample="$1"
tb_output_path="$2"
fastq_outdir="$3"
tb_outdir="$4"
threads="$5"
s3_bucket="${6:-}"

Then you wouldn't need quotes here

scripts/run_tbprofiler.sh {wildcards.sample} \
{params.tb_output_path} \
{params.fastq_outdir} \
{params.tb_outdir} \
{threads} \
{params.s3_bucket} \

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make AWS credentials and S3 caching optional

4 participants