Skip to content

03. Offline Setup

Krista Ternus edited this page Mar 11, 2019 · 18 revisions

Offline Setup

This setup is crucial to successfully execute the metagenomics pipeline in an air-gapped system. It is assumed that you have completed the Install directions, including the creation and activation of your metag environment.

Table of Contents

Offline Setup Process

When running the Offline Setup, you may specify which workflow files and dependencies to download, or you may choose to download all of the workflow files and dependencies at once (see the Workflow Setup Options table below).

NOTE: The offline setup command must be executed from the metagenomics/workflows/ directory.

[user@localhost ~]$ source activate metag 

(metag)[user@localhost ~]$ cd metagenomics/workflows

Offline Setup Command

python download_offline_files.py --workflow {workflow_setup_options}

Offline Setup Options

Setup Option Description
test_files Downloads the Shakya subset 10 datasets
read_filtering Copies the adapter file to the data directory, downloads the biocontainers needed for the read filtering workflow, and creates singularity images
assembly Downloads biocontainers needed for the assembly workflow and creates singularity images
comparison Downloads biocontainer needed for the metagenome comparison workflow and creates singularity image
taxonomic_classification Downloads all databases and biocontainers needed for tools within the taxonomic classification workflow and creates singularity images
sourmash Downloads only the sourmash databases and biocontainer; creates the sourmash singularity image
kaiju Downloads only the kaiju database and biocontainer; creates the kaiju singularity image
functional_inference Downloads the databases and biocontainers needed for the functional inference workflow and creates singularity images
all Downloads all files and biocontainers needed for all workflows and creates all singularity images

Once you have successfully finished the setup, you will have the files and images needed to proceed with executing each workflow offline.

Quick Check Before Proceeding

In order to proceed to the Read Filtering workflow in an offline environment, you should have run the workflow setup with either 1) the test_files and the read_filtering flags or 2) the all flag.

IMPORTANT: If you did not download all of the workflow files and dependencies, please keep in mind that the workflows are executed in a specific order (i.e., read filtering -> assembly -> comparison -> taxonomic classification -> functional inference). It is recommended that users run the example dataset through the workflows in that order to learn how everything operates, and the workflows are described in that order throughout subsequent pages of this wiki. The subsequent wiki pages will walk through each workflow in a step by step process using the example dataset and default config files.

Skipping Ahead

If you are ready to skip ahead to run your own samples, it is recommended that you review the Workflow Architecture page to decide if you would like to use the default config or a custom config to process your samples. The Workflow Architecture page also contains an example of a custom config file that you can copy, edit, and save with the name of your specific samples and parameters. One advantage of custom config files is that they can be uniquely named, which may be helpful for organizing and keeping a record of the analyses that you run.

You can skip ahead to processing your own samples by following these steps:

  • Complete the installation and offline setup for v1.1
  • Check to make sure you have all the container images you need in the metagenomics/container_images directory
  • Check to make sure you have the Trimmomatic adapter file and all of the taxonomic and functional databases you need in the metagenomics/workflows/data directory
  • Move your input files to the metagenomics/workflows/data directory
  • Setup your default or custom config file as needed (i.e., change file names and parameters throughout the config to process your sample)
  • Save your updated config file in the metagenomics/workflows/config directory
  • Activate your metag environment
  • Run screen or something similar, since these workflows can run for a while
  • Navigate to the metagenomics/workflows directory
  • Set the singularity bindpath
  • Run a command to execute the snakemake rules

Helpful Commands to Skip Ahead

These commands will activate your metag environment, run screen, navigate to the metagenomics/workflows directory, and set the singularity bindpath:

source activate metag 
screen
cd metagenomics/workflows 
export SINGULARITY_BINDPATH="data:/tmp"

The following command will run all available rules within the metagenomics/workflows/config/default_workflowconfig.settings default config file (i.e., if you directly edited the default config with the name of your sample and parameters, then run this command):

snakemake --use-singularity read_filtering_pretrim_workflow read_filtering_posttrim_workflow read_filtering_multiqc_workflow read_filtering_khmer_interleave_reads_workflow read_filtering_khmer_count_unique_kmers_workflow read_filtering_khmer_subsample_interleaved_reads_workflow read_filtering_khmer_split_interleaved_reads_workflow read_filtering_fastq_to_fasta_workflow assembly_all_workflow assembly_quast_workflow assembly_multiqc_workflow comparison_reads_assembly_workflow taxclass_signatures_workflow taxclass_gather_workflow taxclass_kaijureport_workflow taxclass_kaijureport_filtered_workflow taxclass_kaijureport_filteredclass_workflow taxclass_add_taxonnames_workflow taxclass_kaiju_species_summary_workflow taxclass_visualize_krona_kaijureport_workflow taxclass_visualize_krona_kaijureport_filtered_workflow taxclass_visualize_krona_kaijureport_filteredclass_workflow taxclass_visualize_krona_species_summary_workflow functional_with_srst2_workflow functional_prokka_with_megahit_workflow functional_prokka_with_metaspades_workflow functional_abricate_with_megahit_workflow functional_abricate_with_metaspades_workflow

The following command will run all available rules within a custom config, provided that all of these rules are included in the custom config (i.e., if you created a custom config file and saved it in the metagenomics/workflows/config/ directory with a unique name, then run this command):

snakemake --use-singularity --configfile=config/custom_config_name.json read_filtering_pretrim_workflow read_filtering_posttrim_workflow read_filtering_multiqc_workflow read_filtering_khmer_interleave_reads_workflow read_filtering_khmer_count_unique_kmers_workflow read_filtering_khmer_subsample_interleaved_reads_workflow read_filtering_khmer_split_interleaved_reads_workflow read_filtering_fastq_to_fasta_workflow assembly_all_workflow assembly_quast_workflow assembly_multiqc_workflow comparison_reads_assembly_workflow taxclass_signatures_workflow taxclass_gather_workflow taxclass_kaijureport_workflow taxclass_kaijureport_filtered_workflow taxclass_kaijureport_filteredclass_workflow taxclass_add_taxonnames_workflow taxclass_kaiju_species_summary_workflow taxclass_visualize_krona_kaijureport_workflow taxclass_visualize_krona_kaijureport_filtered_workflow taxclass_visualize_krona_kaijureport_filteredclass_workflow taxclass_visualize_krona_species_summary_workflow functional_with_srst2_workflow functional_prokka_with_megahit_workflow functional_prokka_with_metaspades_workflow functional_abricate_with_megahit_workflow functional_abricate_with_metaspades_workflow

Note that --configfile=config/custom_config_name.json is used to specify the name of the custom config in the above command.

Clone this wiki locally