diff --git a/README.md b/README.md
index fb20f490..d7d9d094 100644
--- a/README.md
+++ b/README.md
@@ -25,21 +25,21 @@
## Table of Contents
- [Introduction](#overview)
- [Quick Install](#usage)
- - [Using ROADIES Bioconda package](#conda)
- - [Using DockerHub](#dockerhub)
- - [Using Docker locally](#docker)
- - [Using Installation Script](#script)
+ - [Option 1: Install via Bioconda (Recommended)](#conda)
+ - [Option 2: Install via DockerHub](#dockerhub)
+ - [Option 3: Install via Local Docker Build](#docker)
+ - [Option 4: Install via Source Script](#script)
- [Quick Start](#start)
-- [Run ROADIES with your own datasets](#runpipeline)
+- [Running ROADIES on your own data](#runpipeline)
- [Citing ROADIES](#citation)
## Introduction
-Welcome to the official repository of ROADIES, a novel pipeline designed for phylogenetic tree inference of the species directly from their raw genomic assemblies. ROADIES offers a fully automated, easy-to-use, scalable solution, eliminating any manual steps and providing unique flexibility in adjusting the tradeoff between accuracy and runtime.
+Welcome to the official repository of ROADIES, a novel pipeline for inferring phylogenetic species trees directly from raw genomic assemblies. ROADIES offers a fully automated, scalable, and easy-to-use solution, eliminating manual steps and allowing flexible control over the trade-off between accuracy and runtime.
-**For more detailed information on all the features and settings of ROADIES, please refer to our [Wiki](https://turakhialab.github.io/ROADIES/).**
+**For a detailed overview of ROADIES' features and configuration options, please visit our [Wiki](https://turakhialab.github.io/ROADIES/).**
@@ -56,11 +56,11 @@ Welcome to the official repository of ROADIES, a novel pipeline designed for phy
## Quick Install
-### Using ROADIES Bioconda package (recommended)
+Please follow any of the options below to install ROADIES in your system.
-To run ROADIES using Bioconda package, follow these steps:
+### Option 1: Install via Bioconda (Recommended)
-To install and use conda in Ubuntu machine, execute the set of commands below:
+1. Install Conda (if not installed):
```
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
@@ -69,81 +69,79 @@ chmod +x Miniconda3-latest-Linux-x86_64.sh
export PATH="$HOME/miniconda3/bin:$PATH"
source ~/.bashrc
+```
+
+2. Configure Conda channels:
+```
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
```
-After this, try running `conda` in your terminal to check if conda is properly installed. Once it is installed, follow the steps below:
+Verify the installation by running `conda` in your terminal
-1. Create and activate custom conda environment with Python version 3.9, ETE3 and Seaborn.
+3. Create and activate a custom environment:
```
-conda create -n myenv python=3.9 ete3 seaborn
-conda activate myenv
+conda create -n roadies_env python=3.9 ete3 seaborn
+conda activate roadies_env
```
-2. Install ROADIES bioconda package
+4. Install ROADIES:
```
conda install roadies
```
-All files of ROADIES along with dependencies will be found in `/miniconda3/envs/myenv/ROADIES`.
+5. Locate the installed files:
+
+```
+cd $HOME/miniconda3/envs/roadies_env/ROADIES
+
+```
+
+Now you are ready to follow the [Quick Start](#start) section to run the pipeline.
-### Using DockerHub
+### Option 2: Install via DockerHub
-To run ROADIES using DockerHub, follow these steps:
+If you would like to install ROADIES using DockerHub, follow these steps:
-1. Pull the ROADIES Docker image from DockerHub:
+1. Pull the ROADIES image from DockerHub:
```
docker pull ang037/roadies:latest
```
-2. Run the Docker container:
+2. Launch a container:
```
docker run -it ang037/roadies:latest
```
-### Using Docker locally
+Once you are able to access the ROADIES repository, refer to the [Quick Start](#start) to run the pipeline.
+
+### Option 3: Install via Local Docker Build
-First, clone the repository (requires `git` to be installed in the system):
+1. Clone the ROADIES repository:
```
git clone https://github.com/TurakhiaLab/ROADIES.git
cd ROADIES
```
-Then build and run the Docker container:
+2. Build and run the Docker container:
```
docker build -t roadies_image .
docker run -it roadies_image
```
-### Using installation script (requires sudo access)
-
-First clone the repository:
-
-```
-git clone https://github.com/TurakhiaLab/ROADIES.git
-cd ROADIES
-```
-
-Then, execute the installation script:
-
-```
-chmod +x roadies_env.sh
-source roadies_env.sh
-```
+Once you are able to access the ROADIES repository, refer to [Quick Start](#start) instructions to run the pipeline.
-This will install and build all tools and dependencies. Once the setup is complete, it will print `Setup complete` in the terminal and activate the `roadies_env` environment with all Conda packages installed.
+### Option 4: Install via Source Script
-#### Required dependencies
+1. Install the following dependencies (**requires sudo access**):
-To run this script, ensure the following dependencies are installed:
- Java Runtime Environment (Version 1.7 or higher)
- Python (Version 3.9 or higher)
- `wget` and `unzip` commands
@@ -151,7 +149,6 @@ To run this script, ensure the following dependencies are installed:
- cmake (Download here: https://cmake.org/download/)
- Boost library (Download here: https://boostorg.jfrog.io/artifactory/main/release/1.82.0/source/)
- zlib (Download here: http://www.zlib.net/)
-- GLIBC (Version 2.29 or higher)
For Ubuntu, you can install these dependencies with:
@@ -159,62 +156,86 @@ For Ubuntu, you can install these dependencies with:
sudo apt-get install -y wget unzip make g++ python3 python3-pip python3-setuptools git default-jre libgomp1 libboost-all-dev cmake
```
+2. Clone the repository:
+
+```
+git clone https://github.com/TurakhiaLab/ROADIES.git
+cd ROADIES
+```
+
+3. Run the installation script:
+
+```
+chmod +x roadies_env.sh
+source roadies_env.sh
+```
+
+After successful setup (Setup complete message), your environment roadies_env will be activated. Proceed to [Quick Start](#start).
+
**Note:** If you encounter issues with the Boost library, add its path to `$CPLUS_LIBRARY_PATH` and save it in `~/.bashrc`.
## Quick Start
-Once setup is done, you can run the ROADIES pipeline using the provided test dataset. Follow these steps for a 16-core machine:
+After installing using one of the options mentioned in [Quick Install](#usage), you're ready to run ROADIES! To get started:
-1. Go to ROADIES repository directory if not there:
+1. Download the test dataset (11 Drosophila genomes):
```
-cd ROADIES
+mkdir -p test/test_data && cat test/input_genome_links.txt | xargs -I {} sh -c 'wget -O test/test_data/$(basename {}) {}'
```
-2. Create a directory for the test data and download the test datasets (using the following one line command):
+This will save the datasets on a separate `test/test_data` folder within the repository
-```
-mkdir -p test/test_data && cat test/input_genome_links.txt | xargs -I {} sh -c 'wget -O test/test_data/$(basename {}) {}'
-```
-3. Run the pipeline with the following command (from ROADIES directory):
+2. Run the pipeline
-#### NOTE: By default, ROADIES run multiple iterations to get you the most accurate tree. --noconverge is the recommended option if you want to only test the pipeline or if you know optimal gene count to get the accurate tree.
+#### IMPORTANT: ROADIES by default runs multiple iterations for generating highly accurate trees. For quick testing, use `--noconverge` to run a single iteration.
```
-python run_roadies.py --cores 16 (# for actual run)
+python run_roadies.py --cores 16 # Full run (multiple iterations)
```
```
-python run_roadies.py --cores 16 --noconverge (# for test run)
+python run_roadies.py --cores 16 --noconverge # Quick test run (one iteration)
```
-These commands will download the 11 Drosophila genomic datasets (links provided in `test/input_genome_links.txt`) and save them in the `test/test_data` directory. Then it will run ROADIES pipeline for those 11 Drosophila genomes and save the final **UNROOTED** newick tree as `roadies.nwk` in a separate `output_files` folder upon completion. If `--noconverge` flag is not set, ROADIES saves the output of all other iterations in a separate `converge_files` folder.
+3. Output:
+
+ - Final **UNROOTED** newick tree saved as `roadies.nwk` in a separate `output_files` folder.
+ - Intermediate files (if `--noconverge` not used) saved in a separate `converge_files` folder.
-#### NOTE: The final newick tree is unrooted by default. User needs to reroot the tree appropriately on their own. We provide a script saved in `ROADIES/workflow/scripts/reroot.py` which lets you reroot the tree given a reference rooted species tree as input.
+#### NOTE: ROADIES outputs unrooted trees by default. You can reroot trees on your own or use the provided `reroot.py` script in `workflow/scripts/` (given a reference rooted species tree as input).
-## Run ROADIES with your own datasets
+## Running ROADIES on your own data
+
+If you want to run ROADIES with your own datasets, follow these steps:
+
+1. Specify Input Dataset:
+
+- Edit `config.yaml` file (found in the ROADIES directory - `config` folder).
+- Update the `GENOMES` field with paths to your `.fa` or `.fa.gz` genome assemblies. Ensure all input genomic assemblies are in `.fa` or `.fa.gz` format and named according to the species' name (e.g., `Aardvark.fa`).
-To run ROADIES with your own datasets, follow these steps:
+**IMPORTANT**: Each file must contain only one species. If needed, split multi-species files with:
-1. **Specify Input Genomic Dataset**: Update the `config.yaml` file (found in the ROADIES directory - `config` folder) to include the path to your input datasets under the `GENOMES` parameter. Ensure all input genomic assemblies are in `.fa` or `.fa.gz` format and named according to the species' name (e.g., `Aardvark.fa`).
+```
+faSplit byname
+```
-**Note**: Each file should contain the genome assembly of one unique species. If a file contains multiple species, split it into individual genome files (`fasplit` can be used: `faSplit byname `).
+2. Configure Other Parameters:
-2. **Configure Other Parameters**: Adjust other parameters in `config.yaml` as needed. Detailed information on each parameter is available in the [`Usage` section](https://turakhialab.github.io/ROADIES/).
+- Modify other parameters in `config.yaml` as needed.
+- Refer to detailed settings on the [Wiki](https://turakhialab.github.io/ROADIES/).
-3. **Run the Pipeline**: Execute the pipeline with the following command (example for 16 cores):
+3. Run the Pipeline:
```
python run_roadies.py --cores 16
```
-The output species tree (unrooted) in Newick format will be saved as `roadies.nwk` in the `output_files` folder.
-
-4. **Modes of operation**: ROADIES supports multiple modes of operation (`fast`, `balanced`, `accurate`) by controlling the accuracy-runtime tradeoff. Use any one of the following commands to select a mode (`accurate` mode is the default):
+**Modes of operation**: ROADIES supports multiple modes of operation (`fast`, `balanced`, `accurate`) by controlling the accuracy-runtime tradeoff. Use any one of the following commands to select a mode (`accurate` mode is the default):
```
@@ -225,7 +246,9 @@ python run_roadies.py --cores 16 --mode balanced
python run_roadies.py --cores 16 --mode fast
```
-### For troubleshooting and contribution details (also to know the steps of running ROADIES in a multi-node SLURM based cluster), refer to [Wiki](https://turakhialab.github.io/ROADIES/)
+The output species tree (unrooted) in Newick format will be saved as `roadies.nwk` in the `output_files` folder.
+
+### For troubleshooting, contributing, or SLURM cluster usage, refer to [Wiki](https://turakhialab.github.io/ROADIES/)
diff --git a/docs/install.md b/docs/install.md
index 03b5aaf5..11aa8f3a 100644
--- a/docs/install.md
+++ b/docs/install.md
@@ -1,12 +1,10 @@
# Installation Methods
-## Using ROADIES Bioconda package (Recommended)
+Please follow any of the options below to install ROADIES in your system.
-To run ROADIES using Bioconda package, follow these steps:
+## Option 1: Install via Bioconda (Recommended)
-**Note:** You need to have conda installed in your system.
-
-To install and use conda in Ubuntu machine, execute the set of commands below:
+1. Install Conda (if not installed):
```bash
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
@@ -15,92 +13,86 @@ chmod +x Miniconda3-latest-Linux-x86_64.sh
export PATH="$HOME/miniconda3/bin:$PATH"
source ~/.bashrc
+```
+
+2. Configure Conda channels:
+```bash
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
```
-After this, try running `conda` in your terminal to check if conda is properly installed. Once it is installed, follow the steps below:
+Verify the installation by running `conda` in your terminal
-1. Create and activate custom conda environment with Python version 3.9
+3. Create and activate a custom environment:
```bash
-conda create -n myenv python=3.9
-conda activate myenv
+conda create -n roadies_env python=3.9 ete3 seaborn
+conda activate roadies_env
```
-2. Install ROADIES bioconda package
+4. Install ROADIES:
-```
+```bash
conda install roadies
```
-All files of ROADIES along with dependencies will be found in `/miniconda3/envs/myenv/ROADIES`.
+5. Locate the installed files:
+
+```bash
+cd $HOME/miniconda3/envs/roadies_env/ROADIES
+
+```
-## Using DockerHub
+Now you are ready to follow the Quick Start section to run the pipeline.
-To run ROADIES using DockerHub, follow these steps:
+## Option 2: Install via DockerHub
-1. Pull the ROADIES Docker image from DockerHub:
+If you would like to install ROADIES using DockerHub, follow these steps:
+
+1. Pull the ROADIES image from DockerHub:
```bash
docker pull ang037/roadies:latest
```
-2. Run the Docker container:
+2. Launch a container:
```bash
docker run -it ang037/roadies:latest
```
-## Using Docker locally
+Once you are able to access the ROADIES repository, refer to the Quick Start section to run the pipeline.
+
+## Option 3: Install via Local Docker Build
-First, clone the repository (requires `git` to be installed in the system):
+1. Clone the ROADIES repository:
```bash
git clone https://github.com/TurakhiaLab/ROADIES.git
cd ROADIES
```
-Then build and run the Docker container:
+2. Build and run the Docker container:
```bash
docker build -t roadies_image .
docker run -it roadies_image
```
-## Using installation script (requires sudo access)
-
-First clone the repository:
-
-```bash
-git clone https://github.com/TurakhiaLab/ROADIES.git
-cd ROADIES
-```
-
-Then, execute the installation script:
+Once you are able to access the ROADIES repository, refer to Quick Start instructions to run the pipeline.
-```bash
-chmod +x roadies_env.sh
-source roadies_env.sh
-```
-
-This will install and build all tools and dependencies. Once the setup is complete, it will print `Setup complete` in the terminal and activate the `roadies_env` environment with all Conda packages installed.
+## Option 4: Install via Source Script
-!!! Note
- ROADIES is built on [Snakemake (workflow parallelization tool)](https://snakemake.readthedocs.io/en/stable/). It also requires various tools (PASTA, LASTZ, RAxML-NG, MashTree, FastTree, ASTRAL-Pro3) to be installed before performing the analysis. To ease the process, instead of individually installing the tools, we provide `roadies_env.sh` script to automatically download all dependencies into the user system.
+1. Install the following dependencies (**requires sudo access**):
-### Required dependencies
-
-To run this script, ensure the following dependencies are installed:
-- Java Runtime Environment (version 1.7 or higher)
-- Python (version 3 or higher)
+- Java Runtime Environment (Version 1.7 or higher)
+- Python (Version 3.9 or higher)
- `wget` and `unzip` commands
-- GCC (version 11.4 or higher)
+- GCC (Version 11.4 or higher)
- cmake (Download here: https://cmake.org/download/)
- Boost library (Download here: https://boostorg.jfrog.io/artifactory/main/release/1.82.0/source/)
- zlib (Download here: http://www.zlib.net/)
-- GLIBC (Version 2.29 or higher)
For Ubuntu, you can install these dependencies with:
@@ -108,5 +100,20 @@ For Ubuntu, you can install these dependencies with:
sudo apt-get install -y wget unzip make g++ python3 python3-pip python3-setuptools git default-jre libgomp1 libboost-all-dev cmake
```
-!!! Warning
- If you encounter issues with the Boost library, add its path to `$CPLUS_LIBRARY_PATH` and save it in `~/.bashrc`.
+2. Clone the repository:
+
+```bash
+git clone https://github.com/TurakhiaLab/ROADIES.git
+cd ROADIES
+```
+
+3. Run the installation script:
+
+```bash
+chmod +x roadies_env.sh
+source roadies_env.sh
+```
+
+After successful setup (Setup complete message), your environment roadies_env will be activated. Proceed to Quick Start.
+
+**Note:** If you encounter issues with the Boost library, add its path to `$CPLUS_LIBRARY_PATH` and save it in `~/.bashrc`.
\ No newline at end of file
diff --git a/docs/quickstart.md b/docs/quickstart.md
index a0c9cece..7a14a631 100644
--- a/docs/quickstart.md
+++ b/docs/quickstart.md
@@ -1,59 +1,31 @@
-# Quick start (with provided test dataset)
+# Quick start
-Once setup is done, you can run the ROADIES pipeline using the provided test dataset. Follow these steps for a 16-core machine:
+After installing using one of the options mentioned in Quick Install, you're ready to run ROADIES! To get started:
-**Step 1:** Go to ROADIES repository directory if not there:
-
-```bash
-cd ROADIES
-```
-
-**Step 2:** Create a directory for the test data and download the test datasets (using the following one line command):
+1. Download the test dataset (11 Drosophila genomes):
```bash
mkdir -p test/test_data && cat test/input_genome_links.txt | xargs -I {} sh -c 'wget -O test/test_data/$(basename {}) {}'
```
-**Step 3:** Run the pipeline with the following command (from ROADIES directory):
-```bash
-python run_roadies.py --cores 16
-```
-
-Step 2 will download the 11 Drosophila genomic datasets (links provided in `test/input_genome_links.txt`) and save them in the `test/test_data` directory. Step 3 will run ROADIES for those 11 Drosophila genomes and save the final newick tree as `roadies.nwk` in a separate `output_files` folder for the current iteration. The final output files for all iterations will be saved in `converge_files` folder upon completion.
+This will save the datasets on a separate `test/test_data` folder within the repository
-## Running ROADIES with different modes of operation
+2. Run the pipeline
-To run ROADIES in various other modes of operation (fast, balanced, accurate) (description of these modes are mentioned in [Modes of operation](index.md#modes-of-operation) section), try the following commands:
+#### IMPORTANT: ROADIES by default runs multiple iterations for generating highly accurate trees. For quick testing, use `--noconverge` to run a single iteration.
```bash
-python run_roadies.py --cores 16 --mode accurate
+python run_roadies.py --cores 16 # Full run (multiple iterations)
```
-
```bash
-python run_roadies.py --cores 16 --mode balanced
+python run_roadies.py --cores 16 --noconverge # Quick test run (one iteration)
```
-```bash
-python run_roadies.py --cores 16 --mode fast
-```
-!!! Note
- Accurate mode is the default mode of operation. If you don't specify any particular mode using `--mode` argument, default mode will run.
-
-For each modes, the output files for all iterations will be saved in a separate `converge_files` folder. `output_files` will save the results of the last iteration. Species tree for all iterations will be saved in `converge_files` folder with the nomenclature `iteration_.nwk`.
+3. Output:
-## Running ROADIES in non converge mode (single iteration mode)
+ - Final **UNROOTED** newick tree saved as `roadies.nwk` in a separate `output_files` folder.
+ - Intermediate files (if `--noconverge` not used) saved in a separate `converge_files` folder.
-By default, ROADIES will run for multiple iteration until it gets a stable tree at the end (details mentioned in [convergence mechanism](index.md#convergence-mechanism) section). To run ROADIES with non converge mode (only for one iteration), execute the following command (notice the addition of `--noconverge` argument):
-
-```bash
-python run_roadies.py --cores 16 --noconverge
-```
-Try following commands for other modes:
+#### NOTE: ROADIES outputs unrooted trees by default. You can reroot trees on your own or use the provided `reroot.py` script in `workflow/scripts/` (given a reference rooted species tree as input).
-```bash
-python run_roadies.py --cores 16 --mode balanced --noconverge
-```
-```bash
-python run_roadies.py --cores 16 --mode fast --noconverge
-```
\ No newline at end of file
diff --git a/docs/usage.md b/docs/usage.md
index 069b9137..be6bf85e 100644
--- a/docs/usage.md
+++ b/docs/usage.md
@@ -142,7 +142,7 @@ Replace below lines:
]
```
-With below lines (you can change the value of `--jobs` based on your cluster configuration):
+With below lines (you can change the value of `--jobs` and other account details based on your cluster configuration):
```
cmd = [
@@ -162,9 +162,9 @@ With below lines (you can change the value of `--jobs` based on your cluster con
"--cluster",
(
"sbatch "
- "--job-name=ROADIES_run "
- "--partition=vgl_a "
- "--account=jarv_condo_bank "
+ "--job-name=XXX "
+ "--partition=XXX "
+ "--account=XXX "
"--nodes=1 "
"--ntasks-per-node=4 "
"--cpus-per-task=8 "
@@ -172,7 +172,7 @@ With below lines (you can change the value of `--jobs` based on your cluster con
"--mem-per-cpu=11G "
"--output=%x_%j.out "
"--error=%x_%j.err "
- "--mail-user=agupta02@rockefeller.edu "
+ "--mail-user=XXX "
"--mail-type=ALL"
)
]
@@ -182,7 +182,7 @@ After the above changes, save the following lines of code as separate file calle
```
#! /bin/bash
#SBATCH -J ROADIES_XXX
-#SBATCH -p vgl_a
+#SBATCH -p XXX
#SBATCH --account=XXX
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1