Skip to content

CNAG-Biomedical-Informatics/cbicall

Repository files navigation

CBIcall

CNAG Biomedical Informatics framework for variant calling

Build version Coverage Status Docker Build Docker Image Size Docker Pulls Documentation Status License: GPL v3


📘 Documentation: https://cnag-biomedical-informatics.github.io/cbicall

🐳 Docker Hub Image: https://hub.docker.com/r/manuelrueda/cbicall/tags

Table of contents

Name

CBIcall: CNAG Biomedical Informatics Framework for Variant Calling on Illumina DNA-seq (germline) NGS Data.

Synopsis

cbicall -p <parameters_file.yaml> -t <n_threads> [options]

Arguments:
  -p|param          Parameters input file (YAML)
  -t|threads        Number of CPUs/Cores/Threads

Options:
  -debug            Debugging level (from 1 to 5; 5 is maximum verbosity)
  -h|help           Brief help message
  -man              Full documentation
  -v                Show version information
  -verbose          Enable verbose output
  -nc|no-color      Do not print colors to STDOUT
  -ne|no-emoji      Do not print emojis to STDOUT

Summary

CBIcall (CNAG Biomedical Informatics framework for variant calling) is a computational framework designed for variant calling analysis using Illumina Next-Generation Sequencing (NGS) data.

How to run CBIcall

CBIcall execution requires:

  • Input Files

    A folder containing Paired-End FASTQ files (e.g., MA00001_exome/MA0000101P_ex/*{R1,R2}*fastq.gz).

    You have a examples/input/ directory with input data that you can use for testing.

  • Parameters File

    A YAML-formatted parameters file controlling pipeline execution.

Below are the parameters that can be customized, along with their default values. Parameters must be separated from their values by whitespace or tabs.

Essential Parameters

mode:            single  
pipeline:        wes          
sample:          undef        
sample_map:      undef
workflow_engine:   bash
gatk_version:      gatk-4.6
genome:            b37
cleanup_bam:       false

Optional Parameters (Currently Unused)

organism:        Homo Sapiens        
technology:      Illumina HiSeq      

CBIcall will create a dedicated project directory (cbicall_*) to store analysis outputs. This design allows multiple independent runs concurrently without modifying original input files.

Below is a detailed description of key parameters:

  • cleanup_bam

    Set it to true to delete 01_bam/*.{bam,bai}.

  • gatk_version

    Supported values: gatk-3.5 or gatk-4.6.

  • genome

    Supported values: b37 (default), hg38 or rsrs (mtDNA).

  • mode

    Two modes are supported: single (default, for individual samples) and cohort (for family/cohort-based analyses).

  • pipeline

    Specifies the analysis pipeline. Currently available options: wes (whole-exome sequencing), wgs (whole-genome sequencing) and mit (mitochondrial DNA analysis). Note: to run cohort analysis, first complete a single analysis for each sample.

  • projectdir

    The prefix for dir name (e.g., 'cancer_sample_001'). Note that it can also contain a path (e.g., foo/cancer_sample_001).

    Note: Such directory will be always created below the sample directory. The script will automatically add an unique identifier to each job.

  • sample

    Path (relative or absolute) to the directory containing FASTQ files for analysis. See the examples directory for detailed guidance.

    Example:

    examples/input/CNAG999_exome/CNAG99901P_ex

  • sample_map (cohort-mode only)

    Path (relative or absolute) to the file containing the sample ids and the paths for the GVCF files

    See example here

  • workflow_engine

    Supported workflow engines: bash or snakemake.

Example Commands

$ bin/cbicall -p param_file.yaml -t 8
$ bin/cbicall -p param_file.yaml -t 4 -verbose
$ bin/cbicall -p param_file.yaml -t 16 > log 2>&1
$ $path_to_cbicall/bin/cbicall -p param_file.yaml -t 8 -debug 5
$ nohup bin/cbicall -p param_file.yaml -t 4 &

Recommended specifications:

CBIcall is optimized for multi-core Linux desktop, workstation, or server environments.

* Works in amd64 and arm64 archs (M-based Macs).
* Ideally a Debian-based distribution (Ubuntu or Mint), but any other (e.g., CentOS, OpenSUSE) should do as well (untested).
* >= 8 GB RAM.
* >= 4 CPU cores (Intel i7 or Xeon preferred).
* >= 250 GB HDD space.
* Python >= 3.8
* Java 8 (install via C<sudo apt install openjdk-8-jdk>).
* Snakemake (install via C<pip3 install -r requirements.txt>).

Citation

CBIcall: a configuration-driven framework for variant calling in large DNA-seq cohorts. Manuscript In preparation.

Author

Written by Manuel Rueda (mrueda). GitHub repository: https://github.com/CNAG-Biomedical-Informatics/cbicall.

Copyright and license

Please see the included LICENSE file for distribution and usage terms.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published