Pipeline for calling Single Nucleotide Polymorphisms (SNPs). The pipeline is written in bash.
- Align FASTQ reads to a reference genome to create an alignment file - Mapping step
- Processing the alignment file (file format conversion, sorting, alignment improvement) - Improvement step
- Calling the variants - Variant Calling step
- bwa for the alignment
- samtools/HTS package for processing and calling variants
- GATK for improving the alignment. You must use GATK v3.7.0, available on the Archived version page
- -a Input reads file – pair 1
- -b Input reads file – pair 2
- -r Reference genome file
- -e Perform read re-alignment
- -o Output VCF file name
- -f Mills file location
- -z Output VCF file should be gunzipped (*.vcf.gz)
- -v Verbose mode; print each instruction/command to tell the user what your script is doing right now
- -i Index your output BAM file (using samtools index)
- -h Print usage information (how to run your script and the arguments it takes in) and exit
- Input reads file - pair1
- Input reads file - pair2
- Reference genome file
- Mills file
./snp_pipeline.bash -a <input reads file -pair1> -b <input reads file -pair2> -r -f -o
VCF File