|
1 | 1 | --- |
2 | | -title: "More bioinformatic tools" |
3 | | -teaching: |
4 | | -exercises: |
5 | | -questions: |
6 | | -- dsfgdf |
| 2 | +title: "Another bioinformatic tool: PyANI" |
7 | 3 | objectives: |
8 | | -- dfgdf |
9 | | -keypoints: |
10 | | -- dfgdf |
| 4 | +- To calculated average nucleotide identity of a genome with other related genomes. |
11 | 5 |
|
12 | 6 | --- |
13 | 7 |
|
14 | | -WIP |
| 8 | +In the regular lessons, we implemented three bioinformatic tools: |
| 9 | +`blast` for homology search, |
| 10 | +`maaft` for sequence alignment, and |
| 11 | +`raxml` for phylogenetic analysis. |
| 12 | + |
| 13 | +In this section, we will discuss two more tools. |
| 14 | + |
| 15 | +## PyANI |
| 16 | +PyANI is an open-source python-based tool for calculating |
| 17 | +Average Nucleotide Identity (ANI) between two or more sequences. |
| 18 | +When comparing two genomes, first syntenic regions are identified |
| 19 | +using tools such as `mummer` or `blast`. |
| 20 | +Then the nucleotide identity is calculated in the syntenic regions. |
| 21 | + |
| 22 | +The source code for **PyANI** is available at |
| 23 | +[widdowquinn/pyani](https://github.com/widdowquinn/pyani){: target="_blank"}. |
| 24 | +The documentation for basic usage is available |
| 25 | +[here](https://github.com/widdowquinn/pyani/blob/master/README_v_0_2_x.md){: target="_blank"}. |
| 26 | + |
| 27 | +**PyANI** v2 is available in Hipergator, but has to be loaded. |
| 28 | +The dependencies, `mummer` and `blast+` will be loaded together with `pyani`. |
| 29 | + |
| 30 | +~~~ |
| 31 | +$ ml pyani |
| 32 | +~~~ |
| 33 | +{: .language-bash} |
| 34 | + |
| 35 | +~~~ |
| 36 | +Lmod is automatically replacing "python/3.8" with "pyani/0.2.10". |
| 37 | +~~~ |
| 38 | +{: .output} |
| 39 | + |
| 40 | +We will be using the genomes present in `files/ani` for computing ANI. |
| 41 | +The file `UXhortspp.fasta` contains genome of a unknown *X. hortorum* species. |
| 42 | +The other sequences are genome of some *X. hortorum* pathovars |
| 43 | +downloaded from NCBI. |
| 44 | + |
| 45 | +> ## Getting genome sequences from NCBI |
| 46 | +> PyANI has a script called `genbank_get_genomes_by_taxon.py` to download |
| 47 | +> all genomes for a taxon from NCBI. |
| 48 | +> For usage, check the documentation linked above. |
| 49 | +{: .tips} |
| 50 | + |
| 51 | +The objective now is to perform pairwise comparisons of all reference genomes |
| 52 | +and calculate ANI. This can be performed with following command. |
| 53 | + |
| 54 | +~~~ |
| 55 | +average_nucleotide_identity.py -i files/ani -o ani -m ANIm -g --gformat png,pdf |
| 56 | +~~~ |
| 57 | +{: .language-bash} |
| 58 | + |
| 59 | +> - `average_nucleotide_identity.py` is the name of the script |
| 60 | +> - `-i` is used to specify directory containing input genomes/sequences. |
| 61 | +> - `-o` is used to specify output directory. |
| 62 | +> Note that the program will exit if this directory preexists. |
| 63 | +> - `-m` is used to specify mode for alignment of syntenic region. |
| 64 | +> `ANIm` specifies `mummer` and `ANIb` specifies `blast+`. |
| 65 | +> - `-g` is used to generate graphic output, i.e., heatmap. |
| 66 | +> `--gformat` specifies the graphic output formats. |
| 67 | +{: .notes} |
| 68 | + |
| 69 | +~~~ |
| 70 | +$ ls ani |
| 71 | +~~~ |
| 72 | +{: .language-bash} |
| 73 | + |
| 74 | +~~~ |
| 75 | +ANIm_alignment_coverage.pdf ANIm_hadamard.pdf ANIm_similarity_errors.pdf |
| 76 | +ANIm_alignment_coverage.png ANIm_hadamard.png ANIm_similarity_errors.png |
| 77 | +ANIm_alignment_coverage.tab ANIm_hadamard.tab ANIm_similarity_errors.tab |
| 78 | +ANIm_alignment_lengths.pdf ANIm_percentage_identity.pdf nucmer_output.tar.gz |
| 79 | +ANIm_alignment_lengths.png ANIm_percentage_identity.png |
| 80 | +ANIm_alignment_lengths.tab ANIm_percentage_identity.tab |
| 81 | +~~~ |
| 82 | +{: .output} |
| 83 | + |
| 84 | +You can now transfer `ANIm_percentage_identity.png` |
| 85 | +to your computer to view the heatmap. |
| 86 | +For numeric values, you can use `ANIm_percentage_identity.tab` table. |
| 87 | + |
| 88 | +<img src="/fig/ANIm_percentage_identity.png" height="500px"> |
| 89 | + |
| 90 | +Based on ANI, the unknown strain seems to be *X. hortorum pv. gardneri*. |
0 commit comments