-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathHowToConvertRefDB.txt
More file actions
executable file
·28 lines (25 loc) · 1.3 KB
/
HowToConvertRefDB.txt
File metadata and controls
executable file
·28 lines (25 loc) · 1.3 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Converted BLASTDB and TAXDB must place appropreate directory.
From SINTAX format
==================
gzip -dc input.gz | \
perl -npe 'if(/^>/){s/^>.*tax=/>/;s/\;$//;s/([>,])d:/${1}superkingdom:/;s/([>,])k:/${1}kingdom:/;s/([>,])p:/${1}phylum:/;s/([>,])c:/${1}class:/;s/([>,])o:/${1}order:/;s/([>,])f:/${1}family:/;s/([>,])g:/${1}genus:/;s/([>,])s:/${1}species:/}' \
> temporary.fasta
clconvrefdb \
--format="superkingdom,kingdom,phylum,class,order,family,genus,species" \
--separator="," \
temporary.fasta.fasta \
outputprefix
From UNITE format
=================
gzip -dc input.gz | \
perl -npe 'if(/^>/){s/^>[A-Z0-9_.]+\|/>/;s/\|[A-Z0-9]+\.?[A-Z0-9]*$//;s/^>k__/>kingdom:/;s/\;p__/\;phylum:/;s/\;c__/\;class:/;s/\;o__/\;order:/;s/\;f__/\;family:/;s/\;g__/\;genus:/;s/\;s__/\;species:/}' \
> temporary.fasta
clconvrefdb \
--format="kingdom;phylum;class;order;family;genus;species" \
temporary.fasta.fasta \
outputprefix
Recommended databases
=====================
MIDORI Reference https://www.reference-midori.info/ (Download SINTAX_sp/uniq files which is SINTAX format FASTA files)
DAIRYdb https://github.com/marcomeola/DAIRYdb (Download usearch/*.udb file and convert UDB to FASTA using vsearch, then you can get SINTAX format FASTA file)
UNITE https://unite.ut.ee/ (Download Full dataset which is UNITE format FASTA file)