TAFFISH wrapper for EDTA, the Extensive de novo TE Annotator. EDTA builds high-quality transposable element libraries and can perform whole-genome repeat annotation.
This repository packages EDTA 2.3.0 as a TAFFISH tool app. The image is based
on the BioContainers/Bioconda EDTA environment
quay.io/biocontainers/edta:2.3.0--hdfd78af_0, pinned by digest in the
Dockerfile for reproducibility.
Install from the public TAFFISH Hub index:
taf update
taf install edtaInstall the exact release:
taf install edta 2.3.0-r2For local testing before the app is published to the public index:
taf install --from .Show TAFFISH app help:
taf-edta --helpShow the TAFFISH package version:
taf-edta --versionShow upstream EDTA help:
taf-edta EDTA.pl -h
taf-edta -- -hCheck that the bundled EDTA runtime dependencies are available:
taf-edta EDTA.pl --check_dependencies
taf-edta -- --check_dependenciesRun the main EDTA pipeline:
taf-edta EDTA.pl --genome genome.fa --threads 10Run a fuller official-style annotation command:
taf-edta EDTA.pl \
--genome genome.fa \
--cds genome.cds.fa \
--curatedlib curated.fa \
--exclude exclude.bed \
--overwrite 1 \
--sensitive 1 \
--anno 1 \
--threads 10Option-leading shorthand also works because EDTA.pl is the default command:
taf-edta -- --genome genome.fa --threads 10This is a command-mode TAFFISH tool. The first non-option argument is treated as an executable inside the container, so the clearest form is to name the upstream EDTA command explicitly:
taf-edta EDTA.pl --genome genome.fa --threads 10
taf-edta EDTA_raw.pl --genome genome.fa --type ltr --threads 10
taf-edta panEDTA.sh
taf-edta panEDTA.sh -g genome_list.txt -c cds.fa -t 10Do not assume taf-edta raw ... or taf-edta -- raw ... means
EDTA.pl raw ...; EDTA does not expose a single subcommand-style CLI. Its
public interfaces are separate scripts such as EDTA.pl, EDTA_raw.pl,
EDTA_processK.pl, lib-test.pl, and panEDTA.sh.
Access bundled helper executables directly:
taf-edta RepeatMasker -h
taf-edta RepeatModeler -h
taf-edta BuildDatabase -h
taf-edta LTR_retriever -h
taf-edta TEsorter -h
taf-edta makeblastdb -version
taf-edta samtools --version
taf-edta Rscript --version
taf-edta python3 --versionEDTA is sensitive to paths. The upstream Docker documentation recommends running with all needed inputs in the current working directory and using plain local filenames. In practice, prefer:
cd my-edta-run
taf-edta EDTA.pl --genome genome.fa --cds genome.cds.fa --threads 10Avoid absolute paths and symlink-heavy inputs unless you have tested that EDTA resolves them correctly in the container. This matters because EDTA creates many working directories, symlinks, and intermediate files while coordinating RepeatMasker, RepeatModeler, BLAST+, LTR_retriever, TIR-Learner, and related tools.
EDTA is also I/O intensive. Use a fast local working directory for real genomes, and reserve enough disk space for intermediate files.
Common EDTA outputs include:
genome.fa.mod.EDTA.TElib.fa Final non-redundant TE library
genome.fa.mod.EDTA.intact.gff3 Structurally intact TE annotation
genome.fa.mod.EDTA.TEanno.gff3 Whole-genome TE annotation, with --anno 1
genome.fa.mod.EDTA.TEanno.gtf Whole-genome TE annotation in GTF, with --anno 1
genome.fa.mod.EDTA.TEanno.sum Whole-genome TE annotation summary
genome.fa.mod.MAKER.masked Low-threshold masked genome, with --anno 1
For pan-genome annotation, use panEDTA.sh:
taf-edta panEDTA.sh -g genome_list.txt -c cds.fa -t 10The genome list should use paths accessible from the working directory. For the most predictable TAFFISH/container behavior, keep the listed genomes and CDS files in or under the current working directory.
name: edta
command: taf-edta
version: 2.3.0-r2
kind: tool
image: ghcr.io/taffish/edta:2.3.0-r2
The container image starts from the official BioContainers EDTA image:
quay.io/biocontainers/edta:2.3.0--hdfd78af_0
digest: sha256:6dfb5313b05caf4d6cafa724d6c5a95365e0471adee29005c42a338dfdf358c5
EDTA has a large dependency set. The image intentionally keeps the full BioContainers/Bioconda runtime rather than rebuilding a partial custom environment. It includes EDTA plus the major tools EDTA calls internally:
RepeatMasker, RepeatModeler, BuildDatabase, BLAST+, LTR_retriever,
LTR_FINDER_parallel, LTR_HARVEST_parallel, TIR-Learner, HelitronScanner,
AnnoSINE, TEsorter, GenomeTools, TRF, GRF, CD-HIT, SAMtools, BEDTools,
R, Python, Java, and Perl modules
The image is large, but this is the reliable choice for EDTA. Rebuilding EDTA manually from Debian would still require installing and validating this whole toolchain, and would likely recreate most of the BioContainers image while increasing maintenance risk.
The TAFFISH Dockerfile adds only:
TAFFISH environment metadata
PYTHONNOUSERSITE=1
a corrected panEDTA.sh bash launcher
build-time dependency/help checks
The current release is built for:
linux/amd64
The BioContainers tag is a single linux/amd64 manifest, not a native
multi-architecture image. For Docker and Podman, src/main.taf declares
--platform linux/amd64, so arm64 machines such as Apple Silicon Macs can run
it through normal amd64 emulation:
TAFFISH_CONTAINER_BACKEND=docker \
taf-edta EDTA.pl --check_dependenciesThis does not mean the image contains a native arm64 build. Apptainer compatibility depends on the host and site configuration.
The TAFFISH metadata declares a Docker smoke check:
exist:
EDTA.pl, EDTA_raw.pl, EDTA_processK.pl, lib-test.pl, panEDTA.sh
RepeatMasker, RepeatModeler, BuildDatabase, LTR_retriever
LTR_FINDER_parallel, LTR_HARVEST_parallel, TIR-Learner, HelitronScanner
AnnoSINE_v2, TEsorter, makeblastdb, blastn, blastx, gt, grf-main, trf
mdust, cd-hit-est, samtools, bedtools, Rscript, python3, perl, java
test:
EDTA.pl help is available
EDTA.pl --check_dependencies reports "All passed"
EDTA_raw.pl help is available
EDTA_processK.pl help is available
lib-test.pl help is available
panEDTA.sh usage is available through the corrected bash launcher
The smoke check deliberately does not run a full genome annotation. Upstream's own toy genome test takes minutes and produces many intermediate files, so it is better kept as a manual functional test when needed:
taf-edta EDTA.pl \
--genome genome.fa \
--cds genome.cds.fa \
--curatedlib curated.fa \
--exclude exclude.bed \
--overwrite 1 \
--sensitive 1 \
--anno 1 \
--threads 10- Project: EDTA
- Repository: https://github.com/oushujun/EDTA
- Wiki: https://github.com/oushujun/EDTA/wiki
- Bioconda recipe: https://bioconda.github.io/recipes/edta/README.html
- Container base: https://quay.io/repository/biocontainers/edta
- Upstream license: GPL-3.0-only
- Primary citation: Ou et al. 2019, doi:10.1186/s13059-019-1905-y, PMID:31843001
- panEDTA citation: Ou et al. 2024, doi:10.1101/gr.278131.123, PMID:39251347
Useful checks before publishing:
taf check
taf publish --release --dry-run
docker build --platform linux/amd64 -t ghcr.io/taffish/edta:2.3.0-r2 -f docker/Dockerfile .
docker run --rm --platform linux/amd64 ghcr.io/taffish/edta:2.3.0-r2 EDTA.pl --check_dependencies
docker run --rm --platform linux/amd64 ghcr.io/taffish/edta:2.3.0-r2 EDTA_raw.pl -h
docker run --rm --platform linux/amd64 ghcr.io/taffish/edta:2.3.0-r2 panEDTA.shThe repository wrapper files are licensed under Apache-2.0. Upstream EDTA is GPL-3.0-only, and third-party runtime components are distributed under their own upstream licenses.