TAFFISH wrapper for DIAMOND, a fast protein sequence aligner for BLAST-like protein search, translated DNA-to-protein search, DIAMOND database creation, DAA archive handling, and protein clustering.
This repository packages upstream DIAMOND release 2.2.1 as the TAFFISH
package 2.2.1-r1.
Install from the public TAFFISH Hub index:
taf update
taf install diamondInstall the exact release:
taf install diamond 2.2.1-r1For local testing before the app is published to the public index:
taf install --from .Show TAFFISH app help:
taf-diamond --helpShow the TAFFISH package version:
taf-diamond --versionShow upstream DIAMOND version:
taf-diamond diamond versionShow upstream command help:
taf-diamond diamond help
taf-diamond diamond blastp
taf-diamond diamond makedbCreate a DIAMOND protein database:
taf-diamond diamond makedb --in proteins.faa -d proteins --threads 8Search protein queries against a DIAMOND database:
taf-diamond diamond blastp \
-d proteins \
-q query.faa \
-o blastp.tsv \
--threads 8 \
--outfmt 6 qseqid sseqid pident length evalue bitscoreSearch translated nucleotide queries against a protein database:
taf-diamond diamond blastx \
-d proteins \
-q transcripts.fa \
-o blastx.tsv \
--threads 8 \
--outfmt 6 qseqid sseqid pident length evalue bitscoreWrite a DIAMOND alignment archive and convert it later:
taf-diamond diamond blastp -d proteins -q query.faa -a matches --threads 8
taf-diamond diamond view -a matches.daa -o matches.tsvFor portable, simple workflows, direct tabular output with --outfmt 6 is the
recommended default. The upstream view command is included, but in local
Docker validation of DIAMOND 2.2.1 on arm64 and amd64 emulation, converting
tiny synthetic DAA files triggered an upstream segmentation fault. The smoke
tests therefore verify the view command surface but do not assert DAA
conversion. If your workflow depends on .daa archives, validate that path on
your target runtime before using it in production.
Inspect or extract sequences from a DIAMOND database:
taf-diamond diamond dbinfo -d proteins.dmnd
taf-diamond diamond getseq -d proteins.dmnd -o proteins.from-db.faaCluster protein sequences:
taf-diamond diamond linclust -d proteins.faa -o clusters.tsv --threads 8 --approx-id 90
taf-diamond diamond cluster -d proteins.faa -o clusters.tsv --threads 8Because this is a command-mode TAFFISH tool, the first non-option argument is
the in-container command. DIAMOND's actual executable is named diamond, so
the clearest form is:
taf-diamond diamond blastp ...
taf-diamond diamond blastx ...
taf-diamond diamond makedb ...
taf-diamond diamond linclust ...Do not use taf-diamond blastp ... as the normal form. In command mode,
TAFFISH will interpret blastp as an executable inside the container, not as a
subcommand of diamond.
The -- separator is mainly useful for option-leading arguments to the
default diamond command:
taf-diamond -- --help
taf-diamond -- --versionFor DIAMOND commands such as blastp, makedb, view, or linclust, use
the explicit taf-diamond diamond <command> ... form.
This README lists common usage patterns, not the full upstream manual. The
TAFFISH wrapper calls upstream diamond directly, so official DIAMOND commands
and options are available as upstream implements them. Use
taf-diamond diamond help, taf-diamond diamond <command>, or the upstream
manual for the complete option list.
name: diamond
command: taf-diamond
version: 2.2.1-r1
kind: tool
image: ghcr.io/taffish/diamond:2.2.1-r1
upstream release: v2.2.1
The container image is built from docker/Dockerfile. It starts from
debian:12-slim, downloads the official upstream v2.2.1 diamond-linux64
binary archive from GitHub, verifies it with a pinned sha256 checksum, and
keeps the upstream binary plus documentation/license files extracted from the
matching source tarball.
The runtime image includes these user-facing commands and runtime tools:
diamond
sh
The official Linux binary used here is dynamically linked against the standard glibc runtime libraries present in Debian. It does not require Python, R, Perl, or external bioinformatics helper executables for the core workflows tested here.
The official diamond-linux64 binary is used as released by upstream. In local
validation, its --compress zstd path reported that the executable was not
compiled with ZStd support, so do not rely on ZStd-compressed DIAMOND output in
this package unless you validate that path on your target runtime.
This package intentionally provides the normal upstream DIAMOND executable and
does not add a custom TAFFISH entrypoint. The upstream command surface includes
listed workflows such as makedb, blastp, blastx, view, merge-daa,
getseq, dbinfo, cluster, linclust, and other upstream-provided
commands.
The image is the official upstream CPU linux64 build. DIAMOND does not
require GPU runtime options for the workflows packaged here. This release is
native linux/amd64 only because upstream does not publish a matching official
Linux arm64 binary for v2.2.1. The TAFFISH wrapper requests
--platform linux/amd64 for Docker and Podman, so Apple Silicon and other
arm64 hosts can still use the package through Docker/Podman amd64 emulation.
Do not interpret that as native arm64 support.
The official binary does not enable the upstream EXTRA development command
set, so development-only commands such as blastn, random-seqs, or
smith-waterman are outside this TAFFISH package. In upstream DIAMOND
2.2.1, some clustering maintenance commands are still listed but report that
they were temporarily removed; this package preserves that upstream behavior
rather than wrapping around it.
The image is built and validated for:
linux/amd64
The TAFFISH metadata declares a Docker smoke check:
exist: diamond, sh
test: diamond version reports 2.2.1
test: top-level help plus blastp, linclust, and view option surfaces are available
test: makedb, dbinfo, and a tiny blastp search produce a tabular hit
test: a tiny blastx translated search produces a tabular hit
test: a tiny linclust clustering run produces a cluster table
During TAFFISH Hub indexing, this smoke metadata verifies that the published image exposes the expected command surface, reports the pinned upstream version, includes the runtime pieces used by common workflows, and can run representative local database, search, translated search, and clustering tasks. It does not download remote databases, assert DAA conversion, or exhaustively validate every DIAMOND option.
Each smoke command is self-contained because the public index runs every
[smoke].test entry in a fresh temporary container. No smoke entry depends on
files created by a previous entry.
source: https://github.com/bbuchfink/diamond
release: https://github.com/bbuchfink/diamond/releases/tag/v2.2.1
license: GPL-3.0-or-later
The TAFFISH app wrapper, Dockerfile, and documentation in this repository are released under Apache-2.0. The bundled upstream DIAMOND software is distributed under the upstream GPL-3.0-or-later license; keep that distinction in mind when redistributing derived images.
Please cite DIAMOND when using this package in scientific work:
Buchfink B, Reuter K, Drost HG. Sensitive protein alignments at tree-of-life
scale using DIAMOND. Nature Methods. 2021. doi:10.1038/s41592-021-01101-x.
PMID: 33828273.
For DIAMOND DeepClust workflows, upstream also lists:
Buchfink B, Xie C, Huson DH. Ultrafast and sensitive protein clustering using
DIAMOND DeepClust. Nature Methods. 2026. doi:10.1038/s41592-026-03030-z.
PMID: 41876643.