Skip to content

taffish/diamond

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

taf-diamond

TAFFISH wrapper for DIAMOND, a fast protein sequence aligner for BLAST-like protein search, translated DNA-to-protein search, DIAMOND database creation, DAA archive handling, and protein clustering.

This repository packages upstream DIAMOND release 2.2.1 as the TAFFISH package 2.2.1-r1.

Installation

Install from the public TAFFISH Hub index:

taf update
taf install diamond

Install the exact release:

taf install diamond 2.2.1-r1

For local testing before the app is published to the public index:

taf install --from .

Usage

Show TAFFISH app help:

taf-diamond --help

Show the TAFFISH package version:

taf-diamond --version

Show upstream DIAMOND version:

taf-diamond diamond version

Show upstream command help:

taf-diamond diamond help
taf-diamond diamond blastp
taf-diamond diamond makedb

Create a DIAMOND protein database:

taf-diamond diamond makedb --in proteins.faa -d proteins --threads 8

Search protein queries against a DIAMOND database:

taf-diamond diamond blastp \
  -d proteins \
  -q query.faa \
  -o blastp.tsv \
  --threads 8 \
  --outfmt 6 qseqid sseqid pident length evalue bitscore

Search translated nucleotide queries against a protein database:

taf-diamond diamond blastx \
  -d proteins \
  -q transcripts.fa \
  -o blastx.tsv \
  --threads 8 \
  --outfmt 6 qseqid sseqid pident length evalue bitscore

Write a DIAMOND alignment archive and convert it later:

taf-diamond diamond blastp -d proteins -q query.faa -a matches --threads 8
taf-diamond diamond view -a matches.daa -o matches.tsv

For portable, simple workflows, direct tabular output with --outfmt 6 is the recommended default. The upstream view command is included, but in local Docker validation of DIAMOND 2.2.1 on arm64 and amd64 emulation, converting tiny synthetic DAA files triggered an upstream segmentation fault. The smoke tests therefore verify the view command surface but do not assert DAA conversion. If your workflow depends on .daa archives, validate that path on your target runtime before using it in production.

Inspect or extract sequences from a DIAMOND database:

taf-diamond diamond dbinfo -d proteins.dmnd
taf-diamond diamond getseq -d proteins.dmnd -o proteins.from-db.faa

Cluster protein sequences:

taf-diamond diamond linclust -d proteins.faa -o clusters.tsv --threads 8 --approx-id 90
taf-diamond diamond cluster -d proteins.faa -o clusters.tsv --threads 8

Because this is a command-mode TAFFISH tool, the first non-option argument is the in-container command. DIAMOND's actual executable is named diamond, so the clearest form is:

taf-diamond diamond blastp ...
taf-diamond diamond blastx ...
taf-diamond diamond makedb ...
taf-diamond diamond linclust ...

Do not use taf-diamond blastp ... as the normal form. In command mode, TAFFISH will interpret blastp as an executable inside the container, not as a subcommand of diamond.

The -- separator is mainly useful for option-leading arguments to the default diamond command:

taf-diamond -- --help
taf-diamond -- --version

For DIAMOND commands such as blastp, makedb, view, or linclust, use the explicit taf-diamond diamond <command> ... form.

This README lists common usage patterns, not the full upstream manual. The TAFFISH wrapper calls upstream diamond directly, so official DIAMOND commands and options are available as upstream implements them. Use taf-diamond diamond help, taf-diamond diamond <command>, or the upstream manual for the complete option list.

Package

name: diamond
command: taf-diamond
version: 2.2.1-r1
kind: tool
image: ghcr.io/taffish/diamond:2.2.1-r1
upstream release: v2.2.1

Container

The container image is built from docker/Dockerfile. It starts from debian:12-slim, downloads the official upstream v2.2.1 diamond-linux64 binary archive from GitHub, verifies it with a pinned sha256 checksum, and keeps the upstream binary plus documentation/license files extracted from the matching source tarball.

The runtime image includes these user-facing commands and runtime tools:

diamond
sh

The official Linux binary used here is dynamically linked against the standard glibc runtime libraries present in Debian. It does not require Python, R, Perl, or external bioinformatics helper executables for the core workflows tested here.

The official diamond-linux64 binary is used as released by upstream. In local validation, its --compress zstd path reported that the executable was not compiled with ZStd support, so do not rely on ZStd-compressed DIAMOND output in this package unless you validate that path on your target runtime.

This package intentionally provides the normal upstream DIAMOND executable and does not add a custom TAFFISH entrypoint. The upstream command surface includes listed workflows such as makedb, blastp, blastx, view, merge-daa, getseq, dbinfo, cluster, linclust, and other upstream-provided commands.

The image is the official upstream CPU linux64 build. DIAMOND does not require GPU runtime options for the workflows packaged here. This release is native linux/amd64 only because upstream does not publish a matching official Linux arm64 binary for v2.2.1. The TAFFISH wrapper requests --platform linux/amd64 for Docker and Podman, so Apple Silicon and other arm64 hosts can still use the package through Docker/Podman amd64 emulation. Do not interpret that as native arm64 support.

The official binary does not enable the upstream EXTRA development command set, so development-only commands such as blastn, random-seqs, or smith-waterman are outside this TAFFISH package. In upstream DIAMOND 2.2.1, some clustering maintenance commands are still listed but report that they were temporarily removed; this package preserves that upstream behavior rather than wrapping around it.

The image is built and validated for:

linux/amd64

The TAFFISH metadata declares a Docker smoke check:

exist: diamond, sh
test:  diamond version reports 2.2.1
test:  top-level help plus blastp, linclust, and view option surfaces are available
test:  makedb, dbinfo, and a tiny blastp search produce a tabular hit
test:  a tiny blastx translated search produces a tabular hit
test:  a tiny linclust clustering run produces a cluster table

During TAFFISH Hub indexing, this smoke metadata verifies that the published image exposes the expected command surface, reports the pinned upstream version, includes the runtime pieces used by common workflows, and can run representative local database, search, translated search, and clustering tasks. It does not download remote databases, assert DAA conversion, or exhaustively validate every DIAMOND option.

Each smoke command is self-contained because the public index runs every [smoke].test entry in a fresh temporary container. No smoke entry depends on files created by a previous entry.

Upstream

source:  https://github.com/bbuchfink/diamond
release: https://github.com/bbuchfink/diamond/releases/tag/v2.2.1
license: GPL-3.0-or-later

The TAFFISH app wrapper, Dockerfile, and documentation in this repository are released under Apache-2.0. The bundled upstream DIAMOND software is distributed under the upstream GPL-3.0-or-later license; keep that distinction in mind when redistributing derived images.

Please cite DIAMOND when using this package in scientific work:

Buchfink B, Reuter K, Drost HG. Sensitive protein alignments at tree-of-life
scale using DIAMOND. Nature Methods. 2021. doi:10.1038/s41592-021-01101-x.
PMID: 33828273.

For DIAMOND DeepClust workflows, upstream also lists:

Buchfink B, Xie C, Huson DH. Ultrafast and sensitive protein clustering using
DIAMOND DeepClust. Nature Methods. 2026. doi:10.1038/s41592-026-03030-z.
PMID: 41876643.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors