TAFFISH wrapper for Merfin, a k-mer based tool for improved variant filtering, assembly evaluation, and polishing through k-mer validation.
This repository packages upstream Merfin 1.1 as a TAFFISH tool app. The image
builds Merfin from the upstream v1.1 source tag and vendors the exact meryl
submodule commits recorded by that tag, so the container exposes both merfin
and meryl through the versioned taf-merfin wrapper.
Install from the public TAFFISH Hub index:
taf update
taf install merfinInstall the exact release:
taf install merfin 1.1-r2For local testing before the app is published to the public index:
taf install --from .Show TAFFISH app help:
taf-merfin --helpShow the TAFFISH package version:
taf-merfin --versionShow upstream Merfin usage:
taf-merfin merfin
taf-merfin --Show upstream build version strings:
taf-merfin merfin --version
taf-merfin meryl --versionShow meryl help:
taf-merfin meryl -hBuild a tiny read k-mer database with meryl:
taf-merfin meryl count k=21 reads.fa output reads.merylRun common Merfin assembly evaluation modes:
taf-merfin merfin \
-hist \
-sequence assembly.fa \
-readmers reads.meryl \
-peak 30 \
-output assembly.kstar.hist \
-threads 8taf-merfin merfin \
-dump \
-sequence assembly.fa \
-readmers reads.meryl \
-peak 30 \
-output assembly.kstar.dump \
-threads 8taf-merfin merfin \
-completeness \
-sequence assembly.fa \
-readmers reads.meryl \
-peak 30Run a variant polishing screen:
taf-merfin merfin \
-polish \
-sequence assembly.fa \
-readmers reads.meryl \
-peak 30 \
-vcf calls.vcf \
-output merfin-polish \
-threads 8Option-leading calls can also use the default command with --:
taf-merfin -- -hist -sequence assembly.fa -readmers reads.meryl -peak 30 -output assembly.kstar.histBecause this is a command-mode TAFFISH tool, the first non-option argument is treated as a command available inside the container image. For normal Merfin workflows, the clearest form is:
taf-merfin merfin -hist ...
taf-merfin meryl count ...Merfin modes included in this package:
-filter Filter variants by missing k-mers
-polish Score variants for polishing with k*
-loose Polishing mode without k*, least conservative
-strict Polishing mode without k*, most conservative
-better Legacy polishing mode without k*
-hist Generate a 0-centered k* histogram
-dump Dump readK, asmK, and k* per base
-completeness Compute expected-copy-number k-mer completeness
Common inputs:
-sequence PATH Assembly or consensus FASTA/FASTQ
-readmers DB Meryl database built from reads
-seqmers DB Optional pre-built meryl database for the sequence
-peak N Haploid peak estimate, required outside -filter mode
-prob PATH Optional lookup table of fitted copy probabilities
-vcf PATH Variant calls for variant filtering or polishing modes
-output PREFIX Output path or prefix, depending on mode
-threads N Number of worker threads
-memory N Memory limit in GB for k-mer lookup tables
name: merfin
command: taf-merfin
version: 1.1-r2
kind: tool
image: ghcr.io/taffish/merfin:1.1-r2
The container image is built from docker/Dockerfile. It starts from
debian:12-slim, downloads and verifies:
Merfin v1.1 source archive
marbl/meryl commit d8f336f791b92c2a265e70bba9f995621b67d195
marbl/meryl-utility commit 4f84efbb0c907b14f5381fca2356bb11a0a12f53
It compiles Merfin and meryl from source, then installs:
merfin
meryl
gzip, bzip2, xz for compressed FASTA/FASTQ/VCF input and output
upstream Merfin README, LICENSE, and scripts under /usr/local/share/doc/merfin
meryl and meryl-utility README.licenses files for bundled source attribution
The upstream release archive does not include .git metadata. Upstream's
version-generation script otherwise falls back to the stale string
merfin snapshot (v1.0) during tarball builds. This 1.1-r2 package patches
that fallback at build time so merfin --version and the vendored
meryl --version report the packaged Merfin release as merfin 1.1. The
bundled meryl provenance remains the upstream Merfin v1.1 submodule commit
listed above, not a standalone Meryl release version.
The image is built and validated for:
linux/amd64
linux/arm64
The image intentionally does not bundle GenomeScope 2.0, Merqury, bcftools, tabix, wigToBigWig, R plotting dependencies, aligners, or variant callers. Those are separate workflow steps and should be composed with other TAFFISH apps or local tools when needed.
The TAFFISH smoke metadata checks:
exist: merfin, meryl, gzip, bzip2, xz
test: packaged Merfin version file reports 1.1
test: packaged meryl submodule commit matches the upstream v1.1 gitlink
test: merfin --version and vendored meryl --version report merfin 1.1
test: Merfin usage exposes v1.1 polishing modes such as -loose and -strict
test: meryl help is available
test: merfin links against libgomp for OpenMP runtime support
test: a tiny offline FASTA can be counted with meryl and analyzed by
merfin -hist, producing a non-empty histogram and Merfin QV* summary
This smoke test validates the container, command wiring, meryl database creation, and a minimal Merfin histogram path. It does not validate biological polishing accuracy, large-genome memory behavior, fitted probability tables, or variant filtering on real VCF inputs.
- Project: Merfin
- Source: https://github.com/arangrhie/merfin
- Release: https://github.com/arangrhie/merfin/releases/tag/v1.1
- Wiki: https://github.com/arangrhie/merfin/wiki/Best-practices-for-Merfin
- License: Apache-2.0
- Citation: Formenti et al. 2022, doi:10.1038/s41592-022-01445-y, PMID:35361932
The repository wrapper files are licensed under Apache-2.0. Upstream Merfin, meryl, meryl-utility, bundled source files, and related scripts are distributed under their own upstream terms.