Skip to content

taffish/merfin

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

taf-merfin

TAFFISH wrapper for Merfin, a k-mer based tool for improved variant filtering, assembly evaluation, and polishing through k-mer validation.

This repository packages upstream Merfin 1.1 as a TAFFISH tool app. The image builds Merfin from the upstream v1.1 source tag and vendors the exact meryl submodule commits recorded by that tag, so the container exposes both merfin and meryl through the versioned taf-merfin wrapper.

Installation

Install from the public TAFFISH Hub index:

taf update
taf install merfin

Install the exact release:

taf install merfin 1.1-r2

For local testing before the app is published to the public index:

taf install --from .

Usage

Show TAFFISH app help:

taf-merfin --help

Show the TAFFISH package version:

taf-merfin --version

Show upstream Merfin usage:

taf-merfin merfin
taf-merfin --

Show upstream build version strings:

taf-merfin merfin --version
taf-merfin meryl --version

Show meryl help:

taf-merfin meryl -h

Build a tiny read k-mer database with meryl:

taf-merfin meryl count k=21 reads.fa output reads.meryl

Run common Merfin assembly evaluation modes:

taf-merfin merfin \
  -hist \
  -sequence assembly.fa \
  -readmers reads.meryl \
  -peak 30 \
  -output assembly.kstar.hist \
  -threads 8
taf-merfin merfin \
  -dump \
  -sequence assembly.fa \
  -readmers reads.meryl \
  -peak 30 \
  -output assembly.kstar.dump \
  -threads 8
taf-merfin merfin \
  -completeness \
  -sequence assembly.fa \
  -readmers reads.meryl \
  -peak 30

Run a variant polishing screen:

taf-merfin merfin \
  -polish \
  -sequence assembly.fa \
  -readmers reads.meryl \
  -peak 30 \
  -vcf calls.vcf \
  -output merfin-polish \
  -threads 8

Option-leading calls can also use the default command with --:

taf-merfin -- -hist -sequence assembly.fa -readmers reads.meryl -peak 30 -output assembly.kstar.hist

Because this is a command-mode TAFFISH tool, the first non-option argument is treated as a command available inside the container image. For normal Merfin workflows, the clearest form is:

taf-merfin merfin -hist ...
taf-merfin meryl count ...

Modes

Merfin modes included in this package:

-filter         Filter variants by missing k-mers
-polish         Score variants for polishing with k*
-loose          Polishing mode without k*, least conservative
-strict         Polishing mode without k*, most conservative
-better         Legacy polishing mode without k*
-hist           Generate a 0-centered k* histogram
-dump           Dump readK, asmK, and k* per base
-completeness   Compute expected-copy-number k-mer completeness

Common inputs:

-sequence PATH   Assembly or consensus FASTA/FASTQ
-readmers DB     Meryl database built from reads
-seqmers DB      Optional pre-built meryl database for the sequence
-peak N          Haploid peak estimate, required outside -filter mode
-prob PATH       Optional lookup table of fitted copy probabilities
-vcf PATH        Variant calls for variant filtering or polishing modes
-output PREFIX   Output path or prefix, depending on mode
-threads N       Number of worker threads
-memory N        Memory limit in GB for k-mer lookup tables

Package

name: merfin
command: taf-merfin
version: 1.1-r2
kind: tool
image: ghcr.io/taffish/merfin:1.1-r2

Container

The container image is built from docker/Dockerfile. It starts from debian:12-slim, downloads and verifies:

Merfin v1.1 source archive
marbl/meryl commit d8f336f791b92c2a265e70bba9f995621b67d195
marbl/meryl-utility commit 4f84efbb0c907b14f5381fca2356bb11a0a12f53

It compiles Merfin and meryl from source, then installs:

merfin
meryl
gzip, bzip2, xz for compressed FASTA/FASTQ/VCF input and output
upstream Merfin README, LICENSE, and scripts under /usr/local/share/doc/merfin
meryl and meryl-utility README.licenses files for bundled source attribution

The upstream release archive does not include .git metadata. Upstream's version-generation script otherwise falls back to the stale string merfin snapshot (v1.0) during tarball builds. This 1.1-r2 package patches that fallback at build time so merfin --version and the vendored meryl --version report the packaged Merfin release as merfin 1.1. The bundled meryl provenance remains the upstream Merfin v1.1 submodule commit listed above, not a standalone Meryl release version.

The image is built and validated for:

linux/amd64
linux/arm64

The image intentionally does not bundle GenomeScope 2.0, Merqury, bcftools, tabix, wigToBigWig, R plotting dependencies, aligners, or variant callers. Those are separate workflow steps and should be composed with other TAFFISH apps or local tools when needed.

Smoke Test

The TAFFISH smoke metadata checks:

exist: merfin, meryl, gzip, bzip2, xz
test:  packaged Merfin version file reports 1.1
test:  packaged meryl submodule commit matches the upstream v1.1 gitlink
test:  merfin --version and vendored meryl --version report merfin 1.1
test:  Merfin usage exposes v1.1 polishing modes such as -loose and -strict
test:  meryl help is available
test:  merfin links against libgomp for OpenMP runtime support
test:  a tiny offline FASTA can be counted with meryl and analyzed by
       merfin -hist, producing a non-empty histogram and Merfin QV* summary

This smoke test validates the container, command wiring, meryl database creation, and a minimal Merfin histogram path. It does not validate biological polishing accuracy, large-genome memory behavior, fitted probability tables, or variant filtering on real VCF inputs.

Upstream

The repository wrapper files are licensed under Apache-2.0. Upstream Merfin, meryl, meryl-utility, bundled source files, and related scripts are distributed under their own upstream terms.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors