Question: modkit dmr pair throws invalid-bedmethyl-data when BedMethyl comes from pileup --partition-tag/--phased (HP-tagged BAM); splitting BAM by HP avoids the crash

Hi modkit authors,

I’m running haplotype-aware DMR analysis and I’m seeing a reproducible crash in modkit dmr pair when the BedMethyl inputs were produced by haplotype-aware pileup (--partition-tag in v0.4.4 or --phased in v0.6.1). I’d like to ask whether this is expected / a known limitation, and what the recommended workflow is.

**My workflow:**
Basecalling model: hac@v5.0.0 with modified base model 5mCG_5hmCG
Reference: GRCh38/hg38 (e.g. GCA_000001405.15_GRCh38_no_alt_analysis_set.fa)
BAM is HP-tagged (e.g. from WhatsHap phasing)
Option A (modkit v0.4.4)
I run pileup with --partition-tag HP to produce haplotype-partitioned BedMethyl:
modkit pileup phased.bam out.bed.gz \
  --ref /path/to/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \
  --base C --threads 16 \
  --partition-tag HP

Option B (modkit v0.6.1)
I run pileup with --phased:
modkit pileup phased.bam out.bed.gz \
  --ref /path/to/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \
  --base C --threads 16 \
  --phased

Then I run dmr pair on the haplotype BedMethyl files:

modkit dmr pair \
  -a hp1.bed.gz \
  -b hp2.bed.gz \
  -o test.dmr \
  --segment test.segment \
  --header \
  --ref /path/to/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \
  --base C \
  --threads 16

Observed error
dmr pair fails with:
"batch failed: invalid data, valid coverage (62) is not equal to the sum of canonical and modified counts (61),
[BedMethylLine { chrom: "chr1", interval: Interval { start: 94017, stop: 94018, ... },
mod_code: Code('m'), strand: Both,
count_methylated: 48, valid_coverage: 62,
count_canonical: 13, count_other: 1, count_fail: 2, count_nocall: 2 }]
Error! invalid-bedmethyl-data"

From the failing line, it looks like:
valid_coverage = 62
count_canonical + count_methylated = 61
but count_other = 1 (so 13 + 48 + 1 = 62)
So it seems dmr pair is validating coverage using only canonical + methylated and ignoring count_other.

**However, if I do not use --partition-tag/--phased in pileup, but instead split the BAM by HP tag first, then run pileup separately on each haplotype BAM, dmr pair completes and does not crash (it just reports regions with fail counts).**
Example:
**split BAM by HP tag**
samtools view -h phased.bam | awk '/^@/ || $0 ~ /HP:i:1/' | samtools view -b -o hap1.bam -
samtools view -h phased.bam | awk '/^@/ || $0 ~ /HP:i:2/' | samtools view -b -o hap2.bam -
samtools index hap1.bam
samtools index hap2.bam

pileup per haplotype (no --partition-tag/--phased)
modkit pileup hap1.bam hap1.bed.gz --ref /path/to/ref.fa --base C --threads 16
modkit pileup hap2.bam hap2.bed.gz --ref /path/to/ref.fa --base C --threads 16

**dmr pair works**
modkit dmr pair -a hap1.bed.gz -b hap2.bed.gz -o test.dmr --segment test.segment --header --ref /path/to/ref.fa --base C --threads 16

**So the Questions below for the authors:**
1:Is dmr pair supposed to accept BedMethyl generated by pileup --partition-tag / pileup --phased? If not, is the recommended approach to always split BAMs by HP first?
2:Is count_other expected when using the 5mCG_5hmCG model (e.g. 5hmC contributing to count_other)? If yes, should dmr pair include count_other in the “modified counts” sum for coverage validation, or should we filter BedMethyl to 5mC-only before running dmr pair?
3:If the correct workflow is to run DMR only for one mod type (5mC), what is the recommended modkit command/flag to ensure pileup outputs only that type (and avoids non-zero count_other)?

Looking forward to your feedback,Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: modkit dmr pair throws invalid-bedmethyl-data when BedMethyl comes from pileup --partition-tag/--phased (HP-tagged BAM); splitting BAM by HP avoids the crash #579

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question: modkit dmr pair throws invalid-bedmethyl-data when BedMethyl comes from pileup --partition-tag/--phased (HP-tagged BAM); splitting BAM by HP avoids the crash #579

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions