Skip to content

Different allele counts for vcf and tree file. #41

@jodyhey

Description

@jodyhey

Did a test run of 100kb of vcf with 98 genomes with -polar 0.9. I scanned the tree file (actually codex did) for 0 and 1 counts at each snp and compared to biallelic vcf, expecting either identical counts or complementary counts. But observed about 3% of sites showing something like 162 1's in the vcf and 153 in the tree file. I asked codex to check into this, focusing specifically at pos 5039, a position with 4 1's in the vcf file and 196 1's in the tree file.

Here is the explanation from codex: " - At chr2L:5039 (relative 233), the tree has two top-level mutations (parent = -1):
- root node 2955: derived 1
- descendant node 1943: derived 0

  • With both marked top-level, tskit applies them in table order; the root 1 overwrites everything,
    yielding all 1’s in genotypes.
  • The 0 on node 1943 is a back-mutation and must be a child of the root 1 mutation (its parent should
    be the 1-mutation’s ID). Then tskit would produce 192 ones and 4 zeros, matching the VCF. "

So it looks like I can still work with this, but though it worth mentioning.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions