Skip to content

DOH-JDJ0303/polycore

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PolyCore

PolyCore is a Python-based tool for core genome analysis in polyploid organisms.
It loads reference and sample FASTA files, collapses identical sequences, filters by genome/core fractions, and produces:

  • Core / Full alignment FASTA files
  • Pairwise distance matrices (wide and long formats)
  • Per-sample summary table
  • Progressive core fraction plot (HTML)

Features

  • Handles haploid and polyploid genomes (auto-detects ploidy if not specified)
  • Soft-core / progressive core fraction calculation
  • Collapsing and re-expansion of identical sequences
  • Distance matrices with efficient chunking (auto memory-aware)
  • Output in CSV, FASTA, and VCF formats
  • Interactive visualization with Plotly

Installation

Option 1 - Use the container (Preferred Method):

docker pull public.ecr.aws/o8h2f0o1/polycore:1.0.0

Option 2 - Clone the repository and install:

git clone https://github.com/WA-DOH/polycore.git
cd polycore
pip install -e .

PolyCore requires Python 3.10+. Dependencies (numpy, screed, psutil, plotly) are installed automatically.


Usage

Run PolyCore from the command line:

polycore \
    --progressive \
    --ref reference.fasta \
    sample1.fasta sample2.fasta

Common options

  • --min-gf : Minimum genome fraction per sample (default: 0.9)
  • --min-cf : Minimum fraction of population required per site (default: 0.95)
  • --ploidy : Force ploidy (otherwise auto-detected)
  • --progressive : Enable soft-core (progressive) calculation
  • --split : Treat each contig in each assembly as a separate sample

For full options:

polycore --help

Outputs

Outputs

  • core.aln : Core alignment (variants only, FASTA)
  • core.full.aln : Full core alignment (FASTA)
  • full.csv : Per-site summary for all passing samples
  • dist_wide.csv : Pairwise distance matrix (wide)
  • dist_long.csv : Pairwise distance matrix (long/tidy)
  • summary.csv : Per-sample statistics
  • core_fraction_plot.html : Interactive visualization of soft-core genome fraction

About

Ploidy-aware core genome analysis

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors