Skip to content

AI4S-YB/HiPMCL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HiPMCL

High-throughput PMCL/Pore-C Mapping & Contact Ligation analysis toolkit.

A high-performance Rust pipeline for processing Nanopore Pore-C data — from raw FASTQ alignment to JuiceBox-compatible contact matrices and fragment-level annotation.

Features

  • align — Primary minimap2 alignment from FASTQ (replaces Align.sh). Handles gzipped inputs, generates reads-info FAI, and filters mapped PAF output.
  • hugread — HugRead fragment annotation pipeline. Merges overlapping alignment fragments, detects subread gaps, remaps gap regions, and positions virtual fragments against a reference genome binning.
  • generate — Paired-end contact matrix generation from PAF alignment files. Outputs JuiceBox-compatible contact matrices with optional adjacent/non-adjacent separation.
  • sort — Parallel external merge-sort for large contact matrix files by (chrom1, chrom2, pos1, pos2).

Installation

Requires Rust 1.70+. External dependency: minimap2 must be available on $PATH.

git clone https://github.com/<user>/HiPMCLv1.git
cd HiPMCLv1
cargo build --release

The binary will be at target/release/porecxl.

Usage

porecxl <COMMAND>

Commands:
  align     Run minimap2 primary alignment
  hugread   HugRead Fragment Annotation
  generate  Generate contact matrix from PAF mapping files
  sort      Sort contact matrix by (chrom1, chrom2, pos1, pos2)

align — Primary alignment

porecxl align \
  -f reads.fastq.gz \
  -r reference.fa \
  -o ./output \
  -t 8

Generates Mapping/<sample>.reads_map.paf and <sample>.fai in the output directory.

hugread — Fragment annotation

porecxl hugread \
  <raw.fastq> \
  <reads_map.paf> \
  <workdir> \
  <reference.fa> \
  <genome_vd_fragments.csv> \
  <realign.sh> \
  <chromcheck.sh> \
  <reads.info>

Runs the full fragment annotation pipeline:

  1. Loads and filters primary alignment
  2. Detects tail and internal gaps; exports subread BED
  3. Remaps subreads via minimap2
  4. Merges primary + remapped alignments
  5. Positions virtual fragments against genome binning
  6. Outputs Read_Align_Fragment_RvdF.csv

generate — Contact matrix

porecxl generate \
  -p reads_map.paf \
  -o ./contacts \
  --prefix sample \
  -c 500000 \
  -t 8

Outputs contacts/sample_contact_matrix.txt (or separate Adj_/Nonadj_ with -s).

sort — Sort contact matrix

porecxl sort \
  -i unsorted_contacts.txt \
  -o sorted_contacts.txt \
  -c 536870912 \
  -t 16

Memory-efficient external sort for files larger than available RAM.

Pipeline Overview

FASTQ ──[align]──► reads_map.paf ──[hugread]──► Read_Align_Fragment_RvdF.csv
                           │
                           └──[generate]──► contact_matrix.txt ──[sort]──► sorted_contacts.txt

Dependencies

  • minimap2 (external, must be on PATH)
  • Rust crates: clap, csv, rayon, serde, flate2, anyhow, parking_lot, chrono, tempfile, indexmap

License

MIT

About

The analysis of PMCL data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages