Skip to content

Merge VastDB input mode into rna_maps.py with minus-strand fix#12

Open
Bear-Bee89 wants to merge 5 commits intoulelab:mainfrom
Bear-Bee89:VastDB_input
Open

Merge VastDB input mode into rna_maps.py with minus-strand fix#12
Bear-Bee89 wants to merge 5 commits intoulelab:mainfrom
Bear-Bee89:VastDB_input

Conversation

@Bear-Bee89
Copy link
Contributor

Summary

Merges the standalone rna_maps_vastdb_only.py into rna_maps.py as a
second input mode, so a single script handles both rMATS and VastDB inputs
with a shared downstream pipeline.

What changed

Two input tracks, one shared pipeline

  • rMATS mode (-i): unchanged interface, all original arguments preserved
  • VastDB mode (--vastdb_mode): takes --vastdb_enhanced, --vastdb_silenced,
    --vastdb_control, --vastdb_constitutive ID list files + --vastdb_annotation
  • Both tracks produce a DataFrame with identical columns, which feeds into
    shared BED creation, coverage calculation, plotting, and heatmap generation

Minus-strand fix for rMATS flanking exon coordinates

rMATS labels upstreamES/EE and downstreamES/EE by genomic position (lower
coords = "upstream"), which is inverted for minus-strand genes. The original
script compensated with cross-exon column pairing in get_ss_bed() calls.
This merge instead swaps the columns at load time in load_rmats_data() so
that upstream/downstream always mean transcript direction after loading. Both
modes then use identical same-exon get_ss_bed() calls.

New features

  • --seed flag (default 42) for reproducible control/constitutive subsetting
  • VastDB mode now gets heatmaps, exon length plots, and total-exons-covered
    tables (these were missing from the standalone script)
  • Categorised exon output named by mode: _RMATS_with_categories.tsv vs
    _VastDB_with_categories.tsv

Files changed

  • rna_maps.py — replaced with merged version
  • README.md — rewritten to document both modes
  • rna_maps_vastdb_only.py — superseded (can be removed)

Testing

Tested on KCL CREATE HPC:

  • VastDB track: PTBP1_2_DKD event lists + PTBP1 iCLIP (hg38) —
    ran to completion, outputs match standalone script
  • rMATS track: Gueroussov2015 SE.MATS.JCEC + PTBP1 iCLIP (hg19) —
    ran to completion with correct category counts
    (4448 enhanced, 2256 silenced, 52158 constitutive, 8354 control)

Breaking changes

  • The minus-strand flanking exon swap will change rMATS mode outputs for
    minus-strand genes compared to the original script. This is a bug fix —
    the original cross-exon pairing produced correct results through
    compensating logic, but the new approach is clearer and consistent with
    VastDB mode.
  • rna_maps_vastdb_only.py is no longer needed as a separate script.

Bear-Bee89 and others added 5 commits March 13, 2026 13:32
New standalone script that accepts VastDB EVENT ID lists with pre-assigned categories (enhanced, silenced, control, constitutive) instead of rMATS output, parsing coordinates directly from a VastDB EVENT_INFO annotation file.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ranch

Adds dedicated VastDB section at the top covering inputs, usage, outputs, and a comparison table vs the original rMATS-based script. Original rna_maps.py docs preserved below.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant