Skip to content

Releases: Ensembl/plant-scripts

20250904

12 Sep 14:39

Choose a tag to compare

Main changes in GET_PANGENES:

06032025: get_pangenes.pl: sort & concat alignment results using tempfile with filenames to sort to avoid "Argument list too long"
24032025: BED matrix produced by _cluster_analysis.pl is 0-based 
25032025: match_cluster.pl was added -i to control sequence identity of matches
25032025: match_cluster.pl was added -F to produce a FASTA file with sequence index that can be exported as gene-based pangenome for mapping, 
25032025: with <global pangenome positions> estimated from reference genome
25032025: updated Makefiles and documentation
08042025: match_cluster.pl TSV output updated, tested with barley
08042025: add pangenome coords example to documentation
14052025: added POCS to troubleshooting to explain small cores
19052025: check_quality.pl does not assume gff files are available
27052025: _cluster_analysis.pl -t now affects pangene set growth simulation

Plus changes to phylogenomics scripts described in #16

Finally, tag format was changed to 1.3 for conda compatibility

20250123

23 Jan 10:34

Choose a tag to compare

This release

  • adapts 04102024 for bioconda
  • adopts ISO date formats for version numbers.

04102024

25 Oct 07:44

Choose a tag to compare

This release ships with get_pangenes.pl version 04102024.

Main changes are:
25092024: added section 'Example 6: estimation of haplotype diversity'
03102024: get_pangenes.pl expects min 95% sequence identity for WGA-based gene alignments, as in GET_HOMOLOGUES-EST, to help avoid diverged tandem copies
04102024: get_pangenes.pl now set MAXDISTNEIGHBORS=2, neighbor genes in a cluster cannot be more than 2 genes away

11012024

08 Feb 16:22

Choose a tag to compare

This release ships with updates to GET_PANGENES: code changes since the publication of the manuscript, involving:

15112023

16 Nov 07:31

Choose a tag to compare

This release ships with updates to:

  • GET_PANGENES: code and documentation changes since the publication of the manuscript, involving improved handling of input GFF files and calculation of overlap coordinates from WGA segments in different strands.

  • REST-based recipes.

pangenes_benchmark

03 Apr 14:40

Choose a tag to compare

Pangene sets of Arabidopsis (ACK), rice, wheat and barley datasets produced while benchmarking get_pangenes as described at https://doi.org/10.1186/s13059-023-03071-z and https://www.biorxiv.org/content/10.1101/2023.01.03.520531v2

The HOWTO* files contain the actual commands required to produce these results with the input FASTA & GFF files (32GB), which should be first be downloaded from DOI

The source code was archived as DOI but has been updated since.

test_rice

04 Jan 20:12

Choose a tag to compare

Toy dataset to test the scripts for pan-gene analysis.

nrTEplants

16 Feb 12:23

Choose a tag to compare

Release 0.3 (Jun2020) the nrTEplants library of plant transposable elements which minimizes overlap with sequence containing protein domains known to be part of NLR genes. This sequence set was computed after combining TREP, SINEbase, REdat, RepetDB, EDTArice, EDTAmaize, SoyBaseTE, TAIR10TE, SunflowerTE, MelonTE, RosaTE and SUNREP and obtaining a non-redundant collection with GET_HOMOLOGUES-EST.

Check the code and documentation at https://github.com/Ensembl/plant_tools/tree/master/bench/repeat_libs

Citation: Contreras-Moreira,B., Filippi,C.V., Naamati,G., Girón,C.G., Allen,J.E. and Flicek,P. (2021) Efficient masking of plant genomes by combining kmer counting and curated repeats Genomics. Plant Genome https://doi.org/10.1002/tpg2.20143

23102020

23 Oct 13:11

Choose a tag to compare

This release was created to obtain a DOI from Zenodo