Skip to content

Blacklist from genomes#288

Open
suhrig wants to merge 7 commits intonf-core:devfrom
suhrig:blacklist-from-genomes
Open

Blacklist from genomes#288
suhrig wants to merge 7 commits intonf-core:devfrom
suhrig:blacklist-from-genomes

Conversation

@suhrig
Copy link

@suhrig suhrig commented Jun 8, 2025

The two parameters params.gene_bed and params.blacklist are set to null in nextflow.config (see here). Since parameters can only be set once and all subsequent value assignments are ignored, the null values can only be overwritten by the user by means of passing --gene_bed and --blacklist as arguments to the nextflow run command. Usually, these parameters are defined in the igenomes.config file (see here). The values in this config file are effectively ignored, however, because the initialization in nextflow.config takes precedence. These two lines in the file main.nf should load the values from igenomes.config, but they have no effect.

This PR removes the initialization of gene_bed and blacklist from nextflow.config. This way, the parameters are taken from igenomes.config by default or - if specified - from parameters passed as arguments on the command-line.

It looks like the initialization in nextflow.config is not needed for anything really. I believe the only reason why they are there is because they are used in modules.config (here and here). These uses serve no purpose, however. They decide if a bunch of publishDir attributes shall be set or not. But there is no harm in setting these attributes when no blacklist is used. So we might as well remove the uses of the variables, enabling us to remove the initialization from nextflow.config, too.

@suhrig
Copy link
Author

suhrig commented Jun 17, 2025

Hi @chris-cheshire,

Can you kindly advise on what to do about the failed automatic checks? The check that fails (linting) is not caused by a modification that I made. It complains about a missing file, which I did not remove. I tried moving the file to the expected position (from tests/config/nextflow.config to test/nextflow.config), but this only lead to another error about a missing file.

Many thanks in advance for your advice,
Sebastian

@suhrig
Copy link
Author

suhrig commented Jun 22, 2025

I noticed two more problems with the blacklist, which I have fixed in the latest commits.

  1. The coordinates of some (but not all) blacklists had off-by-one/two errors. According to Suppl. File 2 from the source of the blacklists, the coordinates should be 0-based and the end coordinate should be exclusive, as evidenced by the regions for chrM, which start at 0 and are one base higher than the size of chrM. However, in the cutandrun pipeline some of the blacklists seem to use 1-based coordinates, such as the mm10 blacklist. The mm39 blacklist even seems to use 2-based coordinates. I have harmonized the blacklist coordinates to 0-based ones.
  2. The blacklists for Ensembl assemblies (GRCh39, GRCh37, GRCm39, GRCm38) use NCBI identifiers for the contigs, i.e., with chr prefix. They should use identifiers without prefix, however. This behavior is inconsistent with related pipelines, namely the nf-core/chipseq and nf-core/atacseq pipelines. There, the Ensembl blacklists use Ensembl contig names (example from the atacseq pipeline and example from the chipseq pipeline). To make matters consistent between these pipelines, I have changed the contig names in the Ensembl blacklists to omit the chr prefix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants