Skip to content

A multi-level integration algorithm for sensitive integration, reference-mapping, and cell state identification in single-cell data.

Notifications You must be signed in to change notification settings

elolab/Coralysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Coralysis Coralysis website

đź“– Overview

Coralysis is an R package featuring a multi-level integration algorithm for sensitive integration, reference-mapping, and cell state identification in single-cell data, described in the paper “Coralysis enables sensitive identification of imbalanced cell types and states in single-cell data via multi-level integration”.

Coralysis applications

Coralysis relies on an adapted version of our previously introduced Iterative Clustering Projection (ICP) algorithm (Smolander et al., 2021) to identify shared cell clusters across heterogeneous datasets by leveraging multiple rounds of divisive clustering.

Inspired by the process of assembling a puzzle - where one begins by grouping pieces based on low-to high-level features, such as color and shading, before looking into shape and patterns - this multi-level integration algorithm progressively blends the batch effects while separating cell types across multiple runs of divisive clustering. The trained ICP models can then be used for various purposes, including prediction of cluster identities of related, unannotated single-cell datasets through reference-mapping, and inference of cell states and their differential expression programs using the cell cluster probabilities that represent the likelihood of each cell belonging to each cluster.

While state-of-the-art single-cell integration methods often struggle with imbalanced cell types across heterogeneous datasets, Coralysis effectively differentiates similar yet unshared cell types across batches.

Coralysis flowchart

Coralysis integration flowchart. (A) An input of heterogeneous single-cell datasets are overclustered batch wise into a training set modelled through the Iterative Clustering Projection (ICP) algorithm in order to predict the cell cluster probabilities and obtain an integrated embedding. Adaptations to the original ICP algorithm (Smolander et al., 2021): (B) batch wise cluster assignment at start, dependent on the cell distribution across Principal Component 1 (median as cutoff); (C) training cells selected from batch k nearest neighbours of the cell with the highest probability for every batch per cluster; and, (D) upon ICP clustering convergence, each cluster is further divided into two for the next clustering round, dependent on the batch wise cluster probability distribution (median as cutoff). (E) Multi-level integration is achieved through multiple divisive clustering rounds, blending the batch effect and highlighting the biological signal incrementally. Shapes represent cell types and colours batches.



📦 Installation

Coralysis can be installed from the development version of Bioconductor.

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

# The following initializes usage of Bioc devel
BiocManager::install(version="devel")

BiocManager::install("Coralysis")

Alternatively, the latest version of Coralysis can be installed from GitHub using the devtools R package.

devtools::install_github("elolab/Coralysis")


🛠️ Usage

Coralysis requires as input a SingleCellExperiment object containing the log-normalized single-cell (gene or protein) expression matrix (available in the logcounts assay) and the corresponding batch label identities (stored in the colData of the SingleCellExperiment object).

The output consists of a set of ICP (Iterative Clustering Projection) models along with the associated cell cluster probability matrices. These results are used to compute a Principal Component Analysis (PCA)-based integrated embedding that represents the integration outcome.

The following code snippet highlights the basic function calls to perform integration with Coralysis. See the Vignettes section below for fully reproducible examples.

# Import packages
library("Coralysis")
suppressPackageStartupMessages(library("SingleCellExperiment"))

# Perform multi-level integration
set.seed(123)
sce <- RunParallelDivisiveICP(
  object = sce, # 'SingleCellExperiment' object w/ 'logcounts' & 'colData(sce)'
  batch.label = "batch", # column in 'colData(sce)' w/ batch label identity
  threads = 2 # no. of threads to parallelize ICP runs
)

# Obtain the integrated embedding
set.seed(39)
sce <- RunPCA(object = sce) # stored in 'reducedDims(sce)' (by default named 'PCA')

As an alternative to the Bioconductor ecosystem, the Coralysis integration algorithm can be called directly on Seurat objects after installing the SeuratWrappers R package.

This feature is not yet available in the official repository (satijalab/seurat-wrappers) as our pull request is still under review (see pull request).

In the meantime, users can install the SeuratWrappers package from our repository—elolab/seurat-wrappers (CoralysisIntegration branch).

Below is a minimal reproducible example adapted from the Seurat vignette Introduction to scRNA-seq integration, demonstrating the use of the Coralysis method (CoralysisIntegration).

See the Vignettes section below for additional use cases.

# Install 'SeuratWrappers'
devtools::install_github("elolab/seurat-wrappers@CoralysisIntegration")

# Import packages 
library("Seurat")
library("SeuratData")
library("SeuratWrappers")

# Import single-cell data
InstallData("ifnb")
ifnb <- LoadData("ifnb")
ifnb <- UpdateSeuratObject(ifnb)

# Run basic Seurat workflow
ifnb[["RNA"]] <- split(ifnb[["RNA"]], f = ifnb$stim)
ifnb <- NormalizeData(ifnb)
ifnb <- FindVariableFeatures(ifnb)
ifnb <- ScaleData(ifnb)
ifnb <- RunPCA(ifnb)

# Perform Coralysis integration: 'method = CoralysisIntegration'
set.seed(45)
ifnb <- IntegrateLayers(
  object = ifnb, 
  method = CoralysisIntegration, 
  new.reduction = "integrated.coralysis",
  batch = "stim", 
  threads = 4 # this function accepts any ?Coralysis::RunParallelDivisiveICP specific parameter 
)

# Perform UMAP & clustering on the Coralysis integrated embedding w/ Seurat
ifnb <- RunUMAP(ifnb, reduction = "integrated.coralysis", dims = 1:30)
ifnb <- FindNeighbors(ifnb, reduction = "integrated.coralysis", dims = 1:30)
ifnb <- FindClusters(ifnb)

In the absence of the SeuratWrappers package, users can still interoperate between Coralysis and Seurat by using the Seurat functions as.SingleCellExperiment() and as.Seurat(), which enable conversion between the native Seurat object format (SeuratObject) and the SingleCellExperiment format used by Coralysis, with a few minor adjustments.

The example above is reproduced below without using the SeuratWrappers package.

Only the section for converting between object formats and running Coralysis is highlighted here. Click on Details after the code snippet to view the full minimal reproducible example.

## Import packages 
# It requires 'Coralysis' to be installed
#but it is not required to load it
library("Seurat")

# Convert SeuratObject to SingleCellExperiment
ifnb[["RNA"]] <- JoinLayers(ifnb[["RNA"]])
ifnb.sce <- as.SingleCellExperiment(ifnb)

## Use the same HVG used in Seurat
# Create an alternative experiment (equivalent to 'assays' in Seurat)
seurat.hvg <- VariableFeatures(ifnb)
SingleCellExperiment::altExp(x = ifnb.sce, e = "int") <- ifnb.sce[seurat.hvg,] # creating 'int' assay
ifnb.sce <- SingleCellExperiment::swapAltExp(x = ifnb.sce, # switch to 'int' assay
                                             name = "int", 
                                             withColData = FALSE) 

## Coralysis specific functions
set.seed(129)
ifnb.sce <- Coralysis::RunParallelDivisiveICP(
  object = ifnb.sce, # it took ~5 min.
  batch.label = "stim", 
  threads = 4)
set.seed(75)
ifnb.sce <- Coralysis::RunPCA(object = ifnb.sce, 
                              dimred.name = "integrated.coralysis") # integrated output

# Convert SingleCellExperiment to Seurat & copy integrated embedding to SeuratObject
ifnb.sce <- SingleCellExperiment::swapAltExp(x = ifnb.sce, 
                                             name = "RNA", 
                                             withColData = FALSE) 
SingleCellExperiment::reducedDims(ifnb.sce) <- SingleCellExperiment::reducedDims(SingleCellExperiment::altExp(ifnb.sce))
SingleCellExperiment::altExp(ifnb.sce) <- NULL
ifnb <- as.Seurat(ifnb.sce) 
Details
## Import packages 
# It requires 'Coralysis' to be installed
#but it is not required to load it
library("Seurat")
library("SeuratData")
library("SeuratWrappers")

## Import single-cell data
InstallData("ifnb")
ifnb <- LoadData("ifnb")
ifnb <- UpdateSeuratObject(ifnb)

## Run basic Seurat workflow
ifnb[["RNA"]] <- split(ifnb[["RNA"]], f = ifnb$stim)
ifnb <- NormalizeData(ifnb)
ifnb <- FindVariableFeatures(ifnb)
ifnb <- ScaleData(ifnb)
ifnb <- RunPCA(ifnb)

#-----------------------------------------------------------------------------------------------#
#
## Convert between SeuratObject-SingleCellExperiment-SeuratObject; 
## perform Coralysis integration & embedding 

# Convert SeuratObject to SingleCellExperiment
ifnb[["RNA"]] <- JoinLayers(ifnb[["RNA"]])
ifnb.sce <- as.SingleCellExperiment(ifnb)

## Use the same HVG used in Seurat
# Create an alternative experiment (equivalent to 'assays' in Seurat)
seurat.hvg <- VariableFeatures(ifnb)
SingleCellExperiment::altExp(x = ifnb.sce, e = "int") <- ifnb.sce[seurat.hvg,] # creating 'int' assay
ifnb.sce <- SingleCellExperiment::swapAltExp(x = ifnb.sce, # switch to 'int' assay
                                             name = "int", 
                                             withColData = FALSE) 

## Coralysis specific functions
set.seed(129)
ifnb.sce <- Coralysis::RunParallelDivisiveICP(
  object = ifnb.sce, # it took ~5 min.
  batch.label = "stim", 
  threads = 4)
set.seed(75)
ifnb.sce <- Coralysis::RunPCA(object = ifnb.sce, 
                              dimred.name = "integrated.coralysis") # integrated output

# Convert SingleCellExperiment to Seurat & copy integrated embedding to SeuratObject
ifnb.sce <- SingleCellExperiment::swapAltExp(x = ifnb.sce, 
                                             name = "RNA", 
                                             withColData = FALSE) 
SingleCellExperiment::reducedDims(ifnb.sce) <- SingleCellExperiment::reducedDims(SingleCellExperiment::altExp(ifnb.sce))
SingleCellExperiment::altExp(ifnb.sce) <- NULL
ifnb <- as.Seurat(ifnb.sce) 
#
#-----------------------------------------------------------------------------------------------#

# Continue w/ Seurat workflow: UMAP & graph-based clustering on the integrated embedding
ifnb <- FindNeighbors(ifnb, reduction = "integrated.coralysis", dims = 1:30)
ifnb <- FindClusters(ifnb, resolution = 1)
ifnb <- RunUMAP(ifnb, dims = 1:30, reduction = "integrated.coralysis", 
                reduction.name = "umap.Coralysis")


đź“‘ Vignettes



âť“ Getting help

Check the reference manual or website.

If you have questions related to Coralysis, please contact us here.



📝 Citation

If you use Coralysis in your work, please cite the following preprint:

AntĂłnio GG Sousa, Johannes Smolander, Sini Junttila, Laura L Elo (2025).
Coralysis enables sensitive identification of imbalanced cell types and states in single-cell data via multi-level integration. bioRxiv. https://doi.org/10.1101/2025.02.07.637023



🎉 Acknowledgements

A special thanks to Paulina FrolovaitÄ— for the beautiful logo design.



🏛️ Funding

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement no.: 955321.


Funding logos



📚 References

  1. Smolander J, Junttila S, Venäläinen MS, Elo LL (2021). “ILoReg: a tool for high-resolution cell population identification from single-cell RNA-seq data”. Bioinformatics, 37(8), 1107-1114, https://doi.org/10.1093/bioinformatics/btaa919.

  2. Sousa AGG, Smolander J, Junttila S, Elo LL (2025). “Coralysis enables sensitive identification of imbalanced cell types and states in single-cell data via multi-level integration”. bioRxiv, https://doi.org/10.1101/2025.02.07.637023.

About

A multi-level integration algorithm for sensitive integration, reference-mapping, and cell state identification in single-cell data.

Topics

Resources

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •  

Languages