The usefuldata package provides useful datasets for analysis. It currently includes pre-processed mapping datasets for Olink and SomaLogic proteomics platforms, but will include other datasets in the future.
You can install the package from GitHub using the remotes package:
# install.packages("remotes")
remotes::install_github("mattlee821/usefuldata")The package contains the following datasets:
This dataset maps Olink proteins (UniProt IDs) to gene identifiers and genomic positions.
Columns include:
UNIPROT: UniProt ID (Olink identifier)Target: Gene Symbolensembl_gene_id: Ensembl Gene IDhgnc_id/hgnc_symbol: HGNC infoSTART_hg19/END_hg19: Genomic coordinates (hg19)START_hg38/END_hg38: Genomic coordinates (hg38)
This dataset maps SomaLogic aptamers (SeqId/SomaId) to gene identifiers and genomic positions.
Columns include:
SeqId: Sequence IDSomaId: SomaLogic IDUNIPROT: UniProt IDTarget: Gene Symbolensembl_gene_id: Ensembl Gene IDSTART_hg19/END_hg19: Genomic coordinates (hg19)START_hg38/END_hg38: Genomic coordinates (hg38)
You can load the datasets directly into your R session:
library(usefuldata)
# Load Olink mapping
head(mapping_GRCh38_p14_olink)
# Load SomaLogic mapping
head(mapping_GRCh38_p14_somalogic)