Skip to content

Gu-Lab-RBL-NCI/oligo-tail-miRNA

Repository files navigation

miRNA trimming and oligo-tailing analysis

Description of the methods and R scripts used to analyze miRNA isoforms trimming and tailing (as well as nucleotide composition of non-templated tails).

Dataset availability

  • GEO accession GSE139567
    • Currently only available under reviewer's token (10/29/2019)
  • The Cancer Genome Atlas (TCGA) miRNA-seq datasets used used to analyze the impact of tumor mutations can be retrieved under dbGAP license (phs000178).

System requirements

This code was tested under:

  • MacBook Pro (15-inch, 2016)
  • Processor: 2.7 GHz Intel Core i7
  • Memory: 16 GB 2133 MHz LPDDR3

R and RStudio

Cloud computing tools

All the cloud computing tools can be found in the Cancer Genomics Cloud (CGC).

The Cancer Genomics Cloud (CGC), powered by Seven Bridges, is one of three systems funded by the National Cancer Institute to explore the paradigm of colocalizing massive public datasets, like The Cancer Genomics Atlas (TCGA), alongside secure and scalable computational resources to analyze them.

“The Seven Bridges Cancer Genomics Cloud has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, Contract No. HHSN261201400008C and ID/IQ Agreement No. 17X146 under Contract No. HHSN261201500003I.”

Analysis of miRNA sequencing data and tail composition:

The small RNA sequencing data were analyzed using an in-house pipeline. Briefly, adaptors were removed, reads were mapped using Bowtie and visualized using IGV. More detailed study of the isomiR profile was done using QuagmiR. This software uses a unique algorithm to pull specific reads and aligns them against a consensus sequence in the middle of a miRNA, allowing mismatches on the ends to capture 3’ isomiRs. The reports included tabulated analysis of miRNA expression, length, number of nucleotides trimmed and tail composition at individual read level.

In this manuscript, QuagmiR's parameter "Levenshtein or edit distances" for the 5' and 3' segments were set to 2 and -1 (no restriction), respectively. This particular setting allowed a high stringency on indetifying the miRNA, while leaving the 3' end of the miRNA unrestrained to detect any trimming and/or tailing event.

Customized R scripts were used to calculate percentages of canonical miRNA (defined as the most abundant templated read) and 3’ isomiRs, a well as percentages of tailing and trimming. Long tail composition was calculated by counting the number of non-templated nucleotides present in the tail of each isomiR read. Reads with equal number of non-templated nucleotides in the tail were added together and cumulative distribution was calculated for all the oligo-tailed isomiRs going from ones with longer to shorter tails.

Analysis of isomiR profiles on AGO1 and AGO2 from TCGA:

Tumoral samples from TCGA bearing genomic mutations in either AGO1 or AGO2 leading to missense and synonymous amino acid changes were identified from Genomic Data Commons Data Portal (https://portal.gdc.cancer.gov/, accessed during May 2019). GDC uses combined reports from several variant callers (mutect2, varscan, muse and somaticsniper).

Selected Case ID were: P295L TCGA-53-A4EZ, R315M TCGA-HU-A4G8 and E299K TCGA-Z6-A8JE (AGO2), F310L TCGA-94-7033 (AGO1). The analysis of selected patient samples was also performed using QuagmiR, with a previous conversion of the bam files to fastq files by Picard Sam-to-Fastq, using Amazon cloud instances through the Seven Bridges Genomics implementation of the NCI Cancer Genomics Cloud. Mutations were plotted into the PDB structures of AGO1 and AGO2 using pymol.

Descriptive example of the analysis performed

The examples shown here are just to illustrate the logic implemented in the analysis and calculations used in the R scripts.

miRBase reference

>hsa-miR-7-5p MIMAT0000252 (mature miRNA)
UGGAAGACUAGUGAUUUUGUUGUU

>hsa-mir-7-1 MI0000263 (pri-miRNA paralog 1)
<--mature-miRNA--------><---------templated (genomic reference)---------------------->
UGGAAGACUAGUGAUUUUGUUGUUUUUAGAUAACUAAAUCGACAACAAAUCACAGUCUGCCAUAUGGCACAGGCCAUGCCUCUACAG

>hsa-mir-7-2 MI0000264 (pri-miRNA paralog 2)
<--mature-miRNA--------><---------templated (genomic reference)--------------->
UGGAAGACUAGUGAUUUUGUUGUUGUCUUACUGCGCUCAACAACAAAUCCCAGUCUACCUAAUGGUGCCAGCCAUCGCA

>hsa-mir-7-3 MI0000265 (pri-miRNA paralog 3)
<--mature-miRNA--------><---------templated (genomic reference)---------------->
UGGAAGACUAGUGAUUUUGUUGUUCUGAUGUACUACGACAACAAGUCACAGCCGGCCUCAUAGCGCAGACUCCCUUCGAC

Minimum number of "N" nucleotide in tail

Example long tailed read:
<--templated-----------><--non-templated-->
UGGAAGACUAGUGAUUUUGUUGUUUUUUUUUAAUUUUGUCUUU
........................UUUUUUUAAUUUUGUCUUU

Number of U in tail: 15
Number of A in tail: 2
Number of G in tail: 1
Number of C in tail: 1

Weighted Average of the Minimum number of U in oligo-tail

Example long tailed reads:
<--templated-----------><--non-templated-->  U_in_tail  Counts  Fraction  Weighted_U_in_tail
UGGAAGACUAGUGAUUUUGUUGUU                     
........................UUUUUUUAAUUUUGUCUUU  15         100     0.2       3
........................UUUAUUU              6          100     0.2       1.2
........................UUUUUUU              7          100     0.2       1.4
........................UUU                  3          100     0.2       0.6
........................UU                   2          100     0.2       0.4

Weighted Average of the Minimum number of U in oligo-tail: 6.6

Bioinformatic identification of RNAs with extensive pairing with miRNAs:

The bioinformatic prediction of target RNAs with extensive 3' pairing with miRNAs that could induce the dislocation of the 3' end of the miRNAs from the PAZ domain, and therefore induce trimming-tailing decay was done following this algorithm:

  1. RNAs with a 7mer seed were selected from TargetScan7.2 list of human 3'UTRs.
  2. RNAduplex from the ViennaRNA Package 2.0 was used to calculate the minimum free energy (MFE) of hybridization between each miRNA and target RNA.
  3. MFE for each miRNA-RNA hybrid was plotted against the abundance of the target RNA in HEK293 cells, as previously reported by Yang et al. Mol Cell (2019), data available at GEO:GSE121327.

References:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages