If using the software in a publication, please cite the following:
Yanqiang Li et.al 2′-O-methylation at internal sites on mRNA promotes mRNA stability. Molecular Cell (2024) https://www.cell.com/molecular-cell/abstract/S1097-2765(24)00326-5
Please download the necessery dataset such as example_data.zip of test fast5 for human rRNA and mRNA dataset, and feature files of yeast and fly in Yeast_Fly_feature_dataset.zip from 10.5281/zenodo.14632831 (wget command also supported as below). The description of these folders is as below:
./genome: The folder used to store genome and annotation files for the Nm detection in mRNA.
./raw_data: The folder used to store the raw data of fast5 files downloaded for the processing.
./example_data: The toy fast5 files of human mRNA and rRNA used for the star protocol.
./rRNA: The rRNA information for human, yeast and fly.
./scripts: Custom scripts for star protocols from https://github.com/kaifuchenlab/STAR_Protocol/.
./Yeast_Fly_feature_dataset: The processed feature dataset for yeast and fly.
Nanopore Raw Datasets were also listed as below (These dataset are large and optional for STAR Protocol):
wget https://zenodo.org/records/14632831/files/rRNA.zip?download=1
wget https://zenodo.org/records/14632831/files/example_data.zip?download=1
wget https://zenodo.org/records/14632831/files/Yeast_Fly_feature_dataset.zip?download=1
cd raw_data
wget https://sra-pub-src-1.s3.amazonaws.com/SRZ190740/HS_rRNA_dRNA_fast5.tar.gz.1
Human IVT :
wget https://sra-pub-src-1.s3.amazonaws.com/SRZ190744/IVT_Hs_mRNA_fast5.tar.gz.1
WT:
wget https://sra-pub-src-1.s3.amazonaws.com/SRZ190768/SC_rRNA_dRNA_fast5.tar.gz.1
snR60KO:
wget https://sra-pub-src-2.s3.amazonaws.com/ERR5317712/RNA475632.fast5.tar.gz.1
sno61KO:
wget https://sra-pub-src-1.s3.amazonaws.com/ERR5317713/RNA475633.fast5.tar.gz.1
sno61KO:
wget https://sra-pub-src-2.s3.amazonaws.com/ERR5317714/RNA475634.fast5.tar.gz.1
wget ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR606/ERR6066014/FBL_KD_rep1.tar.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR606/ERR6065782/NT_rep1.tar.gz
wget https://sra-pub-src-1.s3.amazonaws.com/SRR20374459/nanopore_siCTRL_rep1.tar.gz.2
wget https://sra-pub-src-1.s3.amazonaws.com/SRR20374458/nanopore_siCTRL_rep2.tar.gz.2
wget https://sra-pub-src-1.s3.amazonaws.com/SRR20374457/nanopore_siFBL_rep1.tar.gz.2
wget https://sra-pub-src-1.s3.amazonaws.com/SRR20374457/nanopore_siFBL_rep1.tar.gz.2
-
Begik, O., Lucas, M.C., Pryszcz, L.P., Ramirez, J.M., Medina, R., Milenkovic, I., Cruciani, S., Liu, H., Vieira, H.G.S., Sas-Chen, A., et al. (2021). Quantitative profiling of pseudouridylation dynamics in native RNAs with nanopore sequencing. Nat Biotechnol 39, 1278-1291. 10.1038/s41587-021-00915-6
-
Sklias, A., Cruciani, S., Marchand, V., Spagnuolo, M., Lavergne, G., Bourguignon, V., Brambilla, A., Dreos, R., Marygold, S.J., Novoa, E.M., et al. (2024). Comprehensive map of ribosomal 2'-O-methylation and C/D box snoRNAs in Drosophila melanogaster. Nucleic Acids Res. 10.1093/nar/gkae139
-
Jenjaroenpun, P., Wongsurawat, T., Wadley, T.D., Wassenaar, T.M., Liu, J., Dai, Q., Wanchai, V., Akel, N.S., Jamshidi-Parsian, A., Franco, A.T., et al. (2021). Decoding the epitranscriptional landscape from native RNA sequences. Nucleic Acids Res 49, e7. 10.1093/nar/gkaa620
Flowchart of a machine learning model to detect Nm based on Nanopore direct RNA-seq.
conda env create -f Nanopore.environment.yml #install conda environment
conda install -c bioconda ont-fast5-api #install ont-fast5 for multi_to_single_fast5
install guppy_basecaller #download software from Nanopore community.
conda install -c bioconda ont-tombo #install tombo
multi_to_single_fast5 -i ${id}.fast5 -s $id --recursive -t 40
guppy_basecaller --input_path $id --save_path $id_guppy --num_callers 40 --recursive --fast5_out --config rna_r9.4.1_70bps_hac.cfg --cpu_threads_per_caller 10
tombo resquiggle --rna --overwrite ${id}\_guppy/workspace/ ../rRNA/human_uniq.rRNA.fa --processes 40 --fit-global-scale --include-event-stdev
find ${id}\_guppy/workspace/ -name "*.fast5" >${id}_guppy.list
python extract_raw_and_feature_fast_AUCG.py --cpu=30 --fl = ${id}_guppy.list -o ${id}_guppy.feature --clip=5
minimap2 -ax map-ont -uf -k14 -x splice -t 20 ../rRNA/human_uniq.rRNA.fa ${id}_guppy.feature.feature.fa|samtools view -@ 20 -bS - |samtools sort -@ 20 - >${id}.rRNA.sort.bam
sam2tsv -r ../rRNA/human_uniq.rRNA.fa ${id}.rRNA.sort.bam >${id}.rRNA.sort.bam.tsv
python filter_get.fast5.py -i ${id}.rRNA.sort.bam.tsv -b ../rRNA/rRNA.Nm1.bed -f ${id}_guppy.feature.feature.tsv -o ${id}.fast5.rRNA.signal.txt >${id}.rRNA.feature.anno.txt
cat kmer.txt|xargs -i -e echo "python train_model_scale_pos_weight_Nm.py {} >>Auc.scale1.txt & " |sh
#downloaded gencode.v27.transcripts.fa from https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_27/gencode.v27.transcripts.fa.gz to the genome folder
cd example_data
multi_to_single_fast5 -i ${id}.fast5 -s $id --recursive -t 40
guppy_basecaller --input_path $id --save_path $id_guppy --num_callers 40 --recursive --fast5_out --config rna_r9.4.1_70bps_hac.cfg --cpu_threads_per_caller 10
tombo resquiggle --rna --overwrite ${id}\_guppy/workspace/ ../genome/gencode.v27.transcripts.fa --processes 40 --fit-global-scale --include-event-stdev
python extract_raw_and_feature_fast_AUCG.py --cpu=30 --fl = ${id}_guppy.list -o ${id}_guppy.feature --clip=5
cat *.feature.feature.fa >all.feature.fa
cat *.feature.feature.tsv >all.feature.tsv
python ../NanoNm/predict_sites_Nm.final.py --model ../NanoNm/model --cpu 20 -i all -o all_Nm_model -r ../genome/gencode.v27.transcripts.fa -g ../genome/GRCh38.p13.genome.fa -b ../genome/hg38.gene2transcripts.txt
#!/bin/bash
cd Yeast_Fly_feature_dataset
samples=("WT" "snoR60" "snoR61" "sno62")
for id in "${samples[@]}"; do
gunzip "${id}_guppy.feature.feature.*.gz"
python ../NanoNm/predict_sites_Nm.final.py --model ../NanoNm/model --cpu 20 -i ${id}_guppy.feature -o $id\_Nm_model -r ../rRNA/yeast.rRNA.fa -g ../rRNA/yeast.rRNA.fa -b ../rRNA/yeast_rRNA.list
python ../script/ratio2bed.py $id\_Nm_model/ratio.0.5.tsv $id\_yeast
done
#!/bin/bash
cd Yeast_Fly_feature_dataset
samples=("Fly_NT_rep1" "Fly_NT_rep2" "Fly_KD_rep1" "Fly_KD_rep2")
for id in "${samples[@]}"; do
gunzip "${id}_guppy.feature.feature.*.gz"
python NanoNm/predict_sites_Nm.final.py --model ../NanoNm/model --cpu 20 -i ${id}_guppy.feature -o $id\_Nm_model -r ../rRNA/fly.rRNA.fa -g ../rRNA/fly.rRNA.fa -b ../rRNA/fly_rRNA.list
python ./script/ratio2bed.py $id\_Nm_model/ratio.0.5.tsv $id\_fly
done
#!/bin/bash
cd example_data
samples=("siCTRL_rRNA" "siFBL_rRNA")
conda activate NanoNm
for id in "${samples[@]}"; do
multi_to_single_fast5 -i "${id}.fast5" -s "$id" --recursive -t 40
guppy_basecaller --input_path "$id" --save_path "${id}_guppy" --num_callers 40 --recursive --fast5_out --config rna_r9.4.1_70bps_hac.cfg --cpu_threads_per_caller 10
tombo resquiggle --rna --overwrite "${id}_guppy/workspace/" human.rRNA.fa --processes 40 --fit-global-scale --include-event-stdev
find "${id}_guppy/workspace/" -name "*.fast5" "${id}_guppy.list"
python ./NanoNm/extract_raw_and_feature_fast_AUCG.py --cpu=30 --fl="${id}_guppy.list" -o "${id}_guppy.feature" --clip=5
python ./NanoNm/predict_sites_Nm.final.py --model ../NanoNm/model --cpu 10 -i all -o rRNA_QC7 -r ../rRNA/human.rRNA.fa -g ../rRNA/human.rRNA.fa -b ../rRNA/rRNA_gene2transcript.txt 2rRNA.QC7.err
python ../scripts/ratio2bed.py rRNA_QC7/ratio.0.5.tsv "$sample"
done
#!/bin/bash
cd example_data
samples=("siCTRL_mRNA" "siFBL_mRNA")
conda activate NanoNm
for id in "${samples[@]}"; do
multi_to_single_fast5 -i "${id}.fast5" -s "$id" --recursive -t 40
guppy_basecaller --input_path "$id" --save_path "${id}_guppy" --num_callers 40 --recursive --fast5_out --config rna_r9.4.1_70bps_hac.cfg --cpu_threads_per_caller 10
find "${id}_guppy/workspace/" -name "*.fast5" "${id}_guppy.list"
python ../NanoNm/extract_raw_and_feature_fast_AUCG.py --cpu=30 --fl="${id}_guppy.list" -o "$id" --clip=5
python ../NanoNm/predict_sites_Nm.final.py --model ../NanoNm/model -i "$id" -o "${id}_Nm_model" -r ../genome/gencode.v27.transcripts.fa -g ../genome/GRCh38.p13.genome.fa -b ../genome/hg38.gene2transcripts1.txt
python ../script/ratio2bed.py "${id}_Nm_model/ratio.0.5.tsv" siCTRL_mRNA
done
python ../scripts/merge_mRNA.py siCTRL_mRNA.filter.bed siFBL_mRNA.filter.bed Human_mRNA.ratio.txt
python ../scripts/merge_human.rRNA.py human.rRNA.Nm.bed siCTRL_rRNA.filter.bed siFBL_rRNA.filter.bed Human_rRNA.ratio.txt
Rscript ../scripts/Human_siFBL_siCTRL_rRNA_mRNA.plot.R
Yanqiang.Li@childrens.harvard.edu
or
Kaifu.Chen@childrens.harvard.edu
Copy Right @ Kaifu Chen Lab @ Boston Childrens Hospital / Harvard Medical School
