Machine learning algorithm for precise prediction of 2′-O-methylation (Nm) sites from experimental RiboMethSeq datasets

•Improved mapping of 2′-O-methylated (Nm) and pseudouridine (ψ) residues by RiboMethSeq.•Random Forest (RF) model including scores for neighboring positions gives the best results.•The model was trained using human rRNA datasets and successfully validated on other rRNAs. Analysis of epitranscriptomi...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Methods (San Diego, Calif.) Ročník 203; s. 311 - 321
Hlavní autoři: Pichot, Florian, Marchand, Virginie, Helm, Mark, Motorin, Yuri
Médium: Journal Article
Jazyk:angličtina
Vydáno: United States Elsevier Inc 01.07.2022
Elsevier
Témata:
ISSN:1046-2023, 1095-9130, 1095-9130
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:•Improved mapping of 2′-O-methylated (Nm) and pseudouridine (ψ) residues by RiboMethSeq.•Random Forest (RF) model including scores for neighboring positions gives the best results.•The model was trained using human rRNA datasets and successfully validated on other rRNAs. Analysis of epitranscriptomic RNA modifications by deep sequencing-based approaches brings an essential contribution to the general knowledge on their precise locations and relative stoichiometry in cellular RNAs. To reveal RNA modifications, several analytical approaches have been proposed, including antibody-driven enrichment, analysis of RT-signatures and specific chemical treatments. However, analysis and interpretation of these massive datasets, especially for low abundant cellular RNAs (e.g. mRNA and lncRNA) is not easy nor straightforward, since the insufficient specificity and selectivity are leading to massive false-positive and false-negative identifications. The main issue in the application of these methods relies on a subjective classification of potentially modified positions, mostly based on arbitrarily defined threshold values for different scores. Such approach using pre-defined scores’ values was revealed to be appropriate for limited complexity datasets (for tRNA and/or rRNA analysis), but application to longer reference sequences requires much better classification algorithms. In this work we applied a machine learning algorithm (Random Forest, RF) to create a predictive model for analysis of 2′-O-methylated sites in RNA using RiboMethSeq datasets. Model’s training was performed on a large collection of human rRNA datasets with well-known modification profiles and the performance of the prediction was assessed using experimentally defined profiles for other eukaryotic rRNAs (S.cerevisiae and A.thaliana). Application of this Random Forest prediction model for detection of other RNA modifications and to more complex datasets is discussed.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1046-2023
1095-9130
1095-9130
DOI:10.1016/j.ymeth.2022.03.007