RiNALMo: general-purpose RNA language models can generalize well on structure prediction tasks
Uloženo v:
| Název: | RiNALMo: general-purpose RNA language models can generalize well on structure prediction tasks |
|---|---|
| Autoři: | Rafael Josip Penić, Tin Vlašić, Roland G. Huber, Yue Wan, Mile Šikić |
| Zdroj: | Nat Commun Nature Communications Volume 16 |
| Publication Status: | Preprint |
| Informace o vydavateli: | Springer Science and Business Media LLC, 2025. |
| Rok vydání: | 2025 |
| Témata: | Biomolecules, Machine Learning, FOS: Computer and information sciences, TEHNIČKE ZNANOSTI. Računarstvo. Umjetna inteligencija, FOS: Biological sciences, language model, RNA, Biomolecules (q-bio.BM), TECHNICAL SCIENCES. Computing. Artificial Intelligence, structure prediction, Article, Machine Learning (cs.LG) |
| Popis: | While RNA has recently been recognized as an interesting small-molecule drug target, many challenges remain to be addressed before we take full advantage of it. This emphasizes the necessity to improve our understanding of its structures and functions. Over the years, sequencing technologies have produced an enormous amount of unlabeled RNA data, which hides a huge potential. Motivated by the successes of protein language models, we introduce RiboNucleic Acid Language Model (RiNALMo) to unveil the hidden code of RNA. RiNALMo is the largest RNA language model to date, with 650M parameters pre-trained on 36M non-coding RNA sequences from several databases. It can extract hidden knowledge and capture the underlying structure information implicitly embedded within the RNA sequences. RiNALMo achieves state-of-the-art results on several downstream tasks. Notably, we show that its generalization capabilities overcome the inability of other deep learning methods for secondary structure prediction to generalize on unseen RNA families. 31 pages, 9 figures |
| Druh dokumentu: | Article Other literature type |
| Popis souboru: | application/pdf |
| Jazyk: | English |
| ISSN: | 2041-1723 |
| DOI: | 10.1038/s41467-025-60872-5 |
| DOI: | 10.48550/arxiv.2403.00043 |
| Přístupová URL adresa: | http://arxiv.org/abs/2403.00043 https://www.nature.com/articles/s41467-025-60872-5.pdf https://doi.org/10.1038/s41467-025-60872-5 https://repozitorij.fer.unizg.hr/islandora/object/fer:13488 https://urn.nsk.hr/urn:nbn:hr:168:753801 https://repozitorij.fer.unizg.hr/islandora/object/fer:13488/datastream/FILE0 |
| Rights: | CC BY NC ND CC BY URL: http://rightsstatements.org/vocab/InC/1.0/ |
| Přístupové číslo: | edsair.doi.dedup.....eee92525bf7622ddb206b3484cb13a06 |
| Databáze: | OpenAIRE |
| Abstrakt: | While RNA has recently been recognized as an interesting small-molecule drug target, many challenges remain to be addressed before we take full advantage of it. This emphasizes the necessity to improve our understanding of its structures and functions. Over the years, sequencing technologies have produced an enormous amount of unlabeled RNA data, which hides a huge potential. Motivated by the successes of protein language models, we introduce RiboNucleic Acid Language Model (RiNALMo) to unveil the hidden code of RNA. RiNALMo is the largest RNA language model to date, with 650M parameters pre-trained on 36M non-coding RNA sequences from several databases. It can extract hidden knowledge and capture the underlying structure information implicitly embedded within the RNA sequences. RiNALMo achieves state-of-the-art results on several downstream tasks. Notably, we show that its generalization capabilities overcome the inability of other deep learning methods for secondary structure prediction to generalize on unseen RNA families.<br />31 pages, 9 figures |
|---|---|
| ISSN: | 20411723 |
| DOI: | 10.1038/s41467-025-60872-5 |
Full Text Finder
Nájsť tento článok vo Web of Science