Nucleotide dependency analysis of genomic language models detects functional elements
Gespeichert in:
| Titel: | Nucleotide dependency analysis of genomic language models detects functional elements |
|---|---|
| Autoren: | Tomaz da Silva, Pedro, Karollus, Alexander, Hingerl, Johannes, Galindez, Gihanna Sta Teresa, Wagner, Nils, Hernandez-Alias, Xavier, Incarnato, Danny, Gagneur, Julien |
| Quelle: | Nature Genetics. 57(10):2589-2602 |
| Verlagsinformationen: | Nature Publishing Group, 2025. |
| Publikationsjahr: | 2025 |
| Schlagwörter: | Genetic, Models, Humans, Nucleic Acid Conformation, Genome/genetics, RNA/genetics, Nucleotides/genetics, Genomics/methods |
| Beschreibung: | Deciphering how nucleotides in genomes encode regulatory instructions and molecular machines is a long-standing goal. Genomic language models (gLMs) implicitly capture functional elements and their organization from genomic sequences alone by modeling probabilities of each nucleotide given its sequence context. However, discovering functional genomic elements from gLMs has been challenging due to the lack of interpretable methods. Here we introduce nucleotide dependencies, which quantify how nucleotide substitutions at one genomic position affect the probabilities of nucleotides at other positions. We demonstrate that nucleotide dependencies are more effective at indicating the deleteriousness of genetic variants than alignment-based conservation and gLM reconstruction. Dependency analysis accurately detects regulatory motifs and highlights bases in contact within RNAs, including pseudoknots and tertiary structure contacts, revealing new, experimentally validated RNA structures. Finally, we leverage dependency maps to reveal critical limitations of several gLM architectures and training strategies. Altogether, nucleotide dependency analysis opens a new avenue for discovering and studying functional elements and their interactions in genomes. |
| Publikationsart: | Article |
| Sprache: | English |
| ISSN: | 1061-4036 |
| DOI: | 10.1038/s41588-025-02347-3 |
| Zugangs-URL: | https://research.rug.nl/en/publications/3d5add9d-35f2-4c78-a3dc-7f9d53852abb https://hdl.handle.net/11370/3d5add9d-35f2-4c78-a3dc-7f9d53852abb |
| Rights: | CC BY |
| Dokumentencode: | edsair.dris...01423..007b4be2ec1068d0860634f57a886354 |
| Datenbank: | OpenAIRE |
| Abstract: | Deciphering how nucleotides in genomes encode regulatory instructions and molecular machines is a long-standing goal. Genomic language models (gLMs) implicitly capture functional elements and their organization from genomic sequences alone by modeling probabilities of each nucleotide given its sequence context. However, discovering functional genomic elements from gLMs has been challenging due to the lack of interpretable methods. Here we introduce nucleotide dependencies, which quantify how nucleotide substitutions at one genomic position affect the probabilities of nucleotides at other positions. We demonstrate that nucleotide dependencies are more effective at indicating the deleteriousness of genetic variants than alignment-based conservation and gLM reconstruction. Dependency analysis accurately detects regulatory motifs and highlights bases in contact within RNAs, including pseudoknots and tertiary structure contacts, revealing new, experimentally validated RNA structures. Finally, we leverage dependency maps to reveal critical limitations of several gLM architectures and training strategies. Altogether, nucleotide dependency analysis opens a new avenue for discovering and studying functional elements and their interactions in genomes. |
|---|---|
| ISSN: | 10614036 |
| DOI: | 10.1038/s41588-025-02347-3 |
Full Text Finder
Nájsť tento článok vo Web of Science