Artificial intelligence-aided protein engineering: from topological data analysis to deep protein language models
Abstract Protein engineering is an emerging field in biotechnology that has the potential to revolutionize various areas, such as antibody design, drug discovery, food security, ecology, and more. However, the mutational space involved is too vast to be handled through experimental means alone. Leve...
Uloženo v:
| Vydáno v: | Briefings in bioinformatics Ročník 24; číslo 5 |
|---|---|
| Hlavní autoři: | , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
England
Oxford University Press
20.09.2023
Oxford Publishing Limited (England) |
| Témata: | |
| ISSN: | 1467-5463, 1477-4054, 1477-4054 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Abstract
Protein engineering is an emerging field in biotechnology that has the potential to revolutionize various areas, such as antibody design, drug discovery, food security, ecology, and more. However, the mutational space involved is too vast to be handled through experimental means alone. Leveraging accumulative protein databases, machine learning (ML) models, particularly those based on natural language processing (NLP), have considerably expedited protein engineering. Moreover, advances in topological data analysis (TDA) and artificial intelligence-based protein structure prediction, such as AlphaFold2, have made more powerful structure-based ML-assisted protein engineering strategies possible. This review aims to offer a comprehensive, systematic, and indispensable set of methodological components, including TDA and NLP, for protein engineering and to facilitate their future development. |
|---|---|
| AbstractList | Protein engineering is an emerging field in biotechnology that has the potential to revolutionize various areas, such as antibody design, drug discovery, food security, ecology, and more. However, the mutational space involved is too vast to be handled through experimental means alone. Leveraging accumulative protein databases, machine learning (ML) models, particularly those based on natural language processing (NLP), have considerably expedited protein engineering. Moreover, advances in topological data analysis (TDA) and artificial intelligence-based protein structure prediction, such as AlphaFold2, have made more powerful structure-based ML-assisted protein engineering strategies possible. This review aims to offer a comprehensive, systematic, and indispensable set of methodological components, including TDA and NLP, for protein engineering and to facilitate their future development. Abstract Protein engineering is an emerging field in biotechnology that has the potential to revolutionize various areas, such as antibody design, drug discovery, food security, ecology, and more. However, the mutational space involved is too vast to be handled through experimental means alone. Leveraging accumulative protein databases, machine learning (ML) models, particularly those based on natural language processing (NLP), have considerably expedited protein engineering. Moreover, advances in topological data analysis (TDA) and artificial intelligence-based protein structure prediction, such as AlphaFold2, have made more powerful structure-based ML-assisted protein engineering strategies possible. This review aims to offer a comprehensive, systematic, and indispensable set of methodological components, including TDA and NLP, for protein engineering and to facilitate their future development. Protein engineering is an emerging field in biotechnology that has the potential to revolutionize various areas, such as antibody design, drug discovery, food security, ecology, and more. However, the mutational space involved is too vast to be handled through experimental means alone. Leveraging accumulative protein databases, machine learning (ML) models, particularly those based on natural language processing (NLP), have considerably expedited protein engineering. Moreover, advances in topological data analysis (TDA) and artificial intelligence-based protein structure prediction, such as AlphaFold2, have made more powerful structure-based ML-assisted protein engineering strategies possible. This review aims to offer a comprehensive, systematic, and indispensable set of methodological components, including TDA and NLP, for protein engineering and to facilitate their future development.Protein engineering is an emerging field in biotechnology that has the potential to revolutionize various areas, such as antibody design, drug discovery, food security, ecology, and more. However, the mutational space involved is too vast to be handled through experimental means alone. Leveraging accumulative protein databases, machine learning (ML) models, particularly those based on natural language processing (NLP), have considerably expedited protein engineering. Moreover, advances in topological data analysis (TDA) and artificial intelligence-based protein structure prediction, such as AlphaFold2, have made more powerful structure-based ML-assisted protein engineering strategies possible. This review aims to offer a comprehensive, systematic, and indispensable set of methodological components, including TDA and NLP, for protein engineering and to facilitate their future development. |
| Author | Wei, Guo-Wei Qiu, Yuchi |
| Author_xml | – sequence: 1 givenname: Yuchi surname: Qiu fullname: Qiu, Yuchi – sequence: 2 givenname: Guo-Wei surname: Wei fullname: Wei, Guo-Wei email: weig@msu.edu |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/37580175$$D View this record in MEDLINE/PubMed |
| BookMark | eNp9kctrFjEUxYNU7ENX7mVAEEHGJjN5zHQjpfiCghtdh0xyM6ZkkmmSEfrfm4_va9Eiru6F-zuHwz2n6CjEAAi9JPg9wWN_PrnpfJqU6YbxCTohVIiWYkaPdjsXLaO8P0anOd9g3GExkGfouBdswESwE3R7mYqzTjvlGxcKeO9mCBpa5QyYZk2xgAsNhNkFgOTCfNHYFJemxDX6ODtdhUYV1aig_F12uV4aA7A-aL0K86ZmaJZowOfn6KlVPsOLwzxDPz59_H71pb3-9vnr1eV1qyklpe3oaEYzYTbaTutOUTpNhgPrMBaWYSssBk64JlxgTqAflQWNwQhi9TAOvD9DH_a-6zYtYDSEkpSXa3KLSncyKif_vgT3U87xlySYEd7zrjq8PTikeLtBLnJxWdcXqQBxy7IbGCGUjMMOff0IvYlbqh_Jssd0rD0NbKjUqz8jPWS5r6MCZA_oFHNOYKV2RRUXdwmdr9HkrnJZK5eHyqvm3SPNve2_6Td7Om7rf8Hf4AW95g |
| CitedBy_id | crossref_primary_10_1002_qub2_70013 crossref_primary_10_3390_ijms241612703 crossref_primary_10_7554_eLife_102788_3 crossref_primary_10_1016_j_isci_2025_113324 crossref_primary_10_1016_j_jmb_2024_168715 crossref_primary_10_1002_chem_202303889 crossref_primary_10_1021_acs_chemrev_4c00595 crossref_primary_10_1093_pnasnexus_pgae158 crossref_primary_10_1002_cctc_202401542 crossref_primary_10_1007_s00018_025_05770_1 crossref_primary_10_1016_j_jpha_2024_101081 crossref_primary_10_1016_j_tem_2024_01_011 crossref_primary_10_1007_s11274_025_04475_8 crossref_primary_10_1038_s43588_024_00724_2 crossref_primary_10_1007_s10462_024_10710_9 crossref_primary_10_1021_acs_langmuir_4c04140 crossref_primary_10_3390_molecules29194626 crossref_primary_10_1016_j_tibtech_2024_04_003 crossref_primary_10_1007_s12672_025_03395_1 crossref_primary_10_2174_0113816128349577240927071706 crossref_primary_10_1016_j_biotechadv_2025_108601 crossref_primary_10_7554_eLife_102788 crossref_primary_10_1016_j_biotechadv_2024_108459 crossref_primary_10_1038_s41598_025_90828_0 crossref_primary_10_1063_5_0280985 crossref_primary_10_59717_j_xinn_life_2024_100105 crossref_primary_10_1002_mlf2_70009 crossref_primary_10_3390_a16100465 |
| Cites_doi | 10.1038/s41467-022-29874-5 10.1002/humu.22225 10.1038/s43588-022-00373-3 10.1371/journal.pcbi.1005786 10.1093/nar/gki387 10.1093/nar/28.1.235 10.1016/j.isci.2020.100939 10.1016/j.compbiomed.2022.106262 10.1038/s41586-018-0337-2 10.1145/1064092.1064133 10.1038/s41592-019-0496-6 10.1063/1674-0068/cjcp2109150 10.1038/s41592-019-0598-1 10.1007/s10822-018-0146-6 10.1063/1.4978997 10.1093/nar/gkn159 10.1038/s41467-021-25831-w 10.1016/bs.mie.2020.05.005 10.1038/nature17995 10.1371/journal.pcbi.1002195 10.1038/s42256-019-0017-4 10.1016/j.sbi.2021.01.008 10.1093/nar/gky995 10.3934/fods.2023010 10.1007/s41468-019-00038-7 10.1073/pnas.2016239118 10.1016/j.copbio.2022.102713 10.1007/s11042-022-13428-4 10.1038/s41586-021-03819-2 10.1021/acscatal.9b04321 10.1038/nbt.3769 10.1016/j.sbi.2023.102627 10.1126/science.ade2574 10.1038/s41467-023-36048-4 10.1038/s41587-020-00793-4 10.1073/pnas.1215251110 10.1515/mlbmb-2015-0009 10.1038/s41592-019-0583-8 10.1371/journal.pcbi.1009284 10.1021/acs.jcim.0c01415 10.3934/fods.2022015 10.1090/S0273-0979-07-01191-3 10.1038/s41467-022-33004-6 10.1093/bioinformatics/bty862 10.21203/rs.3.rs-1969991/v1 10.1126/science.1257360 10.1038/nature19791 10.1038/s42003-023-04866-3 10.1126/science.abj8754 10.1038/s41592-020-0848-2 10.1093/nar/gkaa1100 10.1002/cnm.2914 10.1038/s41467-021-22732-w 10.1038/s41587-022-01432-w 10.1007/s10958-020-04897-9 10.1016/S0969-2126(97)00260-8 10.1145/997817.997870 10.1021/acs.jcim.2c01046 10.1038/s41598-019-55660-3 10.3390/a13010019 10.1038/s43588-021-00168-y 10.1021/acssynbio.8b00155 10.1126/science.aad8865 10.1016/j.sbi.2022.102518 10.1038/s41467-022-29443-w 10.1109/TPAMI.2020.3013679 10.1093/nar/gkr1178 10.1016/j.cels.2021.07.008 10.1090/conm/453/08802 10.1093/bib/bbab127 10.1021/acs.jcim.9b00334 10.1145/3447548.3467311 10.1038/nrm2805 10.1038/nbt1286 10.1109/5.18626 10.1073/pnas.2122954119 10.1137/19M1272226 10.1038/nmeth1156 10.1038/s41592-021-01100-y 10.2139/ssrn.3275996 10.1137/21M1435471 10.1038/s41467-021-23303-9 10.1103/PhysRevA.100.022512 10.1016/j.tips.2020.12.004 10.1021/ar960017f 10.1038/s42256-022-00532-1 10.7554/eLife.16965 10.1007/b97315 10.3934/dcdsb.2020257 10.1038/s41587-021-01146-5 10.1146/annurev-statistics-031017-100045 10.1093/bioinformatics/btac020 10.1126/sciadv.abc5329 10.1007/s00454-004-1146-y 10.1371/journal.pcbi.1005690 10.1162/neco.1997.9.8.1735 10.1073/pnas.0408930102 10.1002/cnm.3179 10.1038/s41467-021-25976-8 10.3115/v1/D14-1181 10.1038/s41586-021-04043-8 10.1002/cnm.3376 10.15252/msb.20199380 10.1038/s41467-021-25371-3 10.1093/protein/15.10.779 10.1089/cmb.2008.0173 10.1109/MSP.2017.2765202 10.1038/s43588-022-00394-y 10.1038/s41592-018-0138-4 10.1007/s41468-020-00057-9 10.1016/j.sbi.2021.11.002 |
| ContentType | Journal Article |
| Copyright | The Author(s) 2023. Published by Oxford University Press. 2023 The Author(s) 2023. Published by Oxford University Press. |
| Copyright_xml | – notice: The Author(s) 2023. Published by Oxford University Press. 2023 – notice: The Author(s) 2023. Published by Oxford University Press. |
| DBID | TOX AAYXX CITATION CGR CUY CVF ECM EIF NPM 7QO 7SC 8FD FR3 JQ2 K9. L7M L~C L~D P64 RC3 7X8 5PM |
| DOI | 10.1093/bib/bbad289 |
| DatabaseName | Oxford Journals Open Access Collection CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed Biotechnology Research Abstracts Computer and Information Systems Abstracts Technology Research Database Engineering Research Database ProQuest Computer Science Collection ProQuest Health & Medical Complete (Alumni) Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Biotechnology and BioEngineering Abstracts Genetics Abstracts MEDLINE - Academic PubMed Central (Full Participant titles) |
| DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Genetics Abstracts Biotechnology Research Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest Computer Science Collection Computer and Information Systems Abstracts ProQuest Health & Medical Complete (Alumni) Engineering Research Database Advanced Technologies Database with Aerospace Biotechnology and BioEngineering Abstracts Computer and Information Systems Abstracts Professional MEDLINE - Academic |
| DatabaseTitleList | MEDLINE CrossRef MEDLINE - Academic Genetics Abstracts |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: TOX name: Oxford Journals Open Access Collection url: https://academic.oup.com/journals/ sourceTypes: Publisher – sequence: 3 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Biology |
| EISSN | 1477-4054 |
| ExternalDocumentID | PMC10516362 37580175 10_1093_bib_bbad289 10.1093/bib/bbad289 |
| Genre | Research Support, U.S. Gov't, Non-P.H.S Review Research Support, Non-U.S. Gov't Journal Article Research Support, N.I.H., Extramural |
| GrantInformation_xml | – fundername: NIAID NIH HHS grantid: R01 AI164266 – fundername: NIGMS NIH HHS grantid: R35 GM148196 – fundername: NIGMS NIH HHS grantid: R01 GM126189 – fundername: NIH HHS grantid: R01GM126189 – fundername: ; – fundername: ; grantid: DMS-2052983; DMS-1761320; IIS-1900473 – fundername: ; grantid: 80NSSC21M0023 – fundername: ; grantid: R01GM126189; R35GM148196; R01AI164266 – fundername: ; grantid: 65109 |
| GroupedDBID | --- -E4 .2P .I3 0R~ 1TH 23N 2WC 36B 4.4 48X 53G 5GY 5VS 6J9 70D 8VB AAGQS AAHBH AAIJN AAIMJ AAJKP AAJQQ AAMDB AAMVS AAOGV AAPQZ AAPXW AARHZ AAUQX AAVAP AAVLN ABDBF ABEJV ABEUO ABGNP ABIXL ABNKS ABPQP ABPTD ABQLI ABQTQ ABWST ABXVV ABXZS ABZBJ ACGFO ACGFS ACGOD ACIWK ACPRK ACUFI ACUHS ACUXJ ACYTK ADBBV ADEYI ADFTL ADGKP ADGZP ADHKW ADHZD ADOCK ADPDF ADQBN ADRDM ADRTK ADVEK ADYVW ADZTZ ADZXQ AECKG AEGPL AEGXH AEJOX AEKKA AEKSI AELWJ AEMDU AEMOZ AENEX AENZO AEPUE AETBJ AEWNT AFFZL AFGWE AFIYH AFOFC AFRAH AGINJ AGKEF AGQXC AGSYK AHMBA AHQJS AHXPO AIAGR AIJHB AJEEA AJEUX AKHUL AKVCP AKWXX ALMA_UNASSIGNED_HOLDINGS ALTZX ALUQC ALXQX AMNDL ANAKG APIBT APWMN ARIXL AXUDD AYOIW AZVOD BAWUL BAYMD BEYMZ BHONS BQDIO BQUQU BSWAC BTQHN C1A C45 CAG CDBKE COF CS3 CZ4 DAKXR DIK DILTD DU5 D~K E3Z EAD EAP EAS EBA EBC EBD EBR EBS EBU EE~ EJD EMB EMK EMOBN EST ESX F5P F9B FHSFR FLIZI FLUFQ FOEOM FQBLK GAUVT GJXCC GROUPED_DOAJ GX1 H13 H5~ HAR HW0 HZ~ IOX J21 JXSIZ K1G KBUDW KOP KSI KSN M-Z M49 MK~ ML0 N9A NGC NLBLG NMDNZ NOMLY NU- O0~ O9- OAWHX ODMLO OJQWA OK1 OVD OVEED P2P PAFKI PEELM PQQKQ Q1. Q5Y QWB RD5 RPM RUSNO RW1 RXO SV3 TEORI TH9 TJP TLC TOX TR2 TUS W8F WOQ X7H YAYTL YKOAZ YXANX ZKX ZL0 ~91 77I AAYXX AHGBF CITATION ROX CGR CUY CVF ECM EIF NPM 7QO 7SC 8FD FR3 JQ2 K9. L7M L~C L~D P64 RC3 7X8 5PM |
| ID | FETCH-LOGICAL-c441t-249d9db059f2cc2a44bbd6e52007f50f7f0e616c167061e39afec0ed71fc89863 |
| IEDL.DBID | TOX |
| ISICitedReferencesCount | 29 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001047878100001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1467-5463 1477-4054 |
| IngestDate | Tue Sep 30 17:12:52 EDT 2025 Thu Oct 02 10:22:25 EDT 2025 Fri Oct 03 04:10:57 EDT 2025 Tue Oct 21 01:40:08 EDT 2025 Sat Nov 29 05:43:38 EST 2025 Tue Nov 18 22:33:08 EST 2025 Wed Apr 02 07:05:26 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 5 |
| Keywords | protein engineering deep learning and machine learning protein language models topological data analysis |
| Language | English |
| License | This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. https://creativecommons.org/licenses/by/4.0 The Author(s) 2023. Published by Oxford University Press. |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c441t-249d9db059f2cc2a44bbd6e52007f50f7f0e616c167061e39afec0ed71fc89863 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Feature-3 content type line 23 ObjectType-Review-1 |
| OpenAccessLink | https://dx.doi.org/10.1093/bib/bbad289 |
| PMID | 37580175 |
| PQID | 3049109858 |
| PQPubID | 26846 |
| ParticipantIDs | pubmedcentral_primary_oai_pubmedcentral_nih_gov_10516362 proquest_miscellaneous_2851141982 proquest_journals_3049109858 pubmed_primary_37580175 crossref_citationtrail_10_1093_bib_bbad289 crossref_primary_10_1093_bib_bbad289 oup_primary_10_1093_bib_bbad289 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-09-20 |
| PublicationDateYYYYMMDD | 2023-09-20 |
| PublicationDate_xml | – month: 09 year: 2023 text: 2023-09-20 day: 20 |
| PublicationDecade | 2020 |
| PublicationPlace | England |
| PublicationPlace_xml | – name: England – name: Oxford |
| PublicationTitle | Briefings in bioinformatics |
| PublicationTitleAlternate | Brief Bioinform |
| PublicationYear | 2023 |
| Publisher | Oxford University Press Oxford Publishing Limited (England) |
| Publisher_xml | – name: Oxford University Press – name: Oxford Publishing Limited (England) |
| References | Hopf (2023092216513424700_ref52) 2019; 35 You (2023092216513424700_ref99) 2020; 33 Kim (2023092216513424700_ref56) 2014 Ebli (2023092216513424700_ref110) 2020 Chen (2023092216513424700_ref88) 2022; 151 Kipf (2023092216513424700_ref97) 2016 Wei (2023092216513424700_ref86) 2023 Hsu (2023092216513424700_ref33) 2022; 40 Nguyen (2023092216513424700_ref29) 2019; 59 Wang (2023092216513424700_ref17) 2020; 2 Nguyen (2023092216513424700_ref27) 2019; 35 Hopf (2023092216513424700_ref38) 2017; 35 Wittmann (2023092216513424700_ref34) 2021; 12 Hamilton (2023092216513424700_ref94) 2017; 30 Cang (2023092216513424700_ref82) 2015; 3 Aghazadeh (2023092216513424700_ref121) 2021; 12 Qiu (2023092216513424700_ref124) 2021; 1 Gupta (2023092216513424700_ref140) 2019; 1 He (2023092216513424700_ref57) 2016 Veličković (2023092216513424700_ref98) 2018 Rasmussen, Carl Edward (2023092216513424700_ref131) 2003 Cang (2023092216513424700_ref67) 2020; 4 Kandathil (2023092216513424700_ref145) 2023; 81 Saito (2023092216513424700_ref134) 2018; 7 Schymkowitz (2023092216513424700_ref18) 2005; 33 Livesey (2023092216513424700_ref55) 2020; 16 Mazurenko (2023092216513424700_ref9) 2019; 10 Jumper (2023092216513424700_ref24) 2021; 596 Qiu (2023092216513424700_ref20) 2023; 3 Hochreiter (2023092216513424700_ref58) 1997; 9 Kingma (2023092216513424700_ref54) 2013 Greenman (2023092216513424700_ref130) 2022 Brandes (2023092216513424700_ref49) 2022; 38 Wang (2023092216513424700_ref73) 2023; 5 Leman (2023092216513424700_ref19) 2020; 17 Lütgehetmann (2023092216513424700_ref66) 2020; 13 Romero (2023092216513424700_ref5) 2009; 10 Fang (2023092216513424700_ref149) 2022 Madani (2023092216513424700_ref47) 2023 Eddy (2023092216513424700_ref51) 2011; 7 Vaswani (2023092216513424700_ref59) 2017 Pun (2023092216513424700_ref84) 2018 Arnold (2023092216513424700_ref2) 1998; 31 Berman (2023092216513424700_ref13) 2000; 28 Rong (2023092216513424700_ref100) 2020; 33 Tian (2023092216513424700_ref116) 2023 Cohen-Steiner (2023092216513424700_ref79) 2005 Clough (2023092216513424700_ref83) 2020; 44 Siedhoff (2023092216513424700_ref8) 2020; 643 Notin (2023092216513424700_ref15) 2022 Rives (2023092216513424700_ref23) 2021; 118 Mémoli (2023092216513424700_ref70) 2022; 4 Chen (2023092216513424700_ref71) 2021; 26 Rao (2023092216513424700_ref41) 2019; 32 Veličković (2023092216513424700_ref93) 2017 Orengo (2023092216513424700_ref46) 1997; 5 Creswell (2023092216513424700_ref139) 2018; 35 Cang (2023092216513424700_ref64) 2020; 2 Lin (2023092216513424700_ref50) 2023; 379 Chidyausiku (2023092216513424700_ref154) 2022; 13 Ryczko (2023092216513424700_ref30) 2019; 100 Wasserman (2023092216513424700_ref77) 2018; 5 Zhang (2023092216513424700_ref105) 2022 Cang (2023092216513424700_ref120) 2017; 33 Podgornaia (2023092216513424700_ref112) 2015; 347 Shen (2023092216513424700_ref128) Grigor’yan (2023092216513424700_ref90) 2020; 248 Baek (2023092216513424700_ref144) 2021; 373 Schuster (2023092216513424700_ref156) 2008; 5 (2023092216513424700_ref14) 2021; 49 Zomorodian (2023092216513424700_ref63) 2004 Detlefsen (2023092216513424700_ref61) 2022; 13 Zhang (2023092216513424700_ref119) 2020; 23 Sarkisyan (2023092216513424700_ref157) 2016; 533 Shin (2023092216513424700_ref141) 2021; 12 Yang (2023092216513424700_ref12) 2019; 16 Frazer (2023092216513424700_ref40) 2021; 599 Bubenik (2023092216513424700_ref80) 2015; 16 Boyken (2023092216513424700_ref4) 2016; 352 Georgiev (2023092216513424700_ref127) 2009; 16 Ingraham (2023092216513424700_ref106) 2019; 32 Bachas (2023092216513424700_ref142) 2022 Castro (2023092216513424700_ref143) 2022; 4 Wang (2023092216513424700_ref69) 2020; 36 Wei (2023092216513424700_ref72) 2021 Diaz (2023092216513424700_ref10) 2023; 78 Chowdhury (2023092216513424700_ref65) 2018 Bubeck (2023092216513424700_ref137) 2011; 12 Chen (2023092216513424700_ref75) 2023 Gligorijević (2023092216513424700_ref102) 2021; 12 Barrett (2023092216513424700_ref150) 2022 Liu (2023092216513424700_ref103) 2021; 17 Qiu (2023092216513424700_ref125) 2022; 62 Ghrist (2023092216513424700_ref78) 2008; 45 Bedbrook (2023092216513424700_ref133) 2017; 13 Wu (2023092216513424700_ref111) 2016; 5 Chen (2023092216513424700_ref32) 2021; 34 Hsu (2023092216513424700_ref45) 2022 Hie (2023092216513424700_ref115) 2022; 72 Meng (2023092216513424700_ref68) 2020; 10 Meier (2023092216513424700_ref44) 2021; 34 Adams (2023092216513424700_ref81) 2017; 18 Bedbrook (2023092216513424700_ref132) 2019; 16 Weissenow (2023092216513424700_ref152) 2022 Edelsbrunner (2023092216513424700_ref62) 2008; 453 Shihab (2023092216513424700_ref37) 2013; 34 Luo (2023092216513424700_ref129) 2021; 12 Edelsbrunner (2023092216513424700_ref25) 2010 Liu (2023092216513424700_ref74) 2021; 22 Morris (2023092216513424700_ref109) 2019 Riesselman (2023092216513424700_ref22) 2018; 15 Meng (2023092216513424700_ref89) 2021; 7 Greenhalgh (2023092216513424700_ref135) 2021; 12 Bordin (2023092216513424700_ref153) 2022; 48 Wee (2023092216513424700_ref28) 2021; 61 Shan (2023092216513424700_ref104) 2022; 119 Cang (2023092216513424700_ref108) 2017; 13 Rabiner (2023092216513424700_ref53) 1989; 77 Freschlin (2023092216513424700_ref114) 2022; 75 Fox (2023092216513424700_ref117) 2007; 25 Wu (2023092216513424700_ref148) 2022 Munos (2023092216513424700_ref138) 2011; 24 Kipf (2023092216513424700_ref92) 2016 Devlin (2023092216513424700_ref60) 2018 Guo (2023092216513424700_ref118) 2008; 36 Wang (2023092216513424700_ref147) 2022; 2 Chowdhury (2023092216513424700_ref146) 2022; 40 Li (2023092216513424700_ref107) 2022 Li (2023092216513424700_ref101) 2021 Bepler (2023092216513424700_ref42) 2018 Biswas (2023092216513424700_ref43) 2021; 18 Bryant (2023092216513424700_ref123) 2021; 39 Rao (2023092216513424700_ref39) 2021 Zhang (2023092216513424700_ref113) 2023; 14 Hansen (2023092216513424700_ref91) 2019; 3 Cang (2023092216513424700_ref16) 2018; 34 Kaczynski (2023092216513424700_ref76) 2004 Xu (2023092216513424700_ref95) 2018 Li (2023092216513424700_ref96) 2015 Federhen (2023092216513424700_ref48) 2012; 40 Bhardwaj (2023092216513424700_ref6) 2016; 538 Wittmann (2023092216513424700_ref11) 2021; 69 Wu (2023092216513424700_ref151) 2022 Karplus (2023092216513424700_ref3) 2005; 102 Khurana (2023092216513424700_ref35) 2023; 82 Zomorodian (2023092216513424700_ref26) 2005; 33 El-Gebali (2023092216513424700_ref36) 2019; 47 Thean (2023092216513424700_ref126) 2022; 13 Narayanan (2023092216513424700_ref1) 2021; 42 Romero (2023092216513424700_ref136) 2013; 110 Butler (2023092216513424700_ref31) 2018; 559 Pierce (2023092216513424700_ref7) 2002; 15 Stolz (2023092216513424700_ref85) 2017; 27 Dallago (2023092216513424700_ref122) 2021 Keros (2023092216513424700_ref155) 2022 Alley (2023092216513424700_ref21) 2019; 16 Nguyen (2023092216513424700_ref87) 2019; 33 37547662 - ArXiv. 2023 Jul 27:arXiv:2307.14587v1. |
| References_xml | – volume: 13 start-page: 2219 issue: 1 year: 2022 ident: 2023092216513424700_ref126 article-title: Machine learning-coupled combinatorial mutagenesis enables resource-efficient engineering of CRISPR-Cas9 genome editor activities publication-title: Nat Commun doi: 10.1038/s41467-022-29874-5 – volume: 34 start-page: 57 issue: 1 year: 2013 ident: 2023092216513424700_ref37 article-title: Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models publication-title: Hum Mutat doi: 10.1002/humu.22225 – year: 2022 ident: 2023092216513424700_ref107 article-title: Orientation-aware graph neural networks for protein structure representation learning – volume: 2 start-page: 804 issue: 12 year: 2022 ident: 2023092216513424700_ref147 article-title: Single-sequence protein structure prediction using supervised transformer protein language models publication-title: Nat Comput Sci doi: 10.1038/s43588-022-00373-3 – year: 2023 ident: 2023092216513424700_ref86 article-title: Topological data analysis hearing the shapes of drums and bells – volume: 13 start-page: e1005786 issue: 10 year: 2017 ident: 2023092216513424700_ref133 article-title: Machine learning to design integral membrane channelrhodopsins for efficient eukaryotic expression and plasma membrane localization publication-title: PLoS Comput Biol doi: 10.1371/journal.pcbi.1005786 – start-page: 7133 volume-title: Proceedings of the AAAI Conference on Artificial Intelligence year: 2022 ident: 2023092216513424700_ref155 article-title: Dist2Cycle: a simplicial neural network for homology localization – volume: 33 start-page: W382 issue: suppl_2 year: 2005 ident: 2023092216513424700_ref18 article-title: The FoldX web server: an online force field publication-title: Nucleic Acids Res doi: 10.1093/nar/gki387 – volume: 28 start-page: 235 issue: 1 year: 2000 ident: 2023092216513424700_ref13 article-title: The protein data bank publication-title: Nucleic Acids Res doi: 10.1093/nar/28.1.235 – volume: 18 year: 2017 ident: 2023092216513424700_ref81 article-title: Persistence images: a stable vector representation of persistent homology publication-title: J Mach Learn Res – start-page: 8844 volume-title: International Conference on Machine Learning year: 2021 ident: 2023092216513424700_ref39 article-title: MSA transformer – volume: 23 start-page: 100939 issue: 3 year: 2020 ident: 2023092216513424700_ref119 article-title: MutaBind2: predicting the impacts of single and multiple mutations on protein-protein interactions publication-title: iScience doi: 10.1016/j.isci.2020.100939 – volume: 151 start-page: 106262 year: 2022 ident: 2023092216513424700_ref88 article-title: Persistent Laplacian projected Omicron BA.4 and BA.5 to become new dominating variants publication-title: Comput Biol Med doi: 10.1016/j.compbiomed.2022.106262 – volume: 559 start-page: 547 issue: 7715 year: 2018 ident: 2023092216513424700_ref31 article-title: Machine learning for molecular and materials science publication-title: Nature doi: 10.1038/s41586-018-0337-2 – start-page: 263 volume-title: Proceedings of the Twenty-First Annual Symposium on Computational Geometry year: 2005 ident: 2023092216513424700_ref79 article-title: Stability of persistence diagrams doi: 10.1145/1064092.1064133 – volume: 33 start-page: 3549 issue: 22 year: 2017 ident: 2023092216513424700_ref120 article-title: Analysis and prediction of protein folding energy changes upon mutation by element specific persistent homology publication-title: Bioinformatics – year: 2018 ident: 2023092216513424700_ref98 article-title: Deep graph infomax – volume: 16 start-page: 687 issue: 8 year: 2019 ident: 2023092216513424700_ref12 article-title: Machine-learning-guided directed evolution for protein engineering publication-title: Nat Methods doi: 10.1038/s41592-019-0496-6 – volume: 32 start-page: 9689 year: 2019 ident: 2023092216513424700_ref41 article-title: Evaluating protein transfer learning with TAPE publication-title: Adv Neural Inf Process – start-page: 8946 volume-title: International Conference on Machine Learning year: 2022 ident: 2023092216513424700_ref45 article-title: Learning inverse folding from millions of predicted structures – volume: 34 start-page: 683 issue: 6 year: 2021 ident: 2023092216513424700_ref32 article-title: MLIMC: machine learning-based implicit-solvent Monte Carlo publication-title: Chin J Chem Phys doi: 10.1063/1674-0068/cjcp2109150 – volume: 16 start-page: 1315 issue: 12 year: 2019 ident: 2023092216513424700_ref21 article-title: Unified rational protein engineering with sequence-based deep representation learning publication-title: Nat Methods doi: 10.1038/s41592-019-0598-1 – volume: 33 start-page: 71 year: 2019 ident: 2023092216513424700_ref87 article-title: Mathematical deep learning for pose and binding affinity prediction and ranking in D3R Grand Challenges publication-title: J Comput Aided Mol Des doi: 10.1007/s10822-018-0146-6 – volume: 27 start-page: 047410 issue: 4 year: 2017 ident: 2023092216513424700_ref85 article-title: Persistent homology of time-dependent functional networks constructed from coupled time series publication-title: Chaos doi: 10.1063/1.4978997 – volume: 36 start-page: 3025 issue: 9 year: 2008 ident: 2023092216513424700_ref118 article-title: Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences publication-title: Nucleic Acids Res doi: 10.1093/nar/gkn159 – volume: 12 start-page: 5825 issue: 1 year: 2021 ident: 2023092216513424700_ref135 article-title: Machine learning-guided acyl-ACP reductase engineering for improved in vivo fatty alcohol production publication-title: Nat Commun doi: 10.1038/s41467-021-25831-w – volume: 643 start-page: 281 year: 2020 ident: 2023092216513424700_ref8 article-title: Machine learning-assisted enzyme engineering publication-title: Meth Enzymol doi: 10.1016/bs.mie.2020.05.005 – year: 2017 ident: 2023092216513424700_ref93 article-title: Graph attention networks – year: 2022 ident: 2023092216513424700_ref152 article-title: Ultra-fast protein structure prediction to capture effects of sequence variation in mutation movies publication-title: bioRxiv – year: 2021 ident: 2023092216513424700_ref72 article-title: Persistent sheaf Laplacians – year: 2015 ident: 2023092216513424700_ref96 article-title: Gated graph sequence neural networks – volume: 2 start-page: 116 issue: 2 year: 2020 ident: 2023092216513424700_ref17 article-title: A topology-based network tree for the prediction of protein–protein binding affinity changes following mutation publication-title: Nat Mach – volume: 533 start-page: 397 issue: 7603 year: 2016 ident: 2023092216513424700_ref157 article-title: Local fitness landscape of the green fluorescent protein publication-title: Nature doi: 10.1038/nature17995 – volume-title: Computational Topology: An Introduction year: 2010 ident: 2023092216513424700_ref25 – volume: 7 start-page: e1002195 issue: 10 year: 2011 ident: 2023092216513424700_ref51 article-title: Accelerated profile HMM searches publication-title: PLoS Comput Biol doi: 10.1371/journal.pcbi.1002195 – volume: 16 start-page: 77 issue: 1 year: 2015 ident: 2023092216513424700_ref80 article-title: Statistical topological data analysis using persistence landscapes publication-title: J Mach Learn Res – volume: 1 start-page: 105 issue: 2 year: 2019 ident: 2023092216513424700_ref140 article-title: Feedback GAN for DNA optimizes protein functions publication-title: Nat Mach Intell doi: 10.1038/s42256-019-0017-4 – volume: 30 year: 2017 ident: 2023092216513424700_ref94 article-title: Inductive representation learning on large graphs publication-title: Adv Neural Inf Process Syst – volume: 69 start-page: 11 year: 2021 ident: 2023092216513424700_ref11 article-title: Advances in machine learning for directed evolution publication-title: Curr Opin Struct Biol doi: 10.1016/j.sbi.2021.01.008 – volume: 47 start-page: D427 issue: D1 year: 2019 ident: 2023092216513424700_ref36 article-title: The Pfam protein families database in 2019 publication-title: Nucleic Acids Res doi: 10.1093/nar/gky995 – year: 2022 ident: 2023092216513424700_ref150 article-title: So manyfolds, so little time: efficient protein structure prediction with pLMs and MSAs publication-title: bioRxiv – year: 2023 ident: 2023092216513424700_ref75 article-title: Persistent hyperdigraph homology and persistent hyperdigraph Laplacians doi: 10.3934/fods.2023010 – volume: 3 start-page: 315 year: 2019 ident: 2023092216513424700_ref91 article-title: Toward a spectral theory of cellular sheaves publication-title: J Appl Comput Topol doi: 10.1007/s41468-019-00038-7 – volume: 118 issue: 15 year: 2021 ident: 2023092216513424700_ref23 article-title: Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences publication-title: Proc Natl Acad Sci doi: 10.1073/pnas.2016239118 – year: 2021 ident: 2023092216513424700_ref122 article-title: FLIP: benchmark tasks in fitness landscape inference for proteins publication-title: bioRxiv – volume: 75 start-page: 102713 year: 2022 ident: 2023092216513424700_ref114 article-title: Machine learning to navigate fitness landscapes for protein engineering publication-title: Curr Opin Biotechnol doi: 10.1016/j.copbio.2022.102713 – volume: 82 start-page: 3713 issue: 3 year: 2023 ident: 2023092216513424700_ref35 article-title: Natural language processing: state of the art, current trends and challenges publication-title: Multimed Tools Appl doi: 10.1007/s11042-022-13428-4 – volume: 596 start-page: 583 issue: 7873 year: 2021 ident: 2023092216513424700_ref24 article-title: Highly accurate protein structure prediction with AlphaFold publication-title: Nature doi: 10.1038/s41586-021-03819-2 – volume-title: International Conference on Learning Representations year: 2018 ident: 2023092216513424700_ref42 article-title: Learning protein sequence embeddings using information from structure – year: 2018 ident: 2023092216513424700_ref60 article-title: BERT: pre-training of deep bidirectional transformers for language understanding – volume: 10 start-page: 1210 issue: 2 year: 2019 ident: 2023092216513424700_ref9 article-title: Machine learning in enzyme engineering publication-title: ACS Catal doi: 10.1021/acscatal.9b04321 – volume: 35 start-page: 128 issue: 2 year: 2017 ident: 2023092216513424700_ref38 article-title: Mutation effects predicted from sequence co-variation publication-title: Nat Biotechnol doi: 10.1038/nbt.3769 – volume: 81 start-page: 102627 year: 2023 ident: 2023092216513424700_ref145 article-title: Machine learning methods for predicting protein structure from single sequences publication-title: Curr Opin Struct Biol doi: 10.1016/j.sbi.2023.102627 – year: 2018 ident: 2023092216513424700_ref95 article-title: How powerful are graph neural networks? – volume: 379 start-page: 1123 issue: 6637 year: 2023 ident: 2023092216513424700_ref50 article-title: Evolutionary-scale prediction of atomic-level protein structure with a language model publication-title: Science doi: 10.1126/science.ade2574 – volume: 14 start-page: 385 issue: 1 year: 2023 ident: 2023092216513424700_ref113 article-title: Structural insights into the elevator-type transport mechanism of a bacterial ZIP metal transporter publication-title: Nat Commun doi: 10.1038/s41467-023-36048-4 – volume: 39 start-page: 691 issue: 6 year: 2021 ident: 2023092216513424700_ref123 article-title: Deep diversification of an AAV capsid protein by machine learning publication-title: Nat Biotechnol doi: 10.1038/s41587-020-00793-4 – volume: 110 start-page: E193 issue: 3 year: 2013 ident: 2023092216513424700_ref136 article-title: Navigating the protein fitness landscape with gaussian processes publication-title: Proc Natl Acad Sci doi: 10.1073/pnas.1215251110 – volume: 3 issue: 1 year: 2015 ident: 2023092216513424700_ref82 article-title: A topological approach for protein classification publication-title: Comput Math Biophys doi: 10.1515/mlbmb-2015-0009 – volume: 16 start-page: 1176 issue: 11 year: 2019 ident: 2023092216513424700_ref132 article-title: Machine learning-guided channelrhodopsin engineering enables minimally invasive optogenetics publication-title: Nat Methods doi: 10.1038/s41592-019-0583-8 – volume: 17 start-page: e1009284 issue: 8 year: 2021 ident: 2023092216513424700_ref103 article-title: Deep geometric representations for modeling effects of mutations on protein-protein binding affinity publication-title: PLoS Comput Biol doi: 10.1371/journal.pcbi.1009284 – volume: 61 start-page: 1617 issue: 4 year: 2021 ident: 2023092216513424700_ref28 article-title: Ollivier persistent Ricci curvature-based machine learning for the protein–ligand binding affinity prediction publication-title: J Chem Inf Model doi: 10.1021/acs.jcim.0c01415 – start-page: 5998 volume-title: Advances in Neural Information Processing Systems year: 2017 ident: 2023092216513424700_ref59 article-title: Attention is all you need – volume: 5 start-page: 26 year: 2023 ident: 2023092216513424700_ref73 article-title: Persistent path Laplacian publication-title: Found Data Sci doi: 10.3934/fods.2022015 – volume: 34 year: 2021 ident: 2023092216513424700_ref44 article-title: Language models enable zero-shot prediction of the effects of mutations on protein function publication-title: Adv Neural Inf Process Syst – volume: 45 start-page: 61 issue: 1 year: 2008 ident: 2023092216513424700_ref78 article-title: Barcodes: the persistent topology of data publication-title: Bull New Ser Am Math Soc doi: 10.1090/S0273-0979-07-01191-3 – year: 2022 ident: 2023092216513424700_ref142 article-title: Antibody optimization enabled by artificial intelligence predictions of binding affinity and naturalness publication-title: bioRxiv – volume: 13 start-page: 5661 issue: 1 year: 2022 ident: 2023092216513424700_ref154 article-title: De novo design of immunoglobulin-like domains publication-title: Nat Commun doi: 10.1038/s41467-022-33004-6 – volume: 35 start-page: 1582 issue: 9 year: 2019 ident: 2023092216513424700_ref52 article-title: The EVcouplings Python framework for coevolutionary sequence analysis publication-title: Bioinformatics doi: 10.1093/bioinformatics/bty862 – year: 2022 ident: 2023092216513424700_ref149 article-title: HelixFold-Single: MSA-free protein structure prediction by using protein language model as an alternative doi: 10.21203/rs.3.rs-1969991/v1 – volume: 347 start-page: 673 issue: 6222 year: 2015 ident: 2023092216513424700_ref112 article-title: Pervasive degeneracy and epistasis in a protein-protein interface publication-title: Science doi: 10.1126/science.1257360 – volume: 538 start-page: 329 issue: 7625 year: 2016 ident: 2023092216513424700_ref6 article-title: Accurate de novo design of hyperstable constrained peptides publication-title: Nature doi: 10.1038/nature19791 – volume-title: Communication Biology ident: 2023092216513424700_ref128 article-title: SVSBI: sequence-based virtual screening of biomolecular interactions doi: 10.1038/s42003-023-04866-3 – volume: 24 start-page: 783 year: 2011 ident: 2023092216513424700_ref138 article-title: Optimistic optimization of a deterministic function without the knowledge of its smoothness publication-title: Adv Neural Inf Process Syst – volume: 373 start-page: 871 issue: 6557 year: 2021 ident: 2023092216513424700_ref144 article-title: Accurate prediction of protein structures and interactions using a three-track neural network publication-title: Science doi: 10.1126/science.abj8754 – volume: 17 start-page: 665 issue: 7 year: 2020 ident: 2023092216513424700_ref19 article-title: Macromolecular modeling and design in Rosetta: recent methods and frameworks publication-title: Nat Methods doi: 10.1038/s41592-020-0848-2 – volume: 49 start-page: D480 issue: D1 year: 2021 ident: 2023092216513424700_ref14 article-title: Uniprot: the universal protein knowledgebase in 2021 publication-title: Nucleic Acids Res doi: 10.1093/nar/gkaa1100 – volume: 34 start-page: e2914 issue: 2 year: 2018 ident: 2023092216513424700_ref16 article-title: Integration of element specific persistent homology and machine learning for protein-ligand binding affinity prediction publication-title: Int J Numer Methods Biomed doi: 10.1002/cnm.2914 – volume: 12 start-page: 2403 issue: 1 year: 2021 ident: 2023092216513424700_ref141 article-title: Protein design and variant prediction using autoregressive generative models publication-title: Nat Commun doi: 10.1038/s41467-021-22732-w – volume: 40 start-page: 1617 issue: 11 year: 2022 ident: 2023092216513424700_ref146 article-title: Single-sequence protein structure prediction using a language model and deep learning publication-title: Nat Biotechnol doi: 10.1038/s41587-022-01432-w – volume: 248 start-page: 564 year: 2020 ident: 2023092216513424700_ref90 article-title: Path complexes and their homologies publication-title: J Math Sci doi: 10.1007/s10958-020-04897-9 – volume: 5 start-page: 1093 issue: 8 year: 1997 ident: 2023092216513424700_ref46 article-title: Cath–a hierarchic classification of protein domain structures publication-title: Structure doi: 10.1016/S0969-2126(97)00260-8 – start-page: 347 volume-title: Proceedings of the Twentieth Annual Symposium on Computational Geometry year: 2004 ident: 2023092216513424700_ref63 article-title: Computing persistent homology doi: 10.1145/997817.997870 – volume: 62 start-page: 4629 issue: 19 year: 2022 ident: 2023092216513424700_ref125 article-title: CLADE 2.0: evolution-driven cluster learning-assisted directed evolution publication-title: J Chem Inf Model doi: 10.1021/acs.jcim.2c01046 – volume: 10 start-page: 2079 issue: 1 year: 2020 ident: 2023092216513424700_ref68 article-title: Weighted persistent homology for biomolecular data analysis publication-title: Sci Rep doi: 10.1038/s41598-019-55660-3 – volume: 13 start-page: 19 issue: 1 year: 2020 ident: 2023092216513424700_ref66 article-title: Computing persistent homology of directed flag complexes publication-title: Algorithms doi: 10.3390/a13010019 – volume: 1 start-page: 809 issue: 12 year: 2021 ident: 2023092216513424700_ref124 article-title: Cluster learning-assisted directed evolution publication-title: Nat Comput Sci doi: 10.1038/s43588-021-00168-y – volume: 7 start-page: 2014 issue: 9 year: 2018 ident: 2023092216513424700_ref134 article-title: Machine-learning-guided mutagenesis for directed evolution of fluorescent proteins publication-title: ACS Synth Biol doi: 10.1021/acssynbio.8b00155 – volume: 352 start-page: 680 issue: 6286 year: 2016 ident: 2023092216513424700_ref4 article-title: De novo design of protein homo-oligomers with modular hydrogen-bond network–mediated specificity publication-title: Science doi: 10.1126/science.aad8865 – volume: 78 start-page: 102518 year: 2023 ident: 2023092216513424700_ref10 article-title: Using machine learning to predict the effects and consequences of mutations in proteins publication-title: Curr Opin Struct Biol doi: 10.1016/j.sbi.2022.102518 – volume: 13 start-page: 1914 issue: 1 year: 2022 ident: 2023092216513424700_ref61 article-title: Learning meaningful representations of protein sequences publication-title: Nat Commun doi: 10.1038/s41467-022-29443-w – volume: 44 start-page: 8766 issue: 12 year: 2020 ident: 2023092216513424700_ref83 article-title: A topological loss function for deep-learning based image segmentation using persistent homology publication-title: IEEE Trans Pattern Anal Mach Intell doi: 10.1109/TPAMI.2020.3013679 – volume: 40 start-page: D136 issue: D1 year: 2012 ident: 2023092216513424700_ref48 article-title: The NCBI Taxonomy database publication-title: Nucleic Acids Res doi: 10.1093/nar/gkr1178 – volume: 12 start-page: 1026 issue: 11 year: 2021 ident: 2023092216513424700_ref34 article-title: Informed training set design enables efficient machine learning-assisted directed protein evolution publication-title: Cell Syst doi: 10.1016/j.cels.2021.07.008 – volume: 453 start-page: 257 issue: 26 year: 2008 ident: 2023092216513424700_ref62 article-title: Persistent homology-a survey publication-title: Contemp Math doi: 10.1090/conm/453/08802 – volume: 22 start-page: bbab127 issue: 5 year: 2021 ident: 2023092216513424700_ref74 article-title: Persistent spectral hypergraph based machine learning (PSH-ML) for protein-ligand binding affinity prediction publication-title: Brief Bioinform doi: 10.1093/bib/bbab127 – volume: 59 start-page: 3291 issue: 7 year: 2019 ident: 2023092216513424700_ref29 article-title: AGL-Score: algebraic graph learning score for protein–ligand binding scoring, ranking, docking, and screening publication-title: J Chem Inf Model doi: 10.1021/acs.jcim.9b00334 – start-page: 975 volume-title: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining year: 2021 ident: 2023092216513424700_ref101 article-title: Structure-aware interactive graph neural networks for the prediction of protein-ligand binding affinity doi: 10.1145/3447548.3467311 – volume: 10 start-page: 866 issue: 12 year: 2009 ident: 2023092216513424700_ref5 article-title: Exploring protein fitness landscapes by directed evolution publication-title: Nat Rev Mol Cell Biol doi: 10.1038/nrm2805 – year: 2022 ident: 2023092216513424700_ref148 article-title: High-resolution de novo structure prediction from primary sequence publication-title: bioRxiv – volume: 25 start-page: 338 issue: 3 year: 2007 ident: 2023092216513424700_ref117 article-title: Improving catalytic function by ProSAR-driven enzyme evolution publication-title: Nat Biotechnol doi: 10.1038/nbt1286 – volume: 77 start-page: 257 issue: 2 year: 1989 ident: 2023092216513424700_ref53 article-title: A tutorial on hidden Markov models and selected applications in speech recognition publication-title: Proc IEEE doi: 10.1109/5.18626 – volume: 119 start-page: e2122954119 issue: 11 year: 2022 ident: 2023092216513424700_ref104 article-title: Deep learning guided optimization of human antibody against SARS-CoV-2 variants with broad neutralization publication-title: Proc Natl Acad Sci doi: 10.1073/pnas.2122954119 – year: 2013 ident: 2023092216513424700_ref54 article-title: Auto-encoding variational bayes – volume: 2 start-page: 396 issue: 2 year: 2020 ident: 2023092216513424700_ref64 article-title: Persistent cohomology for data with multicomponent heterogeneous information publication-title: SIAM J Math Data Sci doi: 10.1137/19M1272226 – volume: 5 start-page: 16 issue: 1 year: 2008 ident: 2023092216513424700_ref156 article-title: Next-generation sequencing transforms today’s biology publication-title: Nat Methods doi: 10.1038/nmeth1156 – volume: 18 start-page: 389 issue: 4 year: 2021 ident: 2023092216513424700_ref43 article-title: Low-N protein engineering with data-efficient deep learning publication-title: Nat Methods doi: 10.1038/s41592-021-01100-y – year: 2018 ident: 2023092216513424700_ref84 article-title: Persistent-homology-based machine learning and its applications–a survey doi: 10.2139/ssrn.3275996 – year: 2022 ident: 2023092216513424700_ref151 article-title: tFold-Ab: fast and accurate antibody structure prediction without sequence homologs publication-title: bioRxiv – year: 2022 ident: 2023092216513424700_ref105 article-title: Protein representation learning by geometric structure pretraining – volume: 4 start-page: 858 issue: 2 year: 2022 ident: 2023092216513424700_ref70 article-title: Persistent Laplacians: properties, algorithms and implications publication-title: SIAM J Math Data Sci doi: 10.1137/21M1435471 – volume: 12 start-page: 3168 issue: 1 year: 2021 ident: 2023092216513424700_ref102 article-title: Structure-based protein function prediction using graph convolutional networks publication-title: Nat Commun doi: 10.1038/s41467-021-23303-9 – volume: 32 year: 2019 ident: 2023092216513424700_ref106 article-title: Generative models for graph-based protein design publication-title: Adv Neural Inf Process Syst – start-page: 4602 volume-title: Proceedings of the AAAI Conference on Artificial Intelligence year: 2019 ident: 2023092216513424700_ref109 article-title: Weisfeiler and Leman go neural: higher-order graph neural networks – year: 2020 ident: 2023092216513424700_ref110 article-title: Simplicial neural networks – volume: 100 start-page: 022512 issue: 2 year: 2019 ident: 2023092216513424700_ref30 article-title: Deep learning and density-functional theory publication-title: Phys Rev A doi: 10.1103/PhysRevA.100.022512 – volume: 42 start-page: 151 issue: 3 year: 2021 ident: 2023092216513424700_ref1 article-title: Machine learning for biologics: opportunities for protein engineering, developability, and formulation publication-title: Trends Pharmacol Sci doi: 10.1016/j.tips.2020.12.004 – volume: 31 start-page: 125 issue: 3 year: 1998 ident: 2023092216513424700_ref2 article-title: Design by directed evolution publication-title: Acc Chem Res doi: 10.1021/ar960017f – start-page: 1 year: 2023 ident: 2023092216513424700_ref47 article-title: Large language models generate functional protein sequences across diverse families publication-title: Nat Biotechnol – volume: 4 start-page: 840 issue: 10 year: 2022 ident: 2023092216513424700_ref143 article-title: Transformer-based protein generation with regularized latent space optimization publication-title: Nat Mach Intell doi: 10.1038/s42256-022-00532-1 – volume: 5 start-page: e16965 year: 2016 ident: 2023092216513424700_ref111 article-title: Adaptation in protein fitness landscapes is facilitated by indirect paths publication-title: Elife doi: 10.7554/eLife.16965 – volume-title: Computational Homology year: 2004 ident: 2023092216513424700_ref76 doi: 10.1007/b97315 – volume: 26 start-page: 3785 issue: 7 year: 2021 ident: 2023092216513424700_ref71 article-title: Evolutionary de Rham-Hodge method publication-title: Discrete Continuous Dyn Syst Ser B doi: 10.3934/dcdsb.2020257 – volume-title: ICLR2022 Machine Learning for Drug Discovery year: 2022 ident: 2023092216513424700_ref130 article-title: Benchmarking uncertainty quantification for protein engineering – volume: 40 start-page: 1114 year: 2022 ident: 2023092216513424700_ref33 article-title: Learning protein fitness models from evolutionary and assay-labeled data publication-title: Nat Biotechnol doi: 10.1038/s41587-021-01146-5 – volume: 5 start-page: 501 year: 2018 ident: 2023092216513424700_ref77 article-title: Topological data analysis publication-title: Annu Rev Stat doi: 10.1146/annurev-statistics-031017-100045 – volume: 38 start-page: 2102 issue: 8 year: 2022 ident: 2023092216513424700_ref49 article-title: ProteinBERT: a universal deep-learning model of protein sequence and function publication-title: Bioinformatics doi: 10.1093/bioinformatics/btac020 – year: 2016 ident: 2023092216513424700_ref92 article-title: Semi-supervised classification with graph convolutional networks – start-page: 770 volume-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition year: 2016 ident: 2023092216513424700_ref57 article-title: Deep residual learning for image recognition – volume: 33 start-page: 12559 year: 2020 ident: 2023092216513424700_ref100 article-title: Self-supervised graph transformer on large-scale molecular data publication-title: Adv Neural Inf Process Syst – volume: 7 start-page: eabc5329 issue: 19 year: 2021 ident: 2023092216513424700_ref89 article-title: Persistent spectral–based machine learning (PerSpect ML) for protein-ligand binding affinity prediction publication-title: Sci Adv doi: 10.1126/sciadv.abc5329 – volume: 33 start-page: 249 issue: 2 year: 2005 ident: 2023092216513424700_ref26 article-title: Computing persistent homology publication-title: Discrete Comput Geom doi: 10.1007/s00454-004-1146-y – start-page: 16990 volume-title: International Conference on Machine Learning year: 2022 ident: 2023092216513424700_ref15 article-title: Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval – volume: 13 start-page: e1005690 issue: 7 year: 2017 ident: 2023092216513424700_ref108 article-title: TopologyNet: topology based deep convolutional and multi-task neural networks for biomolecular property predictions publication-title: PLoS Comput Biol doi: 10.1371/journal.pcbi.1005690 – volume: 9 start-page: 1735 issue: 8 year: 1997 ident: 2023092216513424700_ref58 article-title: Long short-term memory publication-title: Neural Comput doi: 10.1162/neco.1997.9.8.1735 – start-page: 63 volume-title: Advanced Lectures on Machine Learning: ML Summer Schools year: 2003 ident: 2023092216513424700_ref131 article-title: Gaussian processes in machine learning – volume: 102 start-page: 6679 issue: 19 year: 2005 ident: 2023092216513424700_ref3 article-title: Molecular dynamics and protein function publication-title: Proc Natl Acad Sci doi: 10.1073/pnas.0408930102 – volume: 35 start-page: e3179 issue: 3 year: 2019 ident: 2023092216513424700_ref27 article-title: DG-GL: differential geometry-based geometric learning of molecular datasets publication-title: Int J Numer Methods Biomed Eng doi: 10.1002/cnm.3179 – volume: 12 start-page: 1 issue: 1 year: 2021 ident: 2023092216513424700_ref129 article-title: ECNet is an evolutionary context-integrated deep learning framework for protein engineering publication-title: Nat Commun doi: 10.1038/s41467-021-25976-8 – start-page: 1746 volume-title: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) year: 2014 ident: 2023092216513424700_ref56 article-title: Convolutional neural networks for sentence classification doi: 10.3115/v1/D14-1181 – volume: 599 start-page: 91 issue: 7883 year: 2021 ident: 2023092216513424700_ref40 article-title: Disease variant prediction with deep generative models of evolutionary data publication-title: Nature doi: 10.1038/s41586-021-04043-8 – volume: 36 start-page: e3376 issue: 9 year: 2020 ident: 2023092216513424700_ref69 article-title: Persistent spectral graph publication-title: Int J Numer Methods Biomed Eng doi: 10.1002/cnm.3376 – volume: 16 start-page: e9380 issue: 7 year: 2020 ident: 2023092216513424700_ref55 article-title: Using deep mutational scanning to benchmark variant effect predictors and identify disease mutations publication-title: Mol Syst Biol doi: 10.15252/msb.20199380 – volume: 12 start-page: 5225 issue: 1 year: 2021 ident: 2023092216513424700_ref121 article-title: Epistatic Net allows the sparse spectral regularization of deep neural networks for inferring fitness functions publication-title: Nat Commun doi: 10.1038/s41467-021-25371-3 – start-page: 1152 volume-title: Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms year: 2018 ident: 2023092216513424700_ref65 article-title: Persistent path homology of directed networks – year: 2023 ident: 2023092216513424700_ref116 article-title: Sequence vs. structure: delving deep into data driven protein function prediction publication-title: bioRxiv – volume: 15 start-page: 779 issue: 10 year: 2002 ident: 2023092216513424700_ref7 article-title: Protein design is NP-hard publication-title: Protein Eng doi: 10.1093/protein/15.10.779 – volume: 16 start-page: 703 issue: 5 year: 2009 ident: 2023092216513424700_ref127 article-title: Interpretable numerical descriptors of amino acid space publication-title: J Comput Biol doi: 10.1089/cmb.2008.0173 – volume: 33 start-page: 5812 year: 2020 ident: 2023092216513424700_ref99 article-title: Graph contrastive learning with augmentations publication-title: Adv Neural Inf Process Syst – volume: 35 start-page: 53 issue: 1 year: 2018 ident: 2023092216513424700_ref139 article-title: Generative adversarial networks: an overview publication-title: IEEE Signal Process Mag doi: 10.1109/MSP.2017.2765202 – volume: 12 issue: 5 year: 2011 ident: 2023092216513424700_ref137 article-title: X-armed bandits publication-title: J Mach Learn Res – volume: 3 start-page: 149 year: 2023 ident: 2023092216513424700_ref20 article-title: Persistent spectral theory-guided protein engineering publication-title: Nat Comput Sci doi: 10.1038/s43588-022-00394-y – volume: 48 year: 2022 ident: 2023092216513424700_ref153 article-title: Novel machine learning approaches revolutionize protein knowledge publication-title: Trends Biochem Sci – year: 2016 ident: 2023092216513424700_ref97 article-title: Variational graph auto-encoders – volume: 15 start-page: 816 issue: 10 year: 2018 ident: 2023092216513424700_ref22 article-title: Deep generative models of genetic variation capture the effects of mutations publication-title: Nat Methods doi: 10.1038/s41592-018-0138-4 – volume: 4 start-page: 481 year: 2020 ident: 2023092216513424700_ref67 article-title: Evolutionary homology on coupled dynamical systems with applications to protein flexibility analysis publication-title: J Appl Comput Topol doi: 10.1007/s41468-020-00057-9 – volume: 72 start-page: 145 year: 2022 ident: 2023092216513424700_ref115 article-title: Adaptive machine learning for protein engineering publication-title: Curr Opin Struct Biol doi: 10.1016/j.sbi.2021.11.002 – reference: 37547662 - ArXiv. 2023 Jul 27:arXiv:2307.14587v1. |
| SSID | ssj0020781 |
| Score | 2.556515 |
| SecondaryResourceType | review_article |
| Snippet | Abstract
Protein engineering is an emerging field in biotechnology that has the potential to revolutionize various areas, such as antibody design, drug... Protein engineering is an emerging field in biotechnology that has the potential to revolutionize various areas, such as antibody design, drug discovery, food... |
| SourceID | pubmedcentral proquest pubmed crossref oup |
| SourceType | Open Access Repository Aggregation Database Index Database Enrichment Source Publisher |
| SubjectTerms | Antibodies Artificial Intelligence Biotechnology Data Analysis Drug development Food security Machine learning Natural Language Processing Protein Engineering Protein structure Proteins Review Topology |
| Title | Artificial intelligence-aided protein engineering: from topological data analysis to deep protein language models |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/37580175 https://www.proquest.com/docview/3049109858 https://www.proquest.com/docview/2851141982 https://pubmed.ncbi.nlm.nih.gov/PMC10516362 |
| Volume | 24 |
| WOSCitedRecordID | wos001047878100001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVASL databaseName: Oxford Journals Open Access Collection customDbUrl: eissn: 1477-4054 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0020781 issn: 1467-5463 databaseCode: TOX dateStart: 20000101 isFulltext: true titleUrlDefault: https://academic.oup.com/journals/ providerName: Oxford University Press |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8QwEB5UFLz4ftRnBE9Cse8m3kQUT6uHFfZWmhcWpLtuV8F_70zbrbsi6nkmNGSSzDedyTcA53GoVZxHxg20xAAlFJbyu8K1fo4AQaFL1FHdbCLt9fhgIB7bAtnqhxS-CC9lIS-lzDWGBnjV-jGnRgX9h0EXVxFfTfOIKHWJ3b19hvdt7JzjmXvMNoMpv5dGzviau_X_znID1lo0ya4b82_Cgim3YKXpL_mxDa8kaCgiWDHDvekSL6RmNUdDUTLzxUl4xei9CZs0rRPIgIxqSFnecpeghGljRt3Y6f9OVrfUqXbg6e62f3Pvtj0WXIVAaOJi9KWFlgiybKBUkEeRlDoxRMaU2tizqfVM4ifKT1L0_CYUuTXKMzr1reKCJ-EuLJXD0uwDixVikVAaoa3GayHOMboREm_EVHoSreTAxdQAmWoJyKkPxkvWJMLDDNcwa9fQgfNOedTwbvysdoqW_F3jaGrlrD2eVUa5RVTkMXfgrBPjwaJsSV6a4VuVBYRFI1_wwIG9ZlN03wkxysKrLHaAz22XToFIu-clZfFck3cjnkUInAQHf878EFaptz0VpwTeESxNxm_mGJbV-6SoxiewmA74SX0QPgFVKwdl |
| linkProvider | Oxford University Press |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Artificial+intelligence-aided+protein+engineering%3A+from+topological+data+analysis+to+deep+protein+language+models&rft.jtitle=Briefings+in+bioinformatics&rft.au=Qiu%2C+Yuchi&rft.au=Wei%2C+Guo-Wei&rft.date=2023-09-20&rft.issn=1467-5463&rft.eissn=1477-4054&rft.volume=24&rft.issue=5&rft_id=info:doi/10.1093%2Fbib%2Fbbad289&rft.externalDBID=n%2Fa&rft.externalDocID=10_1093_bib_bbad289 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1467-5463&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1467-5463&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1467-5463&client=summon |