Artificial intelligence-aided protein engineering: from topological data analysis to deep protein language models

Abstract Protein engineering is an emerging field in biotechnology that has the potential to revolutionize various areas, such as antibody design, drug discovery, food security, ecology, and more. However, the mutational space involved is too vast to be handled through experimental means alone. Leve...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Briefings in bioinformatics Ročník 24; číslo 5
Hlavní autoři: Qiu, Yuchi, Wei, Guo-Wei
Médium: Journal Article
Jazyk:angličtina
Vydáno: England Oxford University Press 20.09.2023
Oxford Publishing Limited (England)
Témata:
ISSN:1467-5463, 1477-4054, 1477-4054
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Abstract Protein engineering is an emerging field in biotechnology that has the potential to revolutionize various areas, such as antibody design, drug discovery, food security, ecology, and more. However, the mutational space involved is too vast to be handled through experimental means alone. Leveraging accumulative protein databases, machine learning (ML) models, particularly those based on natural language processing (NLP), have considerably expedited protein engineering. Moreover, advances in topological data analysis (TDA) and artificial intelligence-based protein structure prediction, such as AlphaFold2, have made more powerful structure-based ML-assisted protein engineering strategies possible. This review aims to offer a comprehensive, systematic, and indispensable set of methodological components, including TDA and NLP, for protein engineering and to facilitate their future development.
AbstractList Protein engineering is an emerging field in biotechnology that has the potential to revolutionize various areas, such as antibody design, drug discovery, food security, ecology, and more. However, the mutational space involved is too vast to be handled through experimental means alone. Leveraging accumulative protein databases, machine learning (ML) models, particularly those based on natural language processing (NLP), have considerably expedited protein engineering. Moreover, advances in topological data analysis (TDA) and artificial intelligence-based protein structure prediction, such as AlphaFold2, have made more powerful structure-based ML-assisted protein engineering strategies possible. This review aims to offer a comprehensive, systematic, and indispensable set of methodological components, including TDA and NLP, for protein engineering and to facilitate their future development.
Abstract Protein engineering is an emerging field in biotechnology that has the potential to revolutionize various areas, such as antibody design, drug discovery, food security, ecology, and more. However, the mutational space involved is too vast to be handled through experimental means alone. Leveraging accumulative protein databases, machine learning (ML) models, particularly those based on natural language processing (NLP), have considerably expedited protein engineering. Moreover, advances in topological data analysis (TDA) and artificial intelligence-based protein structure prediction, such as AlphaFold2, have made more powerful structure-based ML-assisted protein engineering strategies possible. This review aims to offer a comprehensive, systematic, and indispensable set of methodological components, including TDA and NLP, for protein engineering and to facilitate their future development.
Protein engineering is an emerging field in biotechnology that has the potential to revolutionize various areas, such as antibody design, drug discovery, food security, ecology, and more. However, the mutational space involved is too vast to be handled through experimental means alone. Leveraging accumulative protein databases, machine learning (ML) models, particularly those based on natural language processing (NLP), have considerably expedited protein engineering. Moreover, advances in topological data analysis (TDA) and artificial intelligence-based protein structure prediction, such as AlphaFold2, have made more powerful structure-based ML-assisted protein engineering strategies possible. This review aims to offer a comprehensive, systematic, and indispensable set of methodological components, including TDA and NLP, for protein engineering and to facilitate their future development.Protein engineering is an emerging field in biotechnology that has the potential to revolutionize various areas, such as antibody design, drug discovery, food security, ecology, and more. However, the mutational space involved is too vast to be handled through experimental means alone. Leveraging accumulative protein databases, machine learning (ML) models, particularly those based on natural language processing (NLP), have considerably expedited protein engineering. Moreover, advances in topological data analysis (TDA) and artificial intelligence-based protein structure prediction, such as AlphaFold2, have made more powerful structure-based ML-assisted protein engineering strategies possible. This review aims to offer a comprehensive, systematic, and indispensable set of methodological components, including TDA and NLP, for protein engineering and to facilitate their future development.
Author Wei, Guo-Wei
Qiu, Yuchi
Author_xml – sequence: 1
  givenname: Yuchi
  surname: Qiu
  fullname: Qiu, Yuchi
– sequence: 2
  givenname: Guo-Wei
  surname: Wei
  fullname: Wei, Guo-Wei
  email: weig@msu.edu
BackLink https://www.ncbi.nlm.nih.gov/pubmed/37580175$$D View this record in MEDLINE/PubMed
BookMark eNp9kctrFjEUxYNU7ENX7mVAEEHGJjN5zHQjpfiCghtdh0xyM6ZkkmmSEfrfm4_va9Eiru6F-zuHwz2n6CjEAAi9JPg9wWN_PrnpfJqU6YbxCTohVIiWYkaPdjsXLaO8P0anOd9g3GExkGfouBdswESwE3R7mYqzTjvlGxcKeO9mCBpa5QyYZk2xgAsNhNkFgOTCfNHYFJemxDX6ODtdhUYV1aig_F12uV4aA7A-aL0K86ZmaJZowOfn6KlVPsOLwzxDPz59_H71pb3-9vnr1eV1qyklpe3oaEYzYTbaTutOUTpNhgPrMBaWYSssBk64JlxgTqAflQWNwQhi9TAOvD9DH_a-6zYtYDSEkpSXa3KLSncyKif_vgT3U87xlySYEd7zrjq8PTikeLtBLnJxWdcXqQBxy7IbGCGUjMMOff0IvYlbqh_Jssd0rD0NbKjUqz8jPWS5r6MCZA_oFHNOYKV2RRUXdwmdr9HkrnJZK5eHyqvm3SPNve2_6Td7Om7rf8Hf4AW95g
CitedBy_id crossref_primary_10_1002_qub2_70013
crossref_primary_10_3390_ijms241612703
crossref_primary_10_7554_eLife_102788_3
crossref_primary_10_1016_j_isci_2025_113324
crossref_primary_10_1016_j_jmb_2024_168715
crossref_primary_10_1002_chem_202303889
crossref_primary_10_1021_acs_chemrev_4c00595
crossref_primary_10_1093_pnasnexus_pgae158
crossref_primary_10_1002_cctc_202401542
crossref_primary_10_1007_s00018_025_05770_1
crossref_primary_10_1016_j_jpha_2024_101081
crossref_primary_10_1016_j_tem_2024_01_011
crossref_primary_10_1007_s11274_025_04475_8
crossref_primary_10_1038_s43588_024_00724_2
crossref_primary_10_1007_s10462_024_10710_9
crossref_primary_10_1021_acs_langmuir_4c04140
crossref_primary_10_3390_molecules29194626
crossref_primary_10_1016_j_tibtech_2024_04_003
crossref_primary_10_1007_s12672_025_03395_1
crossref_primary_10_2174_0113816128349577240927071706
crossref_primary_10_1016_j_biotechadv_2025_108601
crossref_primary_10_7554_eLife_102788
crossref_primary_10_1016_j_biotechadv_2024_108459
crossref_primary_10_1038_s41598_025_90828_0
crossref_primary_10_1063_5_0280985
crossref_primary_10_59717_j_xinn_life_2024_100105
crossref_primary_10_1002_mlf2_70009
crossref_primary_10_3390_a16100465
Cites_doi 10.1038/s41467-022-29874-5
10.1002/humu.22225
10.1038/s43588-022-00373-3
10.1371/journal.pcbi.1005786
10.1093/nar/gki387
10.1093/nar/28.1.235
10.1016/j.isci.2020.100939
10.1016/j.compbiomed.2022.106262
10.1038/s41586-018-0337-2
10.1145/1064092.1064133
10.1038/s41592-019-0496-6
10.1063/1674-0068/cjcp2109150
10.1038/s41592-019-0598-1
10.1007/s10822-018-0146-6
10.1063/1.4978997
10.1093/nar/gkn159
10.1038/s41467-021-25831-w
10.1016/bs.mie.2020.05.005
10.1038/nature17995
10.1371/journal.pcbi.1002195
10.1038/s42256-019-0017-4
10.1016/j.sbi.2021.01.008
10.1093/nar/gky995
10.3934/fods.2023010
10.1007/s41468-019-00038-7
10.1073/pnas.2016239118
10.1016/j.copbio.2022.102713
10.1007/s11042-022-13428-4
10.1038/s41586-021-03819-2
10.1021/acscatal.9b04321
10.1038/nbt.3769
10.1016/j.sbi.2023.102627
10.1126/science.ade2574
10.1038/s41467-023-36048-4
10.1038/s41587-020-00793-4
10.1073/pnas.1215251110
10.1515/mlbmb-2015-0009
10.1038/s41592-019-0583-8
10.1371/journal.pcbi.1009284
10.1021/acs.jcim.0c01415
10.3934/fods.2022015
10.1090/S0273-0979-07-01191-3
10.1038/s41467-022-33004-6
10.1093/bioinformatics/bty862
10.21203/rs.3.rs-1969991/v1
10.1126/science.1257360
10.1038/nature19791
10.1038/s42003-023-04866-3
10.1126/science.abj8754
10.1038/s41592-020-0848-2
10.1093/nar/gkaa1100
10.1002/cnm.2914
10.1038/s41467-021-22732-w
10.1038/s41587-022-01432-w
10.1007/s10958-020-04897-9
10.1016/S0969-2126(97)00260-8
10.1145/997817.997870
10.1021/acs.jcim.2c01046
10.1038/s41598-019-55660-3
10.3390/a13010019
10.1038/s43588-021-00168-y
10.1021/acssynbio.8b00155
10.1126/science.aad8865
10.1016/j.sbi.2022.102518
10.1038/s41467-022-29443-w
10.1109/TPAMI.2020.3013679
10.1093/nar/gkr1178
10.1016/j.cels.2021.07.008
10.1090/conm/453/08802
10.1093/bib/bbab127
10.1021/acs.jcim.9b00334
10.1145/3447548.3467311
10.1038/nrm2805
10.1038/nbt1286
10.1109/5.18626
10.1073/pnas.2122954119
10.1137/19M1272226
10.1038/nmeth1156
10.1038/s41592-021-01100-y
10.2139/ssrn.3275996
10.1137/21M1435471
10.1038/s41467-021-23303-9
10.1103/PhysRevA.100.022512
10.1016/j.tips.2020.12.004
10.1021/ar960017f
10.1038/s42256-022-00532-1
10.7554/eLife.16965
10.1007/b97315
10.3934/dcdsb.2020257
10.1038/s41587-021-01146-5
10.1146/annurev-statistics-031017-100045
10.1093/bioinformatics/btac020
10.1126/sciadv.abc5329
10.1007/s00454-004-1146-y
10.1371/journal.pcbi.1005690
10.1162/neco.1997.9.8.1735
10.1073/pnas.0408930102
10.1002/cnm.3179
10.1038/s41467-021-25976-8
10.3115/v1/D14-1181
10.1038/s41586-021-04043-8
10.1002/cnm.3376
10.15252/msb.20199380
10.1038/s41467-021-25371-3
10.1093/protein/15.10.779
10.1089/cmb.2008.0173
10.1109/MSP.2017.2765202
10.1038/s43588-022-00394-y
10.1038/s41592-018-0138-4
10.1007/s41468-020-00057-9
10.1016/j.sbi.2021.11.002
ContentType Journal Article
Copyright The Author(s) 2023. Published by Oxford University Press. 2023
The Author(s) 2023. Published by Oxford University Press.
Copyright_xml – notice: The Author(s) 2023. Published by Oxford University Press. 2023
– notice: The Author(s) 2023. Published by Oxford University Press.
DBID TOX
AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7QO
7SC
8FD
FR3
JQ2
K9.
L7M
L~C
L~D
P64
RC3
7X8
5PM
DOI 10.1093/bib/bbad289
DatabaseName Oxford Journals Open Access Collection
CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Biotechnology Research Abstracts
Computer and Information Systems Abstracts
Technology Research Database
Engineering Research Database
ProQuest Computer Science Collection
ProQuest Health & Medical Complete (Alumni)
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Biotechnology and BioEngineering Abstracts
Genetics Abstracts
MEDLINE - Academic
PubMed Central (Full Participant titles)
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Genetics Abstracts
Biotechnology Research Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
ProQuest Health & Medical Complete (Alumni)
Engineering Research Database
Advanced Technologies Database with Aerospace
Biotechnology and BioEngineering Abstracts
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitleList MEDLINE
CrossRef

MEDLINE - Academic

Genetics Abstracts
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: TOX
  name: Oxford Journals Open Access Collection
  url: https://academic.oup.com/journals/
  sourceTypes: Publisher
– sequence: 3
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 1477-4054
ExternalDocumentID PMC10516362
37580175
10_1093_bib_bbad289
10.1093/bib/bbad289
Genre Research Support, U.S. Gov't, Non-P.H.S
Review
Research Support, Non-U.S. Gov't
Journal Article
Research Support, N.I.H., Extramural
GrantInformation_xml – fundername: NIAID NIH HHS
  grantid: R01 AI164266
– fundername: NIGMS NIH HHS
  grantid: R35 GM148196
– fundername: NIGMS NIH HHS
  grantid: R01 GM126189
– fundername: NIH HHS
  grantid: R01GM126189
– fundername: ;
– fundername: ;
  grantid: DMS-2052983; DMS-1761320; IIS-1900473
– fundername: ;
  grantid: 80NSSC21M0023
– fundername: ;
  grantid: R01GM126189; R35GM148196; R01AI164266
– fundername: ;
  grantid: 65109
GroupedDBID ---
-E4
.2P
.I3
0R~
1TH
23N
2WC
36B
4.4
48X
53G
5GY
5VS
6J9
70D
8VB
AAGQS
AAHBH
AAIJN
AAIMJ
AAJKP
AAJQQ
AAMDB
AAMVS
AAOGV
AAPQZ
AAPXW
AARHZ
AAUQX
AAVAP
AAVLN
ABDBF
ABEJV
ABEUO
ABGNP
ABIXL
ABNKS
ABPQP
ABPTD
ABQLI
ABQTQ
ABWST
ABXVV
ABXZS
ABZBJ
ACGFO
ACGFS
ACGOD
ACIWK
ACPRK
ACUFI
ACUHS
ACUXJ
ACYTK
ADBBV
ADEYI
ADFTL
ADGKP
ADGZP
ADHKW
ADHZD
ADOCK
ADPDF
ADQBN
ADRDM
ADRTK
ADVEK
ADYVW
ADZTZ
ADZXQ
AECKG
AEGPL
AEGXH
AEJOX
AEKKA
AEKSI
AELWJ
AEMDU
AEMOZ
AENEX
AENZO
AEPUE
AETBJ
AEWNT
AFFZL
AFGWE
AFIYH
AFOFC
AFRAH
AGINJ
AGKEF
AGQXC
AGSYK
AHMBA
AHQJS
AHXPO
AIAGR
AIJHB
AJEEA
AJEUX
AKHUL
AKVCP
AKWXX
ALMA_UNASSIGNED_HOLDINGS
ALTZX
ALUQC
ALXQX
AMNDL
ANAKG
APIBT
APWMN
ARIXL
AXUDD
AYOIW
AZVOD
BAWUL
BAYMD
BEYMZ
BHONS
BQDIO
BQUQU
BSWAC
BTQHN
C1A
C45
CAG
CDBKE
COF
CS3
CZ4
DAKXR
DIK
DILTD
DU5
D~K
E3Z
EAD
EAP
EAS
EBA
EBC
EBD
EBR
EBS
EBU
EE~
EJD
EMB
EMK
EMOBN
EST
ESX
F5P
F9B
FHSFR
FLIZI
FLUFQ
FOEOM
FQBLK
GAUVT
GJXCC
GROUPED_DOAJ
GX1
H13
H5~
HAR
HW0
HZ~
IOX
J21
JXSIZ
K1G
KBUDW
KOP
KSI
KSN
M-Z
M49
MK~
ML0
N9A
NGC
NLBLG
NMDNZ
NOMLY
NU-
O0~
O9-
OAWHX
ODMLO
OJQWA
OK1
OVD
OVEED
P2P
PAFKI
PEELM
PQQKQ
Q1.
Q5Y
QWB
RD5
RPM
RUSNO
RW1
RXO
SV3
TEORI
TH9
TJP
TLC
TOX
TR2
TUS
W8F
WOQ
X7H
YAYTL
YKOAZ
YXANX
ZKX
ZL0
~91
77I
AAYXX
AHGBF
CITATION
ROX
CGR
CUY
CVF
ECM
EIF
NPM
7QO
7SC
8FD
FR3
JQ2
K9.
L7M
L~C
L~D
P64
RC3
7X8
5PM
ID FETCH-LOGICAL-c441t-249d9db059f2cc2a44bbd6e52007f50f7f0e616c167061e39afec0ed71fc89863
IEDL.DBID TOX
ISICitedReferencesCount 29
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001047878100001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1467-5463
1477-4054
IngestDate Tue Sep 30 17:12:52 EDT 2025
Thu Oct 02 10:22:25 EDT 2025
Fri Oct 03 04:10:57 EDT 2025
Tue Oct 21 01:40:08 EDT 2025
Sat Nov 29 05:43:38 EST 2025
Tue Nov 18 22:33:08 EST 2025
Wed Apr 02 07:05:26 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 5
Keywords protein engineering
deep learning and machine learning
protein language models
topological data analysis
Language English
License This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
https://creativecommons.org/licenses/by/4.0
The Author(s) 2023. Published by Oxford University Press.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c441t-249d9db059f2cc2a44bbd6e52007f50f7f0e616c167061e39afec0ed71fc89863
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ObjectType-Article-2
ObjectType-Feature-3
content type line 23
ObjectType-Review-1
OpenAccessLink https://dx.doi.org/10.1093/bib/bbad289
PMID 37580175
PQID 3049109858
PQPubID 26846
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_10516362
proquest_miscellaneous_2851141982
proquest_journals_3049109858
pubmed_primary_37580175
crossref_citationtrail_10_1093_bib_bbad289
crossref_primary_10_1093_bib_bbad289
oup_primary_10_1093_bib_bbad289
PublicationCentury 2000
PublicationDate 2023-09-20
PublicationDateYYYYMMDD 2023-09-20
PublicationDate_xml – month: 09
  year: 2023
  text: 2023-09-20
  day: 20
PublicationDecade 2020
PublicationPlace England
PublicationPlace_xml – name: England
– name: Oxford
PublicationTitle Briefings in bioinformatics
PublicationTitleAlternate Brief Bioinform
PublicationYear 2023
Publisher Oxford University Press
Oxford Publishing Limited (England)
Publisher_xml – name: Oxford University Press
– name: Oxford Publishing Limited (England)
References Hopf (2023092216513424700_ref52) 2019; 35
You (2023092216513424700_ref99) 2020; 33
Kim (2023092216513424700_ref56) 2014
Ebli (2023092216513424700_ref110) 2020
Chen (2023092216513424700_ref88) 2022; 151
Kipf (2023092216513424700_ref97) 2016
Wei (2023092216513424700_ref86) 2023
Hsu (2023092216513424700_ref33) 2022; 40
Nguyen (2023092216513424700_ref29) 2019; 59
Wang (2023092216513424700_ref17) 2020; 2
Nguyen (2023092216513424700_ref27) 2019; 35
Hopf (2023092216513424700_ref38) 2017; 35
Wittmann (2023092216513424700_ref34) 2021; 12
Hamilton (2023092216513424700_ref94) 2017; 30
Cang (2023092216513424700_ref82) 2015; 3
Aghazadeh (2023092216513424700_ref121) 2021; 12
Qiu (2023092216513424700_ref124) 2021; 1
Gupta (2023092216513424700_ref140) 2019; 1
He (2023092216513424700_ref57) 2016
Veličković (2023092216513424700_ref98) 2018
Rasmussen, Carl Edward (2023092216513424700_ref131) 2003
Cang (2023092216513424700_ref67) 2020; 4
Kandathil (2023092216513424700_ref145) 2023; 81
Saito (2023092216513424700_ref134) 2018; 7
Schymkowitz (2023092216513424700_ref18) 2005; 33
Livesey (2023092216513424700_ref55) 2020; 16
Mazurenko (2023092216513424700_ref9) 2019; 10
Jumper (2023092216513424700_ref24) 2021; 596
Qiu (2023092216513424700_ref20) 2023; 3
Hochreiter (2023092216513424700_ref58) 1997; 9
Kingma (2023092216513424700_ref54) 2013
Greenman (2023092216513424700_ref130) 2022
Brandes (2023092216513424700_ref49) 2022; 38
Wang (2023092216513424700_ref73) 2023; 5
Leman (2023092216513424700_ref19) 2020; 17
Lütgehetmann (2023092216513424700_ref66) 2020; 13
Romero (2023092216513424700_ref5) 2009; 10
Fang (2023092216513424700_ref149) 2022
Madani (2023092216513424700_ref47) 2023
Eddy (2023092216513424700_ref51) 2011; 7
Vaswani (2023092216513424700_ref59) 2017
Pun (2023092216513424700_ref84) 2018
Arnold (2023092216513424700_ref2) 1998; 31
Berman (2023092216513424700_ref13) 2000; 28
Rong (2023092216513424700_ref100) 2020; 33
Tian (2023092216513424700_ref116) 2023
Cohen-Steiner (2023092216513424700_ref79) 2005
Clough (2023092216513424700_ref83) 2020; 44
Siedhoff (2023092216513424700_ref8) 2020; 643
Notin (2023092216513424700_ref15) 2022
Rives (2023092216513424700_ref23) 2021; 118
Mémoli (2023092216513424700_ref70) 2022; 4
Chen (2023092216513424700_ref71) 2021; 26
Rao (2023092216513424700_ref41) 2019; 32
Veličković (2023092216513424700_ref93) 2017
Orengo (2023092216513424700_ref46) 1997; 5
Creswell (2023092216513424700_ref139) 2018; 35
Cang (2023092216513424700_ref64) 2020; 2
Lin (2023092216513424700_ref50) 2023; 379
Chidyausiku (2023092216513424700_ref154) 2022; 13
Ryczko (2023092216513424700_ref30) 2019; 100
Wasserman (2023092216513424700_ref77) 2018; 5
Zhang (2023092216513424700_ref105) 2022
Cang (2023092216513424700_ref120) 2017; 33
Podgornaia (2023092216513424700_ref112) 2015; 347
Shen (2023092216513424700_ref128)
Grigor’yan (2023092216513424700_ref90) 2020; 248
Baek (2023092216513424700_ref144) 2021; 373
Schuster (2023092216513424700_ref156) 2008; 5
(2023092216513424700_ref14) 2021; 49
Zomorodian (2023092216513424700_ref63) 2004
Detlefsen (2023092216513424700_ref61) 2022; 13
Zhang (2023092216513424700_ref119) 2020; 23
Sarkisyan (2023092216513424700_ref157) 2016; 533
Shin (2023092216513424700_ref141) 2021; 12
Yang (2023092216513424700_ref12) 2019; 16
Frazer (2023092216513424700_ref40) 2021; 599
Bubenik (2023092216513424700_ref80) 2015; 16
Boyken (2023092216513424700_ref4) 2016; 352
Georgiev (2023092216513424700_ref127) 2009; 16
Ingraham (2023092216513424700_ref106) 2019; 32
Bachas (2023092216513424700_ref142) 2022
Castro (2023092216513424700_ref143) 2022; 4
Wang (2023092216513424700_ref69) 2020; 36
Wei (2023092216513424700_ref72) 2021
Diaz (2023092216513424700_ref10) 2023; 78
Chowdhury (2023092216513424700_ref65) 2018
Bubeck (2023092216513424700_ref137) 2011; 12
Chen (2023092216513424700_ref75) 2023
Gligorijević (2023092216513424700_ref102) 2021; 12
Barrett (2023092216513424700_ref150) 2022
Liu (2023092216513424700_ref103) 2021; 17
Qiu (2023092216513424700_ref125) 2022; 62
Ghrist (2023092216513424700_ref78) 2008; 45
Bedbrook (2023092216513424700_ref133) 2017; 13
Wu (2023092216513424700_ref111) 2016; 5
Chen (2023092216513424700_ref32) 2021; 34
Hsu (2023092216513424700_ref45) 2022
Hie (2023092216513424700_ref115) 2022; 72
Meng (2023092216513424700_ref68) 2020; 10
Meier (2023092216513424700_ref44) 2021; 34
Adams (2023092216513424700_ref81) 2017; 18
Bedbrook (2023092216513424700_ref132) 2019; 16
Weissenow (2023092216513424700_ref152) 2022
Edelsbrunner (2023092216513424700_ref62) 2008; 453
Shihab (2023092216513424700_ref37) 2013; 34
Luo (2023092216513424700_ref129) 2021; 12
Edelsbrunner (2023092216513424700_ref25) 2010
Liu (2023092216513424700_ref74) 2021; 22
Morris (2023092216513424700_ref109) 2019
Riesselman (2023092216513424700_ref22) 2018; 15
Meng (2023092216513424700_ref89) 2021; 7
Greenhalgh (2023092216513424700_ref135) 2021; 12
Bordin (2023092216513424700_ref153) 2022; 48
Wee (2023092216513424700_ref28) 2021; 61
Shan (2023092216513424700_ref104) 2022; 119
Cang (2023092216513424700_ref108) 2017; 13
Rabiner (2023092216513424700_ref53) 1989; 77
Freschlin (2023092216513424700_ref114) 2022; 75
Fox (2023092216513424700_ref117) 2007; 25
Wu (2023092216513424700_ref148) 2022
Munos (2023092216513424700_ref138) 2011; 24
Kipf (2023092216513424700_ref92) 2016
Devlin (2023092216513424700_ref60) 2018
Guo (2023092216513424700_ref118) 2008; 36
Wang (2023092216513424700_ref147) 2022; 2
Chowdhury (2023092216513424700_ref146) 2022; 40
Li (2023092216513424700_ref107) 2022
Li (2023092216513424700_ref101) 2021
Bepler (2023092216513424700_ref42) 2018
Biswas (2023092216513424700_ref43) 2021; 18
Bryant (2023092216513424700_ref123) 2021; 39
Rao (2023092216513424700_ref39) 2021
Zhang (2023092216513424700_ref113) 2023; 14
Hansen (2023092216513424700_ref91) 2019; 3
Cang (2023092216513424700_ref16) 2018; 34
Kaczynski (2023092216513424700_ref76) 2004
Xu (2023092216513424700_ref95) 2018
Li (2023092216513424700_ref96) 2015
Federhen (2023092216513424700_ref48) 2012; 40
Bhardwaj (2023092216513424700_ref6) 2016; 538
Wittmann (2023092216513424700_ref11) 2021; 69
Wu (2023092216513424700_ref151) 2022
Karplus (2023092216513424700_ref3) 2005; 102
Khurana (2023092216513424700_ref35) 2023; 82
Zomorodian (2023092216513424700_ref26) 2005; 33
El-Gebali (2023092216513424700_ref36) 2019; 47
Thean (2023092216513424700_ref126) 2022; 13
Narayanan (2023092216513424700_ref1) 2021; 42
Romero (2023092216513424700_ref136) 2013; 110
Butler (2023092216513424700_ref31) 2018; 559
Pierce (2023092216513424700_ref7) 2002; 15
Stolz (2023092216513424700_ref85) 2017; 27
Dallago (2023092216513424700_ref122) 2021
Keros (2023092216513424700_ref155) 2022
Alley (2023092216513424700_ref21) 2019; 16
Nguyen (2023092216513424700_ref87) 2019; 33
37547662 - ArXiv. 2023 Jul 27:arXiv:2307.14587v1.
References_xml – volume: 13
  start-page: 2219
  issue: 1
  year: 2022
  ident: 2023092216513424700_ref126
  article-title: Machine learning-coupled combinatorial mutagenesis enables resource-efficient engineering of CRISPR-Cas9 genome editor activities
  publication-title: Nat Commun
  doi: 10.1038/s41467-022-29874-5
– volume: 34
  start-page: 57
  issue: 1
  year: 2013
  ident: 2023092216513424700_ref37
  article-title: Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models
  publication-title: Hum Mutat
  doi: 10.1002/humu.22225
– year: 2022
  ident: 2023092216513424700_ref107
  article-title: Orientation-aware graph neural networks for protein structure representation learning
– volume: 2
  start-page: 804
  issue: 12
  year: 2022
  ident: 2023092216513424700_ref147
  article-title: Single-sequence protein structure prediction using supervised transformer protein language models
  publication-title: Nat Comput Sci
  doi: 10.1038/s43588-022-00373-3
– year: 2023
  ident: 2023092216513424700_ref86
  article-title: Topological data analysis hearing the shapes of drums and bells
– volume: 13
  start-page: e1005786
  issue: 10
  year: 2017
  ident: 2023092216513424700_ref133
  article-title: Machine learning to design integral membrane channelrhodopsins for efficient eukaryotic expression and plasma membrane localization
  publication-title: PLoS Comput Biol
  doi: 10.1371/journal.pcbi.1005786
– start-page: 7133
  volume-title: Proceedings of the AAAI Conference on Artificial Intelligence
  year: 2022
  ident: 2023092216513424700_ref155
  article-title: Dist2Cycle: a simplicial neural network for homology localization
– volume: 33
  start-page: W382
  issue: suppl_2
  year: 2005
  ident: 2023092216513424700_ref18
  article-title: The FoldX web server: an online force field
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gki387
– volume: 28
  start-page: 235
  issue: 1
  year: 2000
  ident: 2023092216513424700_ref13
  article-title: The protein data bank
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/28.1.235
– volume: 18
  year: 2017
  ident: 2023092216513424700_ref81
  article-title: Persistence images: a stable vector representation of persistent homology
  publication-title: J Mach Learn Res
– start-page: 8844
  volume-title: International Conference on Machine Learning
  year: 2021
  ident: 2023092216513424700_ref39
  article-title: MSA transformer
– volume: 23
  start-page: 100939
  issue: 3
  year: 2020
  ident: 2023092216513424700_ref119
  article-title: MutaBind2: predicting the impacts of single and multiple mutations on protein-protein interactions
  publication-title: iScience
  doi: 10.1016/j.isci.2020.100939
– volume: 151
  start-page: 106262
  year: 2022
  ident: 2023092216513424700_ref88
  article-title: Persistent Laplacian projected Omicron BA.4 and BA.5 to become new dominating variants
  publication-title: Comput Biol Med
  doi: 10.1016/j.compbiomed.2022.106262
– volume: 559
  start-page: 547
  issue: 7715
  year: 2018
  ident: 2023092216513424700_ref31
  article-title: Machine learning for molecular and materials science
  publication-title: Nature
  doi: 10.1038/s41586-018-0337-2
– start-page: 263
  volume-title: Proceedings of the Twenty-First Annual Symposium on Computational Geometry
  year: 2005
  ident: 2023092216513424700_ref79
  article-title: Stability of persistence diagrams
  doi: 10.1145/1064092.1064133
– volume: 33
  start-page: 3549
  issue: 22
  year: 2017
  ident: 2023092216513424700_ref120
  article-title: Analysis and prediction of protein folding energy changes upon mutation by element specific persistent homology
  publication-title: Bioinformatics
– year: 2018
  ident: 2023092216513424700_ref98
  article-title: Deep graph infomax
– volume: 16
  start-page: 687
  issue: 8
  year: 2019
  ident: 2023092216513424700_ref12
  article-title: Machine-learning-guided directed evolution for protein engineering
  publication-title: Nat Methods
  doi: 10.1038/s41592-019-0496-6
– volume: 32
  start-page: 9689
  year: 2019
  ident: 2023092216513424700_ref41
  article-title: Evaluating protein transfer learning with TAPE
  publication-title: Adv Neural Inf Process
– start-page: 8946
  volume-title: International Conference on Machine Learning
  year: 2022
  ident: 2023092216513424700_ref45
  article-title: Learning inverse folding from millions of predicted structures
– volume: 34
  start-page: 683
  issue: 6
  year: 2021
  ident: 2023092216513424700_ref32
  article-title: MLIMC: machine learning-based implicit-solvent Monte Carlo
  publication-title: Chin J Chem Phys
  doi: 10.1063/1674-0068/cjcp2109150
– volume: 16
  start-page: 1315
  issue: 12
  year: 2019
  ident: 2023092216513424700_ref21
  article-title: Unified rational protein engineering with sequence-based deep representation learning
  publication-title: Nat Methods
  doi: 10.1038/s41592-019-0598-1
– volume: 33
  start-page: 71
  year: 2019
  ident: 2023092216513424700_ref87
  article-title: Mathematical deep learning for pose and binding affinity prediction and ranking in D3R Grand Challenges
  publication-title: J Comput Aided Mol Des
  doi: 10.1007/s10822-018-0146-6
– volume: 27
  start-page: 047410
  issue: 4
  year: 2017
  ident: 2023092216513424700_ref85
  article-title: Persistent homology of time-dependent functional networks constructed from coupled time series
  publication-title: Chaos
  doi: 10.1063/1.4978997
– volume: 36
  start-page: 3025
  issue: 9
  year: 2008
  ident: 2023092216513424700_ref118
  article-title: Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkn159
– volume: 12
  start-page: 5825
  issue: 1
  year: 2021
  ident: 2023092216513424700_ref135
  article-title: Machine learning-guided acyl-ACP reductase engineering for improved in vivo fatty alcohol production
  publication-title: Nat Commun
  doi: 10.1038/s41467-021-25831-w
– volume: 643
  start-page: 281
  year: 2020
  ident: 2023092216513424700_ref8
  article-title: Machine learning-assisted enzyme engineering
  publication-title: Meth Enzymol
  doi: 10.1016/bs.mie.2020.05.005
– year: 2017
  ident: 2023092216513424700_ref93
  article-title: Graph attention networks
– year: 2022
  ident: 2023092216513424700_ref152
  article-title: Ultra-fast protein structure prediction to capture effects of sequence variation in mutation movies
  publication-title: bioRxiv
– year: 2021
  ident: 2023092216513424700_ref72
  article-title: Persistent sheaf Laplacians
– year: 2015
  ident: 2023092216513424700_ref96
  article-title: Gated graph sequence neural networks
– volume: 2
  start-page: 116
  issue: 2
  year: 2020
  ident: 2023092216513424700_ref17
  article-title: A topology-based network tree for the prediction of protein–protein binding affinity changes following mutation
  publication-title: Nat Mach
– volume: 533
  start-page: 397
  issue: 7603
  year: 2016
  ident: 2023092216513424700_ref157
  article-title: Local fitness landscape of the green fluorescent protein
  publication-title: Nature
  doi: 10.1038/nature17995
– volume-title: Computational Topology: An Introduction
  year: 2010
  ident: 2023092216513424700_ref25
– volume: 7
  start-page: e1002195
  issue: 10
  year: 2011
  ident: 2023092216513424700_ref51
  article-title: Accelerated profile HMM searches
  publication-title: PLoS Comput Biol
  doi: 10.1371/journal.pcbi.1002195
– volume: 16
  start-page: 77
  issue: 1
  year: 2015
  ident: 2023092216513424700_ref80
  article-title: Statistical topological data analysis using persistence landscapes
  publication-title: J Mach Learn Res
– volume: 1
  start-page: 105
  issue: 2
  year: 2019
  ident: 2023092216513424700_ref140
  article-title: Feedback GAN for DNA optimizes protein functions
  publication-title: Nat Mach Intell
  doi: 10.1038/s42256-019-0017-4
– volume: 30
  year: 2017
  ident: 2023092216513424700_ref94
  article-title: Inductive representation learning on large graphs
  publication-title: Adv Neural Inf Process Syst
– volume: 69
  start-page: 11
  year: 2021
  ident: 2023092216513424700_ref11
  article-title: Advances in machine learning for directed evolution
  publication-title: Curr Opin Struct Biol
  doi: 10.1016/j.sbi.2021.01.008
– volume: 47
  start-page: D427
  issue: D1
  year: 2019
  ident: 2023092216513424700_ref36
  article-title: The Pfam protein families database in 2019
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gky995
– year: 2022
  ident: 2023092216513424700_ref150
  article-title: So manyfolds, so little time: efficient protein structure prediction with pLMs and MSAs
  publication-title: bioRxiv
– year: 2023
  ident: 2023092216513424700_ref75
  article-title: Persistent hyperdigraph homology and persistent hyperdigraph Laplacians
  doi: 10.3934/fods.2023010
– volume: 3
  start-page: 315
  year: 2019
  ident: 2023092216513424700_ref91
  article-title: Toward a spectral theory of cellular sheaves
  publication-title: J Appl Comput Topol
  doi: 10.1007/s41468-019-00038-7
– volume: 118
  issue: 15
  year: 2021
  ident: 2023092216513424700_ref23
  article-title: Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences
  publication-title: Proc Natl Acad Sci
  doi: 10.1073/pnas.2016239118
– year: 2021
  ident: 2023092216513424700_ref122
  article-title: FLIP: benchmark tasks in fitness landscape inference for proteins
  publication-title: bioRxiv
– volume: 75
  start-page: 102713
  year: 2022
  ident: 2023092216513424700_ref114
  article-title: Machine learning to navigate fitness landscapes for protein engineering
  publication-title: Curr Opin Biotechnol
  doi: 10.1016/j.copbio.2022.102713
– volume: 82
  start-page: 3713
  issue: 3
  year: 2023
  ident: 2023092216513424700_ref35
  article-title: Natural language processing: state of the art, current trends and challenges
  publication-title: Multimed Tools Appl
  doi: 10.1007/s11042-022-13428-4
– volume: 596
  start-page: 583
  issue: 7873
  year: 2021
  ident: 2023092216513424700_ref24
  article-title: Highly accurate protein structure prediction with AlphaFold
  publication-title: Nature
  doi: 10.1038/s41586-021-03819-2
– volume-title: International Conference on Learning Representations
  year: 2018
  ident: 2023092216513424700_ref42
  article-title: Learning protein sequence embeddings using information from structure
– year: 2018
  ident: 2023092216513424700_ref60
  article-title: BERT: pre-training of deep bidirectional transformers for language understanding
– volume: 10
  start-page: 1210
  issue: 2
  year: 2019
  ident: 2023092216513424700_ref9
  article-title: Machine learning in enzyme engineering
  publication-title: ACS Catal
  doi: 10.1021/acscatal.9b04321
– volume: 35
  start-page: 128
  issue: 2
  year: 2017
  ident: 2023092216513424700_ref38
  article-title: Mutation effects predicted from sequence co-variation
  publication-title: Nat Biotechnol
  doi: 10.1038/nbt.3769
– volume: 81
  start-page: 102627
  year: 2023
  ident: 2023092216513424700_ref145
  article-title: Machine learning methods for predicting protein structure from single sequences
  publication-title: Curr Opin Struct Biol
  doi: 10.1016/j.sbi.2023.102627
– year: 2018
  ident: 2023092216513424700_ref95
  article-title: How powerful are graph neural networks?
– volume: 379
  start-page: 1123
  issue: 6637
  year: 2023
  ident: 2023092216513424700_ref50
  article-title: Evolutionary-scale prediction of atomic-level protein structure with a language model
  publication-title: Science
  doi: 10.1126/science.ade2574
– volume: 14
  start-page: 385
  issue: 1
  year: 2023
  ident: 2023092216513424700_ref113
  article-title: Structural insights into the elevator-type transport mechanism of a bacterial ZIP metal transporter
  publication-title: Nat Commun
  doi: 10.1038/s41467-023-36048-4
– volume: 39
  start-page: 691
  issue: 6
  year: 2021
  ident: 2023092216513424700_ref123
  article-title: Deep diversification of an AAV capsid protein by machine learning
  publication-title: Nat Biotechnol
  doi: 10.1038/s41587-020-00793-4
– volume: 110
  start-page: E193
  issue: 3
  year: 2013
  ident: 2023092216513424700_ref136
  article-title: Navigating the protein fitness landscape with gaussian processes
  publication-title: Proc Natl Acad Sci
  doi: 10.1073/pnas.1215251110
– volume: 3
  issue: 1
  year: 2015
  ident: 2023092216513424700_ref82
  article-title: A topological approach for protein classification
  publication-title: Comput Math Biophys
  doi: 10.1515/mlbmb-2015-0009
– volume: 16
  start-page: 1176
  issue: 11
  year: 2019
  ident: 2023092216513424700_ref132
  article-title: Machine learning-guided channelrhodopsin engineering enables minimally invasive optogenetics
  publication-title: Nat Methods
  doi: 10.1038/s41592-019-0583-8
– volume: 17
  start-page: e1009284
  issue: 8
  year: 2021
  ident: 2023092216513424700_ref103
  article-title: Deep geometric representations for modeling effects of mutations on protein-protein binding affinity
  publication-title: PLoS Comput Biol
  doi: 10.1371/journal.pcbi.1009284
– volume: 61
  start-page: 1617
  issue: 4
  year: 2021
  ident: 2023092216513424700_ref28
  article-title: Ollivier persistent Ricci curvature-based machine learning for the protein–ligand binding affinity prediction
  publication-title: J Chem Inf Model
  doi: 10.1021/acs.jcim.0c01415
– start-page: 5998
  volume-title: Advances in Neural Information Processing Systems
  year: 2017
  ident: 2023092216513424700_ref59
  article-title: Attention is all you need
– volume: 5
  start-page: 26
  year: 2023
  ident: 2023092216513424700_ref73
  article-title: Persistent path Laplacian
  publication-title: Found Data Sci
  doi: 10.3934/fods.2022015
– volume: 34
  year: 2021
  ident: 2023092216513424700_ref44
  article-title: Language models enable zero-shot prediction of the effects of mutations on protein function
  publication-title: Adv Neural Inf Process Syst
– volume: 45
  start-page: 61
  issue: 1
  year: 2008
  ident: 2023092216513424700_ref78
  article-title: Barcodes: the persistent topology of data
  publication-title: Bull New Ser Am Math Soc
  doi: 10.1090/S0273-0979-07-01191-3
– year: 2022
  ident: 2023092216513424700_ref142
  article-title: Antibody optimization enabled by artificial intelligence predictions of binding affinity and naturalness
  publication-title: bioRxiv
– volume: 13
  start-page: 5661
  issue: 1
  year: 2022
  ident: 2023092216513424700_ref154
  article-title: De novo design of immunoglobulin-like domains
  publication-title: Nat Commun
  doi: 10.1038/s41467-022-33004-6
– volume: 35
  start-page: 1582
  issue: 9
  year: 2019
  ident: 2023092216513424700_ref52
  article-title: The EVcouplings Python framework for coevolutionary sequence analysis
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bty862
– year: 2022
  ident: 2023092216513424700_ref149
  article-title: HelixFold-Single: MSA-free protein structure prediction by using protein language model as an alternative
  doi: 10.21203/rs.3.rs-1969991/v1
– volume: 347
  start-page: 673
  issue: 6222
  year: 2015
  ident: 2023092216513424700_ref112
  article-title: Pervasive degeneracy and epistasis in a protein-protein interface
  publication-title: Science
  doi: 10.1126/science.1257360
– volume: 538
  start-page: 329
  issue: 7625
  year: 2016
  ident: 2023092216513424700_ref6
  article-title: Accurate de novo design of hyperstable constrained peptides
  publication-title: Nature
  doi: 10.1038/nature19791
– volume-title: Communication Biology
  ident: 2023092216513424700_ref128
  article-title: SVSBI: sequence-based virtual screening of biomolecular interactions
  doi: 10.1038/s42003-023-04866-3
– volume: 24
  start-page: 783
  year: 2011
  ident: 2023092216513424700_ref138
  article-title: Optimistic optimization of a deterministic function without the knowledge of its smoothness
  publication-title: Adv Neural Inf Process Syst
– volume: 373
  start-page: 871
  issue: 6557
  year: 2021
  ident: 2023092216513424700_ref144
  article-title: Accurate prediction of protein structures and interactions using a three-track neural network
  publication-title: Science
  doi: 10.1126/science.abj8754
– volume: 17
  start-page: 665
  issue: 7
  year: 2020
  ident: 2023092216513424700_ref19
  article-title: Macromolecular modeling and design in Rosetta: recent methods and frameworks
  publication-title: Nat Methods
  doi: 10.1038/s41592-020-0848-2
– volume: 49
  start-page: D480
  issue: D1
  year: 2021
  ident: 2023092216513424700_ref14
  article-title: Uniprot: the universal protein knowledgebase in 2021
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkaa1100
– volume: 34
  start-page: e2914
  issue: 2
  year: 2018
  ident: 2023092216513424700_ref16
  article-title: Integration of element specific persistent homology and machine learning for protein-ligand binding affinity prediction
  publication-title: Int J Numer Methods Biomed
  doi: 10.1002/cnm.2914
– volume: 12
  start-page: 2403
  issue: 1
  year: 2021
  ident: 2023092216513424700_ref141
  article-title: Protein design and variant prediction using autoregressive generative models
  publication-title: Nat Commun
  doi: 10.1038/s41467-021-22732-w
– volume: 40
  start-page: 1617
  issue: 11
  year: 2022
  ident: 2023092216513424700_ref146
  article-title: Single-sequence protein structure prediction using a language model and deep learning
  publication-title: Nat Biotechnol
  doi: 10.1038/s41587-022-01432-w
– volume: 248
  start-page: 564
  year: 2020
  ident: 2023092216513424700_ref90
  article-title: Path complexes and their homologies
  publication-title: J Math Sci
  doi: 10.1007/s10958-020-04897-9
– volume: 5
  start-page: 1093
  issue: 8
  year: 1997
  ident: 2023092216513424700_ref46
  article-title: Cath–a hierarchic classification of protein domain structures
  publication-title: Structure
  doi: 10.1016/S0969-2126(97)00260-8
– start-page: 347
  volume-title: Proceedings of the Twentieth Annual Symposium on Computational Geometry
  year: 2004
  ident: 2023092216513424700_ref63
  article-title: Computing persistent homology
  doi: 10.1145/997817.997870
– volume: 62
  start-page: 4629
  issue: 19
  year: 2022
  ident: 2023092216513424700_ref125
  article-title: CLADE 2.0: evolution-driven cluster learning-assisted directed evolution
  publication-title: J Chem Inf Model
  doi: 10.1021/acs.jcim.2c01046
– volume: 10
  start-page: 2079
  issue: 1
  year: 2020
  ident: 2023092216513424700_ref68
  article-title: Weighted persistent homology for biomolecular data analysis
  publication-title: Sci Rep
  doi: 10.1038/s41598-019-55660-3
– volume: 13
  start-page: 19
  issue: 1
  year: 2020
  ident: 2023092216513424700_ref66
  article-title: Computing persistent homology of directed flag complexes
  publication-title: Algorithms
  doi: 10.3390/a13010019
– volume: 1
  start-page: 809
  issue: 12
  year: 2021
  ident: 2023092216513424700_ref124
  article-title: Cluster learning-assisted directed evolution
  publication-title: Nat Comput Sci
  doi: 10.1038/s43588-021-00168-y
– volume: 7
  start-page: 2014
  issue: 9
  year: 2018
  ident: 2023092216513424700_ref134
  article-title: Machine-learning-guided mutagenesis for directed evolution of fluorescent proteins
  publication-title: ACS Synth Biol
  doi: 10.1021/acssynbio.8b00155
– volume: 352
  start-page: 680
  issue: 6286
  year: 2016
  ident: 2023092216513424700_ref4
  article-title: De novo design of protein homo-oligomers with modular hydrogen-bond network–mediated specificity
  publication-title: Science
  doi: 10.1126/science.aad8865
– volume: 78
  start-page: 102518
  year: 2023
  ident: 2023092216513424700_ref10
  article-title: Using machine learning to predict the effects and consequences of mutations in proteins
  publication-title: Curr Opin Struct Biol
  doi: 10.1016/j.sbi.2022.102518
– volume: 13
  start-page: 1914
  issue: 1
  year: 2022
  ident: 2023092216513424700_ref61
  article-title: Learning meaningful representations of protein sequences
  publication-title: Nat Commun
  doi: 10.1038/s41467-022-29443-w
– volume: 44
  start-page: 8766
  issue: 12
  year: 2020
  ident: 2023092216513424700_ref83
  article-title: A topological loss function for deep-learning based image segmentation using persistent homology
  publication-title: IEEE Trans Pattern Anal Mach Intell
  doi: 10.1109/TPAMI.2020.3013679
– volume: 40
  start-page: D136
  issue: D1
  year: 2012
  ident: 2023092216513424700_ref48
  article-title: The NCBI Taxonomy database
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkr1178
– volume: 12
  start-page: 1026
  issue: 11
  year: 2021
  ident: 2023092216513424700_ref34
  article-title: Informed training set design enables efficient machine learning-assisted directed protein evolution
  publication-title: Cell Syst
  doi: 10.1016/j.cels.2021.07.008
– volume: 453
  start-page: 257
  issue: 26
  year: 2008
  ident: 2023092216513424700_ref62
  article-title: Persistent homology-a survey
  publication-title: Contemp Math
  doi: 10.1090/conm/453/08802
– volume: 22
  start-page: bbab127
  issue: 5
  year: 2021
  ident: 2023092216513424700_ref74
  article-title: Persistent spectral hypergraph based machine learning (PSH-ML) for protein-ligand binding affinity prediction
  publication-title: Brief Bioinform
  doi: 10.1093/bib/bbab127
– volume: 59
  start-page: 3291
  issue: 7
  year: 2019
  ident: 2023092216513424700_ref29
  article-title: AGL-Score: algebraic graph learning score for protein–ligand binding scoring, ranking, docking, and screening
  publication-title: J Chem Inf Model
  doi: 10.1021/acs.jcim.9b00334
– start-page: 975
  volume-title: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
  year: 2021
  ident: 2023092216513424700_ref101
  article-title: Structure-aware interactive graph neural networks for the prediction of protein-ligand binding affinity
  doi: 10.1145/3447548.3467311
– volume: 10
  start-page: 866
  issue: 12
  year: 2009
  ident: 2023092216513424700_ref5
  article-title: Exploring protein fitness landscapes by directed evolution
  publication-title: Nat Rev Mol Cell Biol
  doi: 10.1038/nrm2805
– year: 2022
  ident: 2023092216513424700_ref148
  article-title: High-resolution de novo structure prediction from primary sequence
  publication-title: bioRxiv
– volume: 25
  start-page: 338
  issue: 3
  year: 2007
  ident: 2023092216513424700_ref117
  article-title: Improving catalytic function by ProSAR-driven enzyme evolution
  publication-title: Nat Biotechnol
  doi: 10.1038/nbt1286
– volume: 77
  start-page: 257
  issue: 2
  year: 1989
  ident: 2023092216513424700_ref53
  article-title: A tutorial on hidden Markov models and selected applications in speech recognition
  publication-title: Proc IEEE
  doi: 10.1109/5.18626
– volume: 119
  start-page: e2122954119
  issue: 11
  year: 2022
  ident: 2023092216513424700_ref104
  article-title: Deep learning guided optimization of human antibody against SARS-CoV-2 variants with broad neutralization
  publication-title: Proc Natl Acad Sci
  doi: 10.1073/pnas.2122954119
– year: 2013
  ident: 2023092216513424700_ref54
  article-title: Auto-encoding variational bayes
– volume: 2
  start-page: 396
  issue: 2
  year: 2020
  ident: 2023092216513424700_ref64
  article-title: Persistent cohomology for data with multicomponent heterogeneous information
  publication-title: SIAM J Math Data Sci
  doi: 10.1137/19M1272226
– volume: 5
  start-page: 16
  issue: 1
  year: 2008
  ident: 2023092216513424700_ref156
  article-title: Next-generation sequencing transforms today’s biology
  publication-title: Nat Methods
  doi: 10.1038/nmeth1156
– volume: 18
  start-page: 389
  issue: 4
  year: 2021
  ident: 2023092216513424700_ref43
  article-title: Low-N protein engineering with data-efficient deep learning
  publication-title: Nat Methods
  doi: 10.1038/s41592-021-01100-y
– year: 2018
  ident: 2023092216513424700_ref84
  article-title: Persistent-homology-based machine learning and its applications–a survey
  doi: 10.2139/ssrn.3275996
– year: 2022
  ident: 2023092216513424700_ref151
  article-title: tFold-Ab: fast and accurate antibody structure prediction without sequence homologs
  publication-title: bioRxiv
– year: 2022
  ident: 2023092216513424700_ref105
  article-title: Protein representation learning by geometric structure pretraining
– volume: 4
  start-page: 858
  issue: 2
  year: 2022
  ident: 2023092216513424700_ref70
  article-title: Persistent Laplacians: properties, algorithms and implications
  publication-title: SIAM J Math Data Sci
  doi: 10.1137/21M1435471
– volume: 12
  start-page: 3168
  issue: 1
  year: 2021
  ident: 2023092216513424700_ref102
  article-title: Structure-based protein function prediction using graph convolutional networks
  publication-title: Nat Commun
  doi: 10.1038/s41467-021-23303-9
– volume: 32
  year: 2019
  ident: 2023092216513424700_ref106
  article-title: Generative models for graph-based protein design
  publication-title: Adv Neural Inf Process Syst
– start-page: 4602
  volume-title: Proceedings of the AAAI Conference on Artificial Intelligence
  year: 2019
  ident: 2023092216513424700_ref109
  article-title: Weisfeiler and Leman go neural: higher-order graph neural networks
– year: 2020
  ident: 2023092216513424700_ref110
  article-title: Simplicial neural networks
– volume: 100
  start-page: 022512
  issue: 2
  year: 2019
  ident: 2023092216513424700_ref30
  article-title: Deep learning and density-functional theory
  publication-title: Phys Rev A
  doi: 10.1103/PhysRevA.100.022512
– volume: 42
  start-page: 151
  issue: 3
  year: 2021
  ident: 2023092216513424700_ref1
  article-title: Machine learning for biologics: opportunities for protein engineering, developability, and formulation
  publication-title: Trends Pharmacol Sci
  doi: 10.1016/j.tips.2020.12.004
– volume: 31
  start-page: 125
  issue: 3
  year: 1998
  ident: 2023092216513424700_ref2
  article-title: Design by directed evolution
  publication-title: Acc Chem Res
  doi: 10.1021/ar960017f
– start-page: 1
  year: 2023
  ident: 2023092216513424700_ref47
  article-title: Large language models generate functional protein sequences across diverse families
  publication-title: Nat Biotechnol
– volume: 4
  start-page: 840
  issue: 10
  year: 2022
  ident: 2023092216513424700_ref143
  article-title: Transformer-based protein generation with regularized latent space optimization
  publication-title: Nat Mach Intell
  doi: 10.1038/s42256-022-00532-1
– volume: 5
  start-page: e16965
  year: 2016
  ident: 2023092216513424700_ref111
  article-title: Adaptation in protein fitness landscapes is facilitated by indirect paths
  publication-title: Elife
  doi: 10.7554/eLife.16965
– volume-title: Computational Homology
  year: 2004
  ident: 2023092216513424700_ref76
  doi: 10.1007/b97315
– volume: 26
  start-page: 3785
  issue: 7
  year: 2021
  ident: 2023092216513424700_ref71
  article-title: Evolutionary de Rham-Hodge method
  publication-title: Discrete Continuous Dyn Syst Ser B
  doi: 10.3934/dcdsb.2020257
– volume-title: ICLR2022 Machine Learning for Drug Discovery
  year: 2022
  ident: 2023092216513424700_ref130
  article-title: Benchmarking uncertainty quantification for protein engineering
– volume: 40
  start-page: 1114
  year: 2022
  ident: 2023092216513424700_ref33
  article-title: Learning protein fitness models from evolutionary and assay-labeled data
  publication-title: Nat Biotechnol
  doi: 10.1038/s41587-021-01146-5
– volume: 5
  start-page: 501
  year: 2018
  ident: 2023092216513424700_ref77
  article-title: Topological data analysis
  publication-title: Annu Rev Stat
  doi: 10.1146/annurev-statistics-031017-100045
– volume: 38
  start-page: 2102
  issue: 8
  year: 2022
  ident: 2023092216513424700_ref49
  article-title: ProteinBERT: a universal deep-learning model of protein sequence and function
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btac020
– year: 2016
  ident: 2023092216513424700_ref92
  article-title: Semi-supervised classification with graph convolutional networks
– start-page: 770
  volume-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
  year: 2016
  ident: 2023092216513424700_ref57
  article-title: Deep residual learning for image recognition
– volume: 33
  start-page: 12559
  year: 2020
  ident: 2023092216513424700_ref100
  article-title: Self-supervised graph transformer on large-scale molecular data
  publication-title: Adv Neural Inf Process Syst
– volume: 7
  start-page: eabc5329
  issue: 19
  year: 2021
  ident: 2023092216513424700_ref89
  article-title: Persistent spectral–based machine learning (PerSpect ML) for protein-ligand binding affinity prediction
  publication-title: Sci Adv
  doi: 10.1126/sciadv.abc5329
– volume: 33
  start-page: 249
  issue: 2
  year: 2005
  ident: 2023092216513424700_ref26
  article-title: Computing persistent homology
  publication-title: Discrete Comput Geom
  doi: 10.1007/s00454-004-1146-y
– start-page: 16990
  volume-title: International Conference on Machine Learning
  year: 2022
  ident: 2023092216513424700_ref15
  article-title: Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval
– volume: 13
  start-page: e1005690
  issue: 7
  year: 2017
  ident: 2023092216513424700_ref108
  article-title: TopologyNet: topology based deep convolutional and multi-task neural networks for biomolecular property predictions
  publication-title: PLoS Comput Biol
  doi: 10.1371/journal.pcbi.1005690
– volume: 9
  start-page: 1735
  issue: 8
  year: 1997
  ident: 2023092216513424700_ref58
  article-title: Long short-term memory
  publication-title: Neural Comput
  doi: 10.1162/neco.1997.9.8.1735
– start-page: 63
  volume-title: Advanced Lectures on Machine Learning: ML Summer Schools
  year: 2003
  ident: 2023092216513424700_ref131
  article-title: Gaussian processes in machine learning
– volume: 102
  start-page: 6679
  issue: 19
  year: 2005
  ident: 2023092216513424700_ref3
  article-title: Molecular dynamics and protein function
  publication-title: Proc Natl Acad Sci
  doi: 10.1073/pnas.0408930102
– volume: 35
  start-page: e3179
  issue: 3
  year: 2019
  ident: 2023092216513424700_ref27
  article-title: DG-GL: differential geometry-based geometric learning of molecular datasets
  publication-title: Int J Numer Methods Biomed Eng
  doi: 10.1002/cnm.3179
– volume: 12
  start-page: 1
  issue: 1
  year: 2021
  ident: 2023092216513424700_ref129
  article-title: ECNet is an evolutionary context-integrated deep learning framework for protein engineering
  publication-title: Nat Commun
  doi: 10.1038/s41467-021-25976-8
– start-page: 1746
  volume-title: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
  year: 2014
  ident: 2023092216513424700_ref56
  article-title: Convolutional neural networks for sentence classification
  doi: 10.3115/v1/D14-1181
– volume: 599
  start-page: 91
  issue: 7883
  year: 2021
  ident: 2023092216513424700_ref40
  article-title: Disease variant prediction with deep generative models of evolutionary data
  publication-title: Nature
  doi: 10.1038/s41586-021-04043-8
– volume: 36
  start-page: e3376
  issue: 9
  year: 2020
  ident: 2023092216513424700_ref69
  article-title: Persistent spectral graph
  publication-title: Int J Numer Methods Biomed Eng
  doi: 10.1002/cnm.3376
– volume: 16
  start-page: e9380
  issue: 7
  year: 2020
  ident: 2023092216513424700_ref55
  article-title: Using deep mutational scanning to benchmark variant effect predictors and identify disease mutations
  publication-title: Mol Syst Biol
  doi: 10.15252/msb.20199380
– volume: 12
  start-page: 5225
  issue: 1
  year: 2021
  ident: 2023092216513424700_ref121
  article-title: Epistatic Net allows the sparse spectral regularization of deep neural networks for inferring fitness functions
  publication-title: Nat Commun
  doi: 10.1038/s41467-021-25371-3
– start-page: 1152
  volume-title: Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms
  year: 2018
  ident: 2023092216513424700_ref65
  article-title: Persistent path homology of directed networks
– year: 2023
  ident: 2023092216513424700_ref116
  article-title: Sequence vs. structure: delving deep into data driven protein function prediction
  publication-title: bioRxiv
– volume: 15
  start-page: 779
  issue: 10
  year: 2002
  ident: 2023092216513424700_ref7
  article-title: Protein design is NP-hard
  publication-title: Protein Eng
  doi: 10.1093/protein/15.10.779
– volume: 16
  start-page: 703
  issue: 5
  year: 2009
  ident: 2023092216513424700_ref127
  article-title: Interpretable numerical descriptors of amino acid space
  publication-title: J Comput Biol
  doi: 10.1089/cmb.2008.0173
– volume: 33
  start-page: 5812
  year: 2020
  ident: 2023092216513424700_ref99
  article-title: Graph contrastive learning with augmentations
  publication-title: Adv Neural Inf Process Syst
– volume: 35
  start-page: 53
  issue: 1
  year: 2018
  ident: 2023092216513424700_ref139
  article-title: Generative adversarial networks: an overview
  publication-title: IEEE Signal Process Mag
  doi: 10.1109/MSP.2017.2765202
– volume: 12
  issue: 5
  year: 2011
  ident: 2023092216513424700_ref137
  article-title: X-armed bandits
  publication-title: J Mach Learn Res
– volume: 3
  start-page: 149
  year: 2023
  ident: 2023092216513424700_ref20
  article-title: Persistent spectral theory-guided protein engineering
  publication-title: Nat Comput Sci
  doi: 10.1038/s43588-022-00394-y
– volume: 48
  year: 2022
  ident: 2023092216513424700_ref153
  article-title: Novel machine learning approaches revolutionize protein knowledge
  publication-title: Trends Biochem Sci
– year: 2016
  ident: 2023092216513424700_ref97
  article-title: Variational graph auto-encoders
– volume: 15
  start-page: 816
  issue: 10
  year: 2018
  ident: 2023092216513424700_ref22
  article-title: Deep generative models of genetic variation capture the effects of mutations
  publication-title: Nat Methods
  doi: 10.1038/s41592-018-0138-4
– volume: 4
  start-page: 481
  year: 2020
  ident: 2023092216513424700_ref67
  article-title: Evolutionary homology on coupled dynamical systems with applications to protein flexibility analysis
  publication-title: J Appl Comput Topol
  doi: 10.1007/s41468-020-00057-9
– volume: 72
  start-page: 145
  year: 2022
  ident: 2023092216513424700_ref115
  article-title: Adaptive machine learning for protein engineering
  publication-title: Curr Opin Struct Biol
  doi: 10.1016/j.sbi.2021.11.002
– reference: 37547662 - ArXiv. 2023 Jul 27:arXiv:2307.14587v1.
SSID ssj0020781
Score 2.556515
SecondaryResourceType review_article
Snippet Abstract Protein engineering is an emerging field in biotechnology that has the potential to revolutionize various areas, such as antibody design, drug...
Protein engineering is an emerging field in biotechnology that has the potential to revolutionize various areas, such as antibody design, drug discovery, food...
SourceID pubmedcentral
proquest
pubmed
crossref
oup
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
Publisher
SubjectTerms Antibodies
Artificial Intelligence
Biotechnology
Data Analysis
Drug development
Food security
Machine learning
Natural Language Processing
Protein Engineering
Protein structure
Proteins
Review
Topology
Title Artificial intelligence-aided protein engineering: from topological data analysis to deep protein language models
URI https://www.ncbi.nlm.nih.gov/pubmed/37580175
https://www.proquest.com/docview/3049109858
https://www.proquest.com/docview/2851141982
https://pubmed.ncbi.nlm.nih.gov/PMC10516362
Volume 24
WOSCitedRecordID wos001047878100001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVASL
  databaseName: Oxford Journals Open Access Collection
  customDbUrl:
  eissn: 1477-4054
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0020781
  issn: 1467-5463
  databaseCode: TOX
  dateStart: 20000101
  isFulltext: true
  titleUrlDefault: https://academic.oup.com/journals/
  providerName: Oxford University Press
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8QwEB5UFLz4ftRnBE9Cse8m3kQUT6uHFfZWmhcWpLtuV8F_70zbrbsi6nkmNGSSzDedyTcA53GoVZxHxg20xAAlFJbyu8K1fo4AQaFL1FHdbCLt9fhgIB7bAtnqhxS-CC9lIS-lzDWGBnjV-jGnRgX9h0EXVxFfTfOIKHWJ3b19hvdt7JzjmXvMNoMpv5dGzviau_X_znID1lo0ya4b82_Cgim3YKXpL_mxDa8kaCgiWDHDvekSL6RmNUdDUTLzxUl4xei9CZs0rRPIgIxqSFnecpeghGljRt3Y6f9OVrfUqXbg6e62f3Pvtj0WXIVAaOJi9KWFlgiybKBUkEeRlDoxRMaU2tizqfVM4ifKT1L0_CYUuTXKMzr1reKCJ-EuLJXD0uwDixVikVAaoa3GayHOMboREm_EVHoSreTAxdQAmWoJyKkPxkvWJMLDDNcwa9fQgfNOedTwbvysdoqW_F3jaGrlrD2eVUa5RVTkMXfgrBPjwaJsSV6a4VuVBYRFI1_wwIG9ZlN03wkxysKrLHaAz22XToFIu-clZfFck3cjnkUInAQHf878EFaptz0VpwTeESxNxm_mGJbV-6SoxiewmA74SX0QPgFVKwdl
linkProvider Oxford University Press
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Artificial+intelligence-aided+protein+engineering%3A+from+topological+data+analysis+to+deep+protein+language+models&rft.jtitle=Briefings+in+bioinformatics&rft.au=Qiu%2C+Yuchi&rft.au=Wei%2C+Guo-Wei&rft.date=2023-09-20&rft.issn=1467-5463&rft.eissn=1477-4054&rft.volume=24&rft.issue=5&rft_id=info:doi/10.1093%2Fbib%2Fbbad289&rft.externalDBID=n%2Fa&rft.externalDocID=10_1093_bib_bbad289
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1467-5463&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1467-5463&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1467-5463&client=summon