G2GT: Retrosynthesis Prediction with Graph-to-Graph Attention Neural Network and Self-Training

Retrosynthesis prediction, the task of identifying reactant molecules that can be used to synthesize product molecules, is a fundamental challenge in organic chemistry and related fields. To address this challenge, we propose a novel graph-to-graph transformation model, G2GT. The model is built on t...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Journal of chemical information and modeling Ročník 63; číslo 7; s. 1894
Hlavní autoři: Lin, Zaiyun, Yin, Shiqiu, Shi, Lei, Zhou, Wenbiao, Zhang, Yingsheng John
Médium: Journal Article
Jazyk:angličtina
Vydáno: United States 10.04.2023
Témata:
ISSN:1549-960X, 1549-960X
On-line přístup:Zjistit podrobnosti o přístupu
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Retrosynthesis prediction, the task of identifying reactant molecules that can be used to synthesize product molecules, is a fundamental challenge in organic chemistry and related fields. To address this challenge, we propose a novel graph-to-graph transformation model, G2GT. The model is built on the standard transformer structure and utilizes graph encoders and decoders. Additionally, we demonstrate the effectiveness of self-training, a data augmentation technique that utilizes unlabeled molecular data, in improving the performance of the model. To further enhance diversity, we propose a weak ensemble method, inspired by reaction-type labels and ensemble learning. This method incorporates beam search, nucleus sampling, and top- sampling to improve inference diversity. A simple ranking algorithm is employed to retrieve the final top-10 results. We achieved new state-of-the-art results on both the USPTO-50K data set, with a top-1 accuracy of 54%, and the larger more challenging USPTO-Full data set, with a top-1 accuracy of 49.3% and competitive top-10 results. Our model can also be generalized to all other graph-to-graph transformation tasks. Data and code are available at https://github.com/Anonnoname/G2GT_2.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1549-960X
1549-960X
DOI:10.1021/acs.jcim.2c01302