Machine-learning-based predictions of imprinting quality using ensemble and non-linear regression algorithms

The molecularly imprinted polymers are artificial polymers that, during the synthesis, create specific sites for a definite purpose. These polymers due to their characteristics such as stability, easy of synthesis, reproducibility, reusability, high accuracy, and selectivity have many applications....

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Scientific reports Ročník 13; číslo 1; s. 12111 - 9
Hlavní autoři: Yarahmadi, Bita, Hashemianzadeh, Seyed Majid, Milani Hosseini, Seyed Mohammad-Reza
Médium: Journal Article
Jazyk:angličtina
Vydáno: London Nature Publishing Group UK 26.07.2023
Nature Publishing Group
Nature Portfolio
Témata:
ISSN:2045-2322, 2045-2322
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:The molecularly imprinted polymers are artificial polymers that, during the synthesis, create specific sites for a definite purpose. These polymers due to their characteristics such as stability, easy of synthesis, reproducibility, reusability, high accuracy, and selectivity have many applications. However, the variety of the functional monomers, templates, solvents, and synthesis conditions like pH, temperature, the rate of stirring, and time, limit the selectivity of imprinting. The Practical optimization of the synthetic conditions has many drawbacks, including chemical compound usage, equipment requirements, and time costs. The use of machine learning (ML) for the prediction of the imprinting factor (IF), which indicates the quality of imprinting is a very interesting idea to overcome these problems. The ML has many advantages, for example a lack of human error, high accuracy, high repeatability, and prediction of a large amount of data in the minimum time. In this research, ML was used to predict the IF using non-linear regression algorithms, including classification and regression tree, support vector regression, and k-nearest neighbors, and ensemble algorithms, like gradient boosting (GB), random forest, and extra trees. The data sets were obtained practically in the laboratory, and inputs, included pH, the type of the template, the type of the monomer, solvent, the distribution coefficient of the MIP (K MIP ), and the distribution coefficient of the non-imprinted polymer (K NIP ). The mutual information feature selection method was used to select the important features affecting the IF. The results showed that the GB algorithm had the best performance in predicting the IF, and using this algorithm, the maximum R 2 value (R 2  = 0.871), and the minimum mean absolute error (MAE = − 0.982), and mean square error were obtained (MSE = − 2.303).
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:2045-2322
2045-2322
DOI:10.1038/s41598-023-39374-1