Pollution risk assessment by designing predictive binary classification models of substituted benzenes centered on data mining and machine learning techniques

There is a growing need for industry and global regulatory agencies to develop rapid chemical safety assessment through more reliable theoretical models. Thus , quantitative structure–toxicity relationship (QSTR) models are preferred by regulators to bring chemicals to market rather than long and ex...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Environmental science and pollution research international Jg. 32; H. 35; S. 21092 - 21116
Hauptverfasser:	N’guessan, Aubin, Dali, Brice, Esmel, Elvice Akori, Moussé, Logbo Mathias, Ziao, Nahossé, N’guessan, Raymond Kré, Megnassan, Eugene
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Berlin/Heidelberg Springer Berlin Heidelberg 01.07.2025 Springer Nature B.V
Schlagworte:	Algorithms Aquatic Pollution Atmospheric Protection/Air Quality Control/Air Pollution Benzene Benzene - toxicity Chemicals Classification Data Mining Datasets Decision Trees Earth and Environmental Science Ecotoxicology Environment Environmental Chemistry Environmental Health Environmental risk Evaluation Feature selection Hydrocarbons Laboratory animals Learning algorithms Machine Learning Neural networks Organic chemistry Quantitative Structure-Activity Relationship Regression analysis Research Article Risk Assessment Statistical analysis Statistical models Support vector machines Tetrahymena pyriformis - drug effects Toxicants Toxicity Variables Waste Water Technology Water Management Water Pollution Control SMOTE Data mining QSTR Risk assessment Substituted benzenes Machine learning Tetrahymena pyriformis ClustOfVar
ISSN:	1614-7499, 0944-1344, 1614-7499
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	There is a growing need for industry and global regulatory agencies to develop rapid chemical safety assessment through more reliable theoretical models. Thus , quantitative structure–toxicity relationship (QSTR) models are preferred by regulators to bring chemicals to market rather than long and expensive animal testing. In this study, we evaluated four binary classification machine learning (ML) models (support vector machine, k -nearest neighbor, CART decision tree and random forest) for their ability to predict toxicity towards Tetrahymena pyriformis using 1416 benzene-derived compounds (749 chemicals evaluated and 697 synthetic toxicants) classified into two groups: non-toxic molecules (NTox) with 708 observations and toxic molecules (Tox) with 708 observations. Here, ML models have been developed on the basis of data mining methods using the ClustOfvar algorithm for optimal feature selection and SMOTE methods for data balancing, forgoing the hyperparameter tuning techniques of the statistical learning models used. Of the four ML models based on the results of the external validation set centered on fivefold cross-validation, the robust and explanatory CART-decision tree (DT) model achieved the best results ( Q = 95.42%, Pr = 96.60%, Re = 94.67%, F_score = 95.62%, Sp = 96.27%, MCC = 0.91, and AUC = 1.0). Thus, a set of 10 decision rules for predicting BZC (benzene-derived compounds) toxicity, easy to understand by humans, was also identified. The methodologies proposed in this paper would be useful for QSTR modeling by filling data gaps, prioritizing, and focusing experiments on the most hazardous organic chemicals.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1614-7499 0944-1344 1614-7499
DOI:	10.1007/s11356-025-36874-7