Modeling of Feature Selection Based on Random Forest Algorithm and Pearson Correlation Coefficient
This paper establishes a feature selection model to selects 20 molecular descriptors of compounds with the most significant influence on biological activity. Random forest algorithm was used to calculate the correlation between molecular descriptors and pIC50 values of biological activity. In this w...
Uložené v:
| Vydané v: | Journal of physics. Conference series Ročník 2219; číslo 1; s. 12046 - 12054 |
|---|---|
| Hlavní autori: | , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Bristol
IOP Publishing
01.04.2022
|
| Predmet: | |
| ISSN: | 1742-6588, 1742-6596 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | This paper establishes a feature selection model to selects 20 molecular descriptors of compounds with the most significant influence on biological activity. Random forest algorithm was used to calculate the correlation between molecular descriptors and pIC50 values of biological activity. In this way, the top 26 molecular descriptors with high correlation were screened out. The Pearson correlation coefficient was used to analyze the 26 molecular descriptors just selected and eliminate the variables with high correlation between the independent variables. By consulting literature, the parameters such as MlogP, XlogP and TopoPSA in the selected molecular descriptors were found that had a prominent effect on the biological activity, indicating that the screening methods and results of the 20 molecular descriptors were reasonable. |
|---|---|
| Bibliografia: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1742-6588 1742-6596 |
| DOI: | 10.1088/1742-6596/2219/1/012046 |