Integrating TANBN with cost sensitive classification algorithm for imbalanced data in medical diagnosis
•Propose AdaC-TANBN algorithm for imbalanced data in medical diagnosis.•Use variable misclassification cost determined by samples distribution probability.•It can more accurately reflect the punishment of minority misclassification.•The experiments achieve good performances. For the imbalanced class...
Uloženo v:
| Vydáno v: | Computers & industrial engineering Ročník 140; s. 106266 |
|---|---|
| Hlavní autoři: | , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier Ltd
01.02.2020
|
| Témata: | |
| ISSN: | 0360-8352, 1879-0550 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | •Propose AdaC-TANBN algorithm for imbalanced data in medical diagnosis.•Use variable misclassification cost determined by samples distribution probability.•It can more accurately reflect the punishment of minority misclassification.•The experiments achieve good performances.
For the imbalanced classification problems, most traditional classification models only focus on searching for an excellent classifier to maximize classification accuracy with the fixed misclassification cost, not take into consideration that misclassification cost can change with sample probability distribution. So far as we know, cost-sensitive learning method can be effectively utilized to solve imbalanced data classification problems. In this regards, we propose an integrated TANBN with cost-sensitive classification algorithm (AdaC-TANBN) to overcome the above drawback and improve classification accuracy. The AdaC-TANBN algorithm employs variable misclassification cost determined by samples distribution probability to train classifier, then implements classification for imbalanced data in medical diagnosis. The effectiveness of our proposed approach is examined on the Cleveland heart dataset (Heart), Indian liver patient dataset (ILPD), Dermatology dataset and Cervical cancer risk factors dataset (CCRF) from the UCI learning repository. The experimental results indicate that the AdaC-TANBN algorithm can outperform other state-of-the-art comparative methods. |
|---|---|
| ISSN: | 0360-8352 1879-0550 |
| DOI: | 10.1016/j.cie.2019.106266 |