Integrating TANBN with cost sensitive classification algorithm for imbalanced data in medical diagnosis

•Propose AdaC-TANBN algorithm for imbalanced data in medical diagnosis.•Use variable misclassification cost determined by samples distribution probability.•It can more accurately reflect the punishment of minority misclassification.•The experiments achieve good performances. For the imbalanced class...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Computers & industrial engineering Ročník 140; s. 106266
Hlavní autoři: Gan, Dan, Shen, Jiang, An, Bang, Xu, Man, Liu, Na
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier Ltd 01.02.2020
Témata:
ISSN:0360-8352, 1879-0550
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:•Propose AdaC-TANBN algorithm for imbalanced data in medical diagnosis.•Use variable misclassification cost determined by samples distribution probability.•It can more accurately reflect the punishment of minority misclassification.•The experiments achieve good performances. For the imbalanced classification problems, most traditional classification models only focus on searching for an excellent classifier to maximize classification accuracy with the fixed misclassification cost, not take into consideration that misclassification cost can change with sample probability distribution. So far as we know, cost-sensitive learning method can be effectively utilized to solve imbalanced data classification problems. In this regards, we propose an integrated TANBN with cost-sensitive classification algorithm (AdaC-TANBN) to overcome the above drawback and improve classification accuracy. The AdaC-TANBN algorithm employs variable misclassification cost determined by samples distribution probability to train classifier, then implements classification for imbalanced data in medical diagnosis. The effectiveness of our proposed approach is examined on the Cleveland heart dataset (Heart), Indian liver patient dataset (ILPD), Dermatology dataset and Cervical cancer risk factors dataset (CCRF) from the UCI learning repository. The experimental results indicate that the AdaC-TANBN algorithm can outperform other state-of-the-art comparative methods.
ISSN:0360-8352
1879-0550
DOI:10.1016/j.cie.2019.106266