Optimizing Parkinson’s Disease Prediction: A Comparative Analysis of Data Aggregation Methods Using Multiple Voice Recordings via an Automated Artificial Intelligence Pipeline

Patient-level grouped data are prevalent in public health and medical fields, and multiple instance learning (MIL) offers a framework to address the challenges associated with this type of data structure. This study compares four data aggregation methods designed to tackle the grouped structure in c...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Data (Basel) Ročník 10; číslo 1; s. 4
Hlavní autoři: Yang, Zhengxiao, Zhou, Hao, Srivastav, Sudesh, Shaffer, Jeffrey G., Abraham, Kuukua E., Naandam, Samuel M., Kakraba, Samuel
Médium: Journal Article
Jazyk:angličtina
Vydáno: Basel MDPI AG 01.01.2025
Témata:
ISSN:2306-5729, 2306-5729
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Patient-level grouped data are prevalent in public health and medical fields, and multiple instance learning (MIL) offers a framework to address the challenges associated with this type of data structure. This study compares four data aggregation methods designed to tackle the grouped structure in classification tasks: post-mean, post-max, post-min, and pre-mean aggregation. We developed a customized AI pipeline that incorporates twelve machine learning algorithms along with the four aggregation methods to detect Parkinson’s disease (PD) using multiple voice recordings from individuals available in the UCI Machine Learning Repository, which includes 756 voice recordings from 188 PD patients and 64 healthy individuals. Seven performance metrics—accuracy, precision, sensitivity, specificity, F1 score, AUC, and MCC—were utilized for model evaluation. Various techniques, such as Bag Over-Sampling (BOS), cross-validation, and grid search, were implemented to enhance classification performance. Among the four aggregation methods, post-mean aggregation combined with XGBoost achieved the highest accuracy (0.880), F1 score (0.922), and MCC (0.672). Furthermore, we identified potential trends in selecting aggregation methods that are suitable for imbalanced data, particularly based on their differences in sensitivity and specificity. These findings provide meaningful implications for the further exploration of grouped imbalanced data.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2306-5729
2306-5729
DOI:10.3390/data10010004