Harnessing AI for Health and Knowledge: An Investigation into Machine and Deep Learning Models for Medical and Textual Data

The digitalization of medical information has greatly enhanced medical research by converting clinical observations and patient data into structured and unstructured textual formats, respectively. Despite the progress, there’s a notable scarcity of large-scale textual clinical data. This research de...

Full description

Saved in:
Bibliographic Details
Published in:SN computer science Vol. 6; no. 6; p. 696
Main Authors: Abbas, Ali, Agarwal, Shreya, Jaiswal, Manish, Jha, Prajna, Siddiqui, Tanveer J.
Format: Journal Article
Language:English
Published: Singapore Springer Nature Singapore 01.08.2025
Springer Nature B.V
Subjects:
ISSN:2661-8907, 2662-995X, 2661-8907
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The digitalization of medical information has greatly enhanced medical research by converting clinical observations and patient data into structured and unstructured textual formats, respectively. Despite the progress, there’s a notable scarcity of large-scale textual clinical data. This research delves into utilizing machine learning (ML) and deep learning (DL) techniques for classifying both structured medical and unstructured textual data. Specifically, it focuses on the Parkinson’s Disease dataset for structured medical data and the 20 Newsgroup dataset for unstructured textual information. The study involves experimenting with four distinct feature vectors for textual data and employing recursive feature elimination with cross-validation on structured medical data to remove superfluous features. The classifiers chosen for this investigation are Naïve Bayes (NB) for ML and Multi-Layer Perceptron (MLP) for DL. To address the independence assumption in NB, term weighting strategies were applied, leading to the exploration of five variants of the weighted NB model. However, the sparseness of the 20 Newsgroup dataset prevented the training of Categorical and Gaussian NB models. The study examined forty-nine different MLP models to identify an optimal light DL model suitable for both datasets. Performance evaluation, based on accuracy and F1-measure, revealed that the best-performing NB model was the Multinomial NB, achieving accuracies of 0.80 and 0.81 for the medical and textual datasets, respectively. Meanwhile, the most effective MLP model attained accuracies of 0.77 and 0.92. These findings, benchmarked against existing literature, suggest the feasibility of applying both ML and light DL approaches for concurrent classification of structured medical and unstructured textual data.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2661-8907
2662-995X
2661-8907
DOI:10.1007/s42979-025-04201-z