Harnessing AI for Health and Knowledge: An Investigation into Machine and Deep Learning Models for Medical and Textual Data

The digitalization of medical information has greatly enhanced medical research by converting clinical observations and patient data into structured and unstructured textual formats, respectively. Despite the progress, there’s a notable scarcity of large-scale textual clinical data. This research de...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:SN computer science Ročník 6; číslo 6; s. 696
Hlavní autoři: Abbas, Ali, Agarwal, Shreya, Jaiswal, Manish, Jha, Prajna, Siddiqui, Tanveer J.
Médium: Journal Article
Jazyk:angličtina
Vydáno: Singapore Springer Nature Singapore 01.08.2025
Springer Nature B.V
Témata:
ISSN:2661-8907, 2662-995X, 2661-8907
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract The digitalization of medical information has greatly enhanced medical research by converting clinical observations and patient data into structured and unstructured textual formats, respectively. Despite the progress, there’s a notable scarcity of large-scale textual clinical data. This research delves into utilizing machine learning (ML) and deep learning (DL) techniques for classifying both structured medical and unstructured textual data. Specifically, it focuses on the Parkinson’s Disease dataset for structured medical data and the 20 Newsgroup dataset for unstructured textual information. The study involves experimenting with four distinct feature vectors for textual data and employing recursive feature elimination with cross-validation on structured medical data to remove superfluous features. The classifiers chosen for this investigation are Naïve Bayes (NB) for ML and Multi-Layer Perceptron (MLP) for DL. To address the independence assumption in NB, term weighting strategies were applied, leading to the exploration of five variants of the weighted NB model. However, the sparseness of the 20 Newsgroup dataset prevented the training of Categorical and Gaussian NB models. The study examined forty-nine different MLP models to identify an optimal light DL model suitable for both datasets. Performance evaluation, based on accuracy and F1-measure, revealed that the best-performing NB model was the Multinomial NB, achieving accuracies of 0.80 and 0.81 for the medical and textual datasets, respectively. Meanwhile, the most effective MLP model attained accuracies of 0.77 and 0.92. These findings, benchmarked against existing literature, suggest the feasibility of applying both ML and light DL approaches for concurrent classification of structured medical and unstructured textual data.
AbstractList The digitalization of medical information has greatly enhanced medical research by converting clinical observations and patient data into structured and unstructured textual formats, respectively. Despite the progress, there’s a notable scarcity of large-scale textual clinical data. This research delves into utilizing machine learning (ML) and deep learning (DL) techniques for classifying both structured medical and unstructured textual data. Specifically, it focuses on the Parkinson’s Disease dataset for structured medical data and the 20 Newsgroup dataset for unstructured textual information. The study involves experimenting with four distinct feature vectors for textual data and employing recursive feature elimination with cross-validation on structured medical data to remove superfluous features. The classifiers chosen for this investigation are Naïve Bayes (NB) for ML and Multi-Layer Perceptron (MLP) for DL. To address the independence assumption in NB, term weighting strategies were applied, leading to the exploration of five variants of the weighted NB model. However, the sparseness of the 20 Newsgroup dataset prevented the training of Categorical and Gaussian NB models. The study examined forty-nine different MLP models to identify an optimal light DL model suitable for both datasets. Performance evaluation, based on accuracy and F1-measure, revealed that the best-performing NB model was the Multinomial NB, achieving accuracies of 0.80 and 0.81 for the medical and textual datasets, respectively. Meanwhile, the most effective MLP model attained accuracies of 0.77 and 0.92. These findings, benchmarked against existing literature, suggest the feasibility of applying both ML and light DL approaches for concurrent classification of structured medical and unstructured textual data.
ArticleNumber 696
Author Siddiqui, Tanveer J.
Jaiswal, Manish
Agarwal, Shreya
Abbas, Ali
Jha, Prajna
Author_xml – sequence: 1
  givenname: Ali
  orcidid: 0000-0002-7914-9574
  surname: Abbas
  fullname: Abbas, Ali
  email: aliabbas367@gmail.com
  organization: Department of Electronics and Communication, University of Allahabad
– sequence: 2
  givenname: Shreya
  surname: Agarwal
  fullname: Agarwal, Shreya
  organization: Department of Electronics and Communication, University of Allahabad
– sequence: 3
  givenname: Manish
  surname: Jaiswal
  fullname: Jaiswal, Manish
  organization: Department of Computer Science, Prof. Rajendra Singh (Rajju Bhaiya) University
– sequence: 4
  givenname: Prajna
  surname: Jha
  fullname: Jha, Prajna
  organization: Department of Electronics and Communication, University of Allahabad
– sequence: 5
  givenname: Tanveer J.
  surname: Siddiqui
  fullname: Siddiqui, Tanveer J.
  organization: Department of Electronics and Communication, University of Allahabad
BookMark eNp9kMtOAjEUhhuDiYi8gKsmrkd7Yy7uCKgQIW5w3XSmZ2DI2GI7eMGXt8yY6MpVT9Lv_07Of456xhpA6JKSa0pIcuMFy5IsImwUEcEIjQ4nqM_imEZpRpLen_kMDb3fEhJQIkQ86qOvmXIGvK_MGo_nuLQOz0DVzQYro_Gjse816DXc4rHBc_MGvqnWqqmswZVpLF6qYlMZaOEpwA4vIPiOsqXVUPtWuARdFapuoRV8NPswT1WjLtBpqWoPw593gJ7v71aTWbR4ephPxouooGlyiArNM8g1SwUvRcJFqmPGNR9pBgwKUKoM_zGheS4A8oymMYVEZDrWpUipSvkAXXXenbOv-3CC3Nq9M2Gl5IwLSuOEZ4FiHVU4672DUu5c9aLcp6REHnuWXc8ylCfbnuUhhHgX8gE2a3C_6n9S3-DVgts
Cites_doi 10.1016/j.procs.2023.01.007
10.1007/978-981-99-5435-3_20
10.1109/ICCI-CC.2017.8109735
10.1609/aaai.v33i01.33017370
10.1007/s11042-022-12538-3
10.1016/j.eswa.2010.09.133
10.24432/C5C323
10.1016/j.apacoust.2023.109476
10.1080/02648725.2023.2200333
10.1109/ICoICT.2018.8528777
10.1007/s11042-019-7469-8
10.1007/s11042-022-12767-6
10.1016/j.mehy.2020.109603
10.1007/978-1-4757-4305-0_2
10.1016/j.eswa.2023.121900
10.1109/TBME.2012.2183367
10.1109/TKDE.2016.2522427
10.3390/s21124133
10.1109/ICODSE.2015.7436992
10.1038/npre.2008.2298.1
10.1111/exsy.12739
10.1007/s00521-016-2401-x
10.13164/mendel.2022.1.008
10.1023/A:1013652626023
10.1016/j.eswa.2022.118691
10.1016/j.knosys.2021.107288
ContentType Journal Article
Copyright The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2025 Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2025.
Copyright_xml – notice: The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2025 Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
– notice: The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2025.
DBID AAYXX
CITATION
JQ2
DOI 10.1007/s42979-025-04201-z
DatabaseName CrossRef
ProQuest Computer Science Collection
DatabaseTitle CrossRef
ProQuest Computer Science Collection
DatabaseTitleList
ProQuest Computer Science Collection
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2661-8907
ExternalDocumentID 10_1007_s42979_025_04201_z
GroupedDBID 0R~
2JN
406
AACDK
AAHNG
AAJBT
AASML
AATNV
AAUYE
ABAKF
ABBRH
ABDBE
ABECU
ABFSG
ABHQN
ABJNI
ABMQK
ABRTQ
ABTEG
ABTKH
ABWNU
ACAOD
ACDTI
ACHSB
ACOKC
ACPIV
ACSTC
ACZOJ
ADKFA
ADKNI
ADTPH
ADYFF
AEFQL
AEMSY
AESKC
AEZWR
AFBBN
AFDZB
AFHIU
AFOHR
AFQWF
AGMZJ
AGQEE
AGRTI
AHPBZ
AHWEU
AIGIU
AILAN
AIXLP
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
AMXSW
AMYLF
ATHPR
AYFIA
BAPOH
BSONS
DPUIP
EBLON
EBS
FIGPU
FNLPD
GGCAI
GNWQR
IKXTQ
IWAJR
JZLTJ
LLZTM
NPVJJ
NQJWS
PT4
ROL
RSV
SJYHP
SNE
SOJ
SRMVM
SSLCW
UOJIU
UTJUX
ZMTXR
AAYXX
CITATION
KOV
JQ2
ID FETCH-LOGICAL-c187z-cd39ebd2843f47348d623d35d2e2eceaaf39e601bb4eeb91861e749d6df481a83
IEDL.DBID RSV
ISSN 2661-8907
2662-995X
IngestDate Wed Nov 05 14:53:38 EST 2025
Sat Nov 29 07:36:40 EST 2025
Tue Jul 29 01:10:28 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 6
Keywords Min-max scaling
Naïve Bayes
Document classification
Machine learning
Tf-Idf
Artificial intelligence
Multi-layer perceptron
Parkinson’s disease
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c187z-cd39ebd2843f47348d623d35d2e2eceaaf39e601bb4eeb91861e749d6df481a83
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-7914-9574
PQID 3234116739
PQPubID 6623307
ParticipantIDs proquest_journals_3234116739
crossref_primary_10_1007_s42979_025_04201_z
springer_journals_10_1007_s42979_025_04201_z
PublicationCentury 2000
PublicationDate 2025-08-01
PublicationDateYYYYMMDD 2025-08-01
PublicationDate_xml – month: 08
  year: 2025
  text: 2025-08-01
  day: 01
PublicationDecade 2020
PublicationPlace Singapore
PublicationPlace_xml – name: Singapore
– name: Kolkata
PublicationTitle SN computer science
PublicationTitleAbbrev SN COMPUT. SCI
PublicationYear 2025
Publisher Springer Nature Singapore
Springer Nature B.V
Publisher_xml – name: Springer Nature Singapore
– name: Springer Nature B.V
References T Joachims (4201_CR3) 2002; 18
G Celik (4201_CR26) 2023; 211
A Govindu (4201_CR27) 2023; 218
4201_CR10
4201_CR11
T Vyas (4201_CR13) 2022; 39
4201_CR19
4201_CR8
4201_CR9
4201_CR15
4201_CR4
4201_CR1
4201_CR2
ZK Senturk (4201_CR14) 2020; 138
F Heidarivincheh (4201_CR16) 2021; 21
4201_CR20
S Sivaranjini (4201_CR12) 2020; 79
J Zhang (4201_CR22) 2024; 238
A Tsanas (4201_CR21) 2012; 59
T-T Wong (4201_CR17) 2021; 228
Y-B Kang (4201_CR6) 2023; 211
4201_CR23
B Tang (4201_CR7) 2016; 28
4201_CR24
M Jiang (4201_CR25) 2018; 29
R Ghosh (4201_CR5) 2022; 81
P Luukka (4201_CR28) 2011; 38
RN Rathi (4201_CR18) 2023; 82
References_xml – volume: 218
  start-page: 249
  year: 2023
  ident: 4201_CR27
  publication-title: Procedia Comput Sci
  doi: 10.1016/j.procs.2023.01.007
– ident: 4201_CR8
– ident: 4201_CR20
  doi: 10.1007/978-981-99-5435-3_20
– ident: 4201_CR9
  doi: 10.1109/ICCI-CC.2017.8109735
– ident: 4201_CR23
  doi: 10.1609/aaai.v33i01.33017370
– volume: 82
  start-page: 9761
  issue: 7
  year: 2023
  ident: 4201_CR18
  publication-title: Multimed Tools Appl
  doi: 10.1007/s11042-022-12538-3
– volume: 38
  start-page: 4600
  issue: 4
  year: 2011
  ident: 4201_CR28
  publication-title: Expert Syst Appl
  doi: 10.1016/j.eswa.2010.09.133
– ident: 4201_CR2
  doi: 10.24432/C5C323
– volume: 211
  year: 2023
  ident: 4201_CR26
  publication-title: Appl Acoust
  doi: 10.1016/j.apacoust.2023.109476
– ident: 4201_CR15
  doi: 10.1080/02648725.2023.2200333
– ident: 4201_CR11
  doi: 10.1109/ICoICT.2018.8528777
– volume: 79
  start-page: 15467
  year: 2020
  ident: 4201_CR12
  publication-title: Multimed Tools Appl
  doi: 10.1007/s11042-019-7469-8
– volume: 81
  start-page: 24245
  issue: 17
  year: 2022
  ident: 4201_CR5
  publication-title: Multimed Tools Appl
  doi: 10.1007/s11042-022-12767-6
– volume: 138
  year: 2020
  ident: 4201_CR14
  publication-title: Med Hypotheses
  doi: 10.1016/j.mehy.2020.109603
– ident: 4201_CR4
  doi: 10.1007/978-1-4757-4305-0_2
– volume: 238
  year: 2024
  ident: 4201_CR22
  publication-title: Expert Syst Appl
  doi: 10.1016/j.eswa.2023.121900
– ident: 4201_CR24
– volume: 59
  start-page: 1264
  issue: 5
  year: 2012
  ident: 4201_CR21
  publication-title: IEEE Trans Biomed Eng
  doi: 10.1109/TBME.2012.2183367
– volume: 28
  start-page: 1602
  issue: 6
  year: 2016
  ident: 4201_CR7
  publication-title: IEEE Trans Knowl Data Eng
  doi: 10.1109/TKDE.2016.2522427
– volume: 21
  start-page: 4133
  issue: 12
  year: 2021
  ident: 4201_CR16
  publication-title: Sensors
  doi: 10.3390/s21124133
– ident: 4201_CR10
  doi: 10.1109/ICODSE.2015.7436992
– ident: 4201_CR1
  doi: 10.1038/npre.2008.2298.1
– volume: 39
  issue: 3
  year: 2022
  ident: 4201_CR13
  publication-title: Expert Syst
  doi: 10.1111/exsy.12739
– volume: 29
  start-page: 61
  year: 2018
  ident: 4201_CR25
  publication-title: Neural Comput Appl
  doi: 10.1007/s00521-016-2401-x
– ident: 4201_CR19
  doi: 10.13164/mendel.2022.1.008
– volume: 18
  start-page: 103
  issue: 2–3
  year: 2002
  ident: 4201_CR3
  publication-title: J Intell Inf Syst
  doi: 10.1023/A:1013652626023
– volume: 211
  year: 2023
  ident: 4201_CR6
  publication-title: Expert Syst Appl
  doi: 10.1016/j.eswa.2022.118691
– volume: 228
  year: 2021
  ident: 4201_CR17
  publication-title: Knowl Based Syst
  doi: 10.1016/j.knosys.2021.107288
SSID ssj0002504465
Score 2.2989547
Snippet The digitalization of medical information has greatly enhanced medical research by converting clinical observations and patient data into structured and...
SourceID proquest
crossref
springer
SourceType Aggregation Database
Index Database
Publisher
StartPage 696
SubjectTerms Accuracy
Classification
Computer Imaging
Computer Science
Computer Systems Organization and Communication Networks
Data Structures and Information Theory
Datasets
Deep learning
Digitization
Documents
Feature selection
Home environment
Information Systems and Communication Service
Machine learning
Medical diagnosis
Medical research
Methods
Multilayer perceptrons
Multilayers
Neural networks
Optimization techniques
Original Research
Parkinson's disease
Pattern Recognition and Graphics
Performance evaluation
Research Advancements in Intelligent Computing
Social networks
Software Engineering/Programming and Operating Systems
Support vector machines
Text categorization
Unstructured data
Vision
Title Harnessing AI for Health and Knowledge: An Investigation into Machine and Deep Learning Models for Medical and Textual Data
URI https://link.springer.com/article/10.1007/s42979-025-04201-z
https://www.proquest.com/docview/3234116739
Volume 6
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAVX
  databaseName: SpringerLINK Contemporary 1997-Present
  customDbUrl:
  eissn: 2661-8907
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002504465
  issn: 2661-8907
  databaseCode: RSV
  dateStart: 20190101
  isFulltext: true
  titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22
  providerName: Springer Nature
– providerCode: PRVAVX
  databaseName: SpringerLINK Contemporary 1997-Present
  customDbUrl:
  eissn: 2661-8907
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002504465
  issn: 2661-8907
  databaseCode: RSV
  dateStart: 20200101
  isFulltext: true
  titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22
  providerName: Springer Nature
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnR1NS8MwNOj04MX5idMpOXjTQNtkTeJtOIciDtEpu5U0SWUg3Vg7D_PP-5q1m4oe9FbIy6PkvbyPvC-ETiVYIWEchASMUU6AQzRRVPsk8ZQnmNFSe8YNm-C9nhgM5H1ZFJZV2e5VSNJJ6kWxG0hOLkkxfhUYDZzg2SpaA3UnioEND4_Pi5eVoikXC1tlhczPW79qoaVp-S0a6pRMt_6_39tCm6VRidtzLthGKzbdQfVqYAMu7-8uer9Wk0K0AV7cvsFgr-J5GRJWqcG31fPaBW6n-FMDjlGKh2k-wncu8dI64I61Y1w2Z33BxUS118whLCM_DqgPkn8K3x2Vqz301L3qX16TcvoC0b7gM6INlTY2oL5owooeOAYsJUNbJrCB1VapBNbBnYtjZm0sfRH6ljNpQpMw4StB91EtHaX2AGEDPpbgRrJYccYoV1woarRliRdaEYsGOquoEY3nTTaiRTtld64RnGvkzjWaNVCzIlhUXrgsogGoYz_kVDbQeUWg5fLv2A7_Bn6ENgJH4yIFsIlq-WRqj9G6fsuH2eTEMeIHwhjasw
linkProvider Springer Nature
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3dS8MwED90CvritzidmgfftLA2sUl8G06ZuA3RKb6VNEllIJ1snQ_zn_eatfMDfdC3Qi6h5C73kcv9DuBIohcSxkHooTPKPZQQ7SmqfS-pq7pgRktdN67ZBO92xeOjvCmKwkbla_cyJek09azYDTUnl17efhUFDYPgyTwsMLRYOWL-7d3D7GYlB-Vi4WlRIfPz1K9W6MO1_JYNdUbmcvV_v7cGK4VTSRpTKViHOZtuwGrZsIEU53cT3lpqmKs2XJc0rgj6q2RahkRUash1eb12Rhop-QTAMUhJP80GpOMeXlpH3LT2hRTgrE8k76j2PHILFpkfR9RDzT_G76bK1BbcX170zlte0X3B077gE08bKm1s0HzRhOUYOAY9JUNPTWADq61SCY5jOBfHzNpY-iL0LWfShCZhwleCbkMlHaR2B4jBGEtwI1msOGOUKy4UNdqypB5aEYsqHJfciF6mIBvRDE7Z7WuE-xq5fY0mVaiVDIuKAzeKaIDm2A85lVU4KRn0Mfz7art_Iz-EpVav047aV93rPVgOHL_z54A1qGTDsd2HRf2a9UfDAyeU7wuI3Zc
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LT8MwDLZgIMSFN2I8c-AGFWsTmoTbxJhAwITEQ9yqNEkREuqmreMw_jxO1o6H4IC4VUqaVrYT27H9GWBfohUSp1EcoDHKA5QQHSiqwyBrqIZgRkvdML7ZBO90xOOjvPlUxe-z3auQ5LimwaE05cVRz2RHk8I3PEW5DFwrVhQ6dIhH0zDDXCK989dvHya3LA6gi8XHZbXMz69-1UgfZua3yKhXOO3F___qEiyUxiZpjqVjGaZsvgKLVSMHUu7rVXg7V3135OE3SPOCoB1LxuVJROWGXFbXbiekmZNPwBzdnDznRZdc-4RM6ye3rO2RErT1ibhOay8Dv2AZEfKT7lAjDPG5pQq1Bvfts7vT86DsyhDoUPBRoA2VNjWo1mjGHDaOQQvK0GMT2chqq1SG4-jmpSmzNpWhiEPLmTSxyZgIlaDrUMu7ud0AYtD3EtxIlirOGOWKC0WNtixrxFakog4HFWeS3hh8I5nALHu6JkjXxNM1GdVhu2JeUm7EQUIjVNNhzKmsw2HFrI_h31fb_Nv0PZi7abWTq4vO5RbMR57dLktwG2pFf2h3YFa_Fs-D_q6Xz3fqSuZ7
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Harnessing+AI+for+Health+and+Knowledge%3A+An+Investigation+into+Machine+and+Deep+Learning+Models+for+Medical+and+Textual+Data&rft.jtitle=SN+computer+science&rft.au=Abbas%2C+Ali&rft.au=Agarwal%2C+Shreya&rft.au=Jaiswal%2C+Manish&rft.au=Jha%2C+Prajna&rft.date=2025-08-01&rft.pub=Springer+Nature+Singapore&rft.eissn=2661-8907&rft.volume=6&rft.issue=6&rft_id=info:doi/10.1007%2Fs42979-025-04201-z&rft.externalDocID=10_1007_s42979_025_04201_z
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2661-8907&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2661-8907&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2661-8907&client=summon