Harnessing AI for Health and Knowledge: An Investigation into Machine and Deep Learning Models for Medical and Textual Data
The digitalization of medical information has greatly enhanced medical research by converting clinical observations and patient data into structured and unstructured textual formats, respectively. Despite the progress, there’s a notable scarcity of large-scale textual clinical data. This research de...
Uloženo v:
| Vydáno v: | SN computer science Ročník 6; číslo 6; s. 696 |
|---|---|
| Hlavní autoři: | , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Singapore
Springer Nature Singapore
01.08.2025
Springer Nature B.V |
| Témata: | |
| ISSN: | 2661-8907, 2662-995X, 2661-8907 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | The digitalization of medical information has greatly enhanced medical research by converting clinical observations and patient data into structured and unstructured textual formats, respectively. Despite the progress, there’s a notable scarcity of large-scale textual clinical data. This research delves into utilizing machine learning (ML) and deep learning (DL) techniques for classifying both structured medical and unstructured textual data. Specifically, it focuses on the Parkinson’s Disease dataset for structured medical data and the 20 Newsgroup dataset for unstructured textual information. The study involves experimenting with four distinct feature vectors for textual data and employing recursive feature elimination with cross-validation on structured medical data to remove superfluous features. The classifiers chosen for this investigation are Naïve Bayes (NB) for ML and Multi-Layer Perceptron (MLP) for DL. To address the independence assumption in NB, term weighting strategies were applied, leading to the exploration of five variants of the weighted NB model. However, the sparseness of the 20 Newsgroup dataset prevented the training of Categorical and Gaussian NB models. The study examined forty-nine different MLP models to identify an optimal light DL model suitable for both datasets. Performance evaluation, based on accuracy and F1-measure, revealed that the best-performing NB model was the Multinomial NB, achieving accuracies of 0.80 and 0.81 for the medical and textual datasets, respectively. Meanwhile, the most effective MLP model attained accuracies of 0.77 and 0.92. These findings, benchmarked against existing literature, suggest the feasibility of applying both ML and light DL approaches for concurrent classification of structured medical and unstructured textual data. |
|---|---|
| AbstractList | The digitalization of medical information has greatly enhanced medical research by converting clinical observations and patient data into structured and unstructured textual formats, respectively. Despite the progress, there’s a notable scarcity of large-scale textual clinical data. This research delves into utilizing machine learning (ML) and deep learning (DL) techniques for classifying both structured medical and unstructured textual data. Specifically, it focuses on the Parkinson’s Disease dataset for structured medical data and the 20 Newsgroup dataset for unstructured textual information. The study involves experimenting with four distinct feature vectors for textual data and employing recursive feature elimination with cross-validation on structured medical data to remove superfluous features. The classifiers chosen for this investigation are Naïve Bayes (NB) for ML and Multi-Layer Perceptron (MLP) for DL. To address the independence assumption in NB, term weighting strategies were applied, leading to the exploration of five variants of the weighted NB model. However, the sparseness of the 20 Newsgroup dataset prevented the training of Categorical and Gaussian NB models. The study examined forty-nine different MLP models to identify an optimal light DL model suitable for both datasets. Performance evaluation, based on accuracy and F1-measure, revealed that the best-performing NB model was the Multinomial NB, achieving accuracies of 0.80 and 0.81 for the medical and textual datasets, respectively. Meanwhile, the most effective MLP model attained accuracies of 0.77 and 0.92. These findings, benchmarked against existing literature, suggest the feasibility of applying both ML and light DL approaches for concurrent classification of structured medical and unstructured textual data. |
| ArticleNumber | 696 |
| Author | Siddiqui, Tanveer J. Jaiswal, Manish Agarwal, Shreya Abbas, Ali Jha, Prajna |
| Author_xml | – sequence: 1 givenname: Ali orcidid: 0000-0002-7914-9574 surname: Abbas fullname: Abbas, Ali email: aliabbas367@gmail.com organization: Department of Electronics and Communication, University of Allahabad – sequence: 2 givenname: Shreya surname: Agarwal fullname: Agarwal, Shreya organization: Department of Electronics and Communication, University of Allahabad – sequence: 3 givenname: Manish surname: Jaiswal fullname: Jaiswal, Manish organization: Department of Computer Science, Prof. Rajendra Singh (Rajju Bhaiya) University – sequence: 4 givenname: Prajna surname: Jha fullname: Jha, Prajna organization: Department of Electronics and Communication, University of Allahabad – sequence: 5 givenname: Tanveer J. surname: Siddiqui fullname: Siddiqui, Tanveer J. organization: Department of Electronics and Communication, University of Allahabad |
| BookMark | eNp9kMtOAjEUhhuDiYi8gKsmrkd7Yy7uCKgQIW5w3XSmZ2DI2GI7eMGXt8yY6MpVT9Lv_07Of456xhpA6JKSa0pIcuMFy5IsImwUEcEIjQ4nqM_imEZpRpLen_kMDb3fEhJQIkQ86qOvmXIGvK_MGo_nuLQOz0DVzQYro_Gjse816DXc4rHBc_MGvqnWqqmswZVpLF6qYlMZaOEpwA4vIPiOsqXVUPtWuARdFapuoRV8NPswT1WjLtBpqWoPw593gJ7v71aTWbR4ephPxouooGlyiArNM8g1SwUvRcJFqmPGNR9pBgwKUKoM_zGheS4A8oymMYVEZDrWpUipSvkAXXXenbOv-3CC3Nq9M2Gl5IwLSuOEZ4FiHVU4672DUu5c9aLcp6REHnuWXc8ylCfbnuUhhHgX8gE2a3C_6n9S3-DVgts |
| Cites_doi | 10.1016/j.procs.2023.01.007 10.1007/978-981-99-5435-3_20 10.1109/ICCI-CC.2017.8109735 10.1609/aaai.v33i01.33017370 10.1007/s11042-022-12538-3 10.1016/j.eswa.2010.09.133 10.24432/C5C323 10.1016/j.apacoust.2023.109476 10.1080/02648725.2023.2200333 10.1109/ICoICT.2018.8528777 10.1007/s11042-019-7469-8 10.1007/s11042-022-12767-6 10.1016/j.mehy.2020.109603 10.1007/978-1-4757-4305-0_2 10.1016/j.eswa.2023.121900 10.1109/TBME.2012.2183367 10.1109/TKDE.2016.2522427 10.3390/s21124133 10.1109/ICODSE.2015.7436992 10.1038/npre.2008.2298.1 10.1111/exsy.12739 10.1007/s00521-016-2401-x 10.13164/mendel.2022.1.008 10.1023/A:1013652626023 10.1016/j.eswa.2022.118691 10.1016/j.knosys.2021.107288 |
| ContentType | Journal Article |
| Copyright | The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2025 Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2025. |
| Copyright_xml | – notice: The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2025 Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. – notice: The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2025. |
| DBID | AAYXX CITATION JQ2 |
| DOI | 10.1007/s42979-025-04201-z |
| DatabaseName | CrossRef ProQuest Computer Science Collection |
| DatabaseTitle | CrossRef ProQuest Computer Science Collection |
| DatabaseTitleList | ProQuest Computer Science Collection |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 2661-8907 |
| ExternalDocumentID | 10_1007_s42979_025_04201_z |
| GroupedDBID | 0R~ 2JN 406 AACDK AAHNG AAJBT AASML AATNV AAUYE ABAKF ABBRH ABDBE ABECU ABFSG ABHQN ABJNI ABMQK ABRTQ ABTEG ABTKH ABWNU ACAOD ACDTI ACHSB ACOKC ACPIV ACSTC ACZOJ ADKFA ADKNI ADTPH ADYFF AEFQL AEMSY AESKC AEZWR AFBBN AFDZB AFHIU AFOHR AFQWF AGMZJ AGQEE AGRTI AHPBZ AHWEU AIGIU AILAN AIXLP AJZVZ ALMA_UNASSIGNED_HOLDINGS AMXSW AMYLF ATHPR AYFIA BAPOH BSONS DPUIP EBLON EBS FIGPU FNLPD GGCAI GNWQR IKXTQ IWAJR JZLTJ LLZTM NPVJJ NQJWS PT4 ROL RSV SJYHP SNE SOJ SRMVM SSLCW UOJIU UTJUX ZMTXR AAYXX CITATION KOV JQ2 |
| ID | FETCH-LOGICAL-c187z-cd39ebd2843f47348d623d35d2e2eceaaf39e601bb4eeb91861e749d6df481a83 |
| IEDL.DBID | RSV |
| ISSN | 2661-8907 2662-995X |
| IngestDate | Wed Nov 05 14:53:38 EST 2025 Sat Nov 29 07:36:40 EST 2025 Tue Jul 29 01:10:28 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 6 |
| Keywords | Min-max scaling Naïve Bayes Document classification Machine learning Tf-Idf Artificial intelligence Multi-layer perceptron Parkinson’s disease |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c187z-cd39ebd2843f47348d623d35d2e2eceaaf39e601bb4eeb91861e749d6df481a83 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0002-7914-9574 |
| PQID | 3234116739 |
| PQPubID | 6623307 |
| ParticipantIDs | proquest_journals_3234116739 crossref_primary_10_1007_s42979_025_04201_z springer_journals_10_1007_s42979_025_04201_z |
| PublicationCentury | 2000 |
| PublicationDate | 2025-08-01 |
| PublicationDateYYYYMMDD | 2025-08-01 |
| PublicationDate_xml | – month: 08 year: 2025 text: 2025-08-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | Singapore |
| PublicationPlace_xml | – name: Singapore – name: Kolkata |
| PublicationTitle | SN computer science |
| PublicationTitleAbbrev | SN COMPUT. SCI |
| PublicationYear | 2025 |
| Publisher | Springer Nature Singapore Springer Nature B.V |
| Publisher_xml | – name: Springer Nature Singapore – name: Springer Nature B.V |
| References | T Joachims (4201_CR3) 2002; 18 G Celik (4201_CR26) 2023; 211 A Govindu (4201_CR27) 2023; 218 4201_CR10 4201_CR11 T Vyas (4201_CR13) 2022; 39 4201_CR19 4201_CR8 4201_CR9 4201_CR15 4201_CR4 4201_CR1 4201_CR2 ZK Senturk (4201_CR14) 2020; 138 F Heidarivincheh (4201_CR16) 2021; 21 4201_CR20 S Sivaranjini (4201_CR12) 2020; 79 J Zhang (4201_CR22) 2024; 238 A Tsanas (4201_CR21) 2012; 59 T-T Wong (4201_CR17) 2021; 228 Y-B Kang (4201_CR6) 2023; 211 4201_CR23 B Tang (4201_CR7) 2016; 28 4201_CR24 M Jiang (4201_CR25) 2018; 29 R Ghosh (4201_CR5) 2022; 81 P Luukka (4201_CR28) 2011; 38 RN Rathi (4201_CR18) 2023; 82 |
| References_xml | – volume: 218 start-page: 249 year: 2023 ident: 4201_CR27 publication-title: Procedia Comput Sci doi: 10.1016/j.procs.2023.01.007 – ident: 4201_CR8 – ident: 4201_CR20 doi: 10.1007/978-981-99-5435-3_20 – ident: 4201_CR9 doi: 10.1109/ICCI-CC.2017.8109735 – ident: 4201_CR23 doi: 10.1609/aaai.v33i01.33017370 – volume: 82 start-page: 9761 issue: 7 year: 2023 ident: 4201_CR18 publication-title: Multimed Tools Appl doi: 10.1007/s11042-022-12538-3 – volume: 38 start-page: 4600 issue: 4 year: 2011 ident: 4201_CR28 publication-title: Expert Syst Appl doi: 10.1016/j.eswa.2010.09.133 – ident: 4201_CR2 doi: 10.24432/C5C323 – volume: 211 year: 2023 ident: 4201_CR26 publication-title: Appl Acoust doi: 10.1016/j.apacoust.2023.109476 – ident: 4201_CR15 doi: 10.1080/02648725.2023.2200333 – ident: 4201_CR11 doi: 10.1109/ICoICT.2018.8528777 – volume: 79 start-page: 15467 year: 2020 ident: 4201_CR12 publication-title: Multimed Tools Appl doi: 10.1007/s11042-019-7469-8 – volume: 81 start-page: 24245 issue: 17 year: 2022 ident: 4201_CR5 publication-title: Multimed Tools Appl doi: 10.1007/s11042-022-12767-6 – volume: 138 year: 2020 ident: 4201_CR14 publication-title: Med Hypotheses doi: 10.1016/j.mehy.2020.109603 – ident: 4201_CR4 doi: 10.1007/978-1-4757-4305-0_2 – volume: 238 year: 2024 ident: 4201_CR22 publication-title: Expert Syst Appl doi: 10.1016/j.eswa.2023.121900 – ident: 4201_CR24 – volume: 59 start-page: 1264 issue: 5 year: 2012 ident: 4201_CR21 publication-title: IEEE Trans Biomed Eng doi: 10.1109/TBME.2012.2183367 – volume: 28 start-page: 1602 issue: 6 year: 2016 ident: 4201_CR7 publication-title: IEEE Trans Knowl Data Eng doi: 10.1109/TKDE.2016.2522427 – volume: 21 start-page: 4133 issue: 12 year: 2021 ident: 4201_CR16 publication-title: Sensors doi: 10.3390/s21124133 – ident: 4201_CR10 doi: 10.1109/ICODSE.2015.7436992 – ident: 4201_CR1 doi: 10.1038/npre.2008.2298.1 – volume: 39 issue: 3 year: 2022 ident: 4201_CR13 publication-title: Expert Syst doi: 10.1111/exsy.12739 – volume: 29 start-page: 61 year: 2018 ident: 4201_CR25 publication-title: Neural Comput Appl doi: 10.1007/s00521-016-2401-x – ident: 4201_CR19 doi: 10.13164/mendel.2022.1.008 – volume: 18 start-page: 103 issue: 2–3 year: 2002 ident: 4201_CR3 publication-title: J Intell Inf Syst doi: 10.1023/A:1013652626023 – volume: 211 year: 2023 ident: 4201_CR6 publication-title: Expert Syst Appl doi: 10.1016/j.eswa.2022.118691 – volume: 228 year: 2021 ident: 4201_CR17 publication-title: Knowl Based Syst doi: 10.1016/j.knosys.2021.107288 |
| SSID | ssj0002504465 |
| Score | 2.2989547 |
| Snippet | The digitalization of medical information has greatly enhanced medical research by converting clinical observations and patient data into structured and... |
| SourceID | proquest crossref springer |
| SourceType | Aggregation Database Index Database Publisher |
| StartPage | 696 |
| SubjectTerms | Accuracy Classification Computer Imaging Computer Science Computer Systems Organization and Communication Networks Data Structures and Information Theory Datasets Deep learning Digitization Documents Feature selection Home environment Information Systems and Communication Service Machine learning Medical diagnosis Medical research Methods Multilayer perceptrons Multilayers Neural networks Optimization techniques Original Research Parkinson's disease Pattern Recognition and Graphics Performance evaluation Research Advancements in Intelligent Computing Social networks Software Engineering/Programming and Operating Systems Support vector machines Text categorization Unstructured data Vision |
| Title | Harnessing AI for Health and Knowledge: An Investigation into Machine and Deep Learning Models for Medical and Textual Data |
| URI | https://link.springer.com/article/10.1007/s42979-025-04201-z https://www.proquest.com/docview/3234116739 |
| Volume | 6 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAVX databaseName: SpringerLINK Contemporary 1997-Present customDbUrl: eissn: 2661-8907 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002504465 issn: 2661-8907 databaseCode: RSV dateStart: 20190101 isFulltext: true titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22 providerName: Springer Nature – providerCode: PRVAVX databaseName: SpringerLINK Contemporary 1997-Present customDbUrl: eissn: 2661-8907 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002504465 issn: 2661-8907 databaseCode: RSV dateStart: 20200101 isFulltext: true titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22 providerName: Springer Nature |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnR1NS8MwNOj04MX5idMpOXjTQNtkTeJtOIciDtEpu5U0SWUg3Vg7D_PP-5q1m4oe9FbIy6PkvbyPvC-ETiVYIWEchASMUU6AQzRRVPsk8ZQnmNFSe8YNm-C9nhgM5H1ZFJZV2e5VSNJJ6kWxG0hOLkkxfhUYDZzg2SpaA3UnioEND4_Pi5eVoikXC1tlhczPW79qoaVp-S0a6pRMt_6_39tCm6VRidtzLthGKzbdQfVqYAMu7-8uer9Wk0K0AV7cvsFgr-J5GRJWqcG31fPaBW6n-FMDjlGKh2k-wncu8dI64I61Y1w2Z33BxUS118whLCM_DqgPkn8K3x2Vqz301L3qX16TcvoC0b7gM6INlTY2oL5owooeOAYsJUNbJrCB1VapBNbBnYtjZm0sfRH6ljNpQpMw4StB91EtHaX2AGEDPpbgRrJYccYoV1woarRliRdaEYsGOquoEY3nTTaiRTtld64RnGvkzjWaNVCzIlhUXrgsogGoYz_kVDbQeUWg5fLv2A7_Bn6ENgJH4yIFsIlq-WRqj9G6fsuH2eTEMeIHwhjasw |
| linkProvider | Springer Nature |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3dS8MwED90CvritzidmgfftLA2sUl8G06ZuA3RKb6VNEllIJ1snQ_zn_eatfMDfdC3Qi6h5C73kcv9DuBIohcSxkHooTPKPZQQ7SmqfS-pq7pgRktdN67ZBO92xeOjvCmKwkbla_cyJek09azYDTUnl17efhUFDYPgyTwsMLRYOWL-7d3D7GYlB-Vi4WlRIfPz1K9W6MO1_JYNdUbmcvV_v7cGK4VTSRpTKViHOZtuwGrZsIEU53cT3lpqmKs2XJc0rgj6q2RahkRUash1eb12Rhop-QTAMUhJP80GpOMeXlpH3LT2hRTgrE8k76j2PHILFpkfR9RDzT_G76bK1BbcX170zlte0X3B077gE08bKm1s0HzRhOUYOAY9JUNPTWADq61SCY5jOBfHzNpY-iL0LWfShCZhwleCbkMlHaR2B4jBGEtwI1msOGOUKy4UNdqypB5aEYsqHJfciF6mIBvRDE7Z7WuE-xq5fY0mVaiVDIuKAzeKaIDm2A85lVU4KRn0Mfz7art_Iz-EpVav047aV93rPVgOHL_z54A1qGTDsd2HRf2a9UfDAyeU7wuI3Zc |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LT8MwDLZgIMSFN2I8c-AGFWsTmoTbxJhAwITEQ9yqNEkREuqmreMw_jxO1o6H4IC4VUqaVrYT27H9GWBfohUSp1EcoDHKA5QQHSiqwyBrqIZgRkvdML7ZBO90xOOjvPlUxe-z3auQ5LimwaE05cVRz2RHk8I3PEW5DFwrVhQ6dIhH0zDDXCK989dvHya3LA6gi8XHZbXMz69-1UgfZua3yKhXOO3F___qEiyUxiZpjqVjGaZsvgKLVSMHUu7rVXg7V3135OE3SPOCoB1LxuVJROWGXFbXbiekmZNPwBzdnDznRZdc-4RM6ye3rO2RErT1ibhOay8Dv2AZEfKT7lAjDPG5pQq1Bvfts7vT86DsyhDoUPBRoA2VNjWo1mjGHDaOQQvK0GMT2chqq1SG4-jmpSmzNpWhiEPLmTSxyZgIlaDrUMu7ud0AYtD3EtxIlirOGOWKC0WNtixrxFakog4HFWeS3hh8I5nALHu6JkjXxNM1GdVhu2JeUm7EQUIjVNNhzKmsw2HFrI_h31fb_Nv0PZi7abWTq4vO5RbMR57dLktwG2pFf2h3YFa_Fs-D_q6Xz3fqSuZ7 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Harnessing+AI+for+Health+and+Knowledge%3A+An+Investigation+into+Machine+and+Deep+Learning+Models+for+Medical+and+Textual+Data&rft.jtitle=SN+computer+science&rft.au=Abbas%2C+Ali&rft.au=Agarwal%2C+Shreya&rft.au=Jaiswal%2C+Manish&rft.au=Jha%2C+Prajna&rft.date=2025-08-01&rft.pub=Springer+Nature+Singapore&rft.eissn=2661-8907&rft.volume=6&rft.issue=6&rft_id=info:doi/10.1007%2Fs42979-025-04201-z&rft.externalDocID=10_1007_s42979_025_04201_z |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2661-8907&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2661-8907&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2661-8907&client=summon |