Ensemble-based Methods for Multi-label Classification on Biomedical Question-Answer Data
Background: Question-answer (QA) is a popular method to seek health-related information and biomedical data. Such questions can refer to more than one medical entity (multi-label) so determining the correct tags is not easy. The question classification (QC) mechanism in a QA system can narrow down t...
Gespeichert in:
| Veröffentlicht in: | Journal of information systems engineering and business intelligence Jg. 8; H. 1; S. 42 - 50 |
|---|---|
| Hauptverfasser: | , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Universitas Airlangga
26.04.2022
|
| ISSN: | 2598-6333, 2443-2555 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Background: Question-answer (QA) is a popular method to seek health-related information and biomedical data. Such questions can refer to more than one medical entity (multi-label) so determining the correct tags is not easy. The question classification (QC) mechanism in a QA system can narrow down the answers we are seeking.
Objective: This study develops a multi-label classification using the heterogeneous ensembles method to improve accuracy in biomedical data with long text dimensions.
Methods: We used the ensemble method with heterogeneous deep learning and machine learning for multi-label extended text classification. There are 15 various single models consisting of three deep learning (CNN, LSTM, and BERT) and four machine learning algorithms (SVM, kNN, Decision Tree, and Naïve Bayes) with various text representations (TF-IDF, Word2Vec, and FastText). We used the bagging approach with a hard voting mechanism for the decision-making.
Results: The result shows that deep learning is more powerful than machine learning as a single multi-label biomedical data classification method. Moreover, we found that top-three was the best number of base learners by combining the ensembles method. Heterogeneous-based ensembles with three learners resulted in an F1-score of 82.3%, which is better than the best single model by CNN with an F1-score of 80%.
Conclusion: A multi-label classification of biomedical QA using ensemble models is better than single models. The result shows that heterogeneous ensembles are more potent than homogeneous ensembles on biomedical QA data with long text dimensions.
Keywords: Biomedical Question Classification, Ensemble Method, Heterogeneous Ensembles, Multi-Label Classification, Question Answering |
|---|---|
| AbstractList | Background: Question-answer (QA) is a popular method to seek health-related information and biomedical data. Such questions can refer to more than one medical entity (multi-label) so determining the correct tags is not easy. The question classification (QC) mechanism in a QA system can narrow down the answers we are seeking.
Objective: This study develops a multi-label classification using the heterogeneous ensembles method to improve accuracy in biomedical data with long text dimensions.
Methods: We used the ensemble method with heterogeneous deep learning and machine learning for multi-label extended text classification. There are 15 various single models consisting of three deep learning (CNN, LSTM, and BERT) and four machine learning algorithms (SVM, kNN, Decision Tree, and Naïve Bayes) with various text representations (TF-IDF, Word2Vec, and FastText). We used the bagging approach with a hard voting mechanism for the decision-making.
Results: The result shows that deep learning is more powerful than machine learning as a single multi-label biomedical data classification method. Moreover, we found that top-three was the best number of base learners by combining the ensembles method. Heterogeneous-based ensembles with three learners resulted in an F1-score of 82.3%, which is better than the best single model by CNN with an F1-score of 80%.
Conclusion: A multi-label classification of biomedical QA using ensemble models is better than single models. The result shows that heterogeneous ensembles are more potent than homogeneous ensembles on biomedical QA data with long text dimensions.
Keywords: Biomedical Question Classification, Ensemble Method, Heterogeneous Ensembles, Multi-Label Classification, Question Answering Background: Question-answer (QA) is a popular method to seek health-related information and biomedical data. Such questions can refer to more than one medical entity (multi-label) so determining the correct tags is not easy. The question classification (QC) mechanism in a QA system can narrow down the answers we are seeking. Objective: This study develops a multi-label classification using the heterogeneous ensembles method to improve accuracy in biomedical data with long text dimensions. Methods: We used the ensemble method with heterogeneous deep learning and machine learning for multi-label extended text classification. There are 15 various single models consisting of three deep learning (CNN, LSTM, and BERT) and four machine learning algorithms (SVM, kNN, Decision Tree, and Naïve Bayes) with various text representations (TF-IDF, Word2Vec, and FastText). We used the bagging approach with a hard voting mechanism for the decision-making. Results: The result shows that deep learning is more powerful than machine learning as a single multi-label biomedical data classification method. Moreover, we found that top-three was the best number of base learners by combining the ensembles method. Heterogeneous-based ensembles with three learners resulted in an F1-score of 82.3%, which is better than the best single model by CNN with an F1-score of 80%. Conclusion: A multi-label classification of biomedical QA using ensemble models is better than single models. The result shows that heterogeneous ensembles are more potent than homogeneous ensembles on biomedical QA data with long text dimensions. Keywords: Biomedical Question Classification, Ensemble Method, Heterogeneous Ensembles, Multi-Label Classification, Question Answering |
| Author | Abdillah, Abid Famasya Juanita, Safitri Purwitasari, Diana Putra, Cornelius Bagus Purnama Apriantoni, Apriantoni |
| Author_xml | – sequence: 1 givenname: Abid Famasya orcidid: 0000-0002-8373-2826 surname: Abdillah fullname: Abdillah, Abid Famasya – sequence: 2 givenname: Cornelius Bagus Purnama orcidid: 0000-0003-2036-9449 surname: Putra fullname: Putra, Cornelius Bagus Purnama – sequence: 3 givenname: Apriantoni orcidid: 0000-0001-6078-5784 surname: Apriantoni fullname: Apriantoni, Apriantoni – sequence: 4 givenname: Safitri orcidid: 0000-0002-7787-7623 surname: Juanita fullname: Juanita, Safitri – sequence: 5 givenname: Diana orcidid: 0000-0001-7000-7628 surname: Purwitasari fullname: Purwitasari, Diana |
| BookMark | eNo9kMFKAzEURYNUsNZ-gLv5gdRkkkyTZa1VCy0iKLgLLzMvmpJOZDJF_HvHVoQL73LgncW9JKM2tUjINWezksm5uNmFjC7M9IzPZEkVOyPjUkpBS6XUaOjKaFoJIS7INOfgmGKlEVWpxuRt1Wbcu4jUQcam2GL_kZpc-NQV20PsA43gMBbLCMOnDzX0IbXFkNuQ9tgMIBbPB8y_mC7a_IVdcQc9XJFzDzHj9O9OyOv96mX5SDdPD-vlYkNrrjijlfGVrIwRDKQ2jQIjuNHMeS2MAkBgBgemKlU5jRIaLxvQ3qBBX5dKiAlZn7xNgp397MIeum-bINgjSN27ha4PdUTLgSEqpbkwTjKsnWZzgxqbuYaB8cHFT666Szl36P99nNnj0va0tNWWW1laxcQP-Jl00A |
| Cites_doi | 10.1016/j.asoc.2021.107689 10.1016/j.ins.2020.06.017 10.1109/IJCNN.2017.7966144 10.1016/j.eswa.2021.115819 10.1016/j.jbi.2018.07.012 10.1016/j.jbi.2021.103699 10.1016/j.eswa.2016.03.045 10.1177/0165551516677911 10.1016/j.jksuci.2018.08.005 10.1177/0165551515591724 10.1186/s12911-020-1122-3 10.7717/peerj-cs.570 10.1016/j.ipm.2020.102441 10.1109/ACCESS.2020.3004908 10.1007/s10844-019-00584-7 10.1177/2150132720975517 10.1371/journal.pone.0230442 10.1016/j.jbi.2021.103867 10.1016/j.ipm.2015.04.006 10.1007/978-3-030-61527-7_35 10.1016/j.neucom.2021.07.031 10.1007/s10664-021-09976-2 10.1016/j.im.2020.103360 10.1016/j.jbi.2019.103143 |
| ContentType | Journal Article |
| DBID | AAYXX CITATION DOA |
| DOI | 10.20473/jisebi.8.1.42-50 |
| DatabaseName | CrossRef DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 2443-2555 |
| EndPage | 50 |
| ExternalDocumentID | oai_doaj_org_article_1a0ee558139b40ecb8079e8ed78a1391 10_20473_jisebi_8_1_42_50 |
| GroupedDBID | 5VS AAYXX ALMA_UNASSIGNED_HOLDINGS CITATION GROUPED_DOAJ |
| ID | FETCH-LOGICAL-c1510-69f6469930a489d5a931980bf8395aaea09ea935656b8e4adf4da8f9e9efc2533 |
| IEDL.DBID | DOA |
| ISSN | 2598-6333 |
| IngestDate | Fri Oct 03 12:40:53 EDT 2025 Sat Nov 29 03:28:02 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 1 |
| Language | English |
| License | http://creativecommons.org/licenses/by/4.0 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c1510-69f6469930a489d5a931980bf8395aaea09ea935656b8e4adf4da8f9e9efc2533 |
| ORCID | 0000-0003-2036-9449 0000-0002-7787-7623 0000-0001-7000-7628 0000-0001-6078-5784 0000-0002-8373-2826 |
| OpenAccessLink | https://doaj.org/article/1a0ee558139b40ecb8079e8ed78a1391 |
| PageCount | 9 |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_1a0ee558139b40ecb8079e8ed78a1391 crossref_primary_10_20473_jisebi_8_1_42_50 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-04-26 |
| PublicationDateYYYYMMDD | 2022-04-26 |
| PublicationDate_xml | – month: 04 year: 2022 text: 2022-04-26 day: 26 |
| PublicationDecade | 2020 |
| PublicationTitle | Journal of information systems engineering and business intelligence |
| PublicationYear | 2022 |
| Publisher | Universitas Airlangga |
| Publisher_xml | – name: Universitas Airlangga |
| References | ref13 ref12 ref15 ref14 ref11 ref10 ref0 ref2 ref1 ref17 ref16 ref19 ref18 ref24 ref23 ref25 ref20 ref22 ref21 ref8 ref7 ref9 ref4 ref3 ref6 ref5 |
| References_xml | – ident: ref7 doi: 10.1016/j.asoc.2021.107689 – ident: ref15 doi: 10.1016/j.ins.2020.06.017 – ident: ref9 doi: 10.1109/IJCNN.2017.7966144 – ident: ref8 doi: 10.1016/j.eswa.2021.115819 – ident: ref24 – ident: ref18 doi: 10.1016/j.jbi.2018.07.012 – ident: ref13 doi: 10.1016/j.jbi.2021.103699 – ident: ref10 doi: 10.1016/j.eswa.2016.03.045 – ident: ref19 – ident: ref11 doi: 10.1177/0165551516677911 – ident: ref3 doi: 10.1016/j.jksuci.2018.08.005 – ident: ref14 doi: 10.1177/0165551515591724 – ident: ref5 doi: 10.1186/s12911-020-1122-3 – ident: ref4 doi: 10.7717/peerj-cs.570 – ident: ref20 doi: 10.1016/j.ipm.2020.102441 – ident: ref25 doi: 10.1109/ACCESS.2020.3004908 – ident: ref2 doi: 10.1007/s10844-019-00584-7 – ident: ref23 doi: 10.1177/2150132720975517 – ident: ref17 doi: 10.1371/journal.pone.0230442 – ident: ref16 doi: 10.1016/j.jbi.2021.103867 – ident: ref1 doi: 10.1016/j.ipm.2015.04.006 – ident: ref21 doi: 10.1007/978-3-030-61527-7_35 – ident: ref12 doi: 10.1016/j.neucom.2021.07.031 – ident: ref22 doi: 10.1007/s10664-021-09976-2 – ident: ref0 doi: 10.1016/j.im.2020.103360 – ident: ref6 doi: 10.1016/j.jbi.2019.103143 |
| SSID | ssib050293625 ssib044744629 ssj0001922490 |
| Score | 2.1798067 |
| Snippet | Background: Question-answer (QA) is a popular method to seek health-related information and biomedical data. Such questions can refer to more than one medical... |
| SourceID | doaj crossref |
| SourceType | Open Website Index Database |
| StartPage | 42 |
| Title | Ensemble-based Methods for Multi-label Classification on Biomedical Question-Answer Data |
| URI | https://doaj.org/article/1a0ee558139b40ecb8079e8ed78a1391 |
| Volume | 8 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2443-2555 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001922490 issn: 2598-6333 databaseCode: DOA dateStart: 20150101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2443-2555 dateEnd: 99991231 omitProxy: false ssIdentifier: ssib044744629 issn: 2598-6333 databaseCode: M~E dateStart: 20150101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LS8QwEA4iHrz4Ft_k4EnImqZpkxx9rHjZxYPC3kKSTmFFq9hVb_52J2nV9eRFKD0MIZRvEub70swMIcelV1XISs8E8AIFSiWZCyYw5PZaQJZ5L11qNqHGYz2ZmJu5Vl_xTlhXHrgD7jRzHKAoNDIVLzkEr7kyoKFS2qEtCR-0zIkpXElSKpQ5P4G_4BjVyn7l3ne8BnVHPIBB-q9Zmed598tTcKny0_tpC3460INsIFGr8V9Ba662fwpCV2tkpWeP9Kz76nWyAM0GWf3qzED7jbpJJsOmhUf_ACxGqYqOUp_oliJDpSnllqHz4YGmlpjxslDyD8XnPKXjR8_RdBaKZnbWtO84-6WbuS1ydzW8vbhmfQ8FFjCWozI0dYkK2OTcSW2qwhncc5r7GolR4Rw4bgBtkdZ5DdJVtaycrg0YqINALrhNFpunBnYIjYiF4KVQWS2lr5DwclUr7oQqC8j0Ljn5Ask-d6UyLEqMhKjtELXaZlYKW_Bdch5h_B4Yq1wnA_re9r63f_l-7z8m2SfLIqY0cMlEeUAWZy-vcEiWwtts2r4cpWWF79HH8BPyDM33 |
| linkProvider | Directory of Open Access Journals |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Ensemble-based+Methods+for+Multi-label+Classification+on+Biomedical+Question-Answer+Data&rft.jtitle=Journal+of+information+systems+engineering+and+business+intelligence&rft.au=Abdillah%2C+Abid+Famasya&rft.au=Putra%2C+Cornelius+Bagus+Purnama&rft.au=Apriantoni%2C+Apriantoni&rft.au=Juanita%2C+Safitri&rft.date=2022-04-26&rft.issn=2598-6333&rft.eissn=2443-2555&rft.volume=8&rft.issue=1&rft.spage=42&rft.epage=50&rft_id=info:doi/10.20473%2Fjisebi.8.1.42-50&rft.externalDBID=n%2Fa&rft.externalDocID=10_20473_jisebi_8_1_42_50 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2598-6333&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2598-6333&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2598-6333&client=summon |