Ensemble-based Methods for Multi-label Classification on Biomedical Question-Answer Data

Background: Question-answer (QA) is a popular method to seek health-related information and biomedical data. Such questions can refer to more than one medical entity (multi-label) so determining the correct tags is not easy. The question classification (QC) mechanism in a QA system can narrow down t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of information systems engineering and business intelligence Jg. 8; H. 1; S. 42 - 50
Hauptverfasser: Abdillah, Abid Famasya, Putra, Cornelius Bagus Purnama, Apriantoni, Apriantoni, Juanita, Safitri, Purwitasari, Diana
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Universitas Airlangga 26.04.2022
ISSN:2598-6333, 2443-2555
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Background: Question-answer (QA) is a popular method to seek health-related information and biomedical data. Such questions can refer to more than one medical entity (multi-label) so determining the correct tags is not easy. The question classification (QC) mechanism in a QA system can narrow down the answers we are seeking. Objective: This study develops a multi-label classification using the heterogeneous ensembles method to improve accuracy in biomedical data with long text dimensions. Methods: We used the ensemble method with heterogeneous deep learning and machine learning for multi-label extended text classification. There are 15 various single models consisting of three deep learning (CNN, LSTM, and BERT) and four machine learning algorithms (SVM, kNN, Decision Tree, and Naïve Bayes) with various text representations (TF-IDF, Word2Vec, and FastText). We used the bagging approach with a hard voting mechanism for the decision-making. Results: The result shows that deep learning is more powerful than machine learning as a single multi-label biomedical data classification method. Moreover, we found that top-three was the best number of base learners by combining the ensembles method. Heterogeneous-based ensembles with three learners resulted in an F1-score of 82.3%, which is better than the best single model by CNN with an F1-score of 80%. Conclusion: A multi-label classification of biomedical QA using ensemble models is better than single models. The result shows that heterogeneous ensembles are more potent than homogeneous ensembles on biomedical QA data with long text dimensions. Keywords: Biomedical Question Classification, Ensemble Method, Heterogeneous Ensembles, Multi-Label Classification, Question Answering
AbstractList Background: Question-answer (QA) is a popular method to seek health-related information and biomedical data. Such questions can refer to more than one medical entity (multi-label) so determining the correct tags is not easy. The question classification (QC) mechanism in a QA system can narrow down the answers we are seeking. Objective: This study develops a multi-label classification using the heterogeneous ensembles method to improve accuracy in biomedical data with long text dimensions. Methods: We used the ensemble method with heterogeneous deep learning and machine learning for multi-label extended text classification. There are 15 various single models consisting of three deep learning (CNN, LSTM, and BERT) and four machine learning algorithms (SVM, kNN, Decision Tree, and Naïve Bayes) with various text representations (TF-IDF, Word2Vec, and FastText). We used the bagging approach with a hard voting mechanism for the decision-making. Results: The result shows that deep learning is more powerful than machine learning as a single multi-label biomedical data classification method. Moreover, we found that top-three was the best number of base learners by combining the ensembles method. Heterogeneous-based ensembles with three learners resulted in an F1-score of 82.3%, which is better than the best single model by CNN with an F1-score of 80%. Conclusion: A multi-label classification of biomedical QA using ensemble models is better than single models. The result shows that heterogeneous ensembles are more potent than homogeneous ensembles on biomedical QA data with long text dimensions. Keywords: Biomedical Question Classification, Ensemble Method, Heterogeneous Ensembles, Multi-Label Classification, Question Answering
Background: Question-answer (QA) is a popular method to seek health-related information and biomedical data. Such questions can refer to more than one medical entity (multi-label) so determining the correct tags is not easy. The question classification (QC) mechanism in a QA system can narrow down the answers we are seeking. Objective: This study develops a multi-label classification using the heterogeneous ensembles method to improve accuracy in biomedical data with long text dimensions. Methods: We used the ensemble method with heterogeneous deep learning and machine learning for multi-label extended text classification. There are 15 various single models consisting of three deep learning (CNN, LSTM, and BERT) and four machine learning algorithms (SVM, kNN, Decision Tree, and Naïve Bayes) with various text representations (TF-IDF, Word2Vec, and FastText). We used the bagging approach with a hard voting mechanism for the decision-making. Results: The result shows that deep learning is more powerful than machine learning as a single multi-label biomedical data classification method. Moreover, we found that top-three was the best number of base learners by combining the ensembles method. Heterogeneous-based ensembles with three learners resulted in an F1-score of 82.3%, which is better than the best single model by CNN with an F1-score of 80%. Conclusion: A multi-label classification of biomedical QA using ensemble models is better than single models. The result shows that heterogeneous ensembles are more potent than homogeneous ensembles on biomedical QA data with long text dimensions. Keywords: Biomedical Question Classification, Ensemble Method, Heterogeneous Ensembles, Multi-Label Classification, Question Answering
Author Abdillah, Abid Famasya
Juanita, Safitri
Purwitasari, Diana
Putra, Cornelius Bagus Purnama
Apriantoni, Apriantoni
Author_xml – sequence: 1
  givenname: Abid Famasya
  orcidid: 0000-0002-8373-2826
  surname: Abdillah
  fullname: Abdillah, Abid Famasya
– sequence: 2
  givenname: Cornelius Bagus Purnama
  orcidid: 0000-0003-2036-9449
  surname: Putra
  fullname: Putra, Cornelius Bagus Purnama
– sequence: 3
  givenname: Apriantoni
  orcidid: 0000-0001-6078-5784
  surname: Apriantoni
  fullname: Apriantoni, Apriantoni
– sequence: 4
  givenname: Safitri
  orcidid: 0000-0002-7787-7623
  surname: Juanita
  fullname: Juanita, Safitri
– sequence: 5
  givenname: Diana
  orcidid: 0000-0001-7000-7628
  surname: Purwitasari
  fullname: Purwitasari, Diana
BookMark eNo9kMFKAzEURYNUsNZ-gLv5gdRkkkyTZa1VCy0iKLgLLzMvmpJOZDJF_HvHVoQL73LgncW9JKM2tUjINWezksm5uNmFjC7M9IzPZEkVOyPjUkpBS6XUaOjKaFoJIS7INOfgmGKlEVWpxuRt1Wbcu4jUQcam2GL_kZpc-NQV20PsA43gMBbLCMOnDzX0IbXFkNuQ9tgMIBbPB8y_mC7a_IVdcQc9XJFzDzHj9O9OyOv96mX5SDdPD-vlYkNrrjijlfGVrIwRDKQ2jQIjuNHMeS2MAkBgBgemKlU5jRIaLxvQ3qBBX5dKiAlZn7xNgp397MIeum-bINgjSN27ha4PdUTLgSEqpbkwTjKsnWZzgxqbuYaB8cHFT666Szl36P99nNnj0va0tNWWW1laxcQP-Jl00A
Cites_doi 10.1016/j.asoc.2021.107689
10.1016/j.ins.2020.06.017
10.1109/IJCNN.2017.7966144
10.1016/j.eswa.2021.115819
10.1016/j.jbi.2018.07.012
10.1016/j.jbi.2021.103699
10.1016/j.eswa.2016.03.045
10.1177/0165551516677911
10.1016/j.jksuci.2018.08.005
10.1177/0165551515591724
10.1186/s12911-020-1122-3
10.7717/peerj-cs.570
10.1016/j.ipm.2020.102441
10.1109/ACCESS.2020.3004908
10.1007/s10844-019-00584-7
10.1177/2150132720975517
10.1371/journal.pone.0230442
10.1016/j.jbi.2021.103867
10.1016/j.ipm.2015.04.006
10.1007/978-3-030-61527-7_35
10.1016/j.neucom.2021.07.031
10.1007/s10664-021-09976-2
10.1016/j.im.2020.103360
10.1016/j.jbi.2019.103143
ContentType Journal Article
DBID AAYXX
CITATION
DOA
DOI 10.20473/jisebi.8.1.42-50
DatabaseName CrossRef
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList CrossRef

Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2443-2555
EndPage 50
ExternalDocumentID oai_doaj_org_article_1a0ee558139b40ecb8079e8ed78a1391
10_20473_jisebi_8_1_42_50
GroupedDBID 5VS
AAYXX
ALMA_UNASSIGNED_HOLDINGS
CITATION
GROUPED_DOAJ
ID FETCH-LOGICAL-c1510-69f6469930a489d5a931980bf8395aaea09ea935656b8e4adf4da8f9e9efc2533
IEDL.DBID DOA
ISSN 2598-6333
IngestDate Fri Oct 03 12:40:53 EDT 2025
Sat Nov 29 03:28:02 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
License http://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c1510-69f6469930a489d5a931980bf8395aaea09ea935656b8e4adf4da8f9e9efc2533
ORCID 0000-0003-2036-9449
0000-0002-7787-7623
0000-0001-7000-7628
0000-0001-6078-5784
0000-0002-8373-2826
OpenAccessLink https://doaj.org/article/1a0ee558139b40ecb8079e8ed78a1391
PageCount 9
ParticipantIDs doaj_primary_oai_doaj_org_article_1a0ee558139b40ecb8079e8ed78a1391
crossref_primary_10_20473_jisebi_8_1_42_50
PublicationCentury 2000
PublicationDate 2022-04-26
PublicationDateYYYYMMDD 2022-04-26
PublicationDate_xml – month: 04
  year: 2022
  text: 2022-04-26
  day: 26
PublicationDecade 2020
PublicationTitle Journal of information systems engineering and business intelligence
PublicationYear 2022
Publisher Universitas Airlangga
Publisher_xml – name: Universitas Airlangga
References ref13
ref12
ref15
ref14
ref11
ref10
ref0
ref2
ref1
ref17
ref16
ref19
ref18
ref24
ref23
ref25
ref20
ref22
ref21
ref8
ref7
ref9
ref4
ref3
ref6
ref5
References_xml – ident: ref7
  doi: 10.1016/j.asoc.2021.107689
– ident: ref15
  doi: 10.1016/j.ins.2020.06.017
– ident: ref9
  doi: 10.1109/IJCNN.2017.7966144
– ident: ref8
  doi: 10.1016/j.eswa.2021.115819
– ident: ref24
– ident: ref18
  doi: 10.1016/j.jbi.2018.07.012
– ident: ref13
  doi: 10.1016/j.jbi.2021.103699
– ident: ref10
  doi: 10.1016/j.eswa.2016.03.045
– ident: ref19
– ident: ref11
  doi: 10.1177/0165551516677911
– ident: ref3
  doi: 10.1016/j.jksuci.2018.08.005
– ident: ref14
  doi: 10.1177/0165551515591724
– ident: ref5
  doi: 10.1186/s12911-020-1122-3
– ident: ref4
  doi: 10.7717/peerj-cs.570
– ident: ref20
  doi: 10.1016/j.ipm.2020.102441
– ident: ref25
  doi: 10.1109/ACCESS.2020.3004908
– ident: ref2
  doi: 10.1007/s10844-019-00584-7
– ident: ref23
  doi: 10.1177/2150132720975517
– ident: ref17
  doi: 10.1371/journal.pone.0230442
– ident: ref16
  doi: 10.1016/j.jbi.2021.103867
– ident: ref1
  doi: 10.1016/j.ipm.2015.04.006
– ident: ref21
  doi: 10.1007/978-3-030-61527-7_35
– ident: ref12
  doi: 10.1016/j.neucom.2021.07.031
– ident: ref22
  doi: 10.1007/s10664-021-09976-2
– ident: ref0
  doi: 10.1016/j.im.2020.103360
– ident: ref6
  doi: 10.1016/j.jbi.2019.103143
SSID ssib050293625
ssib044744629
ssj0001922490
Score 2.1798067
Snippet Background: Question-answer (QA) is a popular method to seek health-related information and biomedical data. Such questions can refer to more than one medical...
SourceID doaj
crossref
SourceType Open Website
Index Database
StartPage 42
Title Ensemble-based Methods for Multi-label Classification on Biomedical Question-Answer Data
URI https://doaj.org/article/1a0ee558139b40ecb8079e8ed78a1391
Volume 8
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 2443-2555
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001922490
  issn: 2598-6333
  databaseCode: DOA
  dateStart: 20150101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2443-2555
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssib044744629
  issn: 2598-6333
  databaseCode: M~E
  dateStart: 20150101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LS8QwEA4iHrz4Ft_k4EnImqZpkxx9rHjZxYPC3kKSTmFFq9hVb_52J2nV9eRFKD0MIZRvEub70swMIcelV1XISs8E8AIFSiWZCyYw5PZaQJZ5L11qNqHGYz2ZmJu5Vl_xTlhXHrgD7jRzHKAoNDIVLzkEr7kyoKFS2qEtCR-0zIkpXElSKpQ5P4G_4BjVyn7l3ne8BnVHPIBB-q9Zmed598tTcKny0_tpC3460INsIFGr8V9Ba662fwpCV2tkpWeP9Kz76nWyAM0GWf3qzED7jbpJJsOmhUf_ACxGqYqOUp_oliJDpSnllqHz4YGmlpjxslDyD8XnPKXjR8_RdBaKZnbWtO84-6WbuS1ydzW8vbhmfQ8FFjCWozI0dYkK2OTcSW2qwhncc5r7GolR4Rw4bgBtkdZ5DdJVtaycrg0YqINALrhNFpunBnYIjYiF4KVQWS2lr5DwclUr7oQqC8j0Ljn5Ask-d6UyLEqMhKjtELXaZlYKW_Bdch5h_B4Yq1wnA_re9r63f_l-7z8m2SfLIqY0cMlEeUAWZy-vcEiWwtts2r4cpWWF79HH8BPyDM33
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Ensemble-based+Methods+for+Multi-label+Classification+on+Biomedical+Question-Answer+Data&rft.jtitle=Journal+of+information+systems+engineering+and+business+intelligence&rft.au=Abdillah%2C+Abid+Famasya&rft.au=Putra%2C+Cornelius+Bagus+Purnama&rft.au=Apriantoni%2C+Apriantoni&rft.au=Juanita%2C+Safitri&rft.date=2022-04-26&rft.issn=2598-6333&rft.eissn=2443-2555&rft.volume=8&rft.issue=1&rft.spage=42&rft.epage=50&rft_id=info:doi/10.20473%2Fjisebi.8.1.42-50&rft.externalDBID=n%2Fa&rft.externalDocID=10_20473_jisebi_8_1_42_50
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2598-6333&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2598-6333&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2598-6333&client=summon