The effect of clustering algorithms on question answering

Question answering (QA) is one of the essential fields in information retrieval where specific answers are provided instead of large documents. The relations among questions and answers are determined using natural language processing techniques while clustering algorithms can be helpful in improvin...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Expert systems with applications Ročník 243; s. 122959
Hlavní autori: AlMahmoud, Rana Husni, Alian, Marwah
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Elsevier Ltd 01.06.2024
Predmet:
ISSN:0957-4174, 1873-6793
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Question answering (QA) is one of the essential fields in information retrieval where specific answers are provided instead of large documents. The relations among questions and answers are determined using natural language processing techniques while clustering algorithms can be helpful in improving the effectiveness of result retrieval by reducing the amount of required comparisons for a specific question or answer. In this work, we introduce a clustering-based approach for a QA system. This approach groups related questions into clusters using different clustering algorithms, specifies the appropriate answer using similarity methods between the answers and the generated clusters, and then assigns answers to their most related questions. Different clustering algorithms, such as k-means, spherical k-means, single-linkage hierarchical clustering (SLHA), unweighted pair group method with arithmetic mean (UPGMA), expectation–maximization (EM), and clustering Arabic documents based on bond energy (CADBE), are tested. The effectiveness of a clustering algorithm is investigated with respect to certain factors, including number of clusters, text representation, similarity measure between answers and clusters, and similarity measure between answers and questions in a selected cluster. In addition, a comprehensive ranking system is introduced to evaluate the performance of clustering algorithms. Evaluation is performed using the Dataset of Arabic Why Question Answering System (DAWQAS) and the Multilingual Question Answering (MLQA) dataset. Results show that CADBE achieves the highest accuracy and the first rank, followed by SLHA and UPGMA, while spherical k-means has the lowest rank. The performance of clustering algorithms for MLQA dataset is affected by its characteristics, such as short questions, long and varied answers, and diverse subject domains. Unigram and bigram intersection measures perform well in most cases. Term frequency inverse document frequency representation outperforms word embedding in DAWQAS. Overall, the experiments provide insights into the performance of clustering algorithms in QA systems. •A clustering-based QA system groups related questions, selects answer via similarity.•Assigning Answers to Related Questions Using Various Similarity Methods.•Exploring certain factors to investigate effectiveness of clustering algorithm.•A comprehensive ranking system evaluates the performance of clustering algorithms.•CADBE achieves highest accuracy, then SLHA, UPGMA. Spherical k-means ranks lowest.
AbstractList Question answering (QA) is one of the essential fields in information retrieval where specific answers are provided instead of large documents. The relations among questions and answers are determined using natural language processing techniques while clustering algorithms can be helpful in improving the effectiveness of result retrieval by reducing the amount of required comparisons for a specific question or answer. In this work, we introduce a clustering-based approach for a QA system. This approach groups related questions into clusters using different clustering algorithms, specifies the appropriate answer using similarity methods between the answers and the generated clusters, and then assigns answers to their most related questions. Different clustering algorithms, such as k-means, spherical k-means, single-linkage hierarchical clustering (SLHA), unweighted pair group method with arithmetic mean (UPGMA), expectation–maximization (EM), and clustering Arabic documents based on bond energy (CADBE), are tested. The effectiveness of a clustering algorithm is investigated with respect to certain factors, including number of clusters, text representation, similarity measure between answers and clusters, and similarity measure between answers and questions in a selected cluster. In addition, a comprehensive ranking system is introduced to evaluate the performance of clustering algorithms. Evaluation is performed using the Dataset of Arabic Why Question Answering System (DAWQAS) and the Multilingual Question Answering (MLQA) dataset. Results show that CADBE achieves the highest accuracy and the first rank, followed by SLHA and UPGMA, while spherical k-means has the lowest rank. The performance of clustering algorithms for MLQA dataset is affected by its characteristics, such as short questions, long and varied answers, and diverse subject domains. Unigram and bigram intersection measures perform well in most cases. Term frequency inverse document frequency representation outperforms word embedding in DAWQAS. Overall, the experiments provide insights into the performance of clustering algorithms in QA systems. •A clustering-based QA system groups related questions, selects answer via similarity.•Assigning Answers to Related Questions Using Various Similarity Methods.•Exploring certain factors to investigate effectiveness of clustering algorithm.•A comprehensive ranking system evaluates the performance of clustering algorithms.•CADBE achieves highest accuracy, then SLHA, UPGMA. Spherical k-means ranks lowest.
ArticleNumber 122959
Author AlMahmoud, Rana Husni
Alian, Marwah
Author_xml – sequence: 1
  givenname: Rana Husni
  orcidid: 0000-0003-4240-9392
  surname: AlMahmoud
  fullname: AlMahmoud, Rana Husni
  email: Rana.Almahmoud@gju.edu.jo
  organization: School of Electrical Engineering and Information Technology, German Jordanian University, Amman, Jordan
– sequence: 2
  givenname: Marwah
  orcidid: 0000-0001-6358-059X
  surname: Alian
  fullname: Alian, Marwah
  email: marwah2001@yahoo.com
  organization: Basic Sciences Department, Faculty of Science, The Hashemite University, Zarqa, Jordan
BookMark eNp9kMtOwzAQRS1UJNLCD7DKDyT4kcSxxAZVvKRKbMracuxx6yhNwHap-HsSwopFV3MXc0b3zBIt-qEHhG4Jzgkm1V2bQzipnGLKckKpKMUFSkjNWVZxwRYowaLkWUF4cYWWIbQYE44xT5DY7iEFa0HHdLCp7o4hgnf9LlXdbvAu7g8hHfr08wghujGoPpx-F67RpVVdgJu_uULvT4_b9Uu2eXt-XT9sMs0wjhkpNVPCGl1XBeOKNYxVY02tqaEKbFEzgik31hRgmGpKUYIWnFZNo-uGlMBWiM53tR9C8GDlh3cH5b8lwXKSl62c5OUkL2f5Ear_QdpFNQlEr1x3Hr2fURilvhx4GbSDXoNxfvySNIM7h_8A6XF4MA
CitedBy_id crossref_primary_10_1016_j_engappai_2024_109042
crossref_primary_10_1051_bioconf_202414601041
crossref_primary_10_1016_j_conengprac_2024_106129
crossref_primary_10_3389_fpubh_2025_1597381
crossref_primary_10_3390_computers13120327
crossref_primary_10_1038_s41598_025_96696_y
crossref_primary_10_1016_j_cie_2025_110886
Cites_doi 10.1111/j.2517-6161.1977.tb01600.x
10.1109/ACCESS.2019.2918675
10.1109/2.781637
10.1093/comjnl/20.2.141
10.1016/j.patcog.2012.04.031
10.18637/jss.v050.i10
10.12733/jics20105420
10.1109/ACCESS.2021.3074950
10.1007/s41870-022-01012-w
10.1007/s00500-021-05754-w
10.1016/j.csl.2019.101023
10.1108/IDD-06-2018-0022
10.1016/j.eswa.2020.113598
10.1016/j.procs.2018.10.467
10.1016/B978-0-12-387730-7.00018-8
10.1016/j.procs.2017.10.108
10.1016/j.procs.2019.09.203
10.1007/s12046-018-1022-8
10.1007/s10772-020-09753-4
10.1145/584792.584890
10.18653/v1/2020.ecnlp-1.11
ContentType Journal Article
Copyright 2023 Elsevier Ltd
Copyright_xml – notice: 2023 Elsevier Ltd
DBID AAYXX
CITATION
DOI 10.1016/j.eswa.2023.122959
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1873-6793
ExternalDocumentID 10_1016_j_eswa_2023_122959
S0957417423034619
GroupedDBID --K
--M
.DC
.~1
0R~
13V
1B1
1RT
1~.
1~5
4.4
457
4G.
5GY
5VS
7-5
71M
8P~
9JN
9JO
AAAKF
AABNK
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AARIN
AAXUO
AAYFN
ABBOA
ABFNM
ABMAC
ABMVD
ABUCO
ABYKQ
ACDAQ
ACGFS
ACHRH
ACNTT
ACRLP
ACZNC
ADBBV
ADEZE
ADTZH
AEBSH
AECPX
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGJBL
AGUBO
AGUMN
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJOXV
ALEQD
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
APLSM
AXJTR
BJAXD
BKOJK
BLXMC
BNSAS
CS3
DU5
EBS
EFJIC
EFLBG
EO8
EO9
EP2
EP3
F5P
FDB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
HAMUX
IHE
J1W
JJJVA
KOM
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
PQQKQ
Q38
RIG
ROL
RPZ
SDF
SDG
SDP
SDS
SES
SEW
SPC
SPCBC
SSB
SSD
SSL
SST
SSV
SSZ
T5K
TN5
~G-
29G
9DU
AAAKG
AAQXK
AATTM
AAXKI
AAYWO
AAYXX
ABJNI
ABKBG
ABUFD
ABWVN
ABXDB
ACLOT
ACNNM
ACRPL
ACVFH
ADCNI
ADJOM
ADMUD
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
ASPBG
AVWKF
AZFZN
CITATION
EFKBS
EJD
FEDTE
FGOYB
G-2
HLZ
HVGLF
HZ~
LG9
LY1
LY7
M41
R2-
SBC
SET
WUQ
XPP
ZMT
~HD
ID FETCH-LOGICAL-c300t-15c3a9fdc86437a3b336202cc2d2aef4831027dfd4ed3ab595ec9726bbc8b15e3
ISICitedReferencesCount 8
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001138974300001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0957-4174
IngestDate Sat Nov 29 07:05:58 EST 2025
Tue Nov 18 21:00:47 EST 2025
Sat Mar 02 16:00:14 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Arabic language
Question answering
Clustering
Similarity measures
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c300t-15c3a9fdc86437a3b336202cc2d2aef4831027dfd4ed3ab595ec9726bbc8b15e3
ORCID 0000-0003-4240-9392
0000-0001-6358-059X
ParticipantIDs crossref_primary_10_1016_j_eswa_2023_122959
crossref_citationtrail_10_1016_j_eswa_2023_122959
elsevier_sciencedirect_doi_10_1016_j_eswa_2023_122959
PublicationCentury 2000
PublicationDate 2024-06-01
2024-06-00
PublicationDateYYYYMMDD 2024-06-01
PublicationDate_xml – month: 06
  year: 2024
  text: 2024-06-01
  day: 01
PublicationDecade 2020
PublicationTitle Expert systems with applications
PublicationYear 2024
Publisher Elsevier Ltd
Publisher_xml – name: Elsevier Ltd
References Dhillon, Fan, Guan (b20) 2001
(pp. 1027–1035).
Kamal, Abd Azim, Mahmoud (b32) 2014
Zhang, Zhao (b56) 2010
Alian, Awajan (b10) 2023
Alian, Awajan (b8) 2020; 23
Karpagam, Saradha (b33) 2019; 44
Ratna, Noviaindriani, Santiar, Ibrahim, Purnamasari (b44) 2019
Ahmed, BabuAnto (b2) 2016; 12
Dikshit, Chandra, Gupta (b21) 2021
Ismail, Homsi (b26) 2018; 142
(pp. 69–76).
Alian, Al-Naymat (b7) 2022; 14
Arthur, D., & Vassilvitskii, S. (2007). K-means++ the advantages of careful seeding. In
Othman, Faiz, Smaïli (b40) 2019; 159
Legendre, Legendre (b35) 1998
Hamerly, G., & Elkan, C. (2002). Alternatives to the k-means algorithm that find better clusterings. In
Tan, Steinbach, Kumar (b50) 2016
San, Huynh, Nakamori (b46) 2004; 14
Reddy, Madhavi (b45) 2017; 19
Wang, Zhou, Gan, Chen, Fang, Sun, Cheng, Liu (b52) 2021
Jing, Huang, Shi (b30) 2002
AlMahmoud, Hammo, Faris (b13) 2020; 159
Rahim (b43) 2021
Zhong (b57) 2005
Yoon, Shin, Jung (b54) 2017
Jin, Han (b28) 2010
Yang, Lai, Lin (b53) 2012; 45
Al Mahmoud, Hammo, Faris (b5) 2023
Ashok, A., Natarajan, G., Elmasri, R., & Smith-Stvan, L. (2020). SimsterQ: A Similarity based Clustering Approach to Opinion Question Answering. In
Zelnik-Manor, Perona (b55) 2004; 17
Mohammad (b38) 2017
Alian, Awajan (b9) 2021; 25
Allahyari, Pouriyeh, Assefi, Safaei, Trippe, Gutierrez, Kochut (b12) 2017
Lewis, Oguz, Rinott, Riedel, Schwenk (b36) 2020
Jin, Luo, Gao, Tang, Yuan (b29) 2019; 7
(pp. 600–607).
Abdi, Hasan, Arshi, Shamsuddin, Idris (b1) 2020; 60
Jain, Sharma (b27) 2018
Banerjee, Dhillon, Ghosh, Sra, Ridgeway (b16) 2005; 6
Ahmed, Bibin, Anto (b3) 2017; 6
Everitt, Landau, Leese, Stahl (b22) 2011
Jovanovska, Bozhinova, Zdravkova (b31) 2015
Al-Khawaldeh (b4) 2015; 5
Hornik, Feinerer, Kober, Buchta (b25) 2012; 50
Ullmann (b51) 1977; 20
Zhu, Zhang, Li, He, Zhang (b58) 2016
Aljalbout, Golkov, Siddiqui, Strobel, Cremers (b11) 2018
Biltawi, Tedmori, Awajan (b17) 2021; 9
Borriss, Rueckert, Blom, Bezuidt, Reva, Klenk (b18) 2011
Dempster, Laird, Rubin (b19) 1977; 39
Sun, Ma, Wang (b49) 2015; 12
Schubotz, Scharpf, Dudhat, Nagar, Hamborg, Gipp (b47) 2018
Sokal (b48) 1958; 38
Paranjpe (b41) 2007
Karypis, Han, Kumar (b34) 1999; 32
Mikolov, Chen, Corrado, Dean (b37) 2013
Mozannar, Hajal, Maamary, Hajj (b39) 2019
Gupta, Kulkarni, Chanda, Rayasam, Lipton (b23) 2019
Albarghothi, Khater, Shaalan (b6) 2017; 117
Perera (b42) 2012
Tan (10.1016/j.eswa.2023.122959_b50) 2016
Reddy (10.1016/j.eswa.2023.122959_b45) 2017; 19
Zhong (10.1016/j.eswa.2023.122959_b57) 2005
Schubotz (10.1016/j.eswa.2023.122959_b47) 2018
Al Mahmoud (10.1016/j.eswa.2023.122959_b5) 2023
10.1016/j.eswa.2023.122959_b14
Karpagam (10.1016/j.eswa.2023.122959_b33) 2019; 44
Ullmann (10.1016/j.eswa.2023.122959_b51) 1977; 20
Zelnik-Manor (10.1016/j.eswa.2023.122959_b55) 2004; 17
Alian (10.1016/j.eswa.2023.122959_b7) 2022; 14
10.1016/j.eswa.2023.122959_b15
Gupta (10.1016/j.eswa.2023.122959_b23) 2019
Yang (10.1016/j.eswa.2023.122959_b53) 2012; 45
Mozannar (10.1016/j.eswa.2023.122959_b39) 2019
Lewis (10.1016/j.eswa.2023.122959_b36) 2020
Wang (10.1016/j.eswa.2023.122959_b52) 2021
Perera (10.1016/j.eswa.2023.122959_b42) 2012
Allahyari (10.1016/j.eswa.2023.122959_b12) 2017
Kamal (10.1016/j.eswa.2023.122959_b32) 2014
Karypis (10.1016/j.eswa.2023.122959_b34) 1999; 32
Dikshit (10.1016/j.eswa.2023.122959_b21) 2021
Ismail (10.1016/j.eswa.2023.122959_b26) 2018; 142
Jing (10.1016/j.eswa.2023.122959_b30) 2002
San (10.1016/j.eswa.2023.122959_b46) 2004; 14
Everitt (10.1016/j.eswa.2023.122959_b22) 2011
Mohammad (10.1016/j.eswa.2023.122959_b38) 2017
Paranjpe (10.1016/j.eswa.2023.122959_b41) 2007
Alian (10.1016/j.eswa.2023.122959_b9) 2021; 25
10.1016/j.eswa.2023.122959_b24
Zhang (10.1016/j.eswa.2023.122959_b56) 2010
Aljalbout (10.1016/j.eswa.2023.122959_b11) 2018
Zhu (10.1016/j.eswa.2023.122959_b58) 2016
Albarghothi (10.1016/j.eswa.2023.122959_b6) 2017; 117
Jin (10.1016/j.eswa.2023.122959_b29) 2019; 7
Alian (10.1016/j.eswa.2023.122959_b10) 2023
Dempster (10.1016/j.eswa.2023.122959_b19) 1977; 39
Sokal (10.1016/j.eswa.2023.122959_b48) 1958; 38
Banerjee (10.1016/j.eswa.2023.122959_b16) 2005; 6
Yoon (10.1016/j.eswa.2023.122959_b54) 2017
Jain (10.1016/j.eswa.2023.122959_b27) 2018
Borriss (10.1016/j.eswa.2023.122959_b18) 2011
AlMahmoud (10.1016/j.eswa.2023.122959_b13) 2020; 159
Othman (10.1016/j.eswa.2023.122959_b40) 2019; 159
Legendre (10.1016/j.eswa.2023.122959_b35) 1998
Ahmed (10.1016/j.eswa.2023.122959_b3) 2017; 6
Abdi (10.1016/j.eswa.2023.122959_b1) 2020; 60
Alian (10.1016/j.eswa.2023.122959_b8) 2020; 23
Ratna (10.1016/j.eswa.2023.122959_b44) 2019
Dhillon (10.1016/j.eswa.2023.122959_b20) 2001
Jin (10.1016/j.eswa.2023.122959_b28) 2010
Ahmed (10.1016/j.eswa.2023.122959_b2) 2016; 12
Hornik (10.1016/j.eswa.2023.122959_b25) 2012; 50
Rahim (10.1016/j.eswa.2023.122959_b43) 2021
Al-Khawaldeh (10.1016/j.eswa.2023.122959_b4) 2015; 5
Mikolov (10.1016/j.eswa.2023.122959_b37) 2013
Jovanovska (10.1016/j.eswa.2023.122959_b31) 2015
Sun (10.1016/j.eswa.2023.122959_b49) 2015; 12
Biltawi (10.1016/j.eswa.2023.122959_b17) 2021; 9
References_xml – start-page: 2692
  year: 2010
  end-page: 2696
  ident: b56
  article-title: A Chinese question-answering system with question classification and answer clustering
  publication-title: 2010 seventh international conference on fuzzy systems and knowledge discovery, Vol. 6
– start-page: 1
  year: 2023
  end-page: 39
  ident: b5
  article-title: Cluster-based ensemble learning model for improving sentiment classification of arabic documents
  publication-title: Natural Language Engineering
– start-page: 1415
  year: 2016
  end-page: 1420
  ident: b58
  article-title: A study of damp-heat syndrome classification using word2vec and TF-IDF
  publication-title: 2016 IEEE international conference on bioinformatics and biomedicine (BIBM)
– volume: 6
  year: 2005
  ident: b16
  article-title: Clustering on the unit hypersphere using von mises-Fisher distributions
  publication-title: Journal of Machine Learning Research
– year: 2011
  ident: b22
  article-title: Cluster analysis: Wiley series in probability and statistics
– year: 2017
  ident: b38
  article-title: Word affect intensities
– start-page: 641
  year: 2014
  end-page: 645
  ident: b32
  article-title: Enhancing arabic question answering system
  publication-title: 2014 international conference on computational intelligence and communication networks
– volume: 38
  start-page: 1409
  year: 1958
  end-page: 1438
  ident: b48
  article-title: A statistical method for evaluating systematic relationships
  publication-title: The University of Kansas Science Bulletin
– volume: 142
  start-page: 123
  year: 2018
  end-page: 131
  ident: b26
  article-title: Dawqas: A dataset for arabic why question answering system
  publication-title: Procedia Computer Science
– year: 1998
  ident: b35
  article-title: Numerical ecology
– volume: 7
  start-page: 75235
  year: 2019
  end-page: 75246
  ident: b29
  article-title: ComQA: Question answering over knowledge base via semantic matching
  publication-title: IEEE Access
– start-page: 1
  year: 2019
  end-page: 5
  ident: b44
  article-title: K-means clustering for answer categorization on latent semantic analysis automatic Japanese short essay grading system
  publication-title: 2019 16th international conference on quality in research (QIR): international symposium on electrical and computer engineering
– volume: 32
  start-page: 68
  year: 1999
  end-page: 75
  ident: b34
  article-title: Chameleon: Hierarchical clustering using dynamic modeling
  publication-title: Computer
– start-page: 3958
  year: 2021
  end-page: 3968
  ident: b52
  article-title: Cluster-former: Clustering-based sparse transformer for question answering
  publication-title: Findings of the association for computational linguistics: ACL-IJCNLP 2021
– year: 2007
  ident: b41
  article-title: Clustering semantically similar and related questions
– reference: (pp. 1027–1035).
– year: 2016
  ident: b50
  article-title: Introduction to data mining
– start-page: 245
  year: 2012
  end-page: 246
  ident: b42
  article-title: Ipedagogy: Question answering system based on web information clustering
  publication-title: 2012 IEEE fourth international conference on technology for education
– start-page: 357
  year: 2001
  end-page: 381
  ident: b20
  article-title: Efficient clustering of very large document collections
  publication-title: Data mining for scientific and engineering applications
– volume: 12
  start-page: 18
  year: 2016
  end-page: 22
  ident: b2
  article-title: Answer extraction for how and why questions in question answering systems
  publication-title: International Journal of Computational Engineering Research (IJCER)
– volume: 23
  start-page: 851
  year: 2020
  end-page: 859
  ident: b8
  article-title: Factors affecting sentence similarity and paraphrasing identification
  publication-title: International Journal of Speech Technology
– reference: (pp. 69–76).
– year: 2019
  ident: b39
  article-title: Neural arabic question answering
– volume: 9
  start-page: 63876
  year: 2021
  end-page: 63904
  ident: b17
  article-title: Arabic question answering systems: Gap analysis
  publication-title: IEEE Access
– year: 2018
  ident: b47
  article-title: Introducing mathqa: a math-aware question answering system
  publication-title: Information Discovery and Delivery
– year: 2013
  ident: b37
  article-title: Efficient estimation of word representations in vector space
– volume: 159
  year: 2020
  ident: b13
  article-title: A modified bond energy algorithm with fuzzy merging and its application to Arabic text document clustering
  publication-title: Expert Systems with Applications
– year: 2019
  ident: b23
  article-title: Amazonqa: a review-based question answering task
– reference: (pp. 600–607).
– start-page: 1209
  year: 2018
  end-page: 1213
  ident: b27
  article-title: Clustering of text streams via facility location and spherical K-means
  publication-title: 2018 second international conference on electronics, communication and aerospace technology (ICECA)
– start-page: 7315
  year: 2020
  end-page: 7330
  ident: b36
  article-title: MLQA: Evaluating cross-lingual extractive question answering
  publication-title: Proceedings of the 58th annual meeting of the association for computational linguistics
– reference: Hamerly, G., & Elkan, C. (2002). Alternatives to the k-means algorithm that find better clusterings. In
– year: 2021
  ident: b43
  article-title: Measuring semantic similarity for arabic sentences using machine learning
– start-page: 3180
  year: 2005
  end-page: 3185
  ident: b57
  article-title: Efficient online spherical k-means clustering
  publication-title: Proceedings. 2005 IEEE international joint conference on neural networks, 2005, Vol. 5
– start-page: 205
  year: 2015
  end-page: 214
  ident: b31
  article-title: Using NLP methods to improve the effectiveness of a Macedonian question answering system
  publication-title: International conference on ICT innovations
– volume: 6
  start-page: 142
  year: 2017
  end-page: 144
  ident: b3
  article-title: Question answering system based on neural networks
  publication-title: International Journal of Engineering Research
– reference: Ashok, A., Natarajan, G., Elmasri, R., & Smith-Stvan, L. (2020). SimsterQ: A Similarity based Clustering Approach to Opinion Question Answering. In
– volume: 117
  start-page: 183
  year: 2017
  end-page: 191
  ident: b6
  article-title: Arabic question answering using ontology
  publication-title: Procedia Computer Science
– year: 2018
  ident: b11
  article-title: Clustering with deep learning: Taxonomy and new methods
– volume: 17
  year: 2004
  ident: b55
  article-title: Self-tuning spectral clustering
  publication-title: Advances in Neural Information Processing Systems
– year: 2017
  ident: b54
  article-title: Learning to rank question-answer pairs using hierarchical recurrent encoder with latent topic clustering
– volume: 14
  start-page: 3793
  year: 2022
  end-page: 3802
  ident: b7
  article-title: Questions clustering using canopy-k-means and hierarchical-k-means clustering
  publication-title: International Journal of Information Technology
– reference: Arthur, D., & Vassilvitskii, S. (2007). K-means++ the advantages of careful seeding. In
– volume: 19
  start-page: 19
  year: 2017
  end-page: 23
  ident: b45
  article-title: A survey on types of question answering system
  publication-title: IOSR Journal of Computer Engineering (IOSR-JCE)
– volume: 50
  start-page: 1
  year: 2012
  end-page: 22
  ident: b25
  article-title: Spherical k-means clustering
  publication-title: Journal of Statistical Software
– year: 2010
  ident: b28
  article-title: Expectation maximization clustering
– start-page: 1630
  year: 2021
  end-page: 1633
  ident: b21
  article-title: Automating questions and answers of good and services tax system using clustering and embeddings of queries
  publication-title: 2021 20th IEEE international conference on machine learning and applications (ICMLA)
– volume: 159
  start-page: 485
  year: 2019
  end-page: 494
  ident: b40
  article-title: Enhancing question retrieval in community question answering using word embeddings
  publication-title: Procedia Computer Science
– volume: 39
  start-page: 1
  year: 1977
  end-page: 22
  ident: b19
  article-title: Maximum likelihood from incomplete data via the EM algorithm
  publication-title: Journal of the Royal Statistical Society. Series B. Statistical Methodology
– start-page: 944
  year: 2002
  end-page: 946
  ident: b30
  article-title: Improved feature selection approach TFIDF in text mining
  publication-title: Machine learning and cybernetics, 2002. Proceedings. 2002 international conference on, Vol. 2
– volume: 60
  year: 2020
  ident: b1
  article-title: A question answering system in hadith using linguistic knowledge
  publication-title: Computer Speech and Language
– volume: 20
  start-page: 141
  year: 1977
  end-page: 147
  ident: b51
  article-title: A binary n-gram technique for automatic correction of substitution, deletion, insertion and reversal errors in words
  publication-title: The Computer Journal
– volume: 12
  start-page: 957
  year: 2015
  end-page: 964
  ident: b49
  article-title: A comparative evaluation of string similarity metrics for ontology alignment
  publication-title: Journal of Information & Computational Science
– volume: 14
  start-page: 241
  year: 2004
  end-page: 247
  ident: b46
  article-title: An alternative extension of the k-means algorithm for clustering categorical data
  publication-title: International Journal of Applied Mathematics and Computer Science
– year: 2017
  ident: b12
  article-title: A brief survey of text mining: Classification, clustering and extraction techniques
– start-page: 1
  year: 2023
  end-page: 12
  ident: b10
  article-title: Syntactic-semantic similarity based on dependency tree kernel
  publication-title: Arabian Journal for Science and Engineering
– start-page: 409
  year: 2011
  end-page: 436
  ident: b18
  article-title: Whole genome sequence comparisons in taxonomy
  publication-title: Methods in microbiology, Vol. 38
– volume: 5
  start-page: 82
  year: 2015
  end-page: 86
  ident: b4
  article-title: Answer extraction for why arabic questions answering systems: EWAQ
  publication-title: World of Computer Science and Information Technology Journal (WCSIT)
– volume: 25
  start-page: 10089
  year: 2021
  end-page: 10101
  ident: b9
  article-title: Arabic sentence similarity based on similarity features and machine learning
  publication-title: Soft Computing
– volume: 44
  start-page: 1
  year: 2019
  end-page: 10
  ident: b33
  article-title: A framework for intelligent question answering system using semantic context-specific document clustering and wordnet
  publication-title: Sādhanā
– volume: 45
  start-page: 3950
  year: 2012
  end-page: 3961
  ident: b53
  article-title: A robust EM clustering algorithm for Gaussian mixture models
  publication-title: Pattern Recognition
– volume: 12
  start-page: 18
  issue: 6
  year: 2016
  ident: 10.1016/j.eswa.2023.122959_b2
  article-title: Answer extraction for how and why questions in question answering systems
  publication-title: International Journal of Computational Engineering Research (IJCER)
– volume: 39
  start-page: 1
  issue: 1
  year: 1977
  ident: 10.1016/j.eswa.2023.122959_b19
  article-title: Maximum likelihood from incomplete data via the EM algorithm
  publication-title: Journal of the Royal Statistical Society. Series B. Statistical Methodology
  doi: 10.1111/j.2517-6161.1977.tb01600.x
– volume: 7
  start-page: 75235
  year: 2019
  ident: 10.1016/j.eswa.2023.122959_b29
  article-title: ComQA: Question answering over knowledge base via semantic matching
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2019.2918675
– year: 1998
  ident: 10.1016/j.eswa.2023.122959_b35
– volume: 32
  start-page: 68
  issue: 8
  year: 1999
  ident: 10.1016/j.eswa.2023.122959_b34
  article-title: Chameleon: Hierarchical clustering using dynamic modeling
  publication-title: Computer
  doi: 10.1109/2.781637
– start-page: 1
  year: 2023
  ident: 10.1016/j.eswa.2023.122959_b5
  article-title: Cluster-based ensemble learning model for improving sentiment classification of arabic documents
  publication-title: Natural Language Engineering
– start-page: 944
  year: 2002
  ident: 10.1016/j.eswa.2023.122959_b30
  article-title: Improved feature selection approach TFIDF in text mining
– start-page: 205
  year: 2015
  ident: 10.1016/j.eswa.2023.122959_b31
  article-title: Using NLP methods to improve the effectiveness of a Macedonian question answering system
– ident: 10.1016/j.eswa.2023.122959_b14
– year: 2013
  ident: 10.1016/j.eswa.2023.122959_b37
– volume: 20
  start-page: 141
  issue: 2
  year: 1977
  ident: 10.1016/j.eswa.2023.122959_b51
  article-title: A binary n-gram technique for automatic correction of substitution, deletion, insertion and reversal errors in words
  publication-title: The Computer Journal
  doi: 10.1093/comjnl/20.2.141
– start-page: 1
  year: 2019
  ident: 10.1016/j.eswa.2023.122959_b44
  article-title: K-means clustering for answer categorization on latent semantic analysis automatic Japanese short essay grading system
– volume: 45
  start-page: 3950
  issue: 11
  year: 2012
  ident: 10.1016/j.eswa.2023.122959_b53
  article-title: A robust EM clustering algorithm for Gaussian mixture models
  publication-title: Pattern Recognition
  doi: 10.1016/j.patcog.2012.04.031
– volume: 50
  start-page: 1
  year: 2012
  ident: 10.1016/j.eswa.2023.122959_b25
  article-title: Spherical k-means clustering
  publication-title: Journal of Statistical Software
  doi: 10.18637/jss.v050.i10
– volume: 12
  start-page: 957
  issue: 3
  year: 2015
  ident: 10.1016/j.eswa.2023.122959_b49
  article-title: A comparative evaluation of string similarity metrics for ontology alignment
  publication-title: Journal of Information & Computational Science
  doi: 10.12733/jics20105420
– volume: 38
  start-page: 1409
  year: 1958
  ident: 10.1016/j.eswa.2023.122959_b48
  article-title: A statistical method for evaluating systematic relationships
  publication-title: The University of Kansas Science Bulletin
– volume: 9
  start-page: 63876
  year: 2021
  ident: 10.1016/j.eswa.2023.122959_b17
  article-title: Arabic question answering systems: Gap analysis
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2021.3074950
– year: 2019
  ident: 10.1016/j.eswa.2023.122959_b39
– start-page: 357
  year: 2001
  ident: 10.1016/j.eswa.2023.122959_b20
  article-title: Efficient clustering of very large document collections
– volume: 17
  year: 2004
  ident: 10.1016/j.eswa.2023.122959_b55
  article-title: Self-tuning spectral clustering
  publication-title: Advances in Neural Information Processing Systems
– start-page: 2692
  year: 2010
  ident: 10.1016/j.eswa.2023.122959_b56
  article-title: A Chinese question-answering system with question classification and answer clustering
– year: 2016
  ident: 10.1016/j.eswa.2023.122959_b50
– start-page: 1415
  year: 2016
  ident: 10.1016/j.eswa.2023.122959_b58
  article-title: A study of damp-heat syndrome classification using word2vec and TF-IDF
– start-page: 3180
  year: 2005
  ident: 10.1016/j.eswa.2023.122959_b57
  article-title: Efficient online spherical k-means clustering
– start-page: 1
  year: 2023
  ident: 10.1016/j.eswa.2023.122959_b10
  article-title: Syntactic-semantic similarity based on dependency tree kernel
  publication-title: Arabian Journal for Science and Engineering
– year: 2019
  ident: 10.1016/j.eswa.2023.122959_b23
– volume: 14
  start-page: 3793
  issue: 7
  year: 2022
  ident: 10.1016/j.eswa.2023.122959_b7
  article-title: Questions clustering using canopy-k-means and hierarchical-k-means clustering
  publication-title: International Journal of Information Technology
  doi: 10.1007/s41870-022-01012-w
– volume: 25
  start-page: 10089
  issue: 15
  year: 2021
  ident: 10.1016/j.eswa.2023.122959_b9
  article-title: Arabic sentence similarity based on similarity features and machine learning
  publication-title: Soft Computing
  doi: 10.1007/s00500-021-05754-w
– start-page: 3958
  year: 2021
  ident: 10.1016/j.eswa.2023.122959_b52
  article-title: Cluster-former: Clustering-based sparse transformer for question answering
– volume: 6
  issue: 9
  year: 2005
  ident: 10.1016/j.eswa.2023.122959_b16
  article-title: Clustering on the unit hypersphere using von mises-Fisher distributions
  publication-title: Journal of Machine Learning Research
– volume: 60
  year: 2020
  ident: 10.1016/j.eswa.2023.122959_b1
  article-title: A question answering system in hadith using linguistic knowledge
  publication-title: Computer Speech and Language
  doi: 10.1016/j.csl.2019.101023
– year: 2021
  ident: 10.1016/j.eswa.2023.122959_b43
– year: 2018
  ident: 10.1016/j.eswa.2023.122959_b47
  article-title: Introducing mathqa: a math-aware question answering system
  publication-title: Information Discovery and Delivery
  doi: 10.1108/IDD-06-2018-0022
– year: 2017
  ident: 10.1016/j.eswa.2023.122959_b38
– volume: 19
  start-page: 19
  issue: 6
  year: 2017
  ident: 10.1016/j.eswa.2023.122959_b45
  article-title: A survey on types of question answering system
  publication-title: IOSR Journal of Computer Engineering (IOSR-JCE)
– year: 2018
  ident: 10.1016/j.eswa.2023.122959_b11
– volume: 159
  year: 2020
  ident: 10.1016/j.eswa.2023.122959_b13
  article-title: A modified bond energy algorithm with fuzzy merging and its application to Arabic text document clustering
  publication-title: Expert Systems with Applications
  doi: 10.1016/j.eswa.2020.113598
– start-page: 1209
  year: 2018
  ident: 10.1016/j.eswa.2023.122959_b27
  article-title: Clustering of text streams via facility location and spherical K-means
– year: 2017
  ident: 10.1016/j.eswa.2023.122959_b12
– start-page: 7315
  year: 2020
  ident: 10.1016/j.eswa.2023.122959_b36
  article-title: MLQA: Evaluating cross-lingual extractive question answering
– volume: 142
  start-page: 123
  year: 2018
  ident: 10.1016/j.eswa.2023.122959_b26
  article-title: Dawqas: A dataset for arabic why question answering system
  publication-title: Procedia Computer Science
  doi: 10.1016/j.procs.2018.10.467
– start-page: 641
  year: 2014
  ident: 10.1016/j.eswa.2023.122959_b32
  article-title: Enhancing arabic question answering system
– year: 2011
  ident: 10.1016/j.eswa.2023.122959_b22
– start-page: 409
  year: 2011
  ident: 10.1016/j.eswa.2023.122959_b18
  article-title: Whole genome sequence comparisons in taxonomy
  doi: 10.1016/B978-0-12-387730-7.00018-8
– volume: 117
  start-page: 183
  year: 2017
  ident: 10.1016/j.eswa.2023.122959_b6
  article-title: Arabic question answering using ontology
  publication-title: Procedia Computer Science
  doi: 10.1016/j.procs.2017.10.108
– volume: 159
  start-page: 485
  year: 2019
  ident: 10.1016/j.eswa.2023.122959_b40
  article-title: Enhancing question retrieval in community question answering using word embeddings
  publication-title: Procedia Computer Science
  doi: 10.1016/j.procs.2019.09.203
– volume: 44
  start-page: 1
  issue: 3
  year: 2019
  ident: 10.1016/j.eswa.2023.122959_b33
  article-title: A framework for intelligent question answering system using semantic context-specific document clustering and wordnet
  publication-title: Sādhanā
  doi: 10.1007/s12046-018-1022-8
– volume: 23
  start-page: 851
  issue: 4
  year: 2020
  ident: 10.1016/j.eswa.2023.122959_b8
  article-title: Factors affecting sentence similarity and paraphrasing identification
  publication-title: International Journal of Speech Technology
  doi: 10.1007/s10772-020-09753-4
– start-page: 245
  year: 2012
  ident: 10.1016/j.eswa.2023.122959_b42
  article-title: Ipedagogy: Question answering system based on web information clustering
– volume: 6
  start-page: 142
  issue: 3
  year: 2017
  ident: 10.1016/j.eswa.2023.122959_b3
  article-title: Question answering system based on neural networks
  publication-title: International Journal of Engineering Research
– volume: 14
  start-page: 241
  issue: 2
  year: 2004
  ident: 10.1016/j.eswa.2023.122959_b46
  article-title: An alternative extension of the k-means algorithm for clustering categorical data
  publication-title: International Journal of Applied Mathematics and Computer Science
– year: 2007
  ident: 10.1016/j.eswa.2023.122959_b41
– year: 2010
  ident: 10.1016/j.eswa.2023.122959_b28
– year: 2017
  ident: 10.1016/j.eswa.2023.122959_b54
– ident: 10.1016/j.eswa.2023.122959_b24
  doi: 10.1145/584792.584890
– volume: 5
  start-page: 82
  year: 2015
  ident: 10.1016/j.eswa.2023.122959_b4
  article-title: Answer extraction for why arabic questions answering systems: EWAQ
  publication-title: World of Computer Science and Information Technology Journal (WCSIT)
– start-page: 1630
  year: 2021
  ident: 10.1016/j.eswa.2023.122959_b21
  article-title: Automating questions and answers of good and services tax system using clustering and embeddings of queries
– ident: 10.1016/j.eswa.2023.122959_b15
  doi: 10.18653/v1/2020.ecnlp-1.11
SSID ssj0017007
Score 2.478272
Snippet Question answering (QA) is one of the essential fields in information retrieval where specific answers are provided instead of large documents. The relations...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 122959
SubjectTerms Arabic language
Clustering
Question answering
Similarity measures
Title The effect of clustering algorithms on question answering
URI https://dx.doi.org/10.1016/j.eswa.2023.122959
Volume 243
WOSCitedRecordID wos001138974300001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: ScienceDirect database
  customDbUrl:
  eissn: 1873-6793
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017007
  issn: 0957-4174
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1JT-MwFLagcODCMsyIZUA-cENBiR039hGhjgAB4gBSb5G3DEVtgpqU8vOxYycpiEHDgYsVObZj-Xt6fnkrAEeYKsFFmAUIMR3EvB8GNCIqwEJmmQwTcyPVSVyvkpsbOhyyW1_DvazLCSR5Tl9e2NO3Qm36DNg2dPYLcLeLmg7zbEA3rYHdtP8NvPPSqF3GxzObCqEORRz_Laaj6mFSWwjq-8D5IpfzesAbJb3NgFz5PM9NBNyCrbulk_E1f5gUM-e2w3N-fD4r81H3euQUrNd8OveaZ69iQHHnCtXqCpMgjlw5nYZtohgvML7IlgVnH_Jkpx54PNHl3CZ6QvikG_w2Afa7i6l1F2w80R5Tu0Zq10jdGstgBSWEGXa2cnoxGF62BqQkdJHyzc59vJRz7Xu_k49lkgU5424TrPsfBHjqgN0CSzr_ATaa4hvQ8-JtwAzO0OEMiwx2OMMOZ1jksMEZtjj_BPd_Bndn54GvgxFIHIZVEBGJOcuUpNbKyrHARuoIkZRIIa6z2NaKQ4nKVKwV5oIwoiVLUF8ISUVENP4FenmR6x0AucKmo6-R1YkQpBinEmkj01GhMhqSXRA1J5FKnyTe1ioZp__GYBcct3OeXIqUT0eT5oBTL-Q54S019PLJvL0vfWUfrHWE_Bv0qulMH4BV-VyNyumhJ5ZXKeRzZA
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+effect+of+clustering+algorithms+on+question+answering&rft.jtitle=Expert+systems+with+applications&rft.au=AlMahmoud%2C+Rana+Husni&rft.au=Alian%2C+Marwah&rft.date=2024-06-01&rft.issn=0957-4174&rft.volume=243&rft.spage=122959&rft_id=info:doi/10.1016%2Fj.eswa.2023.122959&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_eswa_2023_122959
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0957-4174&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0957-4174&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0957-4174&client=summon