The effect of clustering algorithms on question answering

Question answering (QA) is one of the essential fields in information retrieval where specific answers are provided instead of large documents. The relations among questions and answers are determined using natural language processing techniques while clustering algorithms can be helpful in improvin...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Expert systems with applications Ročník 243; s. 122959
Hlavní autoři: AlMahmoud, Rana Husni, Alian, Marwah
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier Ltd 01.06.2024
Témata:
ISSN:0957-4174, 1873-6793
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Question answering (QA) is one of the essential fields in information retrieval where specific answers are provided instead of large documents. The relations among questions and answers are determined using natural language processing techniques while clustering algorithms can be helpful in improving the effectiveness of result retrieval by reducing the amount of required comparisons for a specific question or answer. In this work, we introduce a clustering-based approach for a QA system. This approach groups related questions into clusters using different clustering algorithms, specifies the appropriate answer using similarity methods between the answers and the generated clusters, and then assigns answers to their most related questions. Different clustering algorithms, such as k-means, spherical k-means, single-linkage hierarchical clustering (SLHA), unweighted pair group method with arithmetic mean (UPGMA), expectation–maximization (EM), and clustering Arabic documents based on bond energy (CADBE), are tested. The effectiveness of a clustering algorithm is investigated with respect to certain factors, including number of clusters, text representation, similarity measure between answers and clusters, and similarity measure between answers and questions in a selected cluster. In addition, a comprehensive ranking system is introduced to evaluate the performance of clustering algorithms. Evaluation is performed using the Dataset of Arabic Why Question Answering System (DAWQAS) and the Multilingual Question Answering (MLQA) dataset. Results show that CADBE achieves the highest accuracy and the first rank, followed by SLHA and UPGMA, while spherical k-means has the lowest rank. The performance of clustering algorithms for MLQA dataset is affected by its characteristics, such as short questions, long and varied answers, and diverse subject domains. Unigram and bigram intersection measures perform well in most cases. Term frequency inverse document frequency representation outperforms word embedding in DAWQAS. Overall, the experiments provide insights into the performance of clustering algorithms in QA systems. •A clustering-based QA system groups related questions, selects answer via similarity.•Assigning Answers to Related Questions Using Various Similarity Methods.•Exploring certain factors to investigate effectiveness of clustering algorithm.•A comprehensive ranking system evaluates the performance of clustering algorithms.•CADBE achieves highest accuracy, then SLHA, UPGMA. Spherical k-means ranks lowest.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2023.122959