Harnessing the Power of Metadata for Enhanced Question Retrieval in Community Question Answering

Community Question Answering (CQA) forums such as Yahoo! Answers and Stack Overflow have become popular. The main goal of a CQA is to provide the most suitable answer in the shortest possible time. Since there is a reach archive of answered questions, similar question retrieval has received much att...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access Vol. 12; p. 1
Main Authors: Ghasemi, Shima, Shakery, Azadeh
Format: Journal Article
Language:English
Published: Piscataway IEEE 01.01.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:2169-3536, 2169-3536
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Community Question Answering (CQA) forums such as Yahoo! Answers and Stack Overflow have become popular. The main goal of a CQA is to provide the most suitable answer in the shortest possible time. Since there is a reach archive of answered questions, similar question retrieval has received much attention intending to answer questions immediately after asking. One of the main challenges in this task is the lexical gap between questions, which refers to the discrepancies between the terminologies used by users asking questions. In this paper, we use metadata and two transformer-based techniques to improve the translation-based language model as a traditional technique addressing the lexical gap in retrieval systems. To overcome the lexical gap problem, additional context and information about the questions can help. Metadata is a rich source of information that refers to supplementary data associated with each question. Subject, category, and answer are metadata used in this article. To leverage these metadata, two transformer-based methods are employed. First, to utilize category information, we build category-specific dictionaries to obtain more accurate translation probabilities. A BERT model predicts the categories of the questions. Second, to utilize answer information, we propose a question expansion technique. Expansion is done by a transformer-based model using a retrieval-augmented generation (RAG) model to generate answers and expand new questions with corresponding answers. Finally, candidate questions are ranked according to their similarity to the expanded new question. Our proposed method achieves 51.47 in terms of MAP, outperforming all state-of-the-art approaches in question retrieval.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2024.3395449