LiSSA: Toward Generic Traceability Link Recovery Through Retrieval-Augmented Generation

There are a multitude of software artifacts which need to be handled during the development and maintenance of a software system. These artifacts interrelate in multiple, complex ways. Therefore, many software engineering tasks are enabled - and even empowered - by a clear understanding of artifact...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings / International Conference on Software Engineering s. 1396 - 1408
Hlavní autoři:	FuchB, Dominik, Hey, Tobias, Keim, Jan, Liu, Haoyu, Ewald, Niklas, Thirolf, Tobias, Koziolek, Anne
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 26.04.2025
Témata:	Codes Computer architecture Documentation Large language models Maintenance Replicability Retrieval augmented generation Software engineering Software systems Source coding traceability link recovery
ISSN:	1558-1225
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	There are a multitude of software artifacts which need to be handled during the development and maintenance of a software system. These artifacts interrelate in multiple, complex ways. Therefore, many software engineering tasks are enabled - and even empowered - by a clear understanding of artifact interrelationships and also by the continued advancement of techniques for automated artifact linking. However, current approaches in automatic Traceability Link Recovery (TLR) target mostly the links between specific sets of artifacts, such as those between requirements and code. Fortu-nately, recent advancements in Large Language Models (LLMs) can enable TLR approaches to achieve broad applicability. Still, it is a nontrivial problem how to provide the LLMs with the specific information needed to perform TLR. In this paper, we present LiSSA, a framework that har-nesses LLM performance and enhances them through Retrieval-Augmented Generation (RAG). We empirically evaluate LiSSA on three different TLR tasks, requirements to code, documentation to code, and architecture documentation to architecture models, and we compare our approach to state-of-the-art approaches. Our results show that the RAG-based approach can signifi-cantly outperform the state-of-the-art on the code-related tasks. However, further research is required to improve the performance of RAG-based approaches to be applicable in practice.
ISSN:	1558-1225
DOI:	10.1109/ICSE55347.2025.00186