Lessons from the NLBSE 2024 Competition: Towards Building Efficient Models for GitHub Issue Classification

This paper presents the findings of our team's efforts during the "NLBSE 2024" competition, which centered on the multi-class classification of GitHub Issues. The challenge required models with strong few-shot learning capabilities to distinguish between 300 issues from five different...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2024 IEEE/ACM International Workshop on Natural Language-Based Software Engineering (NLBSE) s. 45 - 48
Hlavní autoři: Gomez-Barrera, Daniel Fernando, Becerra, Luccas Rojas, Roncancio, Juan Pinzon, Almanza, David Ortiz, Arboleda, Juan, Linares-Vasquez, Mario, Manrique, Ruben Francisco
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: ACM 20.04.2024
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:This paper presents the findings of our team's efforts during the "NLBSE 2024" competition, which centered on the multi-class classification of GitHub Issues. The challenge required models with strong few-shot learning capabilities to distinguish between 300 issues from five different repositories. Our primary strategy involved improving embeddings by developing the Classification Few Fit Sentence Transformer (CFFitST), a strategy that fine-tunes embeddings from a base sentence transformer to suit the dataset. We also explored various hypotheses concerning the optimal combination of information input and classification models. As a result, we managed to achieve an average improvement of 2.44 \% over the SetFit baseline.
DOI:10.1145/3643787.3648040