Lessons from the NLBSE 2024 Competition: Towards Building Efficient Models for GitHub Issue Classification

This paper presents the findings of our team's efforts during the "NLBSE 2024" competition, which centered on the multi-class classification of GitHub Issues. The challenge required models with strong few-shot learning capabilities to distinguish between 300 issues from five different...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:2024 IEEE/ACM International Workshop on Natural Language-Based Software Engineering (NLBSE) s. 45 - 48
Hlavní autori: Gomez-Barrera, Daniel Fernando, Becerra, Luccas Rojas, Roncancio, Juan Pinzon, Almanza, David Ortiz, Arboleda, Juan, Linares-Vasquez, Mario, Manrique, Ruben Francisco
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: ACM 20.04.2024
Predmet:
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:This paper presents the findings of our team's efforts during the "NLBSE 2024" competition, which centered on the multi-class classification of GitHub Issues. The challenge required models with strong few-shot learning capabilities to distinguish between 300 issues from five different repositories. Our primary strategy involved improving embeddings by developing the Classification Few Fit Sentence Transformer (CFFitST), a strategy that fine-tunes embeddings from a base sentence transformer to suit the dataset. We also explored various hypotheses concerning the optimal combination of information input and classification models. As a result, we managed to achieve an average improvement of 2.44 \% over the SetFit baseline.
DOI:10.1145/3643787.3648040