Lessons from the NLBSE 2024 Competition: Towards Building Efficient Models for GitHub Issue Classification
This paper presents the findings of our team's efforts during the "NLBSE 2024" competition, which centered on the multi-class classification of GitHub Issues. The challenge required models with strong few-shot learning capabilities to distinguish between 300 issues from five different...
Uloženo v:
| Vydáno v: | 2024 IEEE/ACM International Workshop on Natural Language-Based Software Engineering (NLBSE) s. 45 - 48 |
|---|---|
| Hlavní autoři: | , , , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
ACM
20.04.2024
|
| Témata: | |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | This paper presents the findings of our team's efforts during the "NLBSE 2024" competition, which centered on the multi-class classification of GitHub Issues. The challenge required models with strong few-shot learning capabilities to distinguish between 300 issues from five different repositories. Our primary strategy involved improving embeddings by developing the Classification Few Fit Sentence Transformer (CFFitST), a strategy that fine-tunes embeddings from a base sentence transformer to suit the dataset. We also explored various hypotheses concerning the optimal combination of information input and classification models. As a result, we managed to achieve an average improvement of 2.44 \% over the SetFit baseline. |
|---|---|
| DOI: | 10.1145/3643787.3648040 |