Lessons from the NLBSE 2024 Competition: Towards Building Efficient Models for GitHub Issue Classification

This paper presents the findings of our team's efforts during the "NLBSE 2024" competition, which centered on the multi-class classification of GitHub Issues. The challenge required models with strong few-shot learning capabilities to distinguish between 300 issues from five different...

Full description

Saved in:

Bibliographic Details
Published in:	2024 IEEE/ACM International Workshop on Natural Language-Based Software Engineering (NLBSE) pp. 45 - 48
Main Authors:	Gomez-Barrera, Daniel Fernando, Becerra, Luccas Rojas, Roncancio, Juan Pinzon, Almanza, David Ortiz, Arboleda, Juan, Linares-Vasquez, Mario, Manrique, Ruben Francisco
Format:	Conference Proceeding
Language:	English
Published:	ACM 20.04.2024
Subjects:	Buildings Computer architecture Conferences Data models Embedding Few-shot learning GitHub Issue Classification Measurement NLBSE 2024 Competition Task analysis Transformers
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This paper presents the findings of our team's efforts during the "NLBSE 2024" competition, which centered on the multi-class classification of GitHub Issues. The challenge required models with strong few-shot learning capabilities to distinguish between 300 issues from five different repositories. Our primary strategy involved improving embeddings by developing the Classification Few Fit Sentence Transformer (CFFitST), a strategy that fine-tunes embeddings from a base sentence transformer to suit the dataset. We also explored various hypotheses concerning the optimal combination of information input and classification models. As a result, we managed to achieve an average improvement of 2.44 \% over the SetFit baseline.
DOI:	10.1145/3643787.3648040