Lessons from the NLBSE 2024 Competition: Towards Building Efficient Models for GitHub Issue Classification

This paper presents the findings of our team's efforts during the "NLBSE 2024" competition, which centered on the multi-class classification of GitHub Issues. The challenge required models with strong few-shot learning capabilities to distinguish between 300 issues from five different...

Full description

Saved in:
Bibliographic Details
Published in:2024 IEEE/ACM International Workshop on Natural Language-Based Software Engineering (NLBSE) pp. 45 - 48
Main Authors: Gomez-Barrera, Daniel Fernando, Becerra, Luccas Rojas, Roncancio, Juan Pinzon, Almanza, David Ortiz, Arboleda, Juan, Linares-Vasquez, Mario, Manrique, Ruben Francisco
Format: Conference Proceeding
Language:English
Published: ACM 20.04.2024
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract This paper presents the findings of our team's efforts during the "NLBSE 2024" competition, which centered on the multi-class classification of GitHub Issues. The challenge required models with strong few-shot learning capabilities to distinguish between 300 issues from five different repositories. Our primary strategy involved improving embeddings by developing the Classification Few Fit Sentence Transformer (CFFitST), a strategy that fine-tunes embeddings from a base sentence transformer to suit the dataset. We also explored various hypotheses concerning the optimal combination of information input and classification models. As a result, we managed to achieve an average improvement of 2.44 \% over the SetFit baseline.
AbstractList This paper presents the findings of our team's efforts during the "NLBSE 2024" competition, which centered on the multi-class classification of GitHub Issues. The challenge required models with strong few-shot learning capabilities to distinguish between 300 issues from five different repositories. Our primary strategy involved improving embeddings by developing the Classification Few Fit Sentence Transformer (CFFitST), a strategy that fine-tunes embeddings from a base sentence transformer to suit the dataset. We also explored various hypotheses concerning the optimal combination of information input and classification models. As a result, we managed to achieve an average improvement of 2.44 \% over the SetFit baseline.
Author Linares-Vasquez, Mario
Becerra, Luccas Rojas
Roncancio, Juan Pinzon
Gomez-Barrera, Daniel Fernando
Almanza, David Ortiz
Arboleda, Juan
Manrique, Ruben Francisco
Author_xml – sequence: 1
  givenname: Daniel Fernando
  surname: Gomez-Barrera
  fullname: Gomez-Barrera, Daniel Fernando
  organization: Universidad de los Andes,Bogotá,Colombia
– sequence: 2
  givenname: Luccas Rojas
  surname: Becerra
  fullname: Becerra, Luccas Rojas
  organization: Universidad de los Andes,Bogotá,Colombia
– sequence: 3
  givenname: Juan Pinzon
  surname: Roncancio
  fullname: Roncancio, Juan Pinzon
  organization: Universidad de los Andes,Bogotá,Colombia
– sequence: 4
  givenname: David Ortiz
  surname: Almanza
  fullname: Almanza, David Ortiz
  organization: Universidad de los Andes,Bogotá,Colombia
– sequence: 5
  givenname: Juan
  surname: Arboleda
  fullname: Arboleda, Juan
  organization: Universidad de los Andes,Bogotá,Colombia
– sequence: 6
  givenname: Mario
  surname: Linares-Vasquez
  fullname: Linares-Vasquez, Mario
  organization: Universidad de los Andes,Bogotá,Colombia
– sequence: 7
  givenname: Ruben Francisco
  surname: Manrique
  fullname: Manrique, Ruben Francisco
  organization: Universidad de los Andes,Bogotá,Colombia
BookMark eNotjbFOwzAUAI0EElA6szD4BwrP8XPssNGotJUCDJS5sutnMErjKk6F-HuCYLrldHfJTrvUEWPXAm6FQHUnS5Ta6NuRBhBO2LTSlUEADUqXxTmb5hwdqBIrUZXVBftsKOfUZR76tOfDB_HnZv664AUUyOu0P9AQh5i6e75JX7b3mc-PsfWxe-eLEOIuUjfwp-SpHROp58s4rI6Or3M-Eq9bO-5Gy_4mrthZsG2m6T8n7O1xsalXs-Zlua4fmpkt0AwzJAfkvSqkMwqtQyAMlYXCWetK44N0cmdwFKwDQz44DSIQ0k4apwDkhN38dSMRbQ993Nv-eyugRC2kkj-fOVjI
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/3643787.3648040
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore Digital Library
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798400705762
EndPage 48
ExternalDocumentID 10647135
Genre orig-research
GroupedDBID 6IE
6IL
ACM
ALMA_UNASSIGNED_HOLDINGS
APO
CBEJK
LHSKQ
RIE
RIL
ID FETCH-LOGICAL-a248t-4eb0edd523b854ab40e4f9a02baab68df3b3c84d52ab08edfb701fe4ec38b5003
IEDL.DBID RIE
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001313494100008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:03:10 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a248t-4eb0edd523b854ab40e4f9a02baab68df3b3c84d52ab08edfb701fe4ec38b5003
PageCount 4
ParticipantIDs ieee_primary_10647135
PublicationCentury 2000
PublicationDate 2024-April-20
PublicationDateYYYYMMDD 2024-04-20
PublicationDate_xml – month: 04
  year: 2024
  text: 2024-April-20
  day: 20
PublicationDecade 2020
PublicationTitle 2024 IEEE/ACM International Workshop on Natural Language-Based Software Engineering (NLBSE)
PublicationTitleAbbrev NLBSE
PublicationYear 2024
Publisher ACM
Publisher_xml – name: ACM
SSID ssib056491969
Score 1.8883711
Snippet This paper presents the findings of our team's efforts during the "NLBSE 2024" competition, which centered on the multi-class classification of GitHub Issues....
SourceID ieee
SourceType Publisher
StartPage 45
SubjectTerms Buildings
Computer architecture
Conferences
Data models
Embedding
Few-shot learning
GitHub Issue Classification
Measurement
NLBSE 2024 Competition
Task analysis
Transformers
Title Lessons from the NLBSE 2024 Competition: Towards Building Efficient Models for GitHub Issue Classification
URI https://ieeexplore.ieee.org/document/10647135
WOSCitedRecordID wos001313494100008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09b8IwELUK6tCprUrVb3noampiO3E6gqAMCCGVSmzIF18kqgoqCP39nE1ou3TolCiKbOnOl3N8995j7DH3ZWFSVQhTIAidJVo48CA8hKKfVTbzkV1_lI3HdjbLJzVYPWJhEDE2n2E73MZavl8V23BURhGe6iAp12CNLEv3YK3D4jGpzgPVS03f09HmSYWilM3adLUynG780k-J6WNw-s-Jz1jrB4jHJ98p5pwd4fKCvY_o60SLhQdoCKcNHB-Puq99nlAi5L24D459WM98GntiN7xba1_zfiSMoLl40ED7oCFWa_6yqIZb4FGHj0eRzNA-FD3WYm-D_rQ3FLVkgnCJtpXQCBK9p79LsEY70BJ1mTuZgHOQWl8qUIXV9IIDadGXkMlOiRoLZcFQhF-y5nK1xCvGyYs5OdgjjUI-LHOjlTVIEW9ycNJcs1Yw1Pxzz4oxP9jo5o_nt-wk2CFUYhJ5x5rVeov37Lj4qhab9UP05Q7jOaHt
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fS8MwEA46BX1SceJv8-BrZ9YmberjxubEWgZO2NvINVeYyCZb59_vJevUFx98aiklgbtcL83d932M3aa2LFQcFYEqEAKZhDIwYCGw4Ip-OtKJ9ez6WZLnejxOhzVY3WNhENE3n2HL3fpavp0XK3dURhEeSycpt812lJShWMO1NstHxTJ1ZC81gU9bqrvIlaV00qKrFu5845eCik8g_YN_Tn3Imj9QPD78TjJHbAtnx-wto-8TLRfuwCGctnA8zzovPR5SKuRdvxP2nVj3fOS7Ype8U6tf856njKC5uFNBe6ch5gv-MK0GK-BeiY97mUzXQOR91mSv_d6oOwhq0YTAhFJXgUQQaC39X4JW0oAUKMvUiBCMgVjbMoKo0JJeMCA02hIS0S5RYhFpUBTjJ6wxm8_wlHHyY0outkijkBfLVMlIK6SYVykYoc5Y0xlq8rHmxZhsbHT-x_MbtjcYPWeT7DF_umD7ziauLhOKS9aoFiu8YrvFZzVdLq69X78AfVKlNA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+IEEE%2FACM+International+Workshop+on+Natural+Language-Based+Software+Engineering+%28NLBSE%29&rft.atitle=Lessons+from+the+NLBSE+2024+Competition%3A+Towards+Building+Efficient+Models+for+GitHub+Issue+Classification&rft.au=Gomez-Barrera%2C+Daniel+Fernando&rft.au=Becerra%2C+Luccas+Rojas&rft.au=Roncancio%2C+Juan+Pinzon&rft.au=Almanza%2C+David+Ortiz&rft.date=2024-04-20&rft.pub=ACM&rft.spage=45&rft.epage=48&rft_id=info:doi/10.1145%2F3643787.3648040&rft.externalDocID=10647135