Lessons from the NLBSE 2024 Competition: Towards Building Efficient Models for GitHub Issue Classification

This paper presents the findings of our team's efforts during the "NLBSE 2024" competition, which centered on the multi-class classification of GitHub Issues. The challenge required models with strong few-shot learning capabilities to distinguish between 300 issues from five different...

Full description

Saved in:

Bibliographic Details
Published in:	2024 IEEE/ACM International Workshop on Natural Language-Based Software Engineering (NLBSE) pp. 45 - 48
Main Authors:	Gomez-Barrera, Daniel Fernando, Becerra, Luccas Rojas, Roncancio, Juan Pinzon, Almanza, David Ortiz, Arboleda, Juan, Linares-Vasquez, Mario, Manrique, Ruben Francisco
Format:	Conference Proceeding
Language:	English
Published:	ACM 20.04.2024
Subjects:	Buildings Computer architecture Conferences Data models Embedding Few-shot learning GitHub Issue Classification Measurement NLBSE 2024 Competition Task analysis Transformers
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Abstract	This paper presents the findings of our team's efforts during the "NLBSE 2024" competition, which centered on the multi-class classification of GitHub Issues. The challenge required models with strong few-shot learning capabilities to distinguish between 300 issues from five different repositories. Our primary strategy involved improving embeddings by developing the Classification Few Fit Sentence Transformer (CFFitST), a strategy that fine-tunes embeddings from a base sentence transformer to suit the dataset. We also explored various hypotheses concerning the optimal combination of information input and classification models. As a result, we managed to achieve an average improvement of 2.44 \% over the SetFit baseline.
AbstractList	This paper presents the findings of our team's efforts during the "NLBSE 2024" competition, which centered on the multi-class classification of GitHub Issues. The challenge required models with strong few-shot learning capabilities to distinguish between 300 issues from five different repositories. Our primary strategy involved improving embeddings by developing the Classification Few Fit Sentence Transformer (CFFitST), a strategy that fine-tunes embeddings from a base sentence transformer to suit the dataset. We also explored various hypotheses concerning the optimal combination of information input and classification models. As a result, we managed to achieve an average improvement of 2.44 \% over the SetFit baseline.
Author	Linares-Vasquez, Mario Becerra, Luccas Rojas Roncancio, Juan Pinzon Gomez-Barrera, Daniel Fernando Almanza, David Ortiz Arboleda, Juan Manrique, Ruben Francisco
Author_xml	– sequence: 1 givenname: Daniel Fernando surname: Gomez-Barrera fullname: Gomez-Barrera, Daniel Fernando organization: Universidad de los Andes,Bogotá,Colombia – sequence: 2 givenname: Luccas Rojas surname: Becerra fullname: Becerra, Luccas Rojas organization: Universidad de los Andes,Bogotá,Colombia – sequence: 3 givenname: Juan Pinzon surname: Roncancio fullname: Roncancio, Juan Pinzon organization: Universidad de los Andes,Bogotá,Colombia – sequence: 4 givenname: David Ortiz surname: Almanza fullname: Almanza, David Ortiz organization: Universidad de los Andes,Bogotá,Colombia – sequence: 5 givenname: Juan surname: Arboleda fullname: Arboleda, Juan organization: Universidad de los Andes,Bogotá,Colombia – sequence: 6 givenname: Mario surname: Linares-Vasquez fullname: Linares-Vasquez, Mario organization: Universidad de los Andes,Bogotá,Colombia – sequence: 7 givenname: Ruben Francisco surname: Manrique fullname: Manrique, Ruben Francisco organization: Universidad de los Andes,Bogotá,Colombia
BookMark	eNotjbFOwzAUAI0EElA6szD4BwrP8XPssNGotJUCDJS5sutnMErjKk6F-HuCYLrldHfJTrvUEWPXAm6FQHUnS5Ta6NuRBhBO2LTSlUEADUqXxTmb5hwdqBIrUZXVBftsKOfUZR76tOfDB_HnZv664AUUyOu0P9AQh5i6e75JX7b3mc-PsfWxe-eLEOIuUjfwp-SpHROp58s4rI6Or3M-Eq9bO-5Gy_4mrthZsG2m6T8n7O1xsalXs-Zlua4fmpkt0AwzJAfkvSqkMwqtQyAMlYXCWetK44N0cmdwFKwDQz44DSIQ0k4apwDkhN38dSMRbQ993Nv-eyugRC2kkj-fOVjI
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1145/3643787.3648040
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore Digital Library IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE/IET Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	9798400705762
EndPage	48
ExternalDocumentID	10647135
Genre	orig-research
GroupedDBID	6IE 6IL ACM ALMA_UNASSIGNED_HOLDINGS APO CBEJK LHSKQ RIE RIL
ID	FETCH-LOGICAL-a248t-4eb0edd523b854ab40e4f9a02baab68df3b3c84d52ab08edfb701fe4ec38b5003
IEDL.DBID	RIE
ISICitedReferencesCount	0
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001313494100008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Wed Aug 27 02:03:10 EDT 2025
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a248t-4eb0edd523b854ab40e4f9a02baab68df3b3c84d52ab08edfb701fe4ec38b5003
PageCount	4
ParticipantIDs	ieee_primary_10647135
PublicationCentury	2000
PublicationDate	2024-April-20
PublicationDateYYYYMMDD	2024-04-20
PublicationDate_xml	– month: 04 year: 2024 text: 2024-April-20 day: 20
PublicationDecade	2020
PublicationTitle	2024 IEEE/ACM International Workshop on Natural Language-Based Software Engineering (NLBSE)
PublicationTitleAbbrev	NLBSE
PublicationYear	2024
Publisher	ACM
Publisher_xml	– name: ACM
SSID	ssib056491969
Score	1.8883711
Snippet	This paper presents the findings of our team's efforts during the "NLBSE 2024" competition, which centered on the multi-class classification of GitHub Issues....
SourceID	ieee
SourceType	Publisher
StartPage	45
SubjectTerms	Buildings Computer architecture Conferences Data models Embedding Few-shot learning GitHub Issue Classification Measurement NLBSE 2024 Competition Task analysis Transformers
Title	Lessons from the NLBSE 2024 Competition: Towards Building Efficient Models for GitHub Issue Classification
URI	https://ieeexplore.ieee.org/document/10647135
WOSCitedRecordID	wos001313494100008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09b8IwELUK6tCprUrVb3noampiO3E6gqAMCCGVSmzIF18kqgoqCP39nE1ou3TolCiKbOnOl3N8995j7DH3ZWFSVQhTIAidJVo48CA8hKKfVTbzkV1_lI3HdjbLJzVYPWJhEDE2n2E73MZavl8V23BURhGe6iAp12CNLEv3YK3D4jGpzgPVS03f09HmSYWilM3adLUynG780k-J6WNw-s-Jz1jrB4jHJ98p5pwd4fKCvY_o60SLhQdoCKcNHB-Puq99nlAi5L24D459WM98GntiN7xba1_zfiSMoLl40ED7oCFWa_6yqIZb4FGHj0eRzNA-FD3WYm-D_rQ3FLVkgnCJtpXQCBK9p79LsEY70BJ1mTuZgHOQWl8qUIXV9IIDadGXkMlOiRoLZcFQhF-y5nK1xCvGyYs5OdgjjUI-LHOjlTVIEW9ycNJcs1Yw1Pxzz4oxP9jo5o_nt-wk2CFUYhJ5x5rVeov37Lj4qhab9UP05Q7jOaHt
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fS8MwEA46BX1SceJv8-BrZ9YmberjxubEWgZO2NvINVeYyCZb59_vJevUFx98aiklgbtcL83d932M3aa2LFQcFYEqEAKZhDIwYCGw4Ip-OtKJ9ez6WZLnejxOhzVY3WNhENE3n2HL3fpavp0XK3dURhEeSycpt812lJShWMO1NstHxTJ1ZC81gU9bqrvIlaV00qKrFu5845eCik8g_YN_Tn3Imj9QPD78TjJHbAtnx-wto-8TLRfuwCGctnA8zzovPR5SKuRdvxP2nVj3fOS7Ype8U6tf856njKC5uFNBe6ch5gv-MK0GK-BeiY97mUzXQOR91mSv_d6oOwhq0YTAhFJXgUQQaC39X4JW0oAUKMvUiBCMgVjbMoKo0JJeMCA02hIS0S5RYhFpUBTjJ6wxm8_wlHHyY0outkijkBfLVMlIK6SYVykYoc5Y0xlq8rHmxZhsbHT-x_MbtjcYPWeT7DF_umD7ziauLhOKS9aoFiu8YrvFZzVdLq69X78AfVKlNA
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+IEEE%2FACM+International+Workshop+on+Natural+Language-Based+Software+Engineering+%28NLBSE%29&rft.atitle=Lessons+from+the+NLBSE+2024+Competition%3A+Towards+Building+Efficient+Models+for+GitHub+Issue+Classification&rft.au=Gomez-Barrera%2C+Daniel+Fernando&rft.au=Becerra%2C+Luccas+Rojas&rft.au=Roncancio%2C+Juan+Pinzon&rft.au=Almanza%2C+David+Ortiz&rft.date=2024-04-20&rft.pub=ACM&rft.spage=45&rft.epage=48&rft_id=info:doi/10.1145%2F3643787.3648040&rft.externalDocID=10647135