Lessons from the NLBSE 2024 Competition: Towards Building Efficient Models for GitHub Issue Classification

This paper presents the findings of our team's efforts during the "NLBSE 2024" competition, which centered on the multi-class classification of GitHub Issues. The challenge required models with strong few-shot learning capabilities to distinguish between 300 issues from five different...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2024 IEEE/ACM International Workshop on Natural Language-Based Software Engineering (NLBSE) s. 45 - 48
Hlavní autoři:	Gomez-Barrera, Daniel Fernando, Becerra, Luccas Rojas, Roncancio, Juan Pinzon, Almanza, David Ortiz, Arboleda, Juan, Linares-Vasquez, Mario, Manrique, Ruben Francisco
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	ACM 20.04.2024
Témata:	Buildings Computer architecture Conferences Data models Embedding Few-shot learning GitHub Issue Classification Measurement NLBSE 2024 Competition Task analysis Transformers
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Abstract	This paper presents the findings of our team's efforts during the "NLBSE 2024" competition, which centered on the multi-class classification of GitHub Issues. The challenge required models with strong few-shot learning capabilities to distinguish between 300 issues from five different repositories. Our primary strategy involved improving embeddings by developing the Classification Few Fit Sentence Transformer (CFFitST), a strategy that fine-tunes embeddings from a base sentence transformer to suit the dataset. We also explored various hypotheses concerning the optimal combination of information input and classification models. As a result, we managed to achieve an average improvement of 2.44 \% over the SetFit baseline.
AbstractList	This paper presents the findings of our team's efforts during the "NLBSE 2024" competition, which centered on the multi-class classification of GitHub Issues. The challenge required models with strong few-shot learning capabilities to distinguish between 300 issues from five different repositories. Our primary strategy involved improving embeddings by developing the Classification Few Fit Sentence Transformer (CFFitST), a strategy that fine-tunes embeddings from a base sentence transformer to suit the dataset. We also explored various hypotheses concerning the optimal combination of information input and classification models. As a result, we managed to achieve an average improvement of 2.44 \% over the SetFit baseline.
Author	Linares-Vasquez, Mario Becerra, Luccas Rojas Roncancio, Juan Pinzon Gomez-Barrera, Daniel Fernando Almanza, David Ortiz Arboleda, Juan Manrique, Ruben Francisco
Author_xml	– sequence: 1 givenname: Daniel Fernando surname: Gomez-Barrera fullname: Gomez-Barrera, Daniel Fernando organization: Universidad de los Andes,Bogotá,Colombia – sequence: 2 givenname: Luccas Rojas surname: Becerra fullname: Becerra, Luccas Rojas organization: Universidad de los Andes,Bogotá,Colombia – sequence: 3 givenname: Juan Pinzon surname: Roncancio fullname: Roncancio, Juan Pinzon organization: Universidad de los Andes,Bogotá,Colombia – sequence: 4 givenname: David Ortiz surname: Almanza fullname: Almanza, David Ortiz organization: Universidad de los Andes,Bogotá,Colombia – sequence: 5 givenname: Juan surname: Arboleda fullname: Arboleda, Juan organization: Universidad de los Andes,Bogotá,Colombia – sequence: 6 givenname: Mario surname: Linares-Vasquez fullname: Linares-Vasquez, Mario organization: Universidad de los Andes,Bogotá,Colombia – sequence: 7 givenname: Ruben Francisco surname: Manrique fullname: Manrique, Ruben Francisco organization: Universidad de los Andes,Bogotá,Colombia
BookMark	eNotjbFOwzAUAI0EElA6szD4BwrP8XPssNGotJUCDJS5sutnMErjKk6F-HuCYLrldHfJTrvUEWPXAm6FQHUnS5Ta6NuRBhBO2LTSlUEADUqXxTmb5hwdqBIrUZXVBftsKOfUZR76tOfDB_HnZv664AUUyOu0P9AQh5i6e75JX7b3mc-PsfWxe-eLEOIuUjfwp-SpHROp58s4rI6Or3M-Eq9bO-5Gy_4mrthZsG2m6T8n7O1xsalXs-Zlua4fmpkt0AwzJAfkvSqkMwqtQyAMlYXCWetK44N0cmdwFKwDQz44DSIQ0k4apwDkhN38dSMRbQ993Nv-eyugRC2kkj-fOVjI
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1145/3643787.3648040
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Xplore url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	9798400705762
EndPage	48
ExternalDocumentID	10647135
Genre	orig-research
GroupedDBID	6IE 6IL ACM ALMA_UNASSIGNED_HOLDINGS APO CBEJK LHSKQ RIE RIL
ID	FETCH-LOGICAL-a248t-4eb0edd523b854ab40e4f9a02baab68df3b3c84d52ab08edfb701fe4ec38b5003
IEDL.DBID	RIE
ISICitedReferencesCount	0
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001313494100008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Wed Aug 27 02:03:10 EDT 2025
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a248t-4eb0edd523b854ab40e4f9a02baab68df3b3c84d52ab08edfb701fe4ec38b5003
PageCount	4
ParticipantIDs	ieee_primary_10647135
PublicationCentury	2000
PublicationDate	2024-April-20
PublicationDateYYYYMMDD	2024-04-20
PublicationDate_xml	– month: 04 year: 2024 text: 2024-April-20 day: 20
PublicationDecade	2020
PublicationTitle	2024 IEEE/ACM International Workshop on Natural Language-Based Software Engineering (NLBSE)
PublicationTitleAbbrev	NLBSE
PublicationYear	2024
Publisher	ACM
Publisher_xml	– name: ACM
SSID	ssib056491969
Score	1.8882654
Snippet	This paper presents the findings of our team's efforts during the "NLBSE 2024" competition, which centered on the multi-class classification of GitHub Issues....
SourceID	ieee
SourceType	Publisher
StartPage	45
SubjectTerms	Buildings Computer architecture Conferences Data models Embedding Few-shot learning GitHub Issue Classification Measurement NLBSE 2024 Competition Task analysis Transformers
Title	Lessons from the NLBSE 2024 Competition: Towards Building Efficient Models for GitHub Issue Classification
URI	https://ieeexplore.ieee.org/document/10647135
WOSCitedRecordID	wos001313494100008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEF5s8eBJxYpv5uA1dZPdZDceW1p7KKVghd7KPiZQkVba1N_v7DY-Lh48JYSQhXlkdnbmm4-x-5yjVKIqE5uZKpEqd4n2BTmezbRLhSS7iiPzx2oy0fN5OW3A6hELg4ix-Qy74TbW8v3a7cJRGXl4IQOlXIu1lCr2YK0v48kLWYZRL834nlTmDyIUpbTq0lXzcLrxiz8lho_h8T8XPmGdHyAeTL9DzCk7wNUZex3T34mMBQI0BGgDB5Nx73kAGQVC6Md9cOzDeoRZ7IndQq_hvoZBHBhBa0HgQHujT6w38LSsRzsLkYcPIklmaB-KGuuwl-Fg1h8lDWVCYjKp60Si5eg9ZZdW59JYSaqoSsMza4wttK-EFU5LesFYrtFXVvG0QolOaJuTh5-z9mq9wgsGxqExxllXeiSVeYtCO6R0kjKMNOPqknWCoBbv-6kYiy8ZXf3x_JodBTmESkzGb1i73uzwlh26j3q53dxFXX4C_pOhbA
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEF60CnpSseLbPXhN3Ww2ycZjS2vFGApW6K3sYwIVSaVN_f3OblP14sFTQghZmEdmZ2e--Qi5jRmINCqzQHNVBiKNTSBtgo6nuTRhJNCu_Mj8PC0KOZlkowas7rEwAOCbz6Djbn0t387Nyh2VoYcnwlHKbZOdWAjO1nCtjfnEicjcsJdmgE8o4rvIlaVk2sGrZO584xeDig8gg4N_Ln1I2j9QPDr6DjJHZAuqY_KW4_8JzYU6cAjFLRwt8u5Ln3IMhbTnd8K-E-uejn1X7JJ2G_Zr2vcjI3At6ljQ3vET8wV9mNXDlaaeiY96mkzXQOR11iavg_64Nwwa0oRAcSHrQIBmYC3ml1rGQmmByigzxbhWSifSlpGOjBT4gtJMgi11ysISBJhI6hh9_IS0qnkFp4QqA0opo01mAZVmNUTSACaUmGOEnKVnpO0ENf1Yz8WYbmR0_sfzG7I3HD_n0_yxeLog-04mri7D2SVp1YsVXJFd81nPlotrr9cvkKyksw
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+IEEE%2FACM+International+Workshop+on+Natural+Language-Based+Software+Engineering+%28NLBSE%29&rft.atitle=Lessons+from+the+NLBSE+2024+Competition%3A+Towards+Building+Efficient+Models+for+GitHub+Issue+Classification&rft.au=Gomez-Barrera%2C+Daniel+Fernando&rft.au=Becerra%2C+Luccas+Rojas&rft.au=Roncancio%2C+Juan+Pinzon&rft.au=Almanza%2C+David+Ortiz&rft.date=2024-04-20&rft.pub=ACM&rft.spage=45&rft.epage=48&rft_id=info:doi/10.1145%2F3643787.3648040&rft.externalDocID=10647135