Lessons from the NLBSE 2024 Competition: Towards Building Efficient Models for GitHub Issue Classification
This paper presents the findings of our team's efforts during the "NLBSE 2024" competition, which centered on the multi-class classification of GitHub Issues. The challenge required models with strong few-shot learning capabilities to distinguish between 300 issues from five different...
Saved in:
| Published in: | 2024 IEEE/ACM International Workshop on Natural Language-Based Software Engineering (NLBSE) pp. 45 - 48 |
|---|---|
| Main Authors: | , , , , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
ACM
20.04.2024
|
| Subjects: | |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | This paper presents the findings of our team's efforts during the "NLBSE 2024" competition, which centered on the multi-class classification of GitHub Issues. The challenge required models with strong few-shot learning capabilities to distinguish between 300 issues from five different repositories. Our primary strategy involved improving embeddings by developing the Classification Few Fit Sentence Transformer (CFFitST), a strategy that fine-tunes embeddings from a base sentence transformer to suit the dataset. We also explored various hypotheses concerning the optimal combination of information input and classification models. As a result, we managed to achieve an average improvement of 2.44 \% over the SetFit baseline. |
|---|---|
| AbstractList | This paper presents the findings of our team's efforts during the "NLBSE 2024" competition, which centered on the multi-class classification of GitHub Issues. The challenge required models with strong few-shot learning capabilities to distinguish between 300 issues from five different repositories. Our primary strategy involved improving embeddings by developing the Classification Few Fit Sentence Transformer (CFFitST), a strategy that fine-tunes embeddings from a base sentence transformer to suit the dataset. We also explored various hypotheses concerning the optimal combination of information input and classification models. As a result, we managed to achieve an average improvement of 2.44 \% over the SetFit baseline. |
| Author | Linares-Vasquez, Mario Becerra, Luccas Rojas Roncancio, Juan Pinzon Gomez-Barrera, Daniel Fernando Almanza, David Ortiz Arboleda, Juan Manrique, Ruben Francisco |
| Author_xml | – sequence: 1 givenname: Daniel Fernando surname: Gomez-Barrera fullname: Gomez-Barrera, Daniel Fernando organization: Universidad de los Andes,Bogotá,Colombia – sequence: 2 givenname: Luccas Rojas surname: Becerra fullname: Becerra, Luccas Rojas organization: Universidad de los Andes,Bogotá,Colombia – sequence: 3 givenname: Juan Pinzon surname: Roncancio fullname: Roncancio, Juan Pinzon organization: Universidad de los Andes,Bogotá,Colombia – sequence: 4 givenname: David Ortiz surname: Almanza fullname: Almanza, David Ortiz organization: Universidad de los Andes,Bogotá,Colombia – sequence: 5 givenname: Juan surname: Arboleda fullname: Arboleda, Juan organization: Universidad de los Andes,Bogotá,Colombia – sequence: 6 givenname: Mario surname: Linares-Vasquez fullname: Linares-Vasquez, Mario organization: Universidad de los Andes,Bogotá,Colombia – sequence: 7 givenname: Ruben Francisco surname: Manrique fullname: Manrique, Ruben Francisco organization: Universidad de los Andes,Bogotá,Colombia |
| BookMark | eNotjbFOwzAUAI0EElA6szD4BwrP8XPssNGotJUCDJS5sutnMErjKk6F-HuCYLrldHfJTrvUEWPXAm6FQHUnS5Ta6NuRBhBO2LTSlUEADUqXxTmb5hwdqBIrUZXVBftsKOfUZR76tOfDB_HnZv664AUUyOu0P9AQh5i6e75JX7b3mc-PsfWxe-eLEOIuUjfwp-SpHROp58s4rI6Or3M-Eq9bO-5Gy_4mrthZsG2m6T8n7O1xsalXs-Zlua4fmpkt0AwzJAfkvSqkMwqtQyAMlYXCWetK44N0cmdwFKwDQz44DSIQ0k4apwDkhN38dSMRbQ993Nv-eyugRC2kkj-fOVjI |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1145/3643787.3648040 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore Digital Library IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9798400705762 |
| EndPage | 48 |
| ExternalDocumentID | 10647135 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL ACM ALMA_UNASSIGNED_HOLDINGS APO CBEJK LHSKQ RIE RIL |
| ID | FETCH-LOGICAL-a248t-4eb0edd523b854ab40e4f9a02baab68df3b3c84d52ab08edfb701fe4ec38b5003 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001313494100008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:03:10 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a248t-4eb0edd523b854ab40e4f9a02baab68df3b3c84d52ab08edfb701fe4ec38b5003 |
| PageCount | 4 |
| ParticipantIDs | ieee_primary_10647135 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-April-20 |
| PublicationDateYYYYMMDD | 2024-04-20 |
| PublicationDate_xml | – month: 04 year: 2024 text: 2024-April-20 day: 20 |
| PublicationDecade | 2020 |
| PublicationTitle | 2024 IEEE/ACM International Workshop on Natural Language-Based Software Engineering (NLBSE) |
| PublicationTitleAbbrev | NLBSE |
| PublicationYear | 2024 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssib056491969 |
| Score | 1.8883711 |
| Snippet | This paper presents the findings of our team's efforts during the "NLBSE 2024" competition, which centered on the multi-class classification of GitHub Issues.... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 45 |
| SubjectTerms | Buildings Computer architecture Conferences Data models Embedding Few-shot learning GitHub Issue Classification Measurement NLBSE 2024 Competition Task analysis Transformers |
| Title | Lessons from the NLBSE 2024 Competition: Towards Building Efficient Models for GitHub Issue Classification |
| URI | https://ieeexplore.ieee.org/document/10647135 |
| WOSCitedRecordID | wos001313494100008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09b8IwELUK6tCprUrVb3noampiO3E6gqAMCCGVSmzIF18kqgoqCP39nE1ou3TolCiKbOnOl3N8995j7DH3ZWFSVQhTIAidJVo48CA8hKKfVTbzkV1_lI3HdjbLJzVYPWJhEDE2n2E73MZavl8V23BURhGe6iAp12CNLEv3YK3D4jGpzgPVS03f09HmSYWilM3adLUynG780k-J6WNw-s-Jz1jrB4jHJ98p5pwd4fKCvY_o60SLhQdoCKcNHB-Puq99nlAi5L24D459WM98GntiN7xba1_zfiSMoLl40ED7oCFWa_6yqIZb4FGHj0eRzNA-FD3WYm-D_rQ3FLVkgnCJtpXQCBK9p79LsEY70BJ1mTuZgHOQWl8qUIXV9IIDadGXkMlOiRoLZcFQhF-y5nK1xCvGyYs5OdgjjUI-LHOjlTVIEW9ycNJcs1Yw1Pxzz4oxP9jo5o_nt-wk2CFUYhJ5x5rVeov37Lj4qhab9UP05Q7jOaHt |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fS8MwEA46BX1SceJv8-BrZ9YmberjxubEWgZO2NvINVeYyCZb59_vJevUFx98aiklgbtcL83d932M3aa2LFQcFYEqEAKZhDIwYCGw4Ip-OtKJ9ez6WZLnejxOhzVY3WNhENE3n2HL3fpavp0XK3dURhEeSycpt812lJShWMO1NstHxTJ1ZC81gU9bqrvIlaV00qKrFu5845eCik8g_YN_Tn3Imj9QPD78TjJHbAtnx-wto-8TLRfuwCGctnA8zzovPR5SKuRdvxP2nVj3fOS7Ype8U6tf856njKC5uFNBe6ch5gv-MK0GK-BeiY97mUzXQOR91mSv_d6oOwhq0YTAhFJXgUQQaC39X4JW0oAUKMvUiBCMgVjbMoKo0JJeMCA02hIS0S5RYhFpUBTjJ6wxm8_wlHHyY0outkijkBfLVMlIK6SYVykYoc5Y0xlq8rHmxZhsbHT-x_MbtjcYPWeT7DF_umD7ziauLhOKS9aoFiu8YrvFZzVdLq69X78AfVKlNA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+IEEE%2FACM+International+Workshop+on+Natural+Language-Based+Software+Engineering+%28NLBSE%29&rft.atitle=Lessons+from+the+NLBSE+2024+Competition%3A+Towards+Building+Efficient+Models+for+GitHub+Issue+Classification&rft.au=Gomez-Barrera%2C+Daniel+Fernando&rft.au=Becerra%2C+Luccas+Rojas&rft.au=Roncancio%2C+Juan+Pinzon&rft.au=Almanza%2C+David+Ortiz&rft.date=2024-04-20&rft.pub=ACM&rft.spage=45&rft.epage=48&rft_id=info:doi/10.1145%2F3643787.3648040&rft.externalDocID=10647135 |