Lessons from the NLBSE 2024 Competition: Towards Building Efficient Models for GitHub Issue Classification

This paper presents the findings of our team's efforts during the "NLBSE 2024" competition, which centered on the multi-class classification of GitHub Issues. The challenge required models with strong few-shot learning capabilities to distinguish between 300 issues from five different...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2024 IEEE/ACM International Workshop on Natural Language-Based Software Engineering (NLBSE) s. 45 - 48
Hlavní autoři: Gomez-Barrera, Daniel Fernando, Becerra, Luccas Rojas, Roncancio, Juan Pinzon, Almanza, David Ortiz, Arboleda, Juan, Linares-Vasquez, Mario, Manrique, Ruben Francisco
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: ACM 20.04.2024
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract This paper presents the findings of our team's efforts during the "NLBSE 2024" competition, which centered on the multi-class classification of GitHub Issues. The challenge required models with strong few-shot learning capabilities to distinguish between 300 issues from five different repositories. Our primary strategy involved improving embeddings by developing the Classification Few Fit Sentence Transformer (CFFitST), a strategy that fine-tunes embeddings from a base sentence transformer to suit the dataset. We also explored various hypotheses concerning the optimal combination of information input and classification models. As a result, we managed to achieve an average improvement of 2.44 \% over the SetFit baseline.
AbstractList This paper presents the findings of our team's efforts during the "NLBSE 2024" competition, which centered on the multi-class classification of GitHub Issues. The challenge required models with strong few-shot learning capabilities to distinguish between 300 issues from five different repositories. Our primary strategy involved improving embeddings by developing the Classification Few Fit Sentence Transformer (CFFitST), a strategy that fine-tunes embeddings from a base sentence transformer to suit the dataset. We also explored various hypotheses concerning the optimal combination of information input and classification models. As a result, we managed to achieve an average improvement of 2.44 \% over the SetFit baseline.
Author Linares-Vasquez, Mario
Becerra, Luccas Rojas
Roncancio, Juan Pinzon
Gomez-Barrera, Daniel Fernando
Almanza, David Ortiz
Arboleda, Juan
Manrique, Ruben Francisco
Author_xml – sequence: 1
  givenname: Daniel Fernando
  surname: Gomez-Barrera
  fullname: Gomez-Barrera, Daniel Fernando
  organization: Universidad de los Andes,Bogotá,Colombia
– sequence: 2
  givenname: Luccas Rojas
  surname: Becerra
  fullname: Becerra, Luccas Rojas
  organization: Universidad de los Andes,Bogotá,Colombia
– sequence: 3
  givenname: Juan Pinzon
  surname: Roncancio
  fullname: Roncancio, Juan Pinzon
  organization: Universidad de los Andes,Bogotá,Colombia
– sequence: 4
  givenname: David Ortiz
  surname: Almanza
  fullname: Almanza, David Ortiz
  organization: Universidad de los Andes,Bogotá,Colombia
– sequence: 5
  givenname: Juan
  surname: Arboleda
  fullname: Arboleda, Juan
  organization: Universidad de los Andes,Bogotá,Colombia
– sequence: 6
  givenname: Mario
  surname: Linares-Vasquez
  fullname: Linares-Vasquez, Mario
  organization: Universidad de los Andes,Bogotá,Colombia
– sequence: 7
  givenname: Ruben Francisco
  surname: Manrique
  fullname: Manrique, Ruben Francisco
  organization: Universidad de los Andes,Bogotá,Colombia
BookMark eNotjbFOwzAUAI0EElA6szD4BwrP8XPssNGotJUCDJS5sutnMErjKk6F-HuCYLrldHfJTrvUEWPXAm6FQHUnS5Ta6NuRBhBO2LTSlUEADUqXxTmb5hwdqBIrUZXVBftsKOfUZR76tOfDB_HnZv664AUUyOu0P9AQh5i6e75JX7b3mc-PsfWxe-eLEOIuUjfwp-SpHROp58s4rI6Or3M-Eq9bO-5Gy_4mrthZsG2m6T8n7O1xsalXs-Zlua4fmpkt0AwzJAfkvSqkMwqtQyAMlYXCWetK44N0cmdwFKwDQz44DSIQ0k4apwDkhN38dSMRbQ993Nv-eyugRC2kkj-fOVjI
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/3643787.3648040
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library (IEL) (UW System Shared)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798400705762
EndPage 48
ExternalDocumentID 10647135
Genre orig-research
GroupedDBID 6IE
6IL
ACM
ALMA_UNASSIGNED_HOLDINGS
APO
CBEJK
LHSKQ
RIE
RIL
ID FETCH-LOGICAL-a248t-4eb0edd523b854ab40e4f9a02baab68df3b3c84d52ab08edfb701fe4ec38b5003
IEDL.DBID RIE
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001313494100008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:03:10 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a248t-4eb0edd523b854ab40e4f9a02baab68df3b3c84d52ab08edfb701fe4ec38b5003
PageCount 4
ParticipantIDs ieee_primary_10647135
PublicationCentury 2000
PublicationDate 2024-April-20
PublicationDateYYYYMMDD 2024-04-20
PublicationDate_xml – month: 04
  year: 2024
  text: 2024-April-20
  day: 20
PublicationDecade 2020
PublicationTitle 2024 IEEE/ACM International Workshop on Natural Language-Based Software Engineering (NLBSE)
PublicationTitleAbbrev NLBSE
PublicationYear 2024
Publisher ACM
Publisher_xml – name: ACM
SSID ssib056491969
Score 1.8882654
Snippet This paper presents the findings of our team's efforts during the "NLBSE 2024" competition, which centered on the multi-class classification of GitHub Issues....
SourceID ieee
SourceType Publisher
StartPage 45
SubjectTerms Buildings
Computer architecture
Conferences
Data models
Embedding
Few-shot learning
GitHub Issue Classification
Measurement
NLBSE 2024 Competition
Task analysis
Transformers
Title Lessons from the NLBSE 2024 Competition: Towards Building Efficient Models for GitHub Issue Classification
URI https://ieeexplore.ieee.org/document/10647135
WOSCitedRecordID wos001313494100008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELVoxcAEiCK-5YHVxUmcxGFs1dKhqipRULfqHJ-lItSiNuH3c3ZTYGFgihVZsXS2886-e_cYuy8kkBNvE2EjbYVyMhUAOhKFs5oAOo5tkE54HeeTiZ7Pi2lDVg9cGEQMyWfY9c0Qy7frsvZXZbTDM-Ul5VqslefZjqy1Xzxppgpf6qUp3xOp9CHxQSmdd-mppb_d-KWfEuBjePzPgU9Y54eIx6ffEHPKDnB1xt7G9HeixcI9NYSTA8cn497zgMcEhLwf_OCQh_XIZyEndst7jfY1H4SCETQW9xpo7_SJ9YY_LatRbXjQ4eNBJNOnD4UZ67CX4WDWH4lGMkFArHQlFBqJ1tLp0uhUgVESlStAxgbAZNq6xCSlVtQBjNRoncll5FBhmWiT0g4_Z-3VeoUXjBvlCO0RkzIGBYRa4IB8GVeiMhBBeck63lCLj11VjMXeRld_vL9mR94OPhITyxvWrjY13rLD8rNabjd3YS6_AHSLodA
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEG0UTfSkRozf9uB1sdudha5HCIhx3ZCIhhtpt9NEY8DA4u93WkC9ePC0zabZJtN237Qzbx5j15nQ5MTbJLKxshE4kUZaqzjKnFUE0FLaIJ3wkreKQo1G2WBFVg9cGEQMyWfY8M0Qy7fTcuGvymiHN8FLym2yrRRAiiVda7180iZkvtjLqoBPDOlN4sNSqtWgpxL-fuOXgkoAkN7eP4feZ_UfKh4ffIPMAdvAySF7y-n_RMuFe3IIJxeOF3n7qcslQSHvBE84ZGLd8mHIip3z9kr9mndDyQgai3sVtHf6xHTG716r_sLwoMTHg0ymTyAKc1Znz73usNOPVqIJkZagqgjQCLSWzpdGpaANCASXaSGN1qaprEtMUiqgDtoIhdaZlogdApaJMint8SNWm0wneMy4AUd4j5iUUoMm3NJOkzfjSgSjY12esLo31PhjWRdjvLbR6R_vr9hOf_iYj_P74uGM7Xqb-LiMFOesVs0WeMG2y8_qdT67DPP6BQvMpRc
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+IEEE%2FACM+International+Workshop+on+Natural+Language-Based+Software+Engineering+%28NLBSE%29&rft.atitle=Lessons+from+the+NLBSE+2024+Competition%3A+Towards+Building+Efficient+Models+for+GitHub+Issue+Classification&rft.au=Gomez-Barrera%2C+Daniel+Fernando&rft.au=Becerra%2C+Luccas+Rojas&rft.au=Roncancio%2C+Juan+Pinzon&rft.au=Almanza%2C+David+Ortiz&rft.date=2024-04-20&rft.pub=ACM&rft.spage=45&rft.epage=48&rft_id=info:doi/10.1145%2F3643787.3648040&rft.externalDocID=10647135