Towards Automatically Addressing Self-Admitted Technical Debt: How Far Are We?

Upon evolving their software, organizations and individual developers have to spend a substantial effort to pay back technical debt, i.e, the fact that software is released in a shape not as good as it should be, e.g, in terms of functionality, reliability, or maintainability. This paper empirically...

Full description

Saved in:
Bibliographic Details
Published in:IEEE/ACM International Conference on Automated Software Engineering : [proceedings] pp. 585 - 597
Main Authors: Mastropaolo, Antonio, Di Penta, Massimiliano, Bavota, Gabriele
Format: Conference Proceeding
Language:English
Published: IEEE 11.09.2023
Subjects:
ISSN:2643-1572
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Upon evolving their software, organizations and individual developers have to spend a substantial effort to pay back technical debt, i.e, the fact that software is released in a shape not as good as it should be, e.g, in terms of functionality, reliability, or maintainability. This paper empirically investigates the extent to which technical debt can be automatically paid back by neural-based generative models, and in particular models exploiting different strategies for pre-training and fine-tuning. We start by extracting a dateset of 5,039 Self-Admitted Technical Debt (SATD) removals from 595 open-source projects. SATD refers to technical debt instances documented (e.g, via code comments) by developers. We use this dataset to experiment with seven different generative deep learning (DL) model configurations. Specifically, we compare transformers pre-trained and fine-tuned with different combinations of training objectives, including the fixing of generic code changes, SATD removals, and SATD-comment prompt tuning. Also, we investigate the applicability in this context of a recently-available Large Language Model (LLM)-based chat bot. Results of our study indicate that the automated repayment of SATD is a challenging task, with the best model we experimented with able to automatically fix ∼2% to 8% of test instances, depending on the number of attempts it is allowed to make. Given the limited size of the fine-tuning dataset (∼5k instances), the model's pre-training plays a fundamental role in boosting performance. Also, the ability to remove SATD steadily drops if the comment documenting the SATD is not provided as input to the model. Finally, we found general-purpose LLMs to not be a competitive approach for addressing SATD.
AbstractList Upon evolving their software, organizations and individual developers have to spend a substantial effort to pay back technical debt, i.e, the fact that software is released in a shape not as good as it should be, e.g, in terms of functionality, reliability, or maintainability. This paper empirically investigates the extent to which technical debt can be automatically paid back by neural-based generative models, and in particular models exploiting different strategies for pre-training and fine-tuning. We start by extracting a dateset of 5,039 Self-Admitted Technical Debt (SATD) removals from 595 open-source projects. SATD refers to technical debt instances documented (e.g, via code comments) by developers. We use this dataset to experiment with seven different generative deep learning (DL) model configurations. Specifically, we compare transformers pre-trained and fine-tuned with different combinations of training objectives, including the fixing of generic code changes, SATD removals, and SATD-comment prompt tuning. Also, we investigate the applicability in this context of a recently-available Large Language Model (LLM)-based chat bot. Results of our study indicate that the automated repayment of SATD is a challenging task, with the best model we experimented with able to automatically fix ∼2% to 8% of test instances, depending on the number of attempts it is allowed to make. Given the limited size of the fine-tuning dataset (∼5k instances), the model's pre-training plays a fundamental role in boosting performance. Also, the ability to remove SATD steadily drops if the comment documenting the SATD is not provided as input to the model. Finally, we found general-purpose LLMs to not be a competitive approach for addressing SATD.
Author Di Penta, Massimiliano
Bavota, Gabriele
Mastropaolo, Antonio
Author_xml – sequence: 1
  givenname: Antonio
  surname: Mastropaolo
  fullname: Mastropaolo, Antonio
  organization: SEART @ Software Institute, Università della Svizzera italiana (USI),Switzerland
– sequence: 2
  givenname: Massimiliano
  surname: Di Penta
  fullname: Di Penta, Massimiliano
  organization: University of Sannio,Dept. of Engineering,Italy
– sequence: 3
  givenname: Gabriele
  surname: Bavota
  fullname: Bavota, Gabriele
  organization: SEART @ Software Institute, Università della Svizzera italiana (USI),Switzerland
BookMark eNotjMtKw0AUQEdRsK39Al3MDyTeufPKuJFQWysUXTTiskwydzSSJpJESv9eRVeHA4czZWdt1xJjVwJSIcDd5NulNoguRUCZAgiQJ2zurMukBonOGXXKJmiUTIS2eMGmw_ABoH_ETthT0R18Hwaef43d3o915ZvmyPMQehqGun3jW2pikod9PY4UeEHVe_sb8Xsqx1u-7g585Xue98Rf6e6SnUffDDT_54y9rJbFYp1snh8eF_km8ZipMTEkHcUqOGsVSdKRQqxEDKCCRKkzsAENigocWq81GG1UdJmQJZAvtZIzdv33rYlo99nXe98fdwLQZVpZ-Q2-W092
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ASE56229.2023.00103
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798350329964
EISSN 2643-1572
EndPage 597
ExternalDocumentID 10298547
Genre orig-research
GrantInformation_xml – fundername: European Research Council (ERC)
  grantid: 851720
  funderid: 10.13039/100010663
GroupedDBID 6IE
6IF
6IH
6IK
6IL
6IM
6IN
6J9
AAJGR
AAWTH
ABLEC
ACREN
ADYOE
ADZIZ
AFYQB
ALMA_UNASSIGNED_HOLDINGS
AMTXH
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
ID FETCH-LOGICAL-a284t-6e39efcd9774e3e5fedfc1fd04d3235807d2621c0927a5506564f9813b0eab543
IEDL.DBID RIE
ISICitedReferencesCount 7
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001103357200047&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:32:28 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a284t-6e39efcd9774e3e5fedfc1fd04d3235807d2621c0927a5506564f9813b0eab543
PageCount 13
ParticipantIDs ieee_primary_10298547
PublicationCentury 2000
PublicationDate 2023-Sept.-11
PublicationDateYYYYMMDD 2023-09-11
PublicationDate_xml – month: 09
  year: 2023
  text: 2023-Sept.-11
  day: 11
PublicationDecade 2020
PublicationTitle IEEE/ACM International Conference on Automated Software Engineering : [proceedings]
PublicationTitleAbbrev ASE
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0051577
ssib057256115
Score 2.3242
Snippet Upon evolving their software, organizations and individual developers have to spend a substantial effort to pay back technical debt, i.e, the fact that...
SourceID ieee
SourceType Publisher
StartPage 585
SubjectTerms Codes
Machine Learning for Code
Organizations
Pre-trained models
Self-Admitted Technical Debt
Shape
Software
Software reliability
Training
Transformers
Title Towards Automatically Addressing Self-Admitted Technical Debt: How Far Are We?
URI https://ieeexplore.ieee.org/document/10298547
WOSCitedRecordID wos001103357200047&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELVoxcBUPor4lgfWQJwv2ywoQq06VZVaRLfKtc8IqUpQmoD495zdtLAwsEVRhsiX87tz7r1HyC0zidQ6XQYa0SFwHQLmHDAMCCjEu3gZcuXNJvh4LOZzOWnJ6p4LAwB--Azu3KX_l29K3bijMszwSIo04R3S4TzbkLW2H0_KEbwZ29W-iNOctzJDLJT3-XSAUB85bkrkRE2Zs8n6Zaji8WTY--ebHJL-DzOPTnaYc0T2oDgmva01A20z9YSMZ34cdk3zpi69Kqtarb5oboyfey1e6RRWNnDSuTXWnNSfsLuHKG5A9QMdlZ90qCqaV0Bf4LFPnoeD2dMoaK0TAoV4UwcZxBKsNq66gxhSC8ZqZk2YmNiRY0NuoixiOpQRV9ikYFWXWCkYxgbUMk3iU9ItygLOCBU2URALJSzTSaY0NjhYlUkbKZFZGetz0nfrs3jfqGMstktz8cf9S3LgQuBmLhi7It26auCa7OuP-m1d3fiYfgN5dZ_C
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWgIMFUPor4xgNrIM6XYxYUoVZFlKhSi-hWufYZIVUNSlMQ_56zSQsLA1sUZYh8Ob875957hFwyHQml4omnEB082yFgzgHDgIBEvAsnPpfObILneToaiX5NVndcGABww2dwZS_dv3xdqIU9KsMMD0QaR3ydbFjrrJqutfx8Yo7wzdiq-kWk5rwWGmK-uM4GbQT7wLJTAitryqxR1i9LFYconeY_32WHtH64ebS_Qp1dsgazPdJcmjPQOlf3ST50A7Fzmi2qwumyyun0k2Zau8nX2QsdwNR4Vjy3wqqTujN2-xDFLai6od3ig3ZkSbMS6DPctshTpz2863q1eYInEXEqL4FQgFHa1ncQQmxAG8WM9iMdWnqsz3WQBEz5IuAS2xSs6yIjUobRATmJo_CANGbFDA4JTU0kIUxlapiKEqmwxcG6TJhApokRoToiLbs-47dvfYzxcmmO_7h_Qba6w8feuHefP5yQbRsOO4HB2ClpVOUCzsimeq9e5-W5i-8X_nCjCw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE%2FACM+International+Conference+on+Automated+Software+Engineering+%3A+%5Bproceedings%5D&rft.atitle=Towards+Automatically+Addressing+Self-Admitted+Technical+Debt%3A+How+Far+Are+We%3F&rft.au=Mastropaolo%2C+Antonio&rft.au=Di+Penta%2C+Massimiliano&rft.au=Bavota%2C+Gabriele&rft.date=2023-09-11&rft.pub=IEEE&rft.eissn=2643-1572&rft.spage=585&rft.epage=597&rft_id=info:doi/10.1109%2FASE56229.2023.00103&rft.externalDocID=10298547