Towards Automatically Addressing Self-Admitted Technical Debt: How Far Are We?

Upon evolving their software, organizations and individual developers have to spend a substantial effort to pay back technical debt, i.e, the fact that software is released in a shape not as good as it should be, e.g, in terms of functionality, reliability, or maintainability. This paper empirically...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE/ACM International Conference on Automated Software Engineering : [proceedings] pp. 585 - 597
Main Authors:	Mastropaolo, Antonio, Di Penta, Massimiliano, Bavota, Gabriele
Format:	Conference Proceeding
Language:	English
Published:	IEEE 11.09.2023
Subjects:	Codes Machine Learning for Code Organizations Pre-trained models Self-Admitted Technical Debt Shape Software Software reliability Training Transformers
ISSN:	2643-1572
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Abstract	Upon evolving their software, organizations and individual developers have to spend a substantial effort to pay back technical debt, i.e, the fact that software is released in a shape not as good as it should be, e.g, in terms of functionality, reliability, or maintainability. This paper empirically investigates the extent to which technical debt can be automatically paid back by neural-based generative models, and in particular models exploiting different strategies for pre-training and fine-tuning. We start by extracting a dateset of 5,039 Self-Admitted Technical Debt (SATD) removals from 595 open-source projects. SATD refers to technical debt instances documented (e.g, via code comments) by developers. We use this dataset to experiment with seven different generative deep learning (DL) model configurations. Specifically, we compare transformers pre-trained and fine-tuned with different combinations of training objectives, including the fixing of generic code changes, SATD removals, and SATD-comment prompt tuning. Also, we investigate the applicability in this context of a recently-available Large Language Model (LLM)-based chat bot. Results of our study indicate that the automated repayment of SATD is a challenging task, with the best model we experimented with able to automatically fix ∼2% to 8% of test instances, depending on the number of attempts it is allowed to make. Given the limited size of the fine-tuning dataset (∼5k instances), the model's pre-training plays a fundamental role in boosting performance. Also, the ability to remove SATD steadily drops if the comment documenting the SATD is not provided as input to the model. Finally, we found general-purpose LLMs to not be a competitive approach for addressing SATD.
AbstractList	Upon evolving their software, organizations and individual developers have to spend a substantial effort to pay back technical debt, i.e, the fact that software is released in a shape not as good as it should be, e.g, in terms of functionality, reliability, or maintainability. This paper empirically investigates the extent to which technical debt can be automatically paid back by neural-based generative models, and in particular models exploiting different strategies for pre-training and fine-tuning. We start by extracting a dateset of 5,039 Self-Admitted Technical Debt (SATD) removals from 595 open-source projects. SATD refers to technical debt instances documented (e.g, via code comments) by developers. We use this dataset to experiment with seven different generative deep learning (DL) model configurations. Specifically, we compare transformers pre-trained and fine-tuned with different combinations of training objectives, including the fixing of generic code changes, SATD removals, and SATD-comment prompt tuning. Also, we investigate the applicability in this context of a recently-available Large Language Model (LLM)-based chat bot. Results of our study indicate that the automated repayment of SATD is a challenging task, with the best model we experimented with able to automatically fix ∼2% to 8% of test instances, depending on the number of attempts it is allowed to make. Given the limited size of the fine-tuning dataset (∼5k instances), the model's pre-training plays a fundamental role in boosting performance. Also, the ability to remove SATD steadily drops if the comment documenting the SATD is not provided as input to the model. Finally, we found general-purpose LLMs to not be a competitive approach for addressing SATD.
Author	Di Penta, Massimiliano Bavota, Gabriele Mastropaolo, Antonio
Author_xml	– sequence: 1 givenname: Antonio surname: Mastropaolo fullname: Mastropaolo, Antonio organization: SEART @ Software Institute, Università della Svizzera italiana (USI),Switzerland – sequence: 2 givenname: Massimiliano surname: Di Penta fullname: Di Penta, Massimiliano organization: University of Sannio,Dept. of Engineering,Italy – sequence: 3 givenname: Gabriele surname: Bavota fullname: Bavota, Gabriele organization: SEART @ Software Institute, Università della Svizzera italiana (USI),Switzerland
BookMark	eNotjMtKw0AUQEdRsK39Al3MDyTeufPKuJFQWysUXTTiskwydzSSJpJESv9eRVeHA4czZWdt1xJjVwJSIcDd5NulNoguRUCZAgiQJ2zurMukBonOGXXKJmiUTIS2eMGmw_ABoH_ETthT0R18Hwaef43d3o915ZvmyPMQehqGun3jW2pikod9PY4UeEHVe_sb8Xsqx1u-7g585Xue98Rf6e6SnUffDDT_54y9rJbFYp1snh8eF_km8ZipMTEkHcUqOGsVSdKRQqxEDKCCRKkzsAENigocWq81GG1UdJmQJZAvtZIzdv33rYlo99nXe98fdwLQZVpZ-Q2-W092
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/ASE56229.2023.00103
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	9798350329964
EISSN	2643-1572
EndPage	597
ExternalDocumentID	10298547
Genre	orig-research
GrantInformation_xml	– fundername: European Research Council (ERC) grantid: 851720 funderid: 10.13039/100010663
GroupedDBID	6IE 6IF 6IH 6IK 6IL 6IM 6IN 6J9 AAJGR AAWTH ABLEC ACREN ADYOE ADZIZ AFYQB ALMA_UNASSIGNED_HOLDINGS AMTXH BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL
ID	FETCH-LOGICAL-a284t-6e39efcd9774e3e5fedfc1fd04d3235807d2621c0927a5506564f9813b0eab543
IEDL.DBID	RIE
ISICitedReferencesCount	7
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001103357200047&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Wed Aug 27 02:32:28 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a284t-6e39efcd9774e3e5fedfc1fd04d3235807d2621c0927a5506564f9813b0eab543
PageCount	13
ParticipantIDs	ieee_primary_10298547
PublicationCentury	2000
PublicationDate	2023-Sept.-11
PublicationDateYYYYMMDD	2023-09-11
PublicationDate_xml	– month: 09 year: 2023 text: 2023-Sept.-11 day: 11
PublicationDecade	2020
PublicationTitle	IEEE/ACM International Conference on Automated Software Engineering : [proceedings]
PublicationTitleAbbrev	ASE
PublicationYear	2023
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0051577 ssib057256115
Score	2.3242
Snippet	Upon evolving their software, organizations and individual developers have to spend a substantial effort to pay back technical debt, i.e, the fact that...
SourceID	ieee
SourceType	Publisher
StartPage	585
SubjectTerms	Codes Machine Learning for Code Organizations Pre-trained models Self-Admitted Technical Debt Shape Software Software reliability Training Transformers
Title	Towards Automatically Addressing Self-Admitted Technical Debt: How Far Are We?
URI	https://ieeexplore.ieee.org/document/10298547
WOSCitedRecordID	wos001103357200047&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELVoxcBUPor4lgfWQJwv2ywoQq06VZVaRLfKtc8IqUpQmoD495zdtLAwsEVRhsiX87tz7r1HyC0zidQ6XQYa0SFwHQLmHDAMCCjEu3gZcuXNJvh4LOZzOWnJ6p4LAwB--Azu3KX_l29K3bijMszwSIo04R3S4TzbkLW2H0_KEbwZ29W-iNOctzJDLJT3-XSAUB85bkrkRE2Zs8n6Zaji8WTY--ebHJL-DzOPTnaYc0T2oDgmva01A20z9YSMZ34cdk3zpi69Kqtarb5oboyfey1e6RRWNnDSuTXWnNSfsLuHKG5A9QMdlZ90qCqaV0Bf4LFPnoeD2dMoaK0TAoV4UwcZxBKsNq66gxhSC8ZqZk2YmNiRY0NuoixiOpQRV9ikYFWXWCkYxgbUMk3iU9ItygLOCBU2URALJSzTSaY0NjhYlUkbKZFZGetz0nfrs3jfqGMstktz8cf9S3LgQuBmLhi7It26auCa7OuP-m1d3fiYfgN5dZ_C
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWgIMFUPor4xgNrIM6XYxYUoVZFlKhSi-hWufYZIVUNSlMQ_56zSQsLA1sUZYh8Ob875957hFwyHQml4omnEB082yFgzgHDgIBEvAsnPpfObILneToaiX5NVndcGABww2dwZS_dv3xdqIU9KsMMD0QaR3ydbFjrrJqutfx8Yo7wzdiq-kWk5rwWGmK-uM4GbQT7wLJTAitryqxR1i9LFYconeY_32WHtH64ebS_Qp1dsgazPdJcmjPQOlf3ST50A7Fzmi2qwumyyun0k2Zau8nX2QsdwNR4Vjy3wqqTujN2-xDFLai6od3ig3ZkSbMS6DPctshTpz2863q1eYInEXEqL4FQgFHa1ncQQmxAG8WM9iMdWnqsz3WQBEz5IuAS2xSs6yIjUobRATmJo_CANGbFDA4JTU0kIUxlapiKEqmwxcG6TJhApokRoToiLbs-47dvfYzxcmmO_7h_Qba6w8feuHefP5yQbRsOO4HB2ClpVOUCzsimeq9e5-W5i-8X_nCjCw
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE%2FACM+International+Conference+on+Automated+Software+Engineering+%3A+%5Bproceedings%5D&rft.atitle=Towards+Automatically+Addressing+Self-Admitted+Technical+Debt%3A+How+Far+Are+We%3F&rft.au=Mastropaolo%2C+Antonio&rft.au=Di+Penta%2C+Massimiliano&rft.au=Bavota%2C+Gabriele&rft.date=2023-09-11&rft.pub=IEEE&rft.eissn=2643-1572&rft.spage=585&rft.epage=597&rft_id=info:doi/10.1109%2FASE56229.2023.00103&rft.externalDocID=10298547