PYEVOLVE: Automating Frequent Code Changes in Python ML Systems

Because of the naturalness of software and the rapid evolution of Machine Learning (ML) techniques, frequently repeated code change patterns (CPATs) occur often. They range from simple API migrations to changes involving several complex control structures such as for loops. While manually performing...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Proceedings / International Conference on Software Engineering s. 995 - 1007
Hlavní autori: Dilhara, Malinda, Dig, Danny, Ketkar, Ameya
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 01.05.2023
Predmet:
ISSN:1558-1225
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Because of the naturalness of software and the rapid evolution of Machine Learning (ML) techniques, frequently repeated code change patterns (CPATs) occur often. They range from simple API migrations to changes involving several complex control structures such as for loops. While manually performing CPATs is tedious, the current state-of-the-art techniques for inferring transformation rules are not advanced enough to handle unseen variants of complex CPATs, resulting in a low recall rate. In this paper we present a novel, automated workflow that mines CPATs, infers the transformation rules, and then transplants them automatically to new target sites. We designed, implemented, evaluated and released this in a tool, PYEVOLVE. At its core is a novel data-flow, control-flow aware transformation rule inference engine. Our technique allows us to advance the state-of-the-art for transformation-by-example tools; without it, 70% of the code changes that PYEVOLVE transforms would not be possible to automate. Our thorough empirical evaluation of over 40,000 transformations shows 97% precision and 94% recall. By accepting 90% of CPATs generated by PYEVOLVE in famous open-source projects, developers confirmed its changes are useful.
AbstractList Because of the naturalness of software and the rapid evolution of Machine Learning (ML) techniques, frequently repeated code change patterns (CPATs) occur often. They range from simple API migrations to changes involving several complex control structures such as for loops. While manually performing CPATs is tedious, the current state-of-the-art techniques for inferring transformation rules are not advanced enough to handle unseen variants of complex CPATs, resulting in a low recall rate. In this paper we present a novel, automated workflow that mines CPATs, infers the transformation rules, and then transplants them automatically to new target sites. We designed, implemented, evaluated and released this in a tool, PYEVOLVE. At its core is a novel data-flow, control-flow aware transformation rule inference engine. Our technique allows us to advance the state-of-the-art for transformation-by-example tools; without it, 70% of the code changes that PYEVOLVE transforms would not be possible to automate. Our thorough empirical evaluation of over 40,000 transformations shows 97% precision and 94% recall. By accepting 90% of CPATs generated by PYEVOLVE in famous open-source projects, developers confirmed its changes are useful.
Author Dilhara, Malinda
Dig, Danny
Ketkar, Ameya
Author_xml – sequence: 1
  givenname: Malinda
  surname: Dilhara
  fullname: Dilhara, Malinda
  email: malinda.malwala@colorado.edu
  organization: University of Colorado Boulder,USA
– sequence: 2
  givenname: Danny
  surname: Dig
  fullname: Dig, Danny
  email: danny.dig@colorado.edu
  organization: University of Colorado Boulder,JetBrains Research,USA
– sequence: 3
  givenname: Ameya
  surname: Ketkar
  fullname: Ketkar, Ameya
  email: ketkara@uber.com
  organization: Uber Technologies Inc.,USA
BookMark eNotz9FKwzAUgOEoCs65N9hFXqD1nJMmabyRUbo5qGwwHXg10ibdCjbVtbvo2yvo1X_3wX_PbkIXPGNzhBgRzOM62-VJqtDEBCRiADB4xWZGp6iUTKQGNNdsglKmERLJOzbr-6YEiYZQgJqw5-1Hvt8U-_yJLy5D19qhCUe-PPvviw8DzzrneXay4eh73gS-HYdTF_hrwXdjP_i2f2C3tf3s_ey_U_a-zN-yl6jYrNbZooisEDREpA2WZBx4AylQQmhLXaWlMk5oBVo4cklVWYPOgNO6lmgdCEk1VlppElM2_3Mb7_3h69y09jweEFCT_l3_AeNrSho
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
ESBDL
RIE
RIO
DOI 10.1109/ICSE48619.2023.00091
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore Open Access Journals
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9781665457019
1665457015
EISSN 1558-1225
EndPage 1007
ExternalDocumentID 10172702
Genre orig-research
GrantInformation_xml – fundername: NSF
  grantid: CNS-1941898,CNS-2213763
  funderid: 10.13039/100000001
GroupedDBID -~X
.4S
.DC
123
23M
29O
5VS
6IE
6IF
6IH
6IK
6IL
6IM
6IN
8US
AAJGR
AAWTH
ABLEC
ADZIZ
AFFNX
ALMA_UNASSIGNED_HOLDINGS
APO
ARCSS
AVWKF
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
EDO
ESBDL
FEDTE
I-F
I07
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
XOL
ID FETCH-LOGICAL-a332t-2791b29d0e90802421ab7c8b69d376073d2d4cca91d90d77f51ad0352f1c76723
IEDL.DBID RIE
ISICitedReferencesCount 7
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001032629800082&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:09:24 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a332t-2791b29d0e90802421ab7c8b69d376073d2d4cca91d90d77f51ad0352f1c76723
OpenAccessLink https://ieeexplore.ieee.org/document/10172702
PageCount 13
ParticipantIDs ieee_primary_10172702
PublicationCentury 2000
PublicationDate 2023-May
PublicationDateYYYYMMDD 2023-05-01
PublicationDate_xml – month: 05
  year: 2023
  text: 2023-May
PublicationDecade 2020
PublicationTitle Proceedings / International Conference on Software Engineering
PublicationTitleAbbrev ICSE
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib051921306
ssj0006499
Score 2.3484254
Snippet Because of the naturalness of software and the rapid evolution of Machine Learning (ML) techniques, frequently repeated code change patterns (CPATs) occur...
SourceID ieee
SourceType Publisher
StartPage 995
SubjectTerms Codes
Engines
Machine learning
Organ transplantation
Program synthesis
Program transformation
Programming by example
Python
Repetitive code changes
Software
Software engineering
Transformation by Example
Transforms
Title PYEVOLVE: Automating Frequent Code Changes in Python ML Systems
URI https://ieeexplore.ieee.org/document/10172702
WOSCitedRecordID wos001032629800082&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ27T8MwEMYtqBiYyqOItzywuo0fiWMWhKpUIJUSCajKVMWPoC4JalMk_nt8SQosDGxRpuhsx9_Z9_0OoSsrQg3CgkgdCwJ8dqIsFcRnzDaTzEhXg-enYzmZxLOZSluzeu2Fcc7VxWeuD4_1Xb4tzRqOygYwfcA_tY22pYwas9Zm8oQA9uJwZdj-hiOv5VuvHA3U4H74lIjYpwt9aBgO2ELAcv7qqFJvKKPuPz9lD_V-rHk4_d509tGWKw5Qd9ObAbdL9RDdpK_J9HE8Ta7x7boqQZcWb3i0rCunKzwsrcONs2CFFwVOP4EhgB_GuEWY99DLKHke3pG2WQLJOGcVYVJRzZQNnAL7rGA009LEOlIW6l4kt8wKP1yKWhVYKfOQZhZgqDk1MpKMH6FOURbuGGGeuSj3mXFu4lAIHcVOGy_ErFd3lGsZnKAeBGT-3vAw5ptYnP7x_gztQsybMsFz1KmWa3eBdsxHtVgtL-tR_ALLQJjz
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ3PT4MwFMcbnSZ6mj9m_G0PXtloKZR6MWZh2SKbJM5lnhZKi9kFDGMm_vf2AVMvHrwRTuS1pd_Xvu_nIXSrmCtBWFhc-swCPrslFGGWyZhVzGnCdQWen4V8MvHncxE1ZvXKC6O1rorPdBceq7t8lSdrOCrrwfQB_9Q22nEZo3Zt19pMHxfQXg5cGjY_Ys-o-cYtR2zRG_WfA-abhKELLcMBXAhgzl89VaotZdD-58ccoM6POQ9H39vOIdrS2RFqb7oz4GaxHqP76DWYPYWz4A4_rMsclGn2hgdFVTtd4n6uNK69BSu8zHD0CRQBPA5xAzHvoJdBMO0PraZdghU7Di0tygWRVChbCzDQMkpiyRNfekJB5Qt3FFXMDJggStiK89QlsQIcakoS7nHqnKBWlmf6FGEn1l5qcuM08U2gpedrmRgppoy-I47k9hnqQEAW7zURY7GJxfkf72_Q3nA6DhfhaPJ4gfYh_nXR4CVqlcVaX6Hd5KNcrorrakS_APVCnDo
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Conference+on+Software+Engineering&rft.atitle=PYEVOLVE%3A+Automating+Frequent+Code+Changes+in+Python+ML+Systems&rft.au=Dilhara%2C+Malinda&rft.au=Dig%2C+Danny&rft.au=Ketkar%2C+Ameya&rft.date=2023-05-01&rft.pub=IEEE&rft.eissn=1558-1225&rft.spage=995&rft.epage=1007&rft_id=info:doi/10.1109%2FICSE48619.2023.00091&rft.externalDocID=10172702