Learning from, Understanding, and Supporting DevOps Artifacts for Docker

With the growing use of DevOps tools and frameworks, there is an increased need for tools and techniques that support more than code. The current state-of-the-art in static developer assistance for tools like Docker is limited to shallow syntactic validation. We identify three core challenges in the...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE) s. 38 - 49
Hlavní autoři: Henkel, Jordan, Bird, Christian, Lahiri, Shuvendu K., Reps, Thomas
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: ACM 27.06.2020
Témata:
ISSN:1558-1225
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract With the growing use of DevOps tools and frameworks, there is an increased need for tools and techniques that support more than code. The current state-of-the-art in static developer assistance for tools like Docker is limited to shallow syntactic validation. We identify three core challenges in the realm of learning from, understanding, and supporting developers writing DevOps artifacts: (i) nested languages in DevOps artifacts, (ii) rule mining, and (iii) the lack of semantic rule-based analysis. To address these challenges we introduce a toolset, binnacle, that enabled us to ingest 900,000 GitHub repositories. Focusing on Docker, we extracted approximately 178,000 unique Dockerfiles, and also identified a Gold Set of Dockerfiles written by Docker experts. We addressed challenge (i) by reducing the number of effectively uninterpretable nodes in our ASTs by over 80% via a technique we call phased parsing. To address challenge (ii), we introduced a novel rule-mining technique capable of recovering two-thirds of the rules in a benchmark we curated. Through this automated mining, we were able to recover 16 new rules that were not found during manual rule collection. To address challenge (iii), we manually collected a set of rules for Dockerfiles from commits to the files in the Gold Set. These rules encapsulate best practices, avoid docker build failures, and improve image size and build latency. We created an analyzer that used these rules, and found that, on average, Dockerfiles on GitHub violated the rules five times more frequently than the Dockerfiles in our Gold Set. We also found that industrial Dockerfiles fared no better than those sourced from GitHub. The learned rules and analyzer in binnacle can be used to aid developers in the IDE when creating Dockerfiles, and in a post-hoc fashion to identify issues in, and to improve, existing Dockerfiles.
AbstractList With the growing use of DevOps tools and frameworks, there is an increased need for tools and techniques that support more than code. The current state-of-the-art in static developer assistance for tools like Docker is limited to shallow syntactic validation. We identify three core challenges in the realm of learning from, understanding, and supporting developers writing DevOps artifacts: (i) nested languages in DevOps artifacts, (ii) rule mining, and (iii) the lack of semantic rule-based analysis. To address these challenges we introduce a toolset, binnacle, that enabled us to ingest 900,000 GitHub repositories. Focusing on Docker, we extracted approximately 178,000 unique Dockerfiles, and also identified a Gold Set of Dockerfiles written by Docker experts. We addressed challenge (i) by reducing the number of effectively uninterpretable nodes in our ASTs by over 80% via a technique we call phased parsing. To address challenge (ii), we introduced a novel rule-mining technique capable of recovering two-thirds of the rules in a benchmark we curated. Through this automated mining, we were able to recover 16 new rules that were not found during manual rule collection. To address challenge (iii), we manually collected a set of rules for Dockerfiles from commits to the files in the Gold Set. These rules encapsulate best practices, avoid docker build failures, and improve image size and build latency. We created an analyzer that used these rules, and found that, on average, Dockerfiles on GitHub violated the rules five times more frequently than the Dockerfiles in our Gold Set. We also found that industrial Dockerfiles fared no better than those sourced from GitHub. The learned rules and analyzer in binnacle can be used to aid developers in the IDE when creating Dockerfiles, and in a post-hoc fashion to identify issues in, and to improve, existing Dockerfiles.
Author Henkel, Jordan
Bird, Christian
Lahiri, Shuvendu K.
Reps, Thomas
Author_xml – sequence: 1
  givenname: Jordan
  surname: Henkel
  fullname: Henkel, Jordan
  email: jjhenkel@cs.wisc.edu
  organization: University of Wisconsin-Madison,USA
– sequence: 2
  givenname: Christian
  surname: Bird
  fullname: Bird, Christian
  email: Christian.Bird@microsoft.com
  organization: Microsoft Research,USA
– sequence: 3
  givenname: Shuvendu K.
  surname: Lahiri
  fullname: Lahiri, Shuvendu K.
  email: Shuvendu.Lahiri@microsoft.com
  organization: Microsoft Research,USA
– sequence: 4
  givenname: Thomas
  surname: Reps
  fullname: Reps, Thomas
  email: reps@cs.wisc.edu
  organization: University of Wisconsin-Madison,USA
BookMark eNotjMFOwzAQRA0CiaZw5sDFH9AUr53E62PVFooUqQfouXKcNQpQJ7IDEn9PEJye5mlmMnYR-kCM3YJYAhTlvVJaI8BSKRSFqM5YNlmhNEhQ52wGZYk5SFlesSylNyFEVRgzY7uabAxdeOU-9qcFP4SWYhptaCe34BP58-cw9HH87Wzoaz8kvpqSt25M3PeRb3r3TvGaXXr7kejmn3N2eNi-rHd5vX98Wq_q3CqUYy6d0cY2kqQn1xSqBURowVXS2wadRUtlhWScbipENY20LLRHQdj4yrdqzu7-fjsiOg6xO9n4fTQSC5BG_QD4fk04
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1145/3377811.3380406
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore Digital Libary (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 1450371213
9781450371216
EISSN 1558-1225
EndPage 49
ExternalDocumentID 9284129
Genre orig-research
GrantInformation_xml – fundername: ONR
  grantid: N00014-17-1-2889,N00014-19-1-2318
  funderid: 10.13039/100000006
GroupedDBID -~X
.4S
.DC
123
23M
29O
5VS
6IE
6IF
6IH
6IK
6IL
6IM
6IN
8US
AAJGR
AAWTH
ABLEC
ADZIZ
AFFNX
ALMA_UNASSIGNED_HOLDINGS
APO
ARCSS
AVWKF
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
EDO
FEDTE
I-F
I07
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
XOL
ID FETCH-LOGICAL-a382t-2c979ab2e2fecb43d1881d1c62fab8ca8ae568e9c7b6883a387247f80e8bf6fd3
IEDL.DBID RIE
ISICitedReferencesCount 40
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000652529800004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:32:58 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a382t-2c979ab2e2fecb43d1881d1c62fab8ca8ae568e9c7b6883a387247f80e8bf6fd3
OpenAccessLink https://dl.acm.org/doi/pdf/10.1145/3377811.3380406
PageCount 12
ParticipantIDs ieee_primary_9284129
PublicationCentury 2000
PublicationDate 2020-06-27
PublicationDateYYYYMMDD 2020-06-27
PublicationDate_xml – month: 06
  year: 2020
  text: 2020-06-27
  day: 27
PublicationDecade 2020
PublicationTitle 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE)
PublicationTitleAbbrev ICSE
PublicationYear 2020
Publisher ACM
Publisher_xml – name: ACM
SSID ssj0006499
ssj0002870079
Score 2.379887
Snippet With the growing use of DevOps tools and frameworks, there is an increased need for tools and techniques that support more than code. The current...
SourceID ieee
SourceType Publisher
StartPage 38
SubjectTerms DevOps
Docker
Mining
Static Checking
Title Learning from, Understanding, and Supporting DevOps Artifacts for Docker
URI https://ieeexplore.ieee.org/document/9284129
WOSCitedRecordID wos000652529800004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1JT0IxEG6QePCECsY9PXikwGv7uhyNSjgY5CCGG2nLlBgTIGy_3-njiZp48dQladJ0mfnamW-GkLsowCkbBDNSAJNWKmaMiMxbsN4Il8fCy_ftWff7ZjSygwpp7rkwAFA4n0ErVQtb_mQeNumrrG1RlqJ-OiAHWqsdV2v_n5IMdp1kcSqlsEIoX4byyWTeFkInTmULX2R4bNWvXCqFKunW_jeJY9L45uTRwV7bnJAKzE5J7SspAy3vaJ30yoipU5qYI006_MleaVIsacrkOU_RA6b0EbYvixW9x1aiOKwoYliKiucDlg0y7D69PvRYmS6BOWH4mvFgtXWeA48QvBSTzCAYzYLi0XkTnHGQKwM2aK9wP3CQ5lJH0wHjo4oTcUaqs_kMzgkVKveASIqbaCSKRJ8Zh8BOuhzFoZPugtTTwowXu4gY43JNLv_uviJHPL1SO4pxfU2q6-UGbshh2K7fV8vbYhs_AVLynYg
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEJ4gmugJFYxve_DIwtJ2u-3RqAQjIgcw3EhbpsSYAOH1-22XFTXx4qmPpEnTx8zXznwzALeOoRbKskhyhhFXXERSMhcZhcpIphOXefm-tdNORw4GqluA6pYLg4iZ8xnWQjWz5Y-mdhW-yurKy1Kvn3ZgN-Gcxhu21vZHJZjs4mBzyuWw8GA-D-bT4EmdsTSwKmv-TeYPrviVTSVTJs3S_6ZxCJVvVh7pbvXNERRwcgylr7QMJL-lZWjlMVPHJHBHqqT_k79SJb4kIZfnNMQPGJMHXL_OFuTOtwLJYUE8iiVe9XzgvAL95mPvvhXlCRMizSRdRtSqVGlDkTq0hrNRQ3o42rCCOm2k1VJjIiQqmxrhd8QPSilPnYxRGifciJ1AcTKd4CkQJhKDHktR6ST3QtE0pPbQjuvEC0TN9RmUw8IMZ5uYGMN8Tc7_7r6B_VbvpT1sP3WeL-CAhjdrLCKaXkJxOV_hFezZ9fJ9Mb_OtvQTtO-gzw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2020+IEEE%2FACM+42nd+International+Conference+on+Software+Engineering+%28ICSE%29&rft.atitle=Learning+from%2C+Understanding%2C+and+Supporting+DevOps+Artifacts+for+Docker&rft.au=Henkel%2C+Jordan&rft.au=Bird%2C+Christian&rft.au=Lahiri%2C+Shuvendu+K.&rft.au=Reps%2C+Thomas&rft.date=2020-06-27&rft.pub=ACM&rft.eissn=1558-1225&rft.spage=38&rft.epage=49&rft_id=info:doi/10.1145%2F3377811.3380406&rft.externalDocID=9284129