Detecting and Characterizing Bots that Commit Code

Background: Some developer activity traditionally performed manually, such as making code commits, opening, managing, or closing issues is increasingly subject to automation in many OSS projects. Specifically, such activity is often performed by tools that react to events or run at specific times. W...

Full description

Saved in:
Bibliographic Details
Published in:2020 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR) pp. 209 - 219
Main Authors: Dey, Tapajit, Mousavi, Sara, Ponce, Eduardo, Fry, Tanner, Vasilescu, Bogdan, Filippova, Anna, Mockus, Audris
Format: Conference Proceeding
Language:English
Published: ACM 01.05.2020
Subjects:
ISSN:2574-3864
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Background: Some developer activity traditionally performed manually, such as making code commits, opening, managing, or closing issues is increasingly subject to automation in many OSS projects. Specifically, such activity is often performed by tools that react to events or run at specific times. We refer to such automation tools as bots and, in many software mining scenarios related to developer productivity or code quality, it is desirable to identify bots in order to separate their actions from actions of individuals. Aim: Find an automated way of identifying bots and code committed by these bots, and to characterize the types of bots based on their activity patterns. Method and Result: We propose BIMAN, a systematic approach to detect bots using author names, commit messages, files modified by the commit, and projects associated with the commits. For our test data, the value for AUC-ROC was 0.9. We also characterized these bots based on the time patterns of their code commits and the types of files modified, and found that they primarily work with documentation files and web pages, and these files are most prevalent in HTML and JavaScript ecosystems. We have compiled a shareable dataset containing detailed information about 461 bots we found (all of which have more than 1000 commits) and 13,762,430 commits they created.
AbstractList Background: Some developer activity traditionally performed manually, such as making code commits, opening, managing, or closing issues is increasingly subject to automation in many OSS projects. Specifically, such activity is often performed by tools that react to events or run at specific times. We refer to such automation tools as bots and, in many software mining scenarios related to developer productivity or code quality, it is desirable to identify bots in order to separate their actions from actions of individuals. Aim: Find an automated way of identifying bots and code committed by these bots, and to characterize the types of bots based on their activity patterns. Method and Result: We propose BIMAN, a systematic approach to detect bots using author names, commit messages, files modified by the commit, and projects associated with the commits. For our test data, the value for AUC-ROC was 0.9. We also characterized these bots based on the time patterns of their code commits and the types of files modified, and found that they primarily work with documentation files and web pages, and these files are most prevalent in HTML and JavaScript ecosystems. We have compiled a shareable dataset containing detailed information about 461 bots we found (all of which have more than 1000 commits) and 13,762,430 commits they created.
Author Ponce, Eduardo
Dey, Tapajit
Fry, Tanner
Vasilescu, Bogdan
Filippova, Anna
Mockus, Audris
Mousavi, Sara
Author_xml – sequence: 1
  givenname: Tapajit
  surname: Dey
  fullname: Dey, Tapajit
  email: tdey2@vols.utk.edu
  organization: The University of Tennessee,Knoxville,TN,USA
– sequence: 2
  givenname: Sara
  surname: Mousavi
  fullname: Mousavi, Sara
  email: mousavi@vols.utk.edu
  organization: The University of Tennessee,Knoxville,TN,USA
– sequence: 3
  givenname: Eduardo
  surname: Ponce
  fullname: Ponce, Eduardo
  email: eponcemo@utk.edu
  organization: The University of Tennessee,Knoxville,TN,USA
– sequence: 4
  givenname: Tanner
  surname: Fry
  fullname: Fry, Tanner
  email: tfry2@vols.utk.edu
  organization: The University of Tennessee,Knoxville,TN,USA
– sequence: 5
  givenname: Bogdan
  surname: Vasilescu
  fullname: Vasilescu, Bogdan
  email: vasilescu@cmu.edu
  organization: Carnegie Mellon University,Pittsburgh,PA,USA
– sequence: 6
  givenname: Anna
  surname: Filippova
  fullname: Filippova, Anna
  email: annafil@github.com
  organization: Github,San Francisco,CA,USA
– sequence: 7
  givenname: Audris
  surname: Mockus
  fullname: Mockus, Audris
  email: audris@utk.edu
  organization: The University of Tennessee,Knoxville,TN,USA
BookMark eNotjrtOw0AQRRcEEklITUPhH3CY2dnHbAnmFSlSGqijxTshRthG9jbw9TiC6khHV1dnrs66vhOlrhBWiMbeEPlgg18RsTeeT9R8skDeoodTNdPWm5LYmQu1HMcPANBsmTzPlL6XLHVuuvcidqmoDnGIdZah-Tmquz6PRT7EXFR92zZHJLlU5_v4Ocrynwv1-vjwUj2Xm-3TurrdlFF7l0vSQe-NS-DgzU5xDFMqMDIHIUK0bCOKE3EpAokFGyTVIYKGaU6aFur677cRkd3X0LRx-N4hoGGPjn4B4YVDXg
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/3379597.3387478
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1450375170
9781450375177
EISSN 2574-3864
EndPage 219
ExternalDocumentID 10148716
Genre orig-research
GrantInformation_xml – fundername: NSF
  grantid: CNS-1925615,IIS-1633437,IIS-1901102,1717415,1901311
  funderid: 10.13039/100000001
GroupedDBID 6IE
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-a276t-3292f46d060b559780114081889e3311585a1e6ee6da03e5059edc9a020559323
IEDL.DBID RIE
ISICitedReferencesCount 28
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001017777500023&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:21:39 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a276t-3292f46d060b559780114081889e3311585a1e6ee6da03e5059edc9a020559323
PageCount 11
ParticipantIDs ieee_primary_10148716
PublicationCentury 2000
PublicationDate 2020-May
PublicationDateYYYYMMDD 2020-05-01
PublicationDate_xml – month: 05
  year: 2020
  text: 2020-May
PublicationDecade 2020
PublicationTitle 2020 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR)
PublicationTitleAbbrev MSR
PublicationYear 2020
Publisher ACM
Publisher_xml – name: ACM
SSID ssj0002858378
ssj0003211714
Score 2.1018267
Snippet Background: Some developer activity traditionally performed manually, such as making code commits, opening, managing, or closing issues is increasingly subject...
SourceID ieee
SourceType Publisher
StartPage 209
SubjectTerms automated commits
Automation
bots
Chatbots
Codes
ensemble model
Productivity
random forest
social coding platforms
Social networking (online)
software engineering
Systematics
Web pages
Title Detecting and Characterizing Bots that Commit Code
URI https://ieeexplore.ieee.org/document/10148716
WOSCitedRecordID wos001017777500023&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV2_TwMhFCa2cXBSY42_c4MrLQccHKvVxsE0HdR0a97Bu9jlatqrg3-99-h5xsHBCXhhgBDC-4Dv-xi7BauDCAZ45jLBtQ7AC1M67gSYUIZUl9Hr8PXJTqf5fO5mLVk9cmEQMX4-wyFV41t-WPktXZWNyFeWEvwe61lrd2St7kJF5hmJo3dt1UAbm-pWzifV2UgpMta2wwaVkWr8Lz-VeJxMDv85kCM2-CHmJbPuyDlme1idMHmP9BLQBBKoQjLuJJg_KXS3qjdJ_QZ1QlyQJRUBB-xl8vA8fuStFQIHaU3NlXSy1CYIIwrCADnhGFKjyx0qEszJM0jRIJoAQmGT1jgM3kGTDDbdlVSnrF-tKjxjCfrcA4ApUnTaCwcgCy2bbZl646zPztmAJrx436ldLL7nevFH_JIdSMKg8RPgFevX6y1es33_US8365u4Rl_wC46R
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWgIMEEiCK-ycCaNvFX4pVCVUSpOhTUrbrYF9ElQW3KwK_Hl5YgBgYm2ycPtizL92y_9xi7hUS6yGkIlVFRKKWDMNO5CU0E2uUulnntdfg6TEajdDo14w1ZvebCIGL9-Qw7VK3f8l1pV3RV1iVfWUrwt9mOkpLHa7pWc6XCU0Xy6E1beHCTxHIj6BNL1RWCrLWTjsdlpBv_y1GlPlD6B_8cyiFr_1DzgnFz6ByxLSyOGb9HegvwgQAKF_QaEeZPCt2V1TKo3qAKiA0yp8Jhm730Hya9QbgxQwiBJ7oKBTc8l9pFOsoIBaSEZEiPLjUoSDInVRCjRtQOIoE-sTHorAGfDvrugosT1irKAk9ZgDa1AKCzGI20kQHgmeR-Y8ZWm8SqM9amCc_e13oXs--5nv8Rv2F7g8nzcDZ8HD1dsH1OiLT-EnjJWtVihVds135U8-Xiul6vL5MBkdg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2020+IEEE%2FACM+17th+International+Conference+on+Mining+Software+Repositories+%28MSR%29&rft.atitle=Detecting+and+Characterizing+Bots+that+Commit+Code&rft.au=Dey%2C+Tapajit&rft.au=Mousavi%2C+Sara&rft.au=Ponce%2C+Eduardo&rft.au=Fry%2C+Tanner&rft.date=2020-05-01&rft.pub=ACM&rft.eissn=2574-3864&rft.spage=209&rft.epage=219&rft_id=info:doi/10.1145%2F3379597.3387478&rft.externalDocID=10148716