The Python Software Quality Dataset

With Python's ascension as a dominant program-ming language, particularly in the fields of artificial intelligence and data science, the need for comprehensive datasets focusing on software quality within Python projects has become increasingly noticeable. This study introduces a detailed datas...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings (EUROMICRO Conference on Software Engineering and Advanced Applications. Online) pp. 395 - 398
Main Authors: Moldovan, Vasilica-Andreea, Berciu, Liviu-Marian, Patcas, Rares-Danut
Format: Conference Proceeding
Language:English
Published: IEEE 28.08.2024
Subjects:
ISSN:2376-9521
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract With Python's ascension as a dominant program-ming language, particularly in the fields of artificial intelligence and data science, the need for comprehensive datasets focusing on software quality within Python projects has become increasingly noticeable. This study introduces a detailed dataset designed to address this gap, enriching academic resources in software engineering. The dataset encompasses a wide array of software quality metrics on up to 80 projects, including 51.765.853 Sonar-Qube issues, 268.506 SonarQube code quality metrics, 11.915 software refactoring records, and 155.127 pairs of bug-inducing and bug-fixing commits, along with 863.931 GitHub issue tracker entries. This extensive collection serves as a versatile tool for various research activities, enabling analysis of the relationships between technical debt and software refactorings, correlations be-tween refactoring processes and bug resolution, and their overall impact on software maintainability and reliability. By offering a comprehensive and multifaceted dataset, this study significantly contributes to understanding and improving software quality in Python projects.
AbstractList With Python's ascension as a dominant program-ming language, particularly in the fields of artificial intelligence and data science, the need for comprehensive datasets focusing on software quality within Python projects has become increasingly noticeable. This study introduces a detailed dataset designed to address this gap, enriching academic resources in software engineering. The dataset encompasses a wide array of software quality metrics on up to 80 projects, including 51.765.853 Sonar-Qube issues, 268.506 SonarQube code quality metrics, 11.915 software refactoring records, and 155.127 pairs of bug-inducing and bug-fixing commits, along with 863.931 GitHub issue tracker entries. This extensive collection serves as a versatile tool for various research activities, enabling analysis of the relationships between technical debt and software refactorings, correlations be-tween refactoring processes and bug resolution, and their overall impact on software maintainability and reliability. By offering a comprehensive and multifaceted dataset, this study significantly contributes to understanding and improving software quality in Python projects.
Author Moldovan, Vasilica-Andreea
Berciu, Liviu-Marian
Patcas, Rares-Danut
Author_xml – sequence: 1
  givenname: Vasilica-Andreea
  surname: Moldovan
  fullname: Moldovan, Vasilica-Andreea
  email: vasilica.moldovan@ubbcluj.ro
  organization: Babes-Bolyai University,Faculty of Mathematics and Informatics,Cluj-Napoca,Romania
– sequence: 2
  givenname: Liviu-Marian
  surname: Berciu
  fullname: Berciu, Liviu-Marian
  email: liviu.berciu@ubbcluj.ro
  organization: Babes-Bolyai University,Faculty of Mathematics and Informatics,Cluj-Napoca,Romania
– sequence: 3
  givenname: Rares-Danut
  surname: Patcas
  fullname: Patcas, Rares-Danut
  email: rares.patcas@ubbcluj.ro
  organization: Babes-Bolyai University,Faculty of Mathematics and Informatics,Cluj-Napoca,Romania
BookMark eNotzMFKw0AQgOFVFKw1b9BDwHPi7Ex2snsMtVWhoNJ6LpvuhEZqIsmK5O0V9PRfPv5rddH1nSi10JBrDe5uu6oqLtCZHAGLHACYz1TiSmfJAFlAxnM1Qyo5cwb1lUrG8f2XESGUZGfqdneU9GWKx75Lt30Tv_0g6euXP7VxSu999KPEG3XZ-NMoyX_n6m292i0fs83zw9Oy2mStBo5ZaCTU-iCsa3QuAKIvGB0cSqipYR0MI9qGjSALOy8haOuB2LIEMkxztfj7tiKy_xzaDz9Mew0WqMCSfgCP4kF_
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/SEAA64295.2024.00066
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798350380262
EISSN 2376-9521
EndPage 398
ExternalDocumentID 10803427
Genre orig-research
GroupedDBID 6IE
6IL
6IN
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-i106t-dfedb1ce61b299d022a46290c70b3f61d56228f65e26e69aedd18a03686ed3563
IEDL.DBID RIE
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001413352200056&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:24:08 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i106t-dfedb1ce61b299d022a46290c70b3f61d56228f65e26e69aedd18a03686ed3563
PageCount 4
ParticipantIDs ieee_primary_10803427
PublicationCentury 2000
PublicationDate 2024-Aug.-28
PublicationDateYYYYMMDD 2024-08-28
PublicationDate_xml – month: 08
  year: 2024
  text: 2024-Aug.-28
  day: 28
PublicationDecade 2020
PublicationTitle Proceedings (EUROMICRO Conference on Software Engineering and Advanced Applications. Online)
PublicationTitleAbbrev SEAA
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003320738
Score 1.8812252
Snippet With Python's ascension as a dominant program-ming language, particularly in the fields of artificial intelligence and data science, the need for comprehensive...
SourceID ieee
SourceType Publisher
StartPage 395
SubjectTerms Computer bugs
Correlation
Data science
Focusing
Github Mining
Measurement
Python
Python Dataset
Refactoring
Software development management
Software engineering
Software Metrics
Software quality
Software reliability
SonarQube Mining
SZZ
Title The Python Software Quality Dataset
URI https://ieeexplore.ieee.org/document/10803427
WOSCitedRecordID wos001413352200056&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEJ0I8eAJPzB-p4leV7rtbrc9EoV4IiRowo1022nCBQwuGv6902VBLx68Nb007bR9M9O-NwAPuXFCepcnzhlNAYrCRJuMJ9RVkH9gCRJsXWyiGI30dGrGDVm95sIgYv35DB9js37L90u3jqmyXvwPJzNRtKBVFGpL1tonVKQUtF11Q49LuelNBv0-udcmpzBQRJFsHrUQfxVRqTFk2Pnn6MfQ_WHjsfEeZ07gABen0NmVY2DN6TyDezI5G2-iGACb0O36ZVfIthoZG_ZsK8Krqgtvw8Hr00vS1EBI5hSsVYkP6MvUoUpLAg5PiGszJQx3BS9lUKkn_0XooHIUCpWx6H2qLcGSVuhlruQ5tBfLBV4AwwyDDIEbTH2GuiwDnT6yieDeO5mFS-jGSc_etzIXs918r_7ov4ajuK4xwSr0DbSr1Rpv4dB9VvOP1V1tnG9SpY8S
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEG0UTfSEHxi_3USvK922222PRCEYkZCACTfSbaeJFzC4aPj3TpcFvXjw1vTStNP2zUz73hByl2rLuLNpbK1WGKBIiJUWNMauDP0Dg5BgymITWb-vxmM9qMjqJRcGAMrPZ3AfmuVbvpvZRUiVNcN_OC5Ytk12UiEYXdG1NikVzhluWFUR5BKqm8N2q4UOtk4xEGRBJpsGNcRfZVRKFOnU_zn-AWn88PGiwQZpDskWTI9IfV2QIarO5zG5RaNHg2WQA4iGeL9-mTlEK5WMZfRoCkSsokFeO-3RQzeuqiDEbxiuFbHz4PLEgkxyhA6HmGuEZJrajObcy8ShB8OUlykwCVIbcC5RBoFJSXA8lfyE1KazKZySCAR47j3VkDgBKs89nj-0CqPOWS78GWmESU_eV0IXk_V8z__ovyF73dFLb9J76j9fkP2wxiHdytQlqRXzBVyRXftZvH3Mr0tDfQP3NZJZ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28EUROMICRO+Conference+on+Software+Engineering+and+Advanced+Applications.+Online%29&rft.atitle=The+Python+Software+Quality+Dataset&rft.au=Moldovan%2C+Vasilica-Andreea&rft.au=Berciu%2C+Liviu-Marian&rft.au=Patcas%2C+Rares-Danut&rft.date=2024-08-28&rft.pub=IEEE&rft.eissn=2376-9521&rft.spage=395&rft.epage=398&rft_id=info:doi/10.1109%2FSEAA64295.2024.00066&rft.externalDocID=10803427