Predicting software reuse using machine learning techniques—A case study on open-source Java software systems.

Uloženo v:
Podrobná bibliografie
Název: Predicting software reuse using machine learning techniques—A case study on open-source Java software systems.
Autoři: Yeow, Matthew Yit Hang, Chong, Chun Yong, Lim, Mei Kuan, Yee Yen, Yuen
Zdroj: PLoS ONE; 2/13/2025, Vol. 20 Issue 2, p1-30, 30p
Témata: MACHINE learning, SYSTEMS software, INDUSTRIAL costs, REGRESSION analysis, MAINTAINABILITY (Engineering), SOFTWARE measurement
Abstrakt: Software reuse is an essential practice to increase efficiency and reduce costs in software production. Software reuse practices range from reusing artifacts, libraries, components, packages, and APIs. Identifying suitable software for reuse requires pinpointing potential candidates. However, there are no objective methods in place to measure software reuse. This makes it challenging to identify highly reusable software. Software reuse research mainly addresses two hurdles: 1) identifying reusable candidates effectively and efficiently, and 2) selecting high-quality software components that improve maintainability and extensibility. This paper proposes automating software reuse prediction by leveraging machine learning (ML) algorithms, enabling future research and practitioners to better identify highly reusable software. Our approach uses cross-project code clone detection to establish the ground truth for software reuse, identifying code clones across popular GitHub projects as indicators of potential reuse candidates. Software metrics were extracted from Maven artifacts and used to train classification and regression models to predict and estimate software reuse. The average F1-score of the ML classification models is 77.19%. The best-performing model, Ridge Regression, achieved an F1-score of 79.17%. Additionally, this research aims to assist developers by identifying key metrics that significantly impact software reuse. Our findings suggest that the file-level PUA (Public Undocumented API) metric is the most important factor influencing software reuse. We also present suitable value ranges for the top five important metrics that developers can follow to create highly reusable software. Furthermore, we developed a tool that utilizes the trained models to predict the reuse potential of existing GitHub projects and rank Maven artifacts by their domain. [ABSTRACT FROM AUTHOR]
Copyright of PLoS ONE is the property of Public Library of Science and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Databáze: Complementary Index
FullText Text:
  Availability: 0
CustomLinks:
  – Url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=search&db=pmc&term=1932-6203[TA]+AND+1[PG]+AND+2025[PDAT]
    Name: FREE - PubMed Central (ISSN based link)
    Category: fullText
    Text: Full Text
    Icon: https://imageserver.ebscohost.com/NetImages/iconPdf.gif
    MouseOverText: Check this PubMed for the article full text.
  – Url: https://resolver.ebscohost.com/openurl?sid=EBSCO:edb&genre=article&issn=19326203&ISBN=&volume=20&issue=2&date=20250213&spage=1&pages=1-30&title=PLoS ONE&atitle=Predicting%20software%20reuse%20using%20machine%20learning%20techniques%E2%80%94A%20case%20study%20on%20open-source%20Java%20software%20systems.&aulast=Yeow%2C%20Matthew%20Yit%20Hang&id=DOI:10.1371/journal.pone.0314512
    Name: Full Text Finder
    Category: fullText
    Text: Full Text Finder
    Icon: https://imageserver.ebscohost.com/branding/images/FTF.gif
    MouseOverText: Full Text Finder
  – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=Yeow%20MYH
    Name: ISI
    Category: fullText
    Text: Nájsť tento článok vo Web of Science
    Icon: https://imagesrvr.epnet.com/ls/20docs.gif
    MouseOverText: Nájsť tento článok vo Web of Science
Header DbId: edb
DbLabel: Complementary Index
An: 183031078
RelevancyScore: 1023
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 1023.06793212891
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Predicting software reuse using machine learning techniques—A case study on open-source Java software systems.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Yeow%2C+Matthew+Yit+Hang%22">Yeow, Matthew Yit Hang</searchLink><br /><searchLink fieldCode="AR" term="%22Chong%2C+Chun+Yong%22">Chong, Chun Yong</searchLink><br /><searchLink fieldCode="AR" term="%22Lim%2C+Mei+Kuan%22">Lim, Mei Kuan</searchLink><br /><searchLink fieldCode="AR" term="%22Yee+Yen%2C+Yuen%22">Yee Yen, Yuen</searchLink>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: PLoS ONE; 2/13/2025, Vol. 20 Issue 2, p1-30, 30p
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22MACHINE+learning%22">MACHINE learning</searchLink><br /><searchLink fieldCode="DE" term="%22SYSTEMS+software%22">SYSTEMS software</searchLink><br /><searchLink fieldCode="DE" term="%22INDUSTRIAL+costs%22">INDUSTRIAL costs</searchLink><br /><searchLink fieldCode="DE" term="%22REGRESSION+analysis%22">REGRESSION analysis</searchLink><br /><searchLink fieldCode="DE" term="%22MAINTAINABILITY+%28Engineering%29%22">MAINTAINABILITY (Engineering)</searchLink><br /><searchLink fieldCode="DE" term="%22SOFTWARE+measurement%22">SOFTWARE measurement</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: Software reuse is an essential practice to increase efficiency and reduce costs in software production. Software reuse practices range from reusing artifacts, libraries, components, packages, and APIs. Identifying suitable software for reuse requires pinpointing potential candidates. However, there are no objective methods in place to measure software reuse. This makes it challenging to identify highly reusable software. Software reuse research mainly addresses two hurdles: 1) identifying reusable candidates effectively and efficiently, and 2) selecting high-quality software components that improve maintainability and extensibility. This paper proposes automating software reuse prediction by leveraging machine learning (ML) algorithms, enabling future research and practitioners to better identify highly reusable software. Our approach uses cross-project code clone detection to establish the ground truth for software reuse, identifying code clones across popular GitHub projects as indicators of potential reuse candidates. Software metrics were extracted from Maven artifacts and used to train classification and regression models to predict and estimate software reuse. The average F1-score of the ML classification models is 77.19%. The best-performing model, Ridge Regression, achieved an F1-score of 79.17%. Additionally, this research aims to assist developers by identifying key metrics that significantly impact software reuse. Our findings suggest that the file-level PUA (Public Undocumented API) metric is the most important factor influencing software reuse. We also present suitable value ranges for the top five important metrics that developers can follow to create highly reusable software. Furthermore, we developed a tool that utilizes the trained models to predict the reuse potential of existing GitHub projects and rank Maven artifacts by their domain. [ABSTRACT FROM AUTHOR]
– Name: Abstract
  Label:
  Group: Ab
  Data: <i>Copyright of PLoS ONE is the property of Public Library of Science and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edb&AN=183031078
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1371/journal.pone.0314512
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 30
        StartPage: 1
    Subjects:
      – SubjectFull: MACHINE learning
        Type: general
      – SubjectFull: SYSTEMS software
        Type: general
      – SubjectFull: INDUSTRIAL costs
        Type: general
      – SubjectFull: REGRESSION analysis
        Type: general
      – SubjectFull: MAINTAINABILITY (Engineering)
        Type: general
      – SubjectFull: SOFTWARE measurement
        Type: general
    Titles:
      – TitleFull: Predicting software reuse using machine learning techniques—A case study on open-source Java software systems.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Yeow, Matthew Yit Hang
      – PersonEntity:
          Name:
            NameFull: Chong, Chun Yong
      – PersonEntity:
          Name:
            NameFull: Lim, Mei Kuan
      – PersonEntity:
          Name:
            NameFull: Yee Yen, Yuen
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 13
              M: 02
              Text: 2/13/2025
              Type: published
              Y: 2025
          Identifiers:
            – Type: issn-print
              Value: 19326203
          Numbering:
            – Type: volume
              Value: 20
            – Type: issue
              Value: 2
          Titles:
            – TitleFull: PLoS ONE
              Type: main
ResultId 1