Classification of APIs by Hierarchical Clustering

APIs can be classified according to the programming domains (e.g., GUIs, databases, collections, or security) that they address. Such classification is vital in searching repositories (e.g., the Maven Central Repository for Java) and for understanding the technology stack used in software projects....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC) S. 233 - 243
Hauptverfasser: Hartel, Johannes, Aksu, Hakan, Lammel, Ralf
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: ACM 01.05.2018
Schlagworte:
ISSN:2643-7171
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract APIs can be classified according to the programming domains (e.g., GUIs, databases, collections, or security) that they address. Such classification is vital in searching repositories (e.g., the Maven Central Repository for Java) and for understanding the technology stack used in software projects. We apply hierarchical clustering to a curated suite of Java APIs to compare the computed API clusters with preexisting API classifications. Clustering entails various parameters (e.g., the choice of IDF versus LSI versus LDA). We describe the corresponding variability in terms of a feature model. We exercise all possible con gurations to determine the maximum correlation with respect to two baselines: i) a smaller suite of APIs manually classified in previous research; ii) a larger suite of APIs from the Maven Central Repository, thereby taking advantage of crowd-sourced classification while relying on a threshold-based approach for identifying important APIs and versions thereof, subject to an API dependency analysis on GitHub. We discuss the configurations found in this way and we examine the influence of particular features on the correlation between computed clusters and baselines. To this end, we also leverage interactive exploration of the parameter space and the resulting dendrograms. In this manner, we can also identify issues with the use of classifiers (e.g., missing classifiers) in the baselines and limitations of the clustering approach.
AbstractList APIs can be classified according to the programming domains (e.g., GUIs, databases, collections, or security) that they address. Such classification is vital in searching repositories (e.g., the Maven Central Repository for Java) and for understanding the technology stack used in software projects. We apply hierarchical clustering to a curated suite of Java APIs to compare the computed API clusters with preexisting API classifications. Clustering entails various parameters (e.g., the choice of IDF versus LSI versus LDA). We describe the corresponding variability in terms of a feature model. We exercise all possible con gurations to determine the maximum correlation with respect to two baselines: i) a smaller suite of APIs manually classified in previous research; ii) a larger suite of APIs from the Maven Central Repository, thereby taking advantage of crowd-sourced classification while relying on a threshold-based approach for identifying important APIs and versions thereof, subject to an API dependency analysis on GitHub. We discuss the configurations found in this way and we examine the influence of particular features on the correlation between computed clusters and baselines. To this end, we also leverage interactive exploration of the parameter space and the resulting dendrograms. In this manner, we can also identify issues with the use of classifiers (e.g., missing classifiers) in the baselines and limitations of the clustering approach.
Author Hartel, Johannes
Lammel, Ralf
Aksu, Hakan
Author_xml – sequence: 1
  givenname: Johannes
  surname: Hartel
  fullname: Hartel, Johannes
  organization: University of Koblenz-Landau
– sequence: 2
  givenname: Hakan
  surname: Aksu
  fullname: Aksu, Hakan
  organization: University of Koblenz-Landau
– sequence: 3
  givenname: Ralf
  surname: Lammel
  fullname: Lammel, Ralf
  organization: University of Koblenz-Landau
BookMark eNotjjFPwzAUhA0CibZ0ZmDxH0jx87Nje6wioJUqwQBz9ew4YBQSZIeh_54I0A2fdCfd3ZJdDOMQGbsBsQFQ-g7B1Shh80ulzthydgVqA8qes4WsFVYGDFyxdSkfQgiUApUxCwZNT6WkLgWa0jjwsePb533h_sR3KWbK4X2Oet7032WKOQ1v1-yyo77E9T9X7PXh_qXZVYenx32zPVQklZkqa4BoHmm1115EC044AAhdIFAutDQrSKqVJZzP6rr1PnonCSmQ6wBX7PavN8UYj185fVI-Ha0zKBDwB8c3RT4
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/3196321.3196344
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 1450357148
9781450357142
EISSN 2643-7171
EndPage 243
ExternalDocumentID 8973031
Genre orig-research
GroupedDBID 6IE
6IF
6IL
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-a247t-871aa034d5b5b0e81909111cfca149cdadadc2a648a314556dbbeb92a3aca9f13
IEDL.DBID RIE
ISICitedReferencesCount 7
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000555427300023&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 13 06:23:00 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a247t-871aa034d5b5b0e81909111cfca149cdadadc2a648a314556dbbeb92a3aca9f13
PageCount 11
ParticipantIDs ieee_primary_8973031
PublicationCentury 2000
PublicationDate 2018-May
PublicationDateYYYYMMDD 2018-05-01
PublicationDate_xml – month: 05
  year: 2018
  text: 2018-May
PublicationDecade 2010
PublicationTitle 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC)
PublicationTitleAbbrev ICPC
PublicationYear 2018
Publisher ACM
Publisher_xml – name: ACM
SSID ssj0003203477
ssj0002869941
Score 2.0865493
Snippet APIs can be classified according to the programming domains (e.g., GUIs, databases, collections, or security) that they address. Such classification is vital...
SourceID ieee
SourceType Publisher
StartPage 233
SubjectTerms APIs
Clustering exploration
Computational modeling
Correlation
GitHub
Hierarchical clustering. Feature modeling
Java
Large scale integration
Maven Central Repository
Programming
Security
Software
Software development management
Title Classification of APIs by Hierarchical Clustering
URI https://ieeexplore.ieee.org/document/8973031
WOSCitedRecordID wos000555427300023&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3LSsNAFL3U4sJV1VZ8MwuXtk0mk0xmKcVSQUoXCt2VOy8olFb6EPx7507SiuBGskhmFiHzyJybyT3nADwEFLMuxAHdUhuyMMtIAzIMSIiQdKKt5aZyLXmV43E5napJAx4PXBjnXEw-cz26jP_y7crsaKusX6owH4k0fSSlrLhah_0UXhZqz8mkcsaTTEhZq_mkIu_HycbTXjwL8ctOJaLJsPW_5ziFzg8tj00OgHMGDbc8h9bel4HVr2kb0uh0STlAsdvZyrOnycuG6S82mhPhOPqfLNhgsSOZhHCvDrwPn98Go25tjdBFLuQ2rGEpYmigzXWuE0ewTquW8QbDJ4-xGA7DsRAlZiRFXlitnVYcMzSofJpdQHO5WrpLYD7JufEYcN8XAp1DQfxTm0ijuFTKX0GbemD2UalfzOrGX_9dfQMnIaQoq5TAW2hu1zt3B8fmczvfrO_jkH0DZlOVgA
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3LasJAFL2ILbQr22rpu7PosmoymTxmWaQSaSouLLiTeYIgKj4K_fvOnURLoZuSRTKzCJlH5txM7jkH4MmhmDYuDmhnUqGFWYQakG5AXIQkA6k1VaVrSZEOh9lkwkc1eD5wYYwxPvnMdPDS_8vXS7XDrbJuxt18RNL0UcwYDUu21mFHhWYJ37MysRzRIGJpWun5hCzu-ulGw44_M_bLUMXjSb_xvyc5g9YPMY-MDpBzDjWzuIDG3pmBVC9qE0LvdYlZQL7jydKSl9FgQ-QXyWdIOfYOKHPSm-9QKMHdqwUf_ddxL29X5ghtQVm6datYKIRroI5lLAODwI7rlrJKuI8epYU7FBUJy0SEYuSJltJITkUklOA2jC6hvlguzBUQG8RUWeGQ3yZMGCMYMlB1kCpOU87tNTSxB6arUv9iWjX-5u_qRzjJx-_FtBgM327h1AUYWZkgeAf17Xpn7uFYfW5nm_WDH75vYESYxw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2018+IEEE%2FACM+26th+International+Conference+on+Program+Comprehension+%28ICPC%29&rft.atitle=Classification+of+APIs+by+Hierarchical+Clustering&rft.au=Hartel%2C+Johannes&rft.au=Aksu%2C+Hakan&rft.au=Lammel%2C+Ralf&rft.date=2018-05-01&rft.pub=ACM&rft.eissn=2643-7171&rft.spage=233&rft.epage=243&rft_id=info:doi/10.1145%2F3196321.3196344&rft.externalDocID=8973031