Classification of APIs by Hierarchical Clustering

APIs can be classified according to the programming domains (e.g., GUIs, databases, collections, or security) that they address. Such classification is vital in searching repositories (e.g., the Maven Central Repository for Java) and for understanding the technology stack used in software projects....

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC) s. 233 - 243
Hlavní autoři: Hartel, Johannes, Aksu, Hakan, Lammel, Ralf
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: ACM 01.05.2018
Témata:
ISSN:2643-7171
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract APIs can be classified according to the programming domains (e.g., GUIs, databases, collections, or security) that they address. Such classification is vital in searching repositories (e.g., the Maven Central Repository for Java) and for understanding the technology stack used in software projects. We apply hierarchical clustering to a curated suite of Java APIs to compare the computed API clusters with preexisting API classifications. Clustering entails various parameters (e.g., the choice of IDF versus LSI versus LDA). We describe the corresponding variability in terms of a feature model. We exercise all possible con gurations to determine the maximum correlation with respect to two baselines: i) a smaller suite of APIs manually classified in previous research; ii) a larger suite of APIs from the Maven Central Repository, thereby taking advantage of crowd-sourced classification while relying on a threshold-based approach for identifying important APIs and versions thereof, subject to an API dependency analysis on GitHub. We discuss the configurations found in this way and we examine the influence of particular features on the correlation between computed clusters and baselines. To this end, we also leverage interactive exploration of the parameter space and the resulting dendrograms. In this manner, we can also identify issues with the use of classifiers (e.g., missing classifiers) in the baselines and limitations of the clustering approach.
AbstractList APIs can be classified according to the programming domains (e.g., GUIs, databases, collections, or security) that they address. Such classification is vital in searching repositories (e.g., the Maven Central Repository for Java) and for understanding the technology stack used in software projects. We apply hierarchical clustering to a curated suite of Java APIs to compare the computed API clusters with preexisting API classifications. Clustering entails various parameters (e.g., the choice of IDF versus LSI versus LDA). We describe the corresponding variability in terms of a feature model. We exercise all possible con gurations to determine the maximum correlation with respect to two baselines: i) a smaller suite of APIs manually classified in previous research; ii) a larger suite of APIs from the Maven Central Repository, thereby taking advantage of crowd-sourced classification while relying on a threshold-based approach for identifying important APIs and versions thereof, subject to an API dependency analysis on GitHub. We discuss the configurations found in this way and we examine the influence of particular features on the correlation between computed clusters and baselines. To this end, we also leverage interactive exploration of the parameter space and the resulting dendrograms. In this manner, we can also identify issues with the use of classifiers (e.g., missing classifiers) in the baselines and limitations of the clustering approach.
Author Hartel, Johannes
Lammel, Ralf
Aksu, Hakan
Author_xml – sequence: 1
  givenname: Johannes
  surname: Hartel
  fullname: Hartel, Johannes
  organization: University of Koblenz-Landau
– sequence: 2
  givenname: Hakan
  surname: Aksu
  fullname: Aksu, Hakan
  organization: University of Koblenz-Landau
– sequence: 3
  givenname: Ralf
  surname: Lammel
  fullname: Lammel, Ralf
  organization: University of Koblenz-Landau
BookMark eNotjjFPwzAUhA0CibZ0ZmDxH0jx87Nje6wioJUqwQBz9ew4YBQSZIeh_54I0A2fdCfd3ZJdDOMQGbsBsQFQ-g7B1Shh80ulzthydgVqA8qes4WsFVYGDFyxdSkfQgiUApUxCwZNT6WkLgWa0jjwsePb533h_sR3KWbK4X2Oet7032WKOQ1v1-yyo77E9T9X7PXh_qXZVYenx32zPVQklZkqa4BoHmm1115EC044AAhdIFAutDQrSKqVJZzP6rr1PnonCSmQ6wBX7PavN8UYj185fVI-Ha0zKBDwB8c3RT4
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/3196321.3196344
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 1450357148
9781450357142
EISSN 2643-7171
EndPage 243
ExternalDocumentID 8973031
Genre orig-research
GroupedDBID 6IE
6IF
6IL
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-a247t-871aa034d5b5b0e81909111cfca149cdadadc2a648a314556dbbeb92a3aca9f13
IEDL.DBID RIE
ISICitedReferencesCount 7
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000555427300023&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 13 06:23:00 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a247t-871aa034d5b5b0e81909111cfca149cdadadc2a648a314556dbbeb92a3aca9f13
PageCount 11
ParticipantIDs ieee_primary_8973031
PublicationCentury 2000
PublicationDate 2018-May
PublicationDateYYYYMMDD 2018-05-01
PublicationDate_xml – month: 05
  year: 2018
  text: 2018-May
PublicationDecade 2010
PublicationTitle 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC)
PublicationTitleAbbrev ICPC
PublicationYear 2018
Publisher ACM
Publisher_xml – name: ACM
SSID ssj0003203477
ssj0002869941
Score 2.0865493
Snippet APIs can be classified according to the programming domains (e.g., GUIs, databases, collections, or security) that they address. Such classification is vital...
SourceID ieee
SourceType Publisher
StartPage 233
SubjectTerms APIs
Clustering exploration
Computational modeling
Correlation
GitHub
Hierarchical clustering. Feature modeling
Java
Large scale integration
Maven Central Repository
Programming
Security
Software
Software development management
Title Classification of APIs by Hierarchical Clustering
URI https://ieeexplore.ieee.org/document/8973031
WOSCitedRecordID wos000555427300023&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07a8MwED7S0KFT2ialbzR0rBPr4cgaS2hIl5ChhWxBjzMEglPygv77SrLjUuhSPNjyIFsnWXeS7_s-gKfccumYwESxQiTCMpX4KJwnWWbSwjjuMleJTcjpNJ_P1awFzw0WBhFj8hn2w2X8l-_Wdh-2yga58uMxgKZPpJQVVqvZT2H5UB0xmaHMWcqFlDWbDxXZIA42RvvxLMQvOZXoTcad_73HOfR-YHlk1jicC2hheQmdoy4DqT_TLtCodBlygKLZybogL7O3LTFfZLIMgOOof7Iio9U-0CT4unrwMX59H02SWhoh0UzInZ_DqNa-gS4z3qgY3HqYtWxhtV_yWKf9YZkeilzzQEU-dMagUUxzbbUqKL-Cdrku8RpIiLDQpFKnvmqKVNMCTWC9d_4ZVskb6AYLLD4r9otF3fjbv2_fwZkPKfIqJfAe2rvNHh_g1B52y-3mMXbZN6OblX8
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3Na8IwFH-IG2wnt-nY93LYcdXmo6Y5Dpkoc6UHB94kXwVBVPwY7L9fklbHYJfRQ5se0uYlzXtJ3-_3A3hKNeWGMBsJUrCIaSIiF4XTKElUXChDTWJKsQmeZelkIvIaPB-wMNbakHxm2_4y_Ms3S73zW2WdVLjx6EHTRwljBJdorcOOCkm7Yo_K9GVKYso4r_h8MEs6YbgR3A5nxn4JqgR_0m_8703OoPUDzEP5weWcQ80uLqCxV2ZA1YfaBBy0Ln0WUDA8WhboJR9ukPpCg5mHHAcFlDnqzXeeKMHV1YKP_uu4N4gqcYRIEsa3bhbDUroGmkQ5s1rv2P28pQst3aJHG-kOTWSXpZJ6MvKuUcoqQSSVWooC00uoL5YLewXIx1hWxVzGrmpsscSFVZ733rhnaMGvoektMF2V_BfTqvE3f99-hJPB-H00HQ2zt1s4dQFGWiYI3kF9u97ZezjWn9vZZv0Quu8bndCYxg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2018+IEEE%2FACM+26th+International+Conference+on+Program+Comprehension+%28ICPC%29&rft.atitle=Classification+of+APIs+by+Hierarchical+Clustering&rft.au=Hartel%2C+Johannes&rft.au=Aksu%2C+Hakan&rft.au=Lammel%2C+Ralf&rft.date=2018-05-01&rft.pub=ACM&rft.eissn=2643-7171&rft.spage=233&rft.epage=243&rft_id=info:doi/10.1145%2F3196321.3196344&rft.externalDocID=8973031