Applying agglomerative hierarchical clustering algorithms to component identification for legacy systems

Component identification, the process of evolving legacy system into finely organized component-based software systems, is a critical part of software reengineering. Currently, many component identification approaches have been developed based on agglomerative hierarchical clustering algorithms. How...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Information and software technology Ročník 53; číslo 6; s. 601 - 614
Hlavní autoři:	Cui, Jian Feng, Chae, Heung Seok
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Amsterdam Elsevier B.V 01.06.2011 Elsevier Science Ltd
Témata:	Agglomeration Agglomerative hierarchical clustering algorithm Algorithms Cluster analysis Clustering Cohesion Component identification Computer programs Criteria Joining Legacy systems Similarity Similarity measure Similarity measures Software Software engineering Software reengineering Studies Weighting scheme Weighting scheme Legacy systems Similarity measure Software reengineering Agglomerative hierarchical clustering algorithm Component identification
ISSN:	0950-5849, 1873-6025
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Component identification, the process of evolving legacy system into finely organized component-based software systems, is a critical part of software reengineering. Currently, many component identification approaches have been developed based on agglomerative hierarchical clustering algorithms. However, there is a lack of thorough investigation on which algorithm is appropriate for component identification. This paper focuses on analyzing agglomerative hierarchical clustering algorithms in software reengineering, and then identifying their respective strengths and weaknesses in order to apply them effectively for future practical applications. A series of experiments were conducted for 18 clustering strategies combined according to various similarity measures, weighting schemes and linkage methods. Eleven subject systems with different application domains and source code sizes were used in the experiments. The component identification results are evaluated by the proposed size, coupling and cohesion criteria. The experimental results suggested that the employed similarity measures, weighting schemes and linkage methods can have various effects on component identification results with respect to the proposed size, coupling and cohesion criteria, so the hierarchical clustering algorithms produced quite different clustering results. According to the experimental results, it can be concluded that it is difficult to produce perfectly satisfactory results for a given clustering algorithm. Nevertheless, these algorithms demonstrated varied capabilities to identify components with respect to the proposed size, coupling and cohesion criteria.
Bibliografie:	SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23
ISSN:	0950-5849 1873-6025
DOI:	10.1016/j.infsof.2011.01.006