Accelerating similarity-based model matching using dual hashing.

Uloženo v:
Podrobná bibliografie
Název: Accelerating similarity-based model matching using dual hashing.
Autoři: He, Xiao, Liu, Yi, He, Huihong
Zdroj: Software & Systems Modeling; Aug2025, Vol. 24 Issue 4, p1011-1034, 24p
Témata: EUCLIDEAN distance, OPTIMIZATION algorithms, MATHEMATICAL optimization, ELECTRONIC data processing
Abstrakt: Similarity-based model matching is the cornerstone of model versioning. It pairs model elements based on a distance metric (e.g., edit distance). However, calculating the distances between elements is computationally expensive. Consequently, a similarity-based matcher typically suffers from performance issues when the model size increases. Based on observation, there are two main causes of the high computation cost: (1) when matching an element p, the matcher calculates the distance between p and every candidate element q, despite the obvious dissimilarity between p and q; (2) the matcher always calculates the distance between p and q ′ , even though q and q ′ are very similar and the distance between p and q is already known. This paper proposes a dual-hash-based approach, which employs two entirely different hashing techniques—similarity-preserving hashing and integrity-based hashing—to accelerate similarity-based model matching. With similarity-preserving hashing, our approach can quickly filter out the dissimilar candidate elements according to their similarity hashes computed using our similarity-preserving hash function, which maps an element to a 64-bit binary hash. With integrity-based hashing, our approach can cache and reuse computed distance values by associating them with the checksums of model elements. We also propose an index structure to facilitate hash-based model matching. Our approach has been implemented and integrated into EMF Compare. We evaluate our approach using open-source Ecore and UML models. The results show that our hash function is effective in preserving the similarity between model elements and our matching approach reduces time costs by 20–88% while assuring the matching results consistent with EMF Compare. [ABSTRACT FROM AUTHOR]
Copyright of Software & Systems Modeling is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Databáze: Complementary Index
Popis
Abstrakt:Similarity-based model matching is the cornerstone of model versioning. It pairs model elements based on a distance metric (e.g., edit distance). However, calculating the distances between elements is computationally expensive. Consequently, a similarity-based matcher typically suffers from performance issues when the model size increases. Based on observation, there are two main causes of the high computation cost: (1) when matching an element p, the matcher calculates the distance between p and every candidate element q, despite the obvious dissimilarity between p and q; (2) the matcher always calculates the distance between p and q ′ , even though q and q ′ are very similar and the distance between p and q is already known. This paper proposes a dual-hash-based approach, which employs two entirely different hashing techniques—similarity-preserving hashing and integrity-based hashing—to accelerate similarity-based model matching. With similarity-preserving hashing, our approach can quickly filter out the dissimilar candidate elements according to their similarity hashes computed using our similarity-preserving hash function, which maps an element to a 64-bit binary hash. With integrity-based hashing, our approach can cache and reuse computed distance values by associating them with the checksums of model elements. We also propose an index structure to facilitate hash-based model matching. Our approach has been implemented and integrated into EMF Compare. We evaluate our approach using open-source Ecore and UML models. The results show that our hash function is effective in preserving the similarity between model elements and our matching approach reduces time costs by 20–88% while assuring the matching results consistent with EMF Compare. [ABSTRACT FROM AUTHOR]
ISSN:16191366
DOI:10.1007/s10270-024-01173-1