Implementation of Association Rule Mining Algorithms on Distributed Data Processing Platforms

Association rule mining algorithms are a frequently used data mining tecnique. It is aimed to find the items that are frequently found from the data. Nowadays, large data processing and analysis platforms are not focused on data mining, so they do not offer large-scale libraries for association rule...

Full description

Saved in:
Bibliographic Details
Published in:2019 4th International Conference on Computer Science and Engineering (UBMK) pp. 79 - 84
Main Authors: Sesver, Duygu, Tuna, Sabah, Aktas, Mehmet S., Kalipsiz, Oya, Kanli, Alper Nebi, Turgut, Umut Orcun
Format: Conference Proceeding
Language:English
Turkish
Published: IEEE 01.09.2019
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Association rule mining algorithms are a frequently used data mining tecnique. It is aimed to find the items that are frequently found from the data. Nowadays, large data processing and analysis platforms are not focused on data mining, so they do not offer large-scale libraries for association rule mining algorithms. In the scope of this research, a library has been developed for association rule mining algorithms on a large data processing platform. The Apache Spark platform has been preferred in terms of common usage for the research case study. Implementation methods of different algorithms have been implemented on this platform to benefit from the Map-Reduce programming model. In this context, Apriori, Eclat and Pascal algorithms are implemented for large data platform. The library created by the implementation method we suggest is comparatively analyzed in terms of performance metrics on big data processing platforms with single and multiple nodes. The methods implemented within the scope of the research are also compared with the performance of the FpGrowth algorithm implemented by the Spark platform. The results of our research show that when tested on large scale data, the Apriori algorithm gives much better performance values than the other algorithms when switching from single-node cluster environment to multi-node cluster environment.
DOI:10.1109/UBMK.2019.8907040