MR-FIMNA: An Efficient N-Lists-Based Algorithm for Mining Frequent Itemsets via the Hybrid Parallel

Saved in:
Bibliographic Details
Title: MR-FIMNA: An Efficient N-Lists-Based Algorithm for Mining Frequent Itemsets via the Hybrid Parallel
Authors: Hao-Yu Gu, Bin-Bin Guo, Ke Gong, Chi Zhang, Neelakandan Chandrasekaran, De-Cheng Miao
Source: Journal of Computers. 36:295-312
Publisher Information: Computer Society of the Republic of China, 2025.
Publication Year: 2025
Description: Frequent itemset mining (FIM), with their compound correlation structure and powerful association mining capabilities, have been successfully used in retail, fast selling, e-commerce, finance and other fields, and has shown great advantages. However, with the increasing scale of data and the expectation of the response time, FIM faces three complex challenges in a big data environment: inefficient parallelism, inefficient merge performance and redundant search. To solve these three problems, this paper proposes an optimization parallel FIM algorithm (MR-FIMNA) in the MapReduce framework. Firstly, a grouping technique based on greedy strategy of 0-1 knapsack (GM-GSK) is developed in the stage of grouping frequent 1-itemset to diminish the limitations initiated by clusters load balance in the parallel algorithm. Then, a previously abandon strategy is proposed in the stage of mining frequent itemsets in parallel to improve the merge performance of N−list structure and a pruning strategy of equivalent superset is proposed to avoid redundant searches during data mining. The MR-FIMNA algorithm was compared with other algorithms on four datasets, namely HIGGS, Adult, Susy and HTRU2. The results of experimental show that the MR-FIMNA algorithm gains a good-performing speed-up ratio and take fewer computing resources and memory usage in a big data environment.
Document Type: Article
ISSN: 1991-1599
DOI: 10.63367/199115992025123606019
Accession Number: edsair.doi...........7e83a1fc71ce50d16bd2f57ff42e21ad
Database: OpenAIRE
Description
Abstract:Frequent itemset mining (FIM), with their compound correlation structure and powerful association mining capabilities, have been successfully used in retail, fast selling, e-commerce, finance and other fields, and has shown great advantages. However, with the increasing scale of data and the expectation of the response time, FIM faces three complex challenges in a big data environment: inefficient parallelism, inefficient merge performance and redundant search. To solve these three problems, this paper proposes an optimization parallel FIM algorithm (MR-FIMNA) in the MapReduce framework. Firstly, a grouping technique based on greedy strategy of 0-1 knapsack (GM-GSK) is developed in the stage of grouping frequent 1-itemset to diminish the limitations initiated by clusters load balance in the parallel algorithm. Then, a previously abandon strategy is proposed in the stage of mining frequent itemsets in parallel to improve the merge performance of N−list structure and a pruning strategy of equivalent superset is proposed to avoid redundant searches during data mining. The MR-FIMNA algorithm was compared with other algorithms on four datasets, namely HIGGS, Adult, Susy and HTRU2. The results of experimental show that the MR-FIMNA algorithm gains a good-performing speed-up ratio and take fewer computing resources and memory usage in a big data environment.
ISSN:19911599
DOI:10.63367/199115992025123606019