MR-FIMNA: An Efficient N-Lists-Based Algorithm for Mining Frequent Itemsets via the Hybrid Parallel
Saved in:
| Title: | MR-FIMNA: An Efficient N-Lists-Based Algorithm for Mining Frequent Itemsets via the Hybrid Parallel |
|---|---|
| Authors: | Hao-Yu Gu, Bin-Bin Guo, Ke Gong, Chi Zhang, Neelakandan Chandrasekaran, De-Cheng Miao |
| Source: | Journal of Computers. 36:295-312 |
| Publisher Information: | Computer Society of the Republic of China, 2025. |
| Publication Year: | 2025 |
| Description: | Frequent itemset mining (FIM), with their compound correlation structure and powerful association mining capabilities, have been successfully used in retail, fast selling, e-commerce, finance and other fields, and has shown great advantages. However, with the increasing scale of data and the expectation of the response time, FIM faces three complex challenges in a big data environment: inefficient parallelism, inefficient merge performance and redundant search. To solve these three problems, this paper proposes an optimization parallel FIM algorithm (MR-FIMNA) in the MapReduce framework. Firstly, a grouping technique based on greedy strategy of 0-1 knapsack (GM-GSK) is developed in the stage of grouping frequent 1-itemset to diminish the limitations initiated by clusters load balance in the parallel algorithm. Then, a previously abandon strategy is proposed in the stage of mining frequent itemsets in parallel to improve the merge performance of N−list structure and a pruning strategy of equivalent superset is proposed to avoid redundant searches during data mining. The MR-FIMNA algorithm was compared with other algorithms on four datasets, namely HIGGS, Adult, Susy and HTRU2. The results of experimental show that the MR-FIMNA algorithm gains a good-performing speed-up ratio and take fewer computing resources and memory usage in a big data environment. |
| Document Type: | Article |
| ISSN: | 1991-1599 |
| DOI: | 10.63367/199115992025123606019 |
| Accession Number: | edsair.doi...........7e83a1fc71ce50d16bd2f57ff42e21ad |
| Database: | OpenAIRE |
| Abstract: | Frequent itemset mining (FIM), with their compound correlation structure and powerful association mining capabilities, have been successfully used in retail, fast selling, e-commerce, finance and other fields, and has shown great advantages. However, with the increasing scale of data and the expectation of the response time, FIM faces three complex challenges in a big data environment: inefficient parallelism, inefficient merge performance and redundant search. To solve these three problems, this paper proposes an optimization parallel FIM algorithm (MR-FIMNA) in the MapReduce framework. Firstly, a grouping technique based on greedy strategy of 0-1 knapsack (GM-GSK) is developed in the stage of grouping frequent 1-itemset to diminish the limitations initiated by clusters load balance in the parallel algorithm. Then, a previously abandon strategy is proposed in the stage of mining frequent itemsets in parallel to improve the merge performance of N−list structure and a pruning strategy of equivalent superset is proposed to avoid redundant searches during data mining. The MR-FIMNA algorithm was compared with other algorithms on four datasets, namely HIGGS, Adult, Susy and HTRU2. The results of experimental show that the MR-FIMNA algorithm gains a good-performing speed-up ratio and take fewer computing resources and memory usage in a big data environment. |
|---|---|
| ISSN: | 19911599 |
| DOI: | 10.63367/199115992025123606019 |
Nájsť tento článok vo Web of Science