View in EDS

MR-FIMNA: An Efficient N-Lists-Based Algorithm for Mining Frequent Itemsets via the Hybrid Parallel

Saved in:

Bibliographic Details
Title:	MR-FIMNA: An Efficient N-Lists-Based Algorithm for Mining Frequent Itemsets via the Hybrid Parallel
Authors:	Hao-Yu Gu, Bin-Bin Guo, Ke Gong, Chi Zhang, Neelakandan Chandrasekaran, De-Cheng Miao
Source:	Journal of Computers. 36:295-312
Publisher Information:	Computer Society of the Republic of China, 2025.
Publication Year:	2025
Description:	Frequent itemset mining (FIM), with their compound correlation structure and powerful association mining capabilities, have been successfully used in retail, fast selling, e-commerce, finance and other fields, and has shown great advantages. However, with the increasing scale of data and the expectation of the response time, FIM faces three complex challenges in a big data environment: inefficient parallelism, inefficient merge performance and redundant search. To solve these three problems, this paper proposes an optimization parallel FIM algorithm (MR-FIMNA) in the MapReduce framework. Firstly, a grouping technique based on greedy strategy of 0-1 knapsack (GM-GSK) is developed in the stage of grouping frequent 1-itemset to diminish the limitations initiated by clusters load balance in the parallel algorithm. Then, a previously abandon strategy is proposed in the stage of mining frequent itemsets in parallel to improve the merge performance of N−list structure and a pruning strategy of equivalent superset is proposed to avoid redundant searches during data mining. The MR-FIMNA algorithm was compared with other algorithms on four datasets, namely HIGGS, Adult, Susy and HTRU2. The results of experimental show that the MR-FIMNA algorithm gains a good-performing speed-up ratio and take fewer computing resources and memory usage in a big data environment.
Document Type:	Article
ISSN:	1991-1599
DOI:	10.63367/199115992025123606019
Accession Number:	edsair.doi...........7e83a1fc71ce50d16bd2f57ff42e21ad
Database:	OpenAIRE

Nájsť tento článok vo Web of Science

Description
Abstract:	Frequent itemset mining (FIM), with their compound correlation structure and powerful association mining capabilities, have been successfully used in retail, fast selling, e-commerce, finance and other fields, and has shown great advantages. However, with the increasing scale of data and the expectation of the response time, FIM faces three complex challenges in a big data environment: inefficient parallelism, inefficient merge performance and redundant search. To solve these three problems, this paper proposes an optimization parallel FIM algorithm (MR-FIMNA) in the MapReduce framework. Firstly, a grouping technique based on greedy strategy of 0-1 knapsack (GM-GSK) is developed in the stage of grouping frequent 1-itemset to diminish the limitations initiated by clusters load balance in the parallel algorithm. Then, a previously abandon strategy is proposed in the stage of mining frequent itemsets in parallel to improve the merge performance of N−list structure and a pruning strategy of equivalent superset is proposed to avoid redundant searches during data mining. The MR-FIMNA algorithm was compared with other algorithms on four datasets, namely HIGGS, Adult, Susy and HTRU2. The results of experimental show that the MR-FIMNA algorithm gains a good-performing speed-up ratio and take fewer computing resources and memory usage in a big data environment.
ISSN:	19911599
DOI:	10.63367/199115992025123606019