Distributed load balancing frequent colossal closed itemset mining algorithm for high dimensional dataset

The focus of extracting colossal closed itemsets from high dimensional biological datasets has been great in recent times. A massive set of short and average sized mined itemsets do not confine complete and valuable information for decision making. But, the traditional itemset mining algorithms expe...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Journal of parallel and distributed computing Ročník 144; s. 136 - 152
Hlavní autoři:	Vanahalli, Manjunath K, Patil, Nagamma
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Elsevier Inc 01.10.2020
Témata:	Bioinformatics Closeness checking Distributed and parallel computing High dimensional datasets Load balancing High dimensional datasets Load balancing Distributed and parallel computing Closeness checking Bioinformatics
ISSN:	0743-7315, 1096-0848
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	The focus of extracting colossal closed itemsets from high dimensional biological datasets has been great in recent times. A massive set of short and average sized mined itemsets do not confine complete and valuable information for decision making. But, the traditional itemset mining algorithms expend a gigantic measure of time in mining a massive set of short and average sized itemsets. The greater interest of research in the field of bioinformatics and the abundant data across the variety of domains paved the way for the generation of the high dimensional dataset. These datasets are depicted by an extensive number of features and a smaller number of rows. Colossal closed itemsets are very significant for numerous applications including the field of bioinformatics and are influential during the decision making. Extracting a huge amount of information and knowledge from the high dimensional dataset is a nontrivial task. The existing colossal closed itemsets mining algorithms for the high dimensional dataset are sequential and computationally expensive. Distributed and parallel computing is a good strategy to overcome the inefficiency of the existing sequential algorithm. Balanced Distributed Parallel Frequent Colossal Closed Itemset Mining (BDPFCCIM) algorithm is designed for high dimensional datasets. An efficient closeness checking method to check the closeness of the rowset and an efficient pruning strategy to snip the row enumeration mining search space is enclosed with the proposed BDPFCCIM algorithm. The proposed BDPFCCIM algorithm is the first distributed load balancing algorithm to mine frequent colossal closed itemsets from high dimensional biological datasets. The experimental results demonstrate the efficient performance of the proposed BDPFCCIM algorithm in comparison with the state-of-the-art algorithms. •An effective pre-processing technique to prune the complete set of insignificant features and insignificant rows.•An efficient closure method to check the closeness of a rowset during row enumeration method.•An efficient pruning strategy to cut down the row enumerated mining search space by efficient utilization of minimum cardinality threshold.•The proposed BDPFCCIM algorithm is the first distributed load balancing algorithm to mine frequent colossal closed itemsets from high dimensional biological datasets.
ISSN:	0743-7315 1096-0848
DOI:	10.1016/j.jpdc.2020.05.017