ScalaParBiBit: scaling the binary biclustering in distributed-memory systems
Biclustering is a data mining technique that allows us to find groups of rows and columns that are highly correlated in a 2D dataset. Although there exist several software applications to perform biclustering, most of them suffer from a high computational complexity which prevents their use in large...
Gespeichert in:
| Veröffentlicht in: | Cluster computing Jg. 24; H. 3; S. 2249 - 2268 |
|---|---|
| Hauptverfasser: | , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
New York
Springer US
01.09.2021
Springer Nature B.V |
| Schlagworte: | |
| ISSN: | 1386-7857, 1573-7543 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | Biclustering is a data mining technique that allows us to find groups of rows and columns that are highly correlated in a 2D dataset. Although there exist several software applications to perform biclustering, most of them suffer from a high computational complexity which prevents their use in large datasets. In this work we present
ScalaParBiBit
, a parallel tool to find biclusters on binary data, quite common in many research fields such as text mining, marketing or bioinformatics.
ScalaParBiBit
takes advantage of the special characteristics of these binary datasets, as well as of an efficient parallel implementation and algorithm, to accelerate the biclustering procedure in distributed-memory systems. The experimental evaluation proves that our tool is significantly faster and more scalable that the state-of-the-art tool
ParBiBit
in a cluster with 32 nodes and 768 cores. Our tool together with its reference manual are freely available at
https://github.com/fraguela/ScalaParBiBit
. |
|---|---|
| Bibliographie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1386-7857 1573-7543 |
| DOI: | 10.1007/s10586-021-03261-z |