Performance Factor Analysis and Scope of Optimization for Big Data Processing on Cluster
Use of computational cluster for large-scale Big Data processing has attracted attention as a technology trend for its time efficiency. Modern cluster equipped with latest multi, many-core distributed shared architecture, high speed interconnect and file system, ensures high performance using messag...
Saved in:
| Published in: | International Conference on Parallel, Distributed and Grid Computing (PDGC ...) pp. 418 - 423 |
|---|---|
| Main Authors: | , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
01.12.2018
|
| Subjects: | |
| ISSN: | 2573-3079 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Use of computational cluster for large-scale Big Data processing has attracted attention as a technology trend for its time efficiency. Modern cluster equipped with latest multi, many-core distributed shared architecture, high speed interconnect and file system, ensures high performance using message passing and multi-threading parallel approaches, also handles batch, micro-batch and stream processing of high dimensional massive dataset but running data-intensive Big Data application on compute-centric cluster imposes challenges to its performance because of several runtime overheads. In order to alleviate these bottlenecks and exploit full potential of the cluster a state of the practice, performance-oriented technical analysis covering all relevant aspects is presented in the context of Terascale Big data processing on TeraFLOPS cluster PARAM-Kanchenjunga, with identification of major factors influencing the performance or sources of these overheads related to computation, communication or IPC, memory, I/O contention, scheduling, load imbalance, synchronization, latency and network jitter; by determining their impact. As existing approaches found insufficient, to achieve possible speedup advance methods with a variety of alternatives as RDMA enabled libraries, PFS, MPI-Integrated extensions, loop tiling, hybrid parallelization are provided to consider for optimization purposes. This paper will assist to prepare performance aware design of experiments and performance modeling. |
|---|---|
| ISSN: | 2573-3079 |
| DOI: | 10.1109/PDGC.2018.8745857 |