Hybrid merge/overlap execution technique for parallel array processing

Uloženo v:
Podrobná bibliografie
Název: Hybrid merge/overlap execution technique for parallel array processing
Autoři: Emad Soroush, Magdalena Balazinska
Přispěvatelé: The Pennsylvania State University CiteSeerX Archives
Zdroj: http://www.cs.washington.edu/homes/magda/papers/soroush-arrayworkshop11.pdf.
Rok vydání: 2011
Sbírka: CiteSeerX
Témata: General Terms Design, Performance Keywords Parallel Array Processing, Scientific Databases, Overlap Execution Strategy
Popis: Whether in business or science, multi-dimensional arrays are a common abstraction in data analytics and many systems exist for efficiently processing arrays. As dataset grow in size, it is becoming increasingly important to process these arrays in parallel. In this paper, we discuss different types of array operations and review how they can be processed in parallel using two different existing techniques. The first technique, which we call merge, consists in partitioning an array, processing the partitions in parallel, then merging the results to reconcile computations that span partition boundaries. The second technique, which we call overlap, consists in partitioning an array into subarrays that overlap by a given number of cells along each dimension. Thanks to this overlap, the array partitions can be processed in parallel without any merge phase. We discuss when each technique can be applied to an array operation. We show that even for a single array operation, a different approach may yield the best performance for different regions of an array. Following this observation, we introduce a new parallel array processing technique that combines the merge and overlap approaches. Our technique enables a parallel array processing system to mix-and-match the merge and overlap techniques within a single operation on an array. Through experiments on real, scientific data, we show that this hybrid approach outperforms the other two techniques.
Druh dokumentu: text
Popis souboru: application/pdf
Jazyk: English
Relation: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.225.6575
Dostupnost: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.225.6575
http://www.cs.washington.edu/homes/magda/papers/soroush-arrayworkshop11.pdf
Rights: Metadata may be used without restrictions as long as the oai identifier remains attached to it.
Přístupové číslo: edsbas.36B8907E
Databáze: BASE
Popis
Abstrakt:Whether in business or science, multi-dimensional arrays are a common abstraction in data analytics and many systems exist for efficiently processing arrays. As dataset grow in size, it is becoming increasingly important to process these arrays in parallel. In this paper, we discuss different types of array operations and review how they can be processed in parallel using two different existing techniques. The first technique, which we call merge, consists in partitioning an array, processing the partitions in parallel, then merging the results to reconcile computations that span partition boundaries. The second technique, which we call overlap, consists in partitioning an array into subarrays that overlap by a given number of cells along each dimension. Thanks to this overlap, the array partitions can be processed in parallel without any merge phase. We discuss when each technique can be applied to an array operation. We show that even for a single array operation, a different approach may yield the best performance for different regions of an array. Following this observation, we introduce a new parallel array processing technique that combines the merge and overlap approaches. Our technique enables a parallel array processing system to mix-and-match the merge and overlap techniques within a single operation on an array. Through experiments on real, scientific data, we show that this hybrid approach outperforms the other two techniques.