A Data Throughput Prediction and Optimization Service for Widely Distributed Many-Task Computing

Uloženo v:
Podrobná bibliografie
Název: A Data Throughput Prediction and Optimization Service for Widely Distributed Many-Task Computing
Autoři: Dengpan Yin, Esma Yildirim, Tevfik Kosar
Přispěvatelé: The Pennsylvania State University CiteSeerX Archives
Zdroj: http://www.cse.buffalo.edu/faculty/tkosar/papers/jnrl_tpds_2011_1.pdf.
Sbírka: CiteSeerX
Témata: Index Terms—Many-Task computing, Modeling, Scheduling, Parallel TCP streams, Optimization, Prediction, Stork F
Popis: —In this paper, we present the design and implementation of an application-layer data throughput prediction and optimization service for many-task computing in widely distributed environments. This service uses multiple parallel TCP streams to improve the end-to-end throughput of data transfers. A novel mathematical model is developed to decide the number of parallel streams to achieve best performance. This model can predict the optimal number of parallel streams with as few as three prediction points. We implement this new service in the Stork data scheduler, where the prediction points can be obtained using Iperf and GridFTP samplings. Our results show that the prediction cost plus the optimized transfer time is much less than the unoptimized transfer time in most cases.
Druh dokumentu: text
Popis souboru: application/pdf
Jazyk: English
Relation: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.674.8264; http://www.cse.buffalo.edu/faculty/tkosar/papers/jnrl_tpds_2011_1.pdf
Dostupnost: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.674.8264
http://www.cse.buffalo.edu/faculty/tkosar/papers/jnrl_tpds_2011_1.pdf
Rights: Metadata may be used without restrictions as long as the oai identifier remains attached to it.
Přístupové číslo: edsbas.A01A02E7
Databáze: BASE
Popis
Abstrakt:—In this paper, we present the design and implementation of an application-layer data throughput prediction and optimization service for many-task computing in widely distributed environments. This service uses multiple parallel TCP streams to improve the end-to-end throughput of data transfers. A novel mathematical model is developed to decide the number of parallel streams to achieve best performance. This model can predict the optimal number of parallel streams with as few as three prediction points. We implement this new service in the Stork data scheduler, where the prediction points can be obtained using Iperf and GridFTP samplings. Our results show that the prediction cost plus the optimized transfer time is much less than the unoptimized transfer time in most cases.