A Data Throughput Prediction and Optimization Service for Widely Distributed Many-Task Computing

Saved in:
Bibliographic Details
Title: A Data Throughput Prediction and Optimization Service for Widely Distributed Many-Task Computing
Authors: Dengpan Yin, Esma Yildirim, Tevfik Kosar
Contributors: The Pennsylvania State University CiteSeerX Archives
Source: http://www.cse.buffalo.edu/faculty/tkosar/papers/jnrl_tpds_2011_1.pdf.
Collection: CiteSeerX
Subject Terms: Index Terms—Many-Task computing, Modeling, Scheduling, Parallel TCP streams, Optimization, Prediction, Stork F
Description: —In this paper, we present the design and implementation of an application-layer data throughput prediction and optimization service for many-task computing in widely distributed environments. This service uses multiple parallel TCP streams to improve the end-to-end throughput of data transfers. A novel mathematical model is developed to decide the number of parallel streams to achieve best performance. This model can predict the optimal number of parallel streams with as few as three prediction points. We implement this new service in the Stork data scheduler, where the prediction points can be obtained using Iperf and GridFTP samplings. Our results show that the prediction cost plus the optimized transfer time is much less than the unoptimized transfer time in most cases.
Document Type: text
File Description: application/pdf
Language: English
Relation: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.674.8264; http://www.cse.buffalo.edu/faculty/tkosar/papers/jnrl_tpds_2011_1.pdf
Availability: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.674.8264
http://www.cse.buffalo.edu/faculty/tkosar/papers/jnrl_tpds_2011_1.pdf
Rights: Metadata may be used without restrictions as long as the oai identifier remains attached to it.
Accession Number: edsbas.A01A02E7
Database: BASE
Description
Abstract:—In this paper, we present the design and implementation of an application-layer data throughput prediction and optimization service for many-task computing in widely distributed environments. This service uses multiple parallel TCP streams to improve the end-to-end throughput of data transfers. A novel mathematical model is developed to decide the number of parallel streams to achieve best performance. This model can predict the optimal number of parallel streams with as few as three prediction points. We implement this new service in the Stork data scheduler, where the prediction points can be obtained using Iperf and GridFTP samplings. Our results show that the prediction cost plus the optimized transfer time is much less than the unoptimized transfer time in most cases.