Distributed Top-k Query Processing by Exploiting Skyline Summaries

Saved in:
Bibliographic Details
Title: Distributed Top-k Query Processing by Exploiting Skyline Summaries
Authors: Akrivi Vlachou, Christos Doulkeridis
Contributors: The Pennsylvania State University CiteSeerX Archives
Source: http://www.idi.ntnu.no/~noervaag/papers/DPD2011.pdf.
Collection: CiteSeerX
Description: Recently, a trend has been observed towards supporting rank-aware query operators, such as top-k, that enable users to retrieve only a limited set of the most interesting data objects. As data nowadays is commonly stored distributed over multiple servers, a challenging problem is to support rank-aware queries in distributed environments. In this paper, we propose a novel approach, called DiTo, for efficient top-k processing over multiple servers, where each server stores autonomously a fraction of the data. Towards this goal, we exploit the inherent relationship of top-k and skyline objects, and we employ the skyline objects of servers as a data summarization mechanism for efficiently identifying the servers that store top-k results. Relying on a thresholding scheme, DiTo retrieves the top-k result set progressively, while the number of queried servers and transferred data is minimized. Furthermore, we extend DiTo to support data summarizations of bounded size, thus restricting the cost of summary distribution and maintenance. To this end, we study the challenging problem of finding an abstraction of the skyline set of fixed size that influences the performance of DiTo only slightly. Our experimental evaluation shows that DiTo performs efficiently and provides a viable solution when a high degree of distribution is required.
Document Type: text
File Description: application/pdf
Language: English
Relation: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.299.5185; http://www.idi.ntnu.no/~noervaag/papers/DPD2011.pdf
Availability: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.299.5185
http://www.idi.ntnu.no/~noervaag/papers/DPD2011.pdf
Rights: Metadata may be used without restrictions as long as the oai identifier remains attached to it.
Accession Number: edsbas.D1591F02
Database: BASE
Description
Abstract:Recently, a trend has been observed towards supporting rank-aware query operators, such as top-k, that enable users to retrieve only a limited set of the most interesting data objects. As data nowadays is commonly stored distributed over multiple servers, a challenging problem is to support rank-aware queries in distributed environments. In this paper, we propose a novel approach, called DiTo, for efficient top-k processing over multiple servers, where each server stores autonomously a fraction of the data. Towards this goal, we exploit the inherent relationship of top-k and skyline objects, and we employ the skyline objects of servers as a data summarization mechanism for efficiently identifying the servers that store top-k results. Relying on a thresholding scheme, DiTo retrieves the top-k result set progressively, while the number of queried servers and transferred data is minimized. Furthermore, we extend DiTo to support data summarizations of bounded size, thus restricting the cost of summary distribution and maintenance. To this end, we study the challenging problem of finding an abstraction of the skyline set of fixed size that influences the performance of DiTo only slightly. Our experimental evaluation shows that DiTo performs efficiently and provides a viable solution when a high degree of distribution is required.