A sample decreasing threshold greedy-based algorithm for big data summarisation

As the scale of datasets used for big data applications expands rapidly, there have been increased efforts to develop faster algorithms. This paper addresses big data summarisation problems using the submodular maximisation approach and proposes an efficient algorithm for maximising general non-nega...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of big data Jg. 8; H. 1; S. 1 - 21
Hauptverfasser: Li, Teng, Shin, Hyo-Sang, Tsourdos, Antonios
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Cham Springer International Publishing 09.02.2021
Springer Nature B.V
SpringerOpen
Schlagworte:
ISSN:2196-1115, 2196-1115
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:As the scale of datasets used for big data applications expands rapidly, there have been increased efforts to develop faster algorithms. This paper addresses big data summarisation problems using the submodular maximisation approach and proposes an efficient algorithm for maximising general non-negative submodular objective functions subject to k -extendible system constraints. Leveraging a random sampling process and a decreasing threshold strategy, this work proposes an algorithm, named Sample Decreasing Threshold Greedy (SDTG). The proposed algorithm obtains an expected approximation guarantee of 1 1 + k - ϵ for maximising monotone submodular functions and of k ( 1 + k ) 2 - ϵ in non-monotone cases with expected computational complexity of O n ( 1 + k ) ϵ ln r ϵ . Here, r is the largest size of feasible solutions, and ϵ ∈ 0 , 1 1 + k is an adjustable designing parameter for the trade-off between the approximation ratio and the computational complexity. The performance of the proposed algorithm is validated and compared with that of benchmark algorithms through experiments with a movie recommendation system based on a real database.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2196-1115
2196-1115
DOI:10.1186/s40537-021-00416-y