A sample decreasing threshold greedy-based algorithm for big data summarisation

As the scale of datasets used for big data applications expands rapidly, there have been increased efforts to develop faster algorithms. This paper addresses big data summarisation problems using the submodular maximisation approach and proposes an efficient algorithm for maximising general non-nega...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Journal of big data Ročník 8; číslo 1; s. 1 - 21
Hlavní autori: Li, Teng, Shin, Hyo-Sang, Tsourdos, Antonios
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Cham Springer International Publishing 09.02.2021
Springer Nature B.V
SpringerOpen
Predmet:
ISSN:2196-1115, 2196-1115
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:As the scale of datasets used for big data applications expands rapidly, there have been increased efforts to develop faster algorithms. This paper addresses big data summarisation problems using the submodular maximisation approach and proposes an efficient algorithm for maximising general non-negative submodular objective functions subject to k -extendible system constraints. Leveraging a random sampling process and a decreasing threshold strategy, this work proposes an algorithm, named Sample Decreasing Threshold Greedy (SDTG). The proposed algorithm obtains an expected approximation guarantee of 1 1 + k - ϵ for maximising monotone submodular functions and of k ( 1 + k ) 2 - ϵ in non-monotone cases with expected computational complexity of O n ( 1 + k ) ϵ ln r ϵ . Here, r is the largest size of feasible solutions, and ϵ ∈ 0 , 1 1 + k is an adjustable designing parameter for the trade-off between the approximation ratio and the computational complexity. The performance of the proposed algorithm is validated and compared with that of benchmark algorithms through experiments with a movie recommendation system based on a real database.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2196-1115
2196-1115
DOI:10.1186/s40537-021-00416-y