Discovering Newsworthy Themes from Sequenced Data: A Step Towards Computational Journalism

Automatic discovery of newsworthy themes from sequenced data can relieve journalists from manually poring over a large amount of data in order to find interesting news. In this paper, we propose a novel <inline-formula> <tex-math notation="LaTeX">k</tex-math> <inline-g...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on knowledge and data engineering Ročník 29; číslo 7; s. 1398 - 1411
Hlavní autoři: Fan, Qi, Li, Yuchen, Zhang, Dongxiang, Tan, Kian-Lee
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York IEEE 01.07.2017
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:1041-4347, 1558-2191
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Automatic discovery of newsworthy themes from sequenced data can relieve journalists from manually poring over a large amount of data in order to find interesting news. In this paper, we propose a novel <inline-formula> <tex-math notation="LaTeX">k</tex-math> <inline-graphic xlink:href="fan-ieq1-2685587.gif"/> </inline-formula>-Sketch query that aims to find <inline-formula><tex-math notation="LaTeX"> k</tex-math> <inline-graphic xlink:href="fan-ieq2-2685587.gif"/> </inline-formula> striking streaks to best summarize a subject. Our scoring function takes into account streak strikingness and streak coverage at the same time. We study the <inline-formula><tex-math notation="LaTeX"> k</tex-math> <inline-graphic xlink:href="fan-ieq3-2685587.gif"/> </inline-formula>-Sketch query processing in both offline and online scenarios, and propose various streak-level pruning techniques to find striking candidates. Among those candidates, we then develop approximate methods to discover the <inline-formula><tex-math notation="LaTeX">k</tex-math> <inline-graphic xlink:href="fan-ieq4-2685587.gif"/> </inline-formula> most representative streaks with theoretical bounds. We conduct experiments on four real datasets, and the results demonstrate the efficiency and effectiveness of our proposed algorithms: the running time achieves up to 500 times speedup and the quality of the generated summaries is endorsed by the anonymous users from Amazon Mechanical Turk.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1041-4347
1558-2191
DOI:10.1109/TKDE.2017.2685587