Discovering Newsworthy Themes from Sequenced Data: A Step Towards Computational Journalism

Automatic discovery of newsworthy themes from sequenced data can relieve journalists from manually poring over a large amount of data in order to find interesting news. In this paper, we propose a novel <inline-formula> <tex-math notation="LaTeX">k</tex-math> <inline-g...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on knowledge and data engineering Jg. 29; H. 7; S. 1398 - 1411
Hauptverfasser: Fan, Qi, Li, Yuchen, Zhang, Dongxiang, Tan, Kian-Lee
Format: Journal Article
Sprache:Englisch
Veröffentlicht: New York IEEE 01.07.2017
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:
ISSN:1041-4347, 1558-2191
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Automatic discovery of newsworthy themes from sequenced data can relieve journalists from manually poring over a large amount of data in order to find interesting news. In this paper, we propose a novel <inline-formula> <tex-math notation="LaTeX">k</tex-math> <inline-graphic xlink:href="fan-ieq1-2685587.gif"/> </inline-formula>-Sketch query that aims to find <inline-formula><tex-math notation="LaTeX"> k</tex-math> <inline-graphic xlink:href="fan-ieq2-2685587.gif"/> </inline-formula> striking streaks to best summarize a subject. Our scoring function takes into account streak strikingness and streak coverage at the same time. We study the <inline-formula><tex-math notation="LaTeX"> k</tex-math> <inline-graphic xlink:href="fan-ieq3-2685587.gif"/> </inline-formula>-Sketch query processing in both offline and online scenarios, and propose various streak-level pruning techniques to find striking candidates. Among those candidates, we then develop approximate methods to discover the <inline-formula><tex-math notation="LaTeX">k</tex-math> <inline-graphic xlink:href="fan-ieq4-2685587.gif"/> </inline-formula> most representative streaks with theoretical bounds. We conduct experiments on four real datasets, and the results demonstrate the efficiency and effectiveness of our proposed algorithms: the running time achieves up to 500 times speedup and the quality of the generated summaries is endorsed by the anonymous users from Amazon Mechanical Turk.
AbstractList Automatic discovery of newsworthy themes from sequenced data can relieve journalists from manually poring over a large amount of data in order to find interesting news. In this paper, we propose a novel <inline-formula> <tex-math notation="LaTeX">k</tex-math> <inline-graphic xlink:href="fan-ieq1-2685587.gif"/> </inline-formula>-Sketch query that aims to find <inline-formula><tex-math notation="LaTeX"> k</tex-math> <inline-graphic xlink:href="fan-ieq2-2685587.gif"/> </inline-formula> striking streaks to best summarize a subject. Our scoring function takes into account streak strikingness and streak coverage at the same time. We study the <inline-formula><tex-math notation="LaTeX"> k</tex-math> <inline-graphic xlink:href="fan-ieq3-2685587.gif"/> </inline-formula>-Sketch query processing in both offline and online scenarios, and propose various streak-level pruning techniques to find striking candidates. Among those candidates, we then develop approximate methods to discover the <inline-formula><tex-math notation="LaTeX">k</tex-math> <inline-graphic xlink:href="fan-ieq4-2685587.gif"/> </inline-formula> most representative streaks with theoretical bounds. We conduct experiments on four real datasets, and the results demonstrate the efficiency and effectiveness of our proposed algorithms: the running time achieves up to 500 times speedup and the quality of the generated summaries is endorsed by the anonymous users from Amazon Mechanical Turk.
Automatic discovery of newsworthy themes from sequenced data can relieve journalists from manually poring over a large amount of data in order to find interesting news. In this paper, we propose a novel [Formula Omitted]-Sketch query that aims to find [Formula Omitted] striking streaks to best summarize a subject. Our scoring function takes into account streak strikingness and streak coverage at the same time. We study the [Formula Omitted]-Sketch query processing in both offline and online scenarios, and propose various streak-level pruning techniques to find striking candidates. Among those candidates, we then develop approximate methods to discover the [Formula Omitted] most representative streaks with theoretical bounds. We conduct experiments on four real datasets, and the results demonstrate the efficiency and effectiveness of our proposed algorithms: the running time achieves up to 500 times speedup and the quality of the generated summaries is endorsed by the anonymous users from Amazon Mechanical Turk.
Author Zhang, Dongxiang
Fan, Qi
Li, Yuchen
Tan, Kian-Lee
Author_xml – sequence: 1
  givenname: Qi
  surname: Fan
  fullname: Fan, Qi
  email: fan.qi@u.nus.edu
  organization: NUS Graduate School for Integrative Sciences and Engineering (NGS), National University of Singapore (NUS), Singapore
– sequence: 2
  givenname: Yuchen
  surname: Li
  fullname: Li, Yuchen
  email: liyuchenmike@gmail.com
  organization: School of Computing, National University of Singapore, Singapore
– sequence: 3
  givenname: Dongxiang
  surname: Zhang
  fullname: Zhang, Dongxiang
  email: zhangdo@uestc.edu.cn
  organization: University of Electronic Science and Technology of China, Sichuan Sheng, China
– sequence: 4
  givenname: Kian-Lee
  surname: Tan
  fullname: Tan, Kian-Lee
  email: tankl@comp.nus.edu.sg
  organization: School of Computing, NGS, National University of Singapore, Singapore
BookMark eNp9kDtPwzAUhS1UJErhByAWS8wpthM_wla15VnB0LCwRI5zQ1M1cbBTqv57UqViYGA6dzjfveeeczSobQ0IXVEyppTEt8nLbD5mhMoxE4pzJU_QkHYaMBrTQTeTiAZRGMkzdO79mhCipKJD9DErvbHf4Mr6E7_Czu-sa1d7nKygAo8LZyu8hK8t1AZyPNOtvsMTvGyhwYndaZd7PLVVs211W9pab_Cz3bpOS19doNNCbzxcHnWE3u_nyfQxWLw9PE0ni8CwOGwDHWuT8UwXQKVRxnSBCykyxQkU2nAR5aEpch6LPCIZk0ozAYIpIakmeUh0OEI3_d7G2S6ob9N1n8GnNCZcEimZ6Fy0dxlnvXdQpI0rK-32KSXpocL0UGF6qDA9Vtgx8g9jyv7R1uly8y953ZMlAPxekkqFSvDwBxOCgZo
CODEN ITKEEH
CitedBy_id crossref_primary_10_1109_TKDE_2023_3328596
crossref_primary_10_1109_TKDE_2018_2854182
crossref_primary_10_1016_j_ins_2018_04_031
Cites_doi 10.1145/2213556.2213580
10.1023/A:1009796218281
10.1007/s10618-011-0232-z
10.1145/780555.780558
10.1007/BF01588971
10.1145/860435.860495
10.1145/2723372.2749451
10.1145/2808194.2809474
10.1137/1.9781611972795.60
10.1109/ICDE.2014.6816644
10.1145/2339530.2339762
10.1007/978-3-642-40328-6_9
10.1145/1281192.1281238
10.3141/1811-08
10.14778/1687627.1687693
10.14778/2733004.2733029
10.1007/s11280-014-0295-z
10.1007/978-3-642-13657-3_34
10.1109/TKDE.2013.44
10.1007/978-3-540-77974-2
10.1145/2601439
10.1145/290941.290954
10.1145/237814.238000
10.1145/1498759.1498766
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2017
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2017
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TKDE.2017.2685587
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005-present
IEEE All-Society Periodicals Package (ASPP) 1998-Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1558-2191
EndPage 1411
ExternalDocumentID 10_1109_TKDE_2017_2685587
7883865
Genre orig-research
GrantInformation_xml – fundername: National Nature Science Foundation of China
  grantid: 61602087; 61632007
  funderid: 10.13039/501100001809
GroupedDBID -~X
.DC
0R~
29I
4.4
5GY
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACIWK
AENEX
AGQYO
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
F5P
HZ~
IEDLZ
IFIPE
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNS
RXW
TAE
TN5
UHB
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
RIG
ID FETCH-LOGICAL-c293t-a9acb5bafe17c8cc219f76b850efac564d3cfd596d40b278a26e628671a0d30a3
IEDL.DBID RIE
ISICitedReferencesCount 6
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000403068000004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1041-4347
IngestDate Mon Jun 30 02:25:28 EDT 2025
Tue Nov 18 22:25:23 EST 2025
Sat Nov 29 04:46:41 EST 2025
Wed Aug 27 02:52:17 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 7
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c293t-a9acb5bafe17c8cc219f76b850efac564d3cfd596d40b278a26e628671a0d30a3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
PQID 1905707726
PQPubID 85438
PageCount 14
ParticipantIDs proquest_journals_1905707726
ieee_primary_7883865
crossref_primary_10_1109_TKDE_2017_2685587
crossref_citationtrail_10_1109_TKDE_2017_2685587
PublicationCentury 2000
PublicationDate 2017-07-01
PublicationDateYYYYMMDD 2017-07-01
PublicationDate_xml – month: 07
  year: 2017
  text: 2017-07-01
  day: 01
PublicationDecade 2010
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on knowledge and data engineering
PublicationTitleAbbrev TKDE
PublicationYear 2017
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
ref15
ref14
ref11
ref10
berg (ref17) 2008
ref2
ref1
ref16
ref19
ref18
ref23
ref25
ref20
lichman (ref24) 2013
ref22
ref21
ref8
ref7
ref9
ref4
ref3
ref6
ref5
References_xml – ident: ref10
  doi: 10.1145/2213556.2213580
– ident: ref5
  doi: 10.1023/A:1009796218281
– ident: ref7
  doi: 10.1007/s10618-011-0232-z
– ident: ref21
  doi: 10.1145/780555.780558
– ident: ref20
  doi: 10.1007/BF01588971
– ident: ref14
  doi: 10.1145/860435.860495
– ident: ref12
  doi: 10.1145/2723372.2749451
– ident: ref16
  doi: 10.1145/2808194.2809474
– ident: ref23
  doi: 10.1137/1.9781611972795.60
– ident: ref1
  doi: 10.1109/ICDE.2014.6816644
– ident: ref2
  doi: 10.1145/2339530.2339762
– ident: ref19
  doi: 10.1007/978-3-642-40328-6_9
– ident: ref8
  doi: 10.1145/1281192.1281238
– ident: ref25
  doi: 10.3141/1811-08
– ident: ref18
  doi: 10.14778/1687627.1687693
– ident: ref3
  doi: 10.14778/2733004.2733029
– ident: ref15
  doi: 10.1007/s11280-014-0295-z
– ident: ref6
  doi: 10.1007/978-3-642-13657-3_34
– ident: ref11
  doi: 10.1109/TKDE.2013.44
– year: 2008
  ident: ref17
  publication-title: Computational Geometry Algorithms and Applications
  doi: 10.1007/978-3-540-77974-2
– ident: ref4
  doi: 10.1145/2601439
– year: 2013
  ident: ref24
  article-title: UCI machine learning repository
– ident: ref13
  doi: 10.1145/290941.290954
– ident: ref22
  doi: 10.1145/237814.238000
– ident: ref9
  doi: 10.1145/1498759.1498766
SSID ssj0008781
Score 2.2785618
Snippet Automatic discovery of newsworthy themes from sequenced data can relieve journalists from manually poring over a large amount of data in order to find...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 1398
SubjectTerms Algorithm design and analysis
Algorithms
approximate algorithms
Approximation
Approximation algorithms
Automation
Computational efficiency
Computational journalism
Computing time
Datasets
Electronic commerce
Electronic mail
Engineering profession
Games
History
News
news theme discovery
Pruning
Query processing
Run time (computers)
Running
sequenced data
Summaries
Title Discovering Newsworthy Themes from Sequenced Data: A Step Towards Computational Journalism
URI https://ieeexplore.ieee.org/document/7883865
https://www.proquest.com/docview/1905707726
Volume 29
WOSCitedRecordID wos000403068000004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Xplore
  customDbUrl:
  eissn: 1558-2191
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0008781
  issn: 1041-4347
  databaseCode: RIE
  dateStart: 19890101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFH_M4UEPTjfF6ZQcPIndsn6l9TbchiAMwQnDS8knDNwmayf435ukWRkogrcWkjT0Je8j-b33A7iRSspU6A1IFFWettfCY1E_8EIqmIpZoH16YckmyGSSzGbpcw3uqlwYKaUFn8muebR3-WLFN-aorKfDNUNRuQd7hJAyV6vSugmxhKQ6utAxURASd4PZx2lv-jQcGRAX6fpxEkUGPbdjgyypyg9NbM3LuPG_iR3DkXMj0aCU-wnU5LIJjS1FA3I7tgmHO_UGW_A2nOfcYDb1GzLqzQIEv5BeKwuZI5Nqgl4ctlqgIS3oPRogAwRDUwuvzVH5CXeCiNw85vniFF7Ho-nDo-fIFTyuLXzh0ZRyFjGqZJ_whHOtuRSJWRJhqSiP4lAEXIkojUWImU8S6sfSpLGSPsUiwDQ4g_pytZTngGSqRKAHMTmu2h9gTGClJFNBKHwuMGsD3v7ujLvK44YA4z2zEQhOMyOhzEgocxJqw23V5aMsu_FX45YRSdXQSaMNna1MM7cx80z7PxHBOqSIL37vdQkHZuwSkduBerHeyCvY55_FPF9f2zX3DU5b1mg
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1bS8MwFD7MC6gP3qY4r3nwSaxLL2la38RNFHUIThi-lFxhoFPWTvDfm6TZEBTBtxaSJvQk55J853wAx0orlUuzAalmOjD2WgachHGQMMl1ymPj00tHNkF7vWwwyB8acDrLhVFKOfCZOrOP7i5fvomJPSprm3DNUlTOwQJJkiiss7VmejejjpLUxBcmKooT6u8wQ5y3-7edroVx0bMozQix-LlvVsjRqvzQxc7AXK39b2rrsOodSXRRS34DGmq0CWtTkgbk9-wmrHyrONiE586wFBa1ad6QVXAOIviJzGp5VSWyySbo0aOrJeqwip2jC2ShYKjvALYlqofwZ4jIz2NYvm7B01W3f3kdeHqFQBgbXwUsZ4ITzrQKqciEMLpL05RnBCvNBEkTGQstSZ7KBPOIZixKlU1kpSHDMsYs3ob50dtI7QBSuZax-YjNcjUeAecSa624jhMZCYl5C_D0dxfC1x63FBgvhYtBcF5YCRVWQoWXUAtOZl3e68IbfzVuWpHMGnpptGB_KtPCb82yMB4QodgEFenu772OYOm6f39X3N30bvdg2Y5T43P3Yb4aT9QBLIqPaliOD936-wKboNmv
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Discovering+Newsworthy+Themes+from+Sequenced+Data%3A+A+Step+Towards+Computational+Journalism&rft.jtitle=IEEE+transactions+on+knowledge+and+data+engineering&rft.au=Fan%2C+Qi&rft.au=Li%2C+Yuchen&rft.au=Zhang%2C+Dongxiang&rft.au=Tan%2C+Kian-Lee&rft.date=2017-07-01&rft.issn=1041-4347&rft.volume=29&rft.issue=7&rft.spage=1398&rft.epage=1411&rft_id=info:doi/10.1109%2FTKDE.2017.2685587&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TKDE_2017_2685587
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1041-4347&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1041-4347&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1041-4347&client=summon