Discovering Newsworthy Themes from Sequenced Data: A Step Towards Computational Journalism
Automatic discovery of newsworthy themes from sequenced data can relieve journalists from manually poring over a large amount of data in order to find interesting news. In this paper, we propose a novel <inline-formula> <tex-math notation="LaTeX">k</tex-math> <inline-g...
Gespeichert in:
| Veröffentlicht in: | IEEE transactions on knowledge and data engineering Jg. 29; H. 7; S. 1398 - 1411 |
|---|---|
| Hauptverfasser: | , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
New York
IEEE
01.07.2017
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Schlagworte: | |
| ISSN: | 1041-4347, 1558-2191 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Automatic discovery of newsworthy themes from sequenced data can relieve journalists from manually poring over a large amount of data in order to find interesting news. In this paper, we propose a novel <inline-formula> <tex-math notation="LaTeX">k</tex-math> <inline-graphic xlink:href="fan-ieq1-2685587.gif"/> </inline-formula>-Sketch query that aims to find <inline-formula><tex-math notation="LaTeX"> k</tex-math> <inline-graphic xlink:href="fan-ieq2-2685587.gif"/> </inline-formula> striking streaks to best summarize a subject. Our scoring function takes into account streak strikingness and streak coverage at the same time. We study the <inline-formula><tex-math notation="LaTeX"> k</tex-math> <inline-graphic xlink:href="fan-ieq3-2685587.gif"/> </inline-formula>-Sketch query processing in both offline and online scenarios, and propose various streak-level pruning techniques to find striking candidates. Among those candidates, we then develop approximate methods to discover the <inline-formula><tex-math notation="LaTeX">k</tex-math> <inline-graphic xlink:href="fan-ieq4-2685587.gif"/> </inline-formula> most representative streaks with theoretical bounds. We conduct experiments on four real datasets, and the results demonstrate the efficiency and effectiveness of our proposed algorithms: the running time achieves up to 500 times speedup and the quality of the generated summaries is endorsed by the anonymous users from Amazon Mechanical Turk. |
|---|---|
| AbstractList | Automatic discovery of newsworthy themes from sequenced data can relieve journalists from manually poring over a large amount of data in order to find interesting news. In this paper, we propose a novel <inline-formula> <tex-math notation="LaTeX">k</tex-math> <inline-graphic xlink:href="fan-ieq1-2685587.gif"/> </inline-formula>-Sketch query that aims to find <inline-formula><tex-math notation="LaTeX"> k</tex-math> <inline-graphic xlink:href="fan-ieq2-2685587.gif"/> </inline-formula> striking streaks to best summarize a subject. Our scoring function takes into account streak strikingness and streak coverage at the same time. We study the <inline-formula><tex-math notation="LaTeX"> k</tex-math> <inline-graphic xlink:href="fan-ieq3-2685587.gif"/> </inline-formula>-Sketch query processing in both offline and online scenarios, and propose various streak-level pruning techniques to find striking candidates. Among those candidates, we then develop approximate methods to discover the <inline-formula><tex-math notation="LaTeX">k</tex-math> <inline-graphic xlink:href="fan-ieq4-2685587.gif"/> </inline-formula> most representative streaks with theoretical bounds. We conduct experiments on four real datasets, and the results demonstrate the efficiency and effectiveness of our proposed algorithms: the running time achieves up to 500 times speedup and the quality of the generated summaries is endorsed by the anonymous users from Amazon Mechanical Turk. Automatic discovery of newsworthy themes from sequenced data can relieve journalists from manually poring over a large amount of data in order to find interesting news. In this paper, we propose a novel [Formula Omitted]-Sketch query that aims to find [Formula Omitted] striking streaks to best summarize a subject. Our scoring function takes into account streak strikingness and streak coverage at the same time. We study the [Formula Omitted]-Sketch query processing in both offline and online scenarios, and propose various streak-level pruning techniques to find striking candidates. Among those candidates, we then develop approximate methods to discover the [Formula Omitted] most representative streaks with theoretical bounds. We conduct experiments on four real datasets, and the results demonstrate the efficiency and effectiveness of our proposed algorithms: the running time achieves up to 500 times speedup and the quality of the generated summaries is endorsed by the anonymous users from Amazon Mechanical Turk. |
| Author | Zhang, Dongxiang Fan, Qi Li, Yuchen Tan, Kian-Lee |
| Author_xml | – sequence: 1 givenname: Qi surname: Fan fullname: Fan, Qi email: fan.qi@u.nus.edu organization: NUS Graduate School for Integrative Sciences and Engineering (NGS), National University of Singapore (NUS), Singapore – sequence: 2 givenname: Yuchen surname: Li fullname: Li, Yuchen email: liyuchenmike@gmail.com organization: School of Computing, National University of Singapore, Singapore – sequence: 3 givenname: Dongxiang surname: Zhang fullname: Zhang, Dongxiang email: zhangdo@uestc.edu.cn organization: University of Electronic Science and Technology of China, Sichuan Sheng, China – sequence: 4 givenname: Kian-Lee surname: Tan fullname: Tan, Kian-Lee email: tankl@comp.nus.edu.sg organization: School of Computing, NGS, National University of Singapore, Singapore |
| BookMark | eNp9kDtPwzAUhS1UJErhByAWS8wpthM_wla15VnB0LCwRI5zQ1M1cbBTqv57UqViYGA6dzjfveeeczSobQ0IXVEyppTEt8nLbD5mhMoxE4pzJU_QkHYaMBrTQTeTiAZRGMkzdO79mhCipKJD9DErvbHf4Mr6E7_Czu-sa1d7nKygAo8LZyu8hK8t1AZyPNOtvsMTvGyhwYndaZd7PLVVs211W9pab_Cz3bpOS19doNNCbzxcHnWE3u_nyfQxWLw9PE0ni8CwOGwDHWuT8UwXQKVRxnSBCykyxQkU2nAR5aEpch6LPCIZk0ozAYIpIakmeUh0OEI3_d7G2S6ob9N1n8GnNCZcEimZ6Fy0dxlnvXdQpI0rK-32KSXpocL0UGF6qDA9Vtgx8g9jyv7R1uly8y953ZMlAPxekkqFSvDwBxOCgZo |
| CODEN | ITKEEH |
| CitedBy_id | crossref_primary_10_1109_TKDE_2023_3328596 crossref_primary_10_1109_TKDE_2018_2854182 crossref_primary_10_1016_j_ins_2018_04_031 |
| Cites_doi | 10.1145/2213556.2213580 10.1023/A:1009796218281 10.1007/s10618-011-0232-z 10.1145/780555.780558 10.1007/BF01588971 10.1145/860435.860495 10.1145/2723372.2749451 10.1145/2808194.2809474 10.1137/1.9781611972795.60 10.1109/ICDE.2014.6816644 10.1145/2339530.2339762 10.1007/978-3-642-40328-6_9 10.1145/1281192.1281238 10.3141/1811-08 10.14778/1687627.1687693 10.14778/2733004.2733029 10.1007/s11280-014-0295-z 10.1007/978-3-642-13657-3_34 10.1109/TKDE.2013.44 10.1007/978-3-540-77974-2 10.1145/2601439 10.1145/290941.290954 10.1145/237814.238000 10.1145/1498759.1498766 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2017 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2017 |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| DOI | 10.1109/TKDE.2017.2685587 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998-Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Computer Science |
| EISSN | 1558-2191 |
| EndPage | 1411 |
| ExternalDocumentID | 10_1109_TKDE_2017_2685587 7883865 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Nature Science Foundation of China grantid: 61602087; 61632007 funderid: 10.13039/501100001809 |
| GroupedDBID | -~X .DC 0R~ 29I 4.4 5GY 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACIWK AENEX AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD F5P HZ~ IEDLZ IFIPE IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNS RXW TAE TN5 UHB AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D RIG |
| ID | FETCH-LOGICAL-c293t-a9acb5bafe17c8cc219f76b850efac564d3cfd596d40b278a26e628671a0d30a3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 6 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000403068000004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1041-4347 |
| IngestDate | Mon Jun 30 02:25:28 EDT 2025 Tue Nov 18 22:25:23 EST 2025 Sat Nov 29 04:46:41 EST 2025 Wed Aug 27 02:52:17 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 7 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c293t-a9acb5bafe17c8cc219f76b850efac564d3cfd596d40b278a26e628671a0d30a3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| PQID | 1905707726 |
| PQPubID | 85438 |
| PageCount | 14 |
| ParticipantIDs | proquest_journals_1905707726 ieee_primary_7883865 crossref_primary_10_1109_TKDE_2017_2685587 crossref_citationtrail_10_1109_TKDE_2017_2685587 |
| PublicationCentury | 2000 |
| PublicationDate | 2017-07-01 |
| PublicationDateYYYYMMDD | 2017-07-01 |
| PublicationDate_xml | – month: 07 year: 2017 text: 2017-07-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | IEEE transactions on knowledge and data engineering |
| PublicationTitleAbbrev | TKDE |
| PublicationYear | 2017 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref13 ref12 ref15 ref14 ref11 ref10 berg (ref17) 2008 ref2 ref1 ref16 ref19 ref18 ref23 ref25 ref20 lichman (ref24) 2013 ref22 ref21 ref8 ref7 ref9 ref4 ref3 ref6 ref5 |
| References_xml | – ident: ref10 doi: 10.1145/2213556.2213580 – ident: ref5 doi: 10.1023/A:1009796218281 – ident: ref7 doi: 10.1007/s10618-011-0232-z – ident: ref21 doi: 10.1145/780555.780558 – ident: ref20 doi: 10.1007/BF01588971 – ident: ref14 doi: 10.1145/860435.860495 – ident: ref12 doi: 10.1145/2723372.2749451 – ident: ref16 doi: 10.1145/2808194.2809474 – ident: ref23 doi: 10.1137/1.9781611972795.60 – ident: ref1 doi: 10.1109/ICDE.2014.6816644 – ident: ref2 doi: 10.1145/2339530.2339762 – ident: ref19 doi: 10.1007/978-3-642-40328-6_9 – ident: ref8 doi: 10.1145/1281192.1281238 – ident: ref25 doi: 10.3141/1811-08 – ident: ref18 doi: 10.14778/1687627.1687693 – ident: ref3 doi: 10.14778/2733004.2733029 – ident: ref15 doi: 10.1007/s11280-014-0295-z – ident: ref6 doi: 10.1007/978-3-642-13657-3_34 – ident: ref11 doi: 10.1109/TKDE.2013.44 – year: 2008 ident: ref17 publication-title: Computational Geometry Algorithms and Applications doi: 10.1007/978-3-540-77974-2 – ident: ref4 doi: 10.1145/2601439 – year: 2013 ident: ref24 article-title: UCI machine learning repository – ident: ref13 doi: 10.1145/290941.290954 – ident: ref22 doi: 10.1145/237814.238000 – ident: ref9 doi: 10.1145/1498759.1498766 |
| SSID | ssj0008781 |
| Score | 2.2785618 |
| Snippet | Automatic discovery of newsworthy themes from sequenced data can relieve journalists from manually poring over a large amount of data in order to find... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 1398 |
| SubjectTerms | Algorithm design and analysis Algorithms approximate algorithms Approximation Approximation algorithms Automation Computational efficiency Computational journalism Computing time Datasets Electronic commerce Electronic mail Engineering profession Games History News news theme discovery Pruning Query processing Run time (computers) Running sequenced data Summaries |
| Title | Discovering Newsworthy Themes from Sequenced Data: A Step Towards Computational Journalism |
| URI | https://ieeexplore.ieee.org/document/7883865 https://www.proquest.com/docview/1905707726 |
| Volume | 29 |
| WOSCitedRecordID | wos000403068000004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Xplore customDbUrl: eissn: 1558-2191 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0008781 issn: 1041-4347 databaseCode: RIE dateStart: 19890101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFH_M4UEPTjfF6ZQcPIndsn6l9TbchiAMwQnDS8knDNwmayf435ukWRkogrcWkjT0Je8j-b33A7iRSspU6A1IFFWettfCY1E_8EIqmIpZoH16YckmyGSSzGbpcw3uqlwYKaUFn8muebR3-WLFN-aorKfDNUNRuQd7hJAyV6vSugmxhKQ6utAxURASd4PZx2lv-jQcGRAX6fpxEkUGPbdjgyypyg9NbM3LuPG_iR3DkXMj0aCU-wnU5LIJjS1FA3I7tgmHO_UGW_A2nOfcYDb1GzLqzQIEv5BeKwuZI5Nqgl4ctlqgIS3oPRogAwRDUwuvzVH5CXeCiNw85vniFF7Ho-nDo-fIFTyuLXzh0ZRyFjGqZJ_whHOtuRSJWRJhqSiP4lAEXIkojUWImU8S6sfSpLGSPsUiwDQ4g_pytZTngGSqRKAHMTmu2h9gTGClJFNBKHwuMGsD3v7ujLvK44YA4z2zEQhOMyOhzEgocxJqw23V5aMsu_FX45YRSdXQSaMNna1MM7cx80z7PxHBOqSIL37vdQkHZuwSkduBerHeyCvY55_FPF9f2zX3DU5b1mg |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1bS8MwFD7MC6gP3qY4r3nwSaxLL2la38RNFHUIThi-lFxhoFPWTvDfm6TZEBTBtxaSJvQk55J853wAx0orlUuzAalmOjD2WgachHGQMMl1ymPj00tHNkF7vWwwyB8acDrLhVFKOfCZOrOP7i5fvomJPSprm3DNUlTOwQJJkiiss7VmejejjpLUxBcmKooT6u8wQ5y3-7edroVx0bMozQix-LlvVsjRqvzQxc7AXK39b2rrsOodSXRRS34DGmq0CWtTkgbk9-wmrHyrONiE586wFBa1ad6QVXAOIviJzGp5VSWyySbo0aOrJeqwip2jC2ShYKjvALYlqofwZ4jIz2NYvm7B01W3f3kdeHqFQBgbXwUsZ4ITzrQKqciEMLpL05RnBCvNBEkTGQstSZ7KBPOIZixKlU1kpSHDMsYs3ob50dtI7QBSuZax-YjNcjUeAecSa624jhMZCYl5C_D0dxfC1x63FBgvhYtBcF5YCRVWQoWXUAtOZl3e68IbfzVuWpHMGnpptGB_KtPCb82yMB4QodgEFenu772OYOm6f39X3N30bvdg2Y5T43P3Yb4aT9QBLIqPaliOD936-wKboNmv |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Discovering+Newsworthy+Themes+from+Sequenced+Data%3A+A+Step+Towards+Computational+Journalism&rft.jtitle=IEEE+transactions+on+knowledge+and+data+engineering&rft.au=Fan%2C+Qi&rft.au=Li%2C+Yuchen&rft.au=Zhang%2C+Dongxiang&rft.au=Tan%2C+Kian-Lee&rft.date=2017-07-01&rft.issn=1041-4347&rft.volume=29&rft.issue=7&rft.spage=1398&rft.epage=1411&rft_id=info:doi/10.1109%2FTKDE.2017.2685587&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TKDE_2017_2685587 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1041-4347&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1041-4347&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1041-4347&client=summon |