Scalable Detection of Anomalous Patterns With Connectivity Constraints
We present GraphScan, a novel method for detecting arbitrarily shaped connected clusters in graph or network data. Given a graph structure, data observed at each node, and a score function defining the anomalousness of a set of nodes, GraphScan can efficiently and exactly identify the most anomalous...
Saved in:
| Published in: | Journal of computational and graphical statistics Vol. 24; no. 4; pp. 1014 - 1033 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Alexandria
Taylor & Francis
02.10.2015
American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America Taylor & Francis Ltd |
| Subjects: | |
| ISSN: | 1061-8600, 1537-2715 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | We present GraphScan, a novel method for detecting arbitrarily shaped connected clusters in graph or network data. Given a graph structure, data observed at each node, and a score function defining the anomalousness of a set of nodes, GraphScan can efficiently and exactly identify the most anomalous (highest-scoring) connected subgraph. Kulldorff's spatial scan, which searches over circles consisting of a center location and its k − 1 nearest neighbors, has been extended to include connectivity constraints by FlexScan. However, FlexScan performs an exhaustive search over connected subsets and is computationally infeasible for k > 30. Alternatively, the upper level set (ULS) scan scales well to large graphs but is not guaranteed to find the highest-scoring subset. We demonstrate that GraphScan is able to scale to graphs an order of magnitude larger than FlexScan, while guaranteeing that the highest-scoring subgraph will be identified. We evaluate GraphScan, Kulldorff's spatial scan (searching over circles) and ULS in two different settings of public health surveillance. The first examines detection power using simulated disease outbreaks injected into real-world Emergency Department data. GraphScan improved detection power by identifying connected, irregularly shaped spatial clusters while requiring less than 4.3 sec of computation time per day of data. The second scenario uses contaminant plumes spreading through a water distribution system to evaluate the spatial accuracy of the methods. GraphScan improved spatial accuracy using data generated from noisy, binary sensors in the network while requiring less than 0.22 sec of computation time per hour of data. |
|---|---|
| AbstractList | We present GraphScan, a novel method for detecting arbitrarily shaped connected clusters in graph or network data. Given a graph structure, data observed at each node, and a score function defining the anomalousness of a set of nodes, GraphScan can efficiently and exactly identify the most anomalous (highest-scoring) connected subgraph. Kulldorff's spatial scan, which searches over circles consisting of a center location and its k - 1 nearest neighbors, has been extended to include connectivity constraints by FlexScan. However, FlexScan performs an exhaustive search over connected subsets and is computationally infeasible for k > 30. Alternatively, the upper level set (ULS) scan scales well to large graphs but is not guaranteed to find the highest-scoring subset. We demonstrate that GraphScan is able to scale to graphs an order of magnitude larger than FlexScan, while guaranteeing that the highest-scoring subgraph will be identified. We evaluate GraphScan, Kulldorff's spatial scan (searching over circles) and ULS in two different settings of public health surveillance. The first examines detection power using simulated disease outbreaks injected into real-world Emergency Department data. GraphScan improved detection power by identifying connected, irregularly shaped spatial clusters while requiring less than 4.3 sec of computation time per day of data. The second scenario uses contaminant plumes spreading through a water distribution system to evaluate the spatial accuracy of the methods. GraphScan improved spatial accuracy using data generated from noisy, binary sensors in the network while requiring less than 0.22 sec of computation time per hour of data. We present GraphScan, a novel method for detecting arbitrarily shaped connected clusters in graph or network data. Given a graph structure, data observed at each node, and a score function defining the anomalousness of a set of nodes, GraphScan can efficiently and exactly identify the most anomalous (highest-scoring) connected subgraph. Kulldorff's spatial scan, which searches over circles consisting of a center location and its k − 1 nearest neighbors, has been extended to include connectivity constraints by FlexScan. However, FlexScan performs an exhaustive search over connected subsets and is computationally infeasible for k > 30. Alternatively, the upper level set (ULS) scan scales well to large graphs but is not guaranteed to find the highest-scoring subset. We demonstrate that GraphScan is able to scale to graphs an order of magnitude larger than FlexScan, while guaranteeing that the highest-scoring subgraph will be identified. We evaluate GraphScan, Kulldorff's spatial scan (searching over circles) and ULS in two different settings of public health surveillance. The first examines detection power using simulated disease outbreaks injected into real-world Emergency Department data. GraphScan improved detection power by identifying connected, irregularly shaped spatial clusters while requiring less than 4.3 sec of computation time per day of data. The second scenario uses contaminant plumes spreading through a water distribution system to evaluate the spatial accuracy of the methods. GraphScan improved spatial accuracy using data generated from noisy, binary sensors in the network while requiring less than 0.22 sec of computation time per hour of data. |
| Author | Speakman, Skyler McFowland, Edward Neill, Daniel B. |
| Author_xml | – sequence: 1 givenname: Skyler surname: Speakman fullname: Speakman, Skyler – sequence: 2 givenname: Edward surname: McFowland fullname: McFowland, Edward – sequence: 3 givenname: Daniel B. surname: Neill fullname: Neill, Daniel B. |
| BookMark | eNqFkF1LwzAUhoNMcE7_gULB6858NE3rjYzpVBgoqHgZ0jTBjC6ZSabs39tQ9cILhUASzvOck7yHYGSdVQCcIDhFsILnCJaoKiGcYoiKaV3CGpd7YIwoYTlmiI76c4_kiTkAhyGsIISorNkYLB6l6ETTqexKRSWjcTZzOptZtxad24bsQcSovA3Zi4mv2dxZm6h3E3fpEqIXxsZwBPa16II6_ton4Hlx_TS_zZf3N3fz2TKXhLKY11SSlgmKoVaSkRYXElMqkSYI10XDSoKQZjVtWtjoGguF-lVq2TRtq4mAZALOhr4b7962KkS-cltv-5EcMVphWJCy6qmLgZLeheCV5tJEkf6WnttxBHnKjX_nxlNufMitl4tf8sabtfC7_7TTQVuF6PyPgwtGGEapfjnUjdXOr8WH813Lo9h1zmsvrDSBkz8nfAIfgI-P |
| CitedBy_id | crossref_primary_10_1109_JPROC_2018_2813311 crossref_primary_10_25300_MISQ_2021_15684 crossref_primary_10_1080_24709360_2022_2069458 crossref_primary_10_1016_j_cose_2020_102085 crossref_primary_10_1109_MIS_2017_25 crossref_primary_10_1080_10618600_2022_2077351 crossref_primary_10_1080_24725854_2022_2037792 crossref_primary_10_1145_3487893 crossref_primary_10_1186_s12911_018_0706_7 crossref_primary_10_1016_j_neucom_2020_04_064 crossref_primary_10_1080_24709360_2022_2065628 crossref_primary_10_3389_fpubh_2024_1432645 crossref_primary_10_1287_ijoc_2022_1192 crossref_primary_10_1016_j_jpdc_2019_04_006 crossref_primary_10_1145_3309712 crossref_primary_10_1177_0962280220930562 crossref_primary_10_1002_cpe_3769 crossref_primary_10_1109_TKDE_2018_2868097 crossref_primary_10_1016_j_csda_2020_107149 |
| Cites_doi | 10.2307/1910129 10.1145/1014052.1014082 10.1002/sim.4780140809 10.1016/B978-012369378-5/50018-1 10.1145/1081870.1081897 10.1145/347090.347121 10.5486/PMD.1959.6.3-4.12 10.1080/03610929708831995 10.1137/1.9781611972788.66 10.1186/1476-072X-8-20 10.1002/sim.2490 10.1016/j.ijforecast.2008.12.002 10.1023/B:EEST.0000027208.48919.7e 10.1111/j.1467-9868.2011.01014.x 10.1186/1476-072X-4-11 10.1061/(ASCE)0733-9496(2008)134:6(556) 10.1016/S0167-9473(02)00302-X 10.1145/1281192.1281239 |
| ContentType | Journal Article |
| Copyright | 2015 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America 2015 2015 American Statistical Association, the Institute of Mathematical Statistics, and the Interface Foundation of North America Copyright American Statistical Association 2015 |
| Copyright_xml | – notice: 2015 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America 2015 – notice: 2015 American Statistical Association, the Institute of Mathematical Statistics, and the Interface Foundation of North America – notice: Copyright American Statistical Association 2015 |
| DBID | AAYXX CITATION JQ2 |
| DOI | 10.1080/10618600.2014.960926 |
| DatabaseName | CrossRef ProQuest Computer Science Collection |
| DatabaseTitle | CrossRef ProQuest Computer Science Collection |
| DatabaseTitleList | ProQuest Computer Science Collection |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Statistics Mathematics Public Health |
| EISSN | 1537-2715 |
| EndPage | 1033 |
| ExternalDocumentID | 3926504121 10_1080_10618600_2014_960926 24737216 960926 |
| Genre | Article Feature |
| GroupedDBID | -~X .4S .7F .DC .QJ 0BK 0R~ 2AX 30N 4.4 5GY AAENE AAGDL AAHIA AAJMT AALDU AAMIU AAPUL AAQRR AAWIL ABAWQ ABBHK ABCCY ABFAN ABFIM ABJNI ABLIJ ABLJU ABPAQ ABPEM ABQDR ABTAI ABXSQ ABXUL ABXYU ABYWD ACDIW ACGFO ACGFS ACHJO ACIWK ACMTB ACTIO ACTMH ADCVX ADGTB ADODI ADULT ADXHL AEGXH AELLO AENEX AEOZL AEPSL AEUPB AEYOC AFRVT AFVYC AGDLA AGLNM AGMYJ AHDZW AIAGR AIHAF AIJEM AKBRZ AKBVH AKOOK ALMA_UNASSIGNED_HOLDINGS ALQZU ALRMG AMVHM AQRUH AQTUD ARCSS AVBZW AWYRJ BLEHA CCCUG CS3 D0L DGEBU DKSSO DQDLB DSRWC DU5 EBS ECEWR EJD E~A E~B F5P GTTXZ H13 HF~ HQ6 HZ~ H~P IPNFZ IPSME J.P JAA JAAYA JBMMH JBZCM JENOY JHFFW JKQEH JLEZI JLXEF JMS JPL JST KYCEM M4Z MS~ NA5 NY~ O9- P2P PQQKQ RIG RNANH ROSJB RTWRZ RWL RXW S-T SA0 SNACF TAE TASJS TBQAZ TDBHL TEJ TFL TFT TFW TN5 TTHFI TUROJ TUS UT5 UU3 WZA XWC ZGOLN ~S~ ADYSH AMPGV AAYXX CITATION JQ2 |
| ID | FETCH-LOGICAL-c357t-95c3d7a520fec73d24c255c1f31294b76311f795bd0bf92ae1ae16fcbbddf3a03 |
| IEDL.DBID | TFW |
| ISICitedReferencesCount | 23 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000366327600006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1061-8600 |
| IngestDate | Mon Nov 10 02:56:17 EST 2025 Tue Nov 18 21:26:40 EST 2025 Sat Nov 29 03:24:15 EST 2025 Fri May 30 11:17:12 EDT 2025 Mon Oct 20 23:42:49 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 4 |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c357t-95c3d7a520fec73d24c255c1f31294b76311f795bd0bf92ae1ae16fcbbddf3a03 |
| Notes | SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 |
| PQID | 1758204368 |
| PQPubID | 29738 |
| PageCount | 20 |
| ParticipantIDs | crossref_primary_10_1080_10618600_2014_960926 informaworld_taylorfrancis_310_1080_10618600_2014_960926 crossref_citationtrail_10_1080_10618600_2014_960926 proquest_journals_1758204368 jstor_primary_24737216 |
| PublicationCentury | 2000 |
| PublicationDate | 2015-10-02 |
| PublicationDateYYYYMMDD | 2015-10-02 |
| PublicationDate_xml | – month: 10 year: 2015 text: 2015-10-02 day: 02 |
| PublicationDecade | 2010 |
| PublicationPlace | Alexandria |
| PublicationPlace_xml | – name: Alexandria |
| PublicationTitle | Journal of computational and graphical statistics |
| PublicationYear | 2015 |
| Publisher | Taylor & Francis American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America Taylor & Francis Ltd |
| Publisher_xml | – name: Taylor & Francis – name: American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America – name: Taylor & Francis Ltd |
| References | cit0011 cit0012 cit0021 Erdös P. (cit0003) 1959; 6 Buckeridge D.L. (cit0001) 2004; 53 McFowland E. (cit0010) 2013; 14 cit0008 cit0019 cit0009 Wallstrom G.L. (cit0020) 2005; 54 cit0006 cit0017 cit0007 cit0018 cit0004 cit0015 cit0005 cit0016 cit0002 cit0013 cit0014 |
| References_xml | – volume: 54 start-page: 85 year: 2005 ident: cit0020 publication-title: Morbidity and Mortality Weekly Report – ident: cit0008 doi: 10.2307/1910129 – ident: cit0015 doi: 10.1145/1014052.1014082 – ident: cit0007 doi: 10.1002/sim.4780140809 – ident: cit0011 doi: 10.1016/B978-012369378-5/50018-1 – ident: cit0016 doi: 10.1145/1081870.1081897 – ident: cit0004 doi: 10.1145/347090.347121 – volume: 6 start-page: 290 year: 1959 ident: cit0003 publication-title: Publicationes Mathematicae doi: 10.5486/PMD.1959.6.3-4.12 – ident: cit0005 doi: 10.1080/03610929708831995 – ident: cit0021 doi: 10.1137/1.9781611972788.66 – ident: cit0012 doi: 10.1186/1476-072X-8-20 – ident: cit0006 doi: 10.1002/sim.2490 – volume: 53 start-page: 137 year: 2004 ident: cit0001 publication-title: Morbidity and Mortality Weekly Report – ident: cit0013 doi: 10.1016/j.ijforecast.2008.12.002 – ident: cit0018 doi: 10.1023/B:EEST.0000027208.48919.7e – ident: cit0014 doi: 10.1111/j.1467-9868.2011.01014.x – ident: cit0019 doi: 10.1186/1476-072X-4-11 – ident: cit0017 doi: 10.1061/(ASCE)0733-9496(2008)134:6(556) – ident: cit0002 doi: 10.1016/S0167-9473(02)00302-X – volume: 14 start-page: 1533 year: 2013 ident: cit0010 publication-title: Journal of Machine Learning Research – ident: cit0009 doi: 10.1145/1281192.1281239 |
| SSID | ssj0001697 |
| Score | 2.2222295 |
| Snippet | We present GraphScan, a novel method for detecting arbitrarily shaped connected clusters in graph or network data. Given a graph structure, data observed at... |
| SourceID | proquest crossref jstor informaworld |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 1014 |
| SubjectTerms | Biosurveillance Cluster analysis Clustering and Pattern Detection Epidemics Event detection Graph mining Graph theory Public health Scan statistics Sensors Simulation Spatial scan statistic Studies |
| Title | Scalable Detection of Anomalous Patterns With Connectivity Constraints |
| URI | https://www.tandfonline.com/doi/abs/10.1080/10618600.2014.960926 https://www.jstor.org/stable/24737216 https://www.proquest.com/docview/1758204368 |
| Volume | 24 |
| WOSCitedRecordID | wos000366327600006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAWR databaseName: Taylor & Francis customDbUrl: eissn: 1537-2715 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001697 issn: 1061-8600 databaseCode: TFW dateStart: 19920301 isFulltext: true titleUrlDefault: https://www.tandfonline.com providerName: Taylor & Francis |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3dS8MwEA8yfJgPfkyH0yl58LXaNOnX41CHDzoGTra30KQJCrOVtfPv95J-4BAVFPoS2uvHXXJ3v3L5HUIXWgkIA5F0DBeaw6KAOAkDsKJDX_jUjwKWWJ7Z-3AyiRaLePppF78pqzQYWldEEdZXm8WdiKKpiLsyKCaCQG0Ks9iloUzzDOc2JPampm82nreumNTdVUDAMRLN3rlvbrIRmzaYS5tqxS8e24ah8d7_P2Af7dYpKB5Vc-YAbamsh3YeWv7Wooe6JgetKJwP0fgR7Gh2WOEbVdrKrQznGo-y_DVZ5usCTy1HZ1bg-Uv5jG3pjKyaUphBYbtQlMURehrfzq7vnLr9giOpH5ZO7EuahonvuVrJkKYek4A_JNEUcgQmwDERosPYF6krdOwlisARaClEmmqauLSPOlmeqWOEUyoCquJAKFcyASCPeYDMRBClgGYgYxkg2iiey5qb3LzckpOawrRRGTcq45XKBshppd4qbo5fro8-25SX9p-IrhqYcPqzaN_av32Ox0x_HwInhs2E4PXqLzikZJFnuf1P_v7IU9SFkaWJdb0h6pSrtTpD2_IdzL86t_P8A8E29gs |
| linkProvider | Taylor & Francis |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1ZS8QwEB68QH3wWBVv--BrtW3S61HURXF3WXDFfQtNmqCwtrLt-vudpAeKqCBCX0o6PSbTmfnC5BuAUyU5hoFI2JoLzaZR4NoJRbCiQp_7xI8Cmhie2V44GETjcTysqwmLuqxSY2hVEUUYX61_br0Y3ZTEnWsYE2Gk1pVZ9ExzpnnBPCzq5nQaf426j60zduv-Kihha5Fm99w3d_kUnT5xlzb1il98tglE3fV_-IQNWKuzUOuiMptNmJNZB1b7LYVr0YEVnYZWLM5b0L3HqdSbrKwrWZrirczKlXWR5S_JJJ8V1tDQdGaF9fhcPlmmekZUfSn0SWEaUZTFNjx0r0eXN3bdgcEWxA9LO_YFScPE9xwlRUhSjwqEIMJVBNMEytE3ua4KY5-nDlexl0gXj0AJztNUkcQhO7CQ5ZncBSslPCAyDrh0BOWI86iH4IwHUYqABpOWPSCN5pmo6cn1y02YW7OYNipjWmWsUtke2K3Ua0XP8cv10cdJZaVZFlFVDxNGfhbdMQbQPsejusWPiwOHjUWw2gEUDLOyyDP0_vt_f-QJLN-M-j3Wux3cHcAKjhjWWMc7hIVyOpNHsCTe0BSmx8bo3wFP7fou |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8NAEB58IfXgu_iomoPXaJLdvI5iDYq1FFTqbcludlGoaWlSf7-zm6QoooJCLmEzeczOzswXZr8BOFWSYxiIhK250GwaBa6dUgQrKvS5T_wooKnhme2F_X709BQPPuzi12WVGkOriijC-Gq9uCeZairizjWKiTBQ68IseqYp07xgEZYxc_a1XT8kw7kvduv2Kihha5Fm89w3d_kUnD5Rlzblil9ctolDycb_v2AT1usc1LqojGYLFmS-DWt3cwLXYhtaOgmtOJx3ILnHidRbrKyuLE3pVm6NlXWRj1_T0XhWWAND0pkX1vClfLZM7YyoulLok8K0oSiLXXhMrh4ur-26_4ItiB-WduwLkoWp7zlKipBkHhUIQISrCCYJlKNncl0Vxj7PHK5iL5UuHoESnGeZIqlD2rCUj3O5B1ZGeEBkHHDpCMoR5VEPoRkPogzhDKYs-0AaxTNRk5Prlxsxt-YwbVTGtMpYpbJ9sOdSk4qc45fro49zykrzU0RVHUwY-Vm0beZ__hyP6gY_Lg50GoNg9fIvGOZkkWfI_Q_-_sgTWB10E9a76d8eQgsHDGWs43VgqZzO5BGsiDe0hOmxMfl3VuT44A |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Scalable+Detection+of+Anomalous+Patterns+With+Connectivity+Constraints&rft.jtitle=Journal+of+computational+and+graphical+statistics&rft.au=Speakman%2C+Skyler&rft.au=McFowland%2C+Edward&rft.au=Neill%2C+Daniel+B.&rft.date=2015-10-02&rft.pub=Taylor+%26+Francis&rft.issn=1061-8600&rft.eissn=1537-2715&rft.volume=24&rft.issue=4&rft.spage=1014&rft.epage=1033&rft_id=info:doi/10.1080%2F10618600.2014.960926&rft.externalDocID=960926 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1061-8600&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1061-8600&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1061-8600&client=summon |