Scalable Detection of Anomalous Patterns With Connectivity Constraints

We present GraphScan, a novel method for detecting arbitrarily shaped connected clusters in graph or network data. Given a graph structure, data observed at each node, and a score function defining the anomalousness of a set of nodes, GraphScan can efficiently and exactly identify the most anomalous...

Full description

Saved in:
Bibliographic Details
Published in:Journal of computational and graphical statistics Vol. 24; no. 4; pp. 1014 - 1033
Main Authors: Speakman, Skyler, McFowland, Edward, Neill, Daniel B.
Format: Journal Article
Language:English
Published: Alexandria Taylor & Francis 02.10.2015
American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America
Taylor & Francis Ltd
Subjects:
ISSN:1061-8600, 1537-2715
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract We present GraphScan, a novel method for detecting arbitrarily shaped connected clusters in graph or network data. Given a graph structure, data observed at each node, and a score function defining the anomalousness of a set of nodes, GraphScan can efficiently and exactly identify the most anomalous (highest-scoring) connected subgraph. Kulldorff's spatial scan, which searches over circles consisting of a center location and its k − 1 nearest neighbors, has been extended to include connectivity constraints by FlexScan. However, FlexScan performs an exhaustive search over connected subsets and is computationally infeasible for k > 30. Alternatively, the upper level set (ULS) scan scales well to large graphs but is not guaranteed to find the highest-scoring subset. We demonstrate that GraphScan is able to scale to graphs an order of magnitude larger than FlexScan, while guaranteeing that the highest-scoring subgraph will be identified. We evaluate GraphScan, Kulldorff's spatial scan (searching over circles) and ULS in two different settings of public health surveillance. The first examines detection power using simulated disease outbreaks injected into real-world Emergency Department data. GraphScan improved detection power by identifying connected, irregularly shaped spatial clusters while requiring less than 4.3 sec of computation time per day of data. The second scenario uses contaminant plumes spreading through a water distribution system to evaluate the spatial accuracy of the methods. GraphScan improved spatial accuracy using data generated from noisy, binary sensors in the network while requiring less than 0.22 sec of computation time per hour of data.
AbstractList We present GraphScan, a novel method for detecting arbitrarily shaped connected clusters in graph or network data. Given a graph structure, data observed at each node, and a score function defining the anomalousness of a set of nodes, GraphScan can efficiently and exactly identify the most anomalous (highest-scoring) connected subgraph. Kulldorff's spatial scan, which searches over circles consisting of a center location and its k - 1 nearest neighbors, has been extended to include connectivity constraints by FlexScan. However, FlexScan performs an exhaustive search over connected subsets and is computationally infeasible for k > 30. Alternatively, the upper level set (ULS) scan scales well to large graphs but is not guaranteed to find the highest-scoring subset. We demonstrate that GraphScan is able to scale to graphs an order of magnitude larger than FlexScan, while guaranteeing that the highest-scoring subgraph will be identified. We evaluate GraphScan, Kulldorff's spatial scan (searching over circles) and ULS in two different settings of public health surveillance. The first examines detection power using simulated disease outbreaks injected into real-world Emergency Department data. GraphScan improved detection power by identifying connected, irregularly shaped spatial clusters while requiring less than 4.3 sec of computation time per day of data. The second scenario uses contaminant plumes spreading through a water distribution system to evaluate the spatial accuracy of the methods. GraphScan improved spatial accuracy using data generated from noisy, binary sensors in the network while requiring less than 0.22 sec of computation time per hour of data.
We present GraphScan, a novel method for detecting arbitrarily shaped connected clusters in graph or network data. Given a graph structure, data observed at each node, and a score function defining the anomalousness of a set of nodes, GraphScan can efficiently and exactly identify the most anomalous (highest-scoring) connected subgraph. Kulldorff's spatial scan, which searches over circles consisting of a center location and its k − 1 nearest neighbors, has been extended to include connectivity constraints by FlexScan. However, FlexScan performs an exhaustive search over connected subsets and is computationally infeasible for k > 30. Alternatively, the upper level set (ULS) scan scales well to large graphs but is not guaranteed to find the highest-scoring subset. We demonstrate that GraphScan is able to scale to graphs an order of magnitude larger than FlexScan, while guaranteeing that the highest-scoring subgraph will be identified. We evaluate GraphScan, Kulldorff's spatial scan (searching over circles) and ULS in two different settings of public health surveillance. The first examines detection power using simulated disease outbreaks injected into real-world Emergency Department data. GraphScan improved detection power by identifying connected, irregularly shaped spatial clusters while requiring less than 4.3 sec of computation time per day of data. The second scenario uses contaminant plumes spreading through a water distribution system to evaluate the spatial accuracy of the methods. GraphScan improved spatial accuracy using data generated from noisy, binary sensors in the network while requiring less than 0.22 sec of computation time per hour of data.
Author Speakman, Skyler
McFowland, Edward
Neill, Daniel B.
Author_xml – sequence: 1
  givenname: Skyler
  surname: Speakman
  fullname: Speakman, Skyler
– sequence: 2
  givenname: Edward
  surname: McFowland
  fullname: McFowland, Edward
– sequence: 3
  givenname: Daniel B.
  surname: Neill
  fullname: Neill, Daniel B.
BookMark eNqFkF1LwzAUhoNMcE7_gULB6858NE3rjYzpVBgoqHgZ0jTBjC6ZSabs39tQ9cILhUASzvOck7yHYGSdVQCcIDhFsILnCJaoKiGcYoiKaV3CGpd7YIwoYTlmiI76c4_kiTkAhyGsIISorNkYLB6l6ETTqexKRSWjcTZzOptZtxad24bsQcSovA3Zi4mv2dxZm6h3E3fpEqIXxsZwBPa16II6_ton4Hlx_TS_zZf3N3fz2TKXhLKY11SSlgmKoVaSkRYXElMqkSYI10XDSoKQZjVtWtjoGguF-lVq2TRtq4mAZALOhr4b7962KkS-cltv-5EcMVphWJCy6qmLgZLeheCV5tJEkf6WnttxBHnKjX_nxlNufMitl4tf8sabtfC7_7TTQVuF6PyPgwtGGEapfjnUjdXOr8WH813Lo9h1zmsvrDSBkz8nfAIfgI-P
CitedBy_id crossref_primary_10_1109_JPROC_2018_2813311
crossref_primary_10_25300_MISQ_2021_15684
crossref_primary_10_1080_24709360_2022_2069458
crossref_primary_10_1016_j_cose_2020_102085
crossref_primary_10_1109_MIS_2017_25
crossref_primary_10_1080_10618600_2022_2077351
crossref_primary_10_1080_24725854_2022_2037792
crossref_primary_10_1145_3487893
crossref_primary_10_1186_s12911_018_0706_7
crossref_primary_10_1016_j_neucom_2020_04_064
crossref_primary_10_1080_24709360_2022_2065628
crossref_primary_10_3389_fpubh_2024_1432645
crossref_primary_10_1287_ijoc_2022_1192
crossref_primary_10_1016_j_jpdc_2019_04_006
crossref_primary_10_1145_3309712
crossref_primary_10_1177_0962280220930562
crossref_primary_10_1002_cpe_3769
crossref_primary_10_1109_TKDE_2018_2868097
crossref_primary_10_1016_j_csda_2020_107149
Cites_doi 10.2307/1910129
10.1145/1014052.1014082
10.1002/sim.4780140809
10.1016/B978-012369378-5/50018-1
10.1145/1081870.1081897
10.1145/347090.347121
10.5486/PMD.1959.6.3-4.12
10.1080/03610929708831995
10.1137/1.9781611972788.66
10.1186/1476-072X-8-20
10.1002/sim.2490
10.1016/j.ijforecast.2008.12.002
10.1023/B:EEST.0000027208.48919.7e
10.1111/j.1467-9868.2011.01014.x
10.1186/1476-072X-4-11
10.1061/(ASCE)0733-9496(2008)134:6(556)
10.1016/S0167-9473(02)00302-X
10.1145/1281192.1281239
ContentType Journal Article
Copyright 2015 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America 2015
2015 American Statistical Association, the Institute of Mathematical Statistics, and the Interface Foundation of North America
Copyright American Statistical Association 2015
Copyright_xml – notice: 2015 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America 2015
– notice: 2015 American Statistical Association, the Institute of Mathematical Statistics, and the Interface Foundation of North America
– notice: Copyright American Statistical Association 2015
DBID AAYXX
CITATION
JQ2
DOI 10.1080/10618600.2014.960926
DatabaseName CrossRef
ProQuest Computer Science Collection
DatabaseTitle CrossRef
ProQuest Computer Science Collection
DatabaseTitleList ProQuest Computer Science Collection


DeliveryMethod fulltext_linktorsrc
Discipline Statistics
Mathematics
Public Health
EISSN 1537-2715
EndPage 1033
ExternalDocumentID 3926504121
10_1080_10618600_2014_960926
24737216
960926
Genre Article
Feature
GroupedDBID -~X
.4S
.7F
.DC
.QJ
0BK
0R~
2AX
30N
4.4
5GY
AAENE
AAGDL
AAHIA
AAJMT
AALDU
AAMIU
AAPUL
AAQRR
AAWIL
ABAWQ
ABBHK
ABCCY
ABFAN
ABFIM
ABJNI
ABLIJ
ABLJU
ABPAQ
ABPEM
ABQDR
ABTAI
ABXSQ
ABXUL
ABXYU
ABYWD
ACDIW
ACGFO
ACGFS
ACHJO
ACIWK
ACMTB
ACTIO
ACTMH
ADCVX
ADGTB
ADODI
ADULT
ADXHL
AEGXH
AELLO
AENEX
AEOZL
AEPSL
AEUPB
AEYOC
AFRVT
AFVYC
AGDLA
AGLNM
AGMYJ
AHDZW
AIAGR
AIHAF
AIJEM
AKBRZ
AKBVH
AKOOK
ALMA_UNASSIGNED_HOLDINGS
ALQZU
ALRMG
AMVHM
AQRUH
AQTUD
ARCSS
AVBZW
AWYRJ
BLEHA
CCCUG
CS3
D0L
DGEBU
DKSSO
DQDLB
DSRWC
DU5
EBS
ECEWR
EJD
E~A
E~B
F5P
GTTXZ
H13
HF~
HQ6
HZ~
H~P
IPNFZ
IPSME
J.P
JAA
JAAYA
JBMMH
JBZCM
JENOY
JHFFW
JKQEH
JLEZI
JLXEF
JMS
JPL
JST
KYCEM
M4Z
MS~
NA5
NY~
O9-
P2P
PQQKQ
RIG
RNANH
ROSJB
RTWRZ
RWL
RXW
S-T
SA0
SNACF
TAE
TASJS
TBQAZ
TDBHL
TEJ
TFL
TFT
TFW
TN5
TTHFI
TUROJ
TUS
UT5
UU3
WZA
XWC
ZGOLN
~S~
ADYSH
AMPGV
AAYXX
CITATION
JQ2
ID FETCH-LOGICAL-c357t-95c3d7a520fec73d24c255c1f31294b76311f795bd0bf92ae1ae16fcbbddf3a03
IEDL.DBID TFW
ISICitedReferencesCount 23
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000366327600006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1061-8600
IngestDate Mon Nov 10 02:56:17 EST 2025
Tue Nov 18 21:26:40 EST 2025
Sat Nov 29 03:24:15 EST 2025
Fri May 30 11:17:12 EDT 2025
Mon Oct 20 23:42:49 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 4
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c357t-95c3d7a520fec73d24c255c1f31294b76311f795bd0bf92ae1ae16fcbbddf3a03
Notes SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
PQID 1758204368
PQPubID 29738
PageCount 20
ParticipantIDs crossref_primary_10_1080_10618600_2014_960926
informaworld_taylorfrancis_310_1080_10618600_2014_960926
crossref_citationtrail_10_1080_10618600_2014_960926
proquest_journals_1758204368
jstor_primary_24737216
PublicationCentury 2000
PublicationDate 2015-10-02
PublicationDateYYYYMMDD 2015-10-02
PublicationDate_xml – month: 10
  year: 2015
  text: 2015-10-02
  day: 02
PublicationDecade 2010
PublicationPlace Alexandria
PublicationPlace_xml – name: Alexandria
PublicationTitle Journal of computational and graphical statistics
PublicationYear 2015
Publisher Taylor & Francis
American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America
Taylor & Francis Ltd
Publisher_xml – name: Taylor & Francis
– name: American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America
– name: Taylor & Francis Ltd
References cit0011
cit0012
cit0021
Erdös P. (cit0003) 1959; 6
Buckeridge D.L. (cit0001) 2004; 53
McFowland E. (cit0010) 2013; 14
cit0008
cit0019
cit0009
Wallstrom G.L. (cit0020) 2005; 54
cit0006
cit0017
cit0007
cit0018
cit0004
cit0015
cit0005
cit0016
cit0002
cit0013
cit0014
References_xml – volume: 54
  start-page: 85
  year: 2005
  ident: cit0020
  publication-title: Morbidity and Mortality Weekly Report
– ident: cit0008
  doi: 10.2307/1910129
– ident: cit0015
  doi: 10.1145/1014052.1014082
– ident: cit0007
  doi: 10.1002/sim.4780140809
– ident: cit0011
  doi: 10.1016/B978-012369378-5/50018-1
– ident: cit0016
  doi: 10.1145/1081870.1081897
– ident: cit0004
  doi: 10.1145/347090.347121
– volume: 6
  start-page: 290
  year: 1959
  ident: cit0003
  publication-title: Publicationes Mathematicae
  doi: 10.5486/PMD.1959.6.3-4.12
– ident: cit0005
  doi: 10.1080/03610929708831995
– ident: cit0021
  doi: 10.1137/1.9781611972788.66
– ident: cit0012
  doi: 10.1186/1476-072X-8-20
– ident: cit0006
  doi: 10.1002/sim.2490
– volume: 53
  start-page: 137
  year: 2004
  ident: cit0001
  publication-title: Morbidity and Mortality Weekly Report
– ident: cit0013
  doi: 10.1016/j.ijforecast.2008.12.002
– ident: cit0018
  doi: 10.1023/B:EEST.0000027208.48919.7e
– ident: cit0014
  doi: 10.1111/j.1467-9868.2011.01014.x
– ident: cit0019
  doi: 10.1186/1476-072X-4-11
– ident: cit0017
  doi: 10.1061/(ASCE)0733-9496(2008)134:6(556)
– ident: cit0002
  doi: 10.1016/S0167-9473(02)00302-X
– volume: 14
  start-page: 1533
  year: 2013
  ident: cit0010
  publication-title: Journal of Machine Learning Research
– ident: cit0009
  doi: 10.1145/1281192.1281239
SSID ssj0001697
Score 2.2222295
Snippet We present GraphScan, a novel method for detecting arbitrarily shaped connected clusters in graph or network data. Given a graph structure, data observed at...
SourceID proquest
crossref
jstor
informaworld
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 1014
SubjectTerms Biosurveillance
Cluster analysis
Clustering and Pattern Detection
Epidemics
Event detection
Graph mining
Graph theory
Public health
Scan statistics
Sensors
Simulation
Spatial scan statistic
Studies
Title Scalable Detection of Anomalous Patterns With Connectivity Constraints
URI https://www.tandfonline.com/doi/abs/10.1080/10618600.2014.960926
https://www.jstor.org/stable/24737216
https://www.proquest.com/docview/1758204368
Volume 24
WOSCitedRecordID wos000366327600006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAWR
  databaseName: Taylor & Francis
  customDbUrl:
  eissn: 1537-2715
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001697
  issn: 1061-8600
  databaseCode: TFW
  dateStart: 19920301
  isFulltext: true
  titleUrlDefault: https://www.tandfonline.com
  providerName: Taylor & Francis
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3dS8MwEA8yfJgPfkyH0yl58LXaNOnX41CHDzoGTra30KQJCrOVtfPv95J-4BAVFPoS2uvHXXJ3v3L5HUIXWgkIA5F0DBeaw6KAOAkDsKJDX_jUjwKWWJ7Z-3AyiRaLePppF78pqzQYWldEEdZXm8WdiKKpiLsyKCaCQG0Ks9iloUzzDOc2JPampm82nreumNTdVUDAMRLN3rlvbrIRmzaYS5tqxS8e24ah8d7_P2Af7dYpKB5Vc-YAbamsh3YeWv7Wooe6JgetKJwP0fgR7Gh2WOEbVdrKrQznGo-y_DVZ5usCTy1HZ1bg-Uv5jG3pjKyaUphBYbtQlMURehrfzq7vnLr9giOpH5ZO7EuahonvuVrJkKYek4A_JNEUcgQmwDERosPYF6krdOwlisARaClEmmqauLSPOlmeqWOEUyoCquJAKFcyASCPeYDMRBClgGYgYxkg2iiey5qb3LzckpOawrRRGTcq45XKBshppd4qbo5fro8-25SX9p-IrhqYcPqzaN_av32Ox0x_HwInhs2E4PXqLzikZJFnuf1P_v7IU9SFkaWJdb0h6pSrtTpD2_IdzL86t_P8A8E29gs
linkProvider Taylor & Francis
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1ZS8QwEB68QH3wWBVv--BrtW3S61HURXF3WXDFfQtNmqCwtrLt-vudpAeKqCBCX0o6PSbTmfnC5BuAUyU5hoFI2JoLzaZR4NoJRbCiQp_7xI8Cmhie2V44GETjcTysqwmLuqxSY2hVEUUYX61_br0Y3ZTEnWsYE2Gk1pVZ9ExzpnnBPCzq5nQaf426j60zduv-Kihha5Fm99w3d_kUnT5xlzb1il98tglE3fV_-IQNWKuzUOuiMptNmJNZB1b7LYVr0YEVnYZWLM5b0L3HqdSbrKwrWZrirczKlXWR5S_JJJ8V1tDQdGaF9fhcPlmmekZUfSn0SWEaUZTFNjx0r0eXN3bdgcEWxA9LO_YFScPE9xwlRUhSjwqEIMJVBNMEytE3ua4KY5-nDlexl0gXj0AJztNUkcQhO7CQ5ZncBSslPCAyDrh0BOWI86iH4IwHUYqABpOWPSCN5pmo6cn1y02YW7OYNipjWmWsUtke2K3Ua0XP8cv10cdJZaVZFlFVDxNGfhbdMQbQPsejusWPiwOHjUWw2gEUDLOyyDP0_vt_f-QJLN-M-j3Wux3cHcAKjhjWWMc7hIVyOpNHsCTe0BSmx8bo3wFP7fou
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8NAEB58IfXgu_iomoPXaJLdvI5iDYq1FFTqbcludlGoaWlSf7-zm6QoooJCLmEzeczOzswXZr8BOFWSYxiIhK250GwaBa6dUgQrKvS5T_wooKnhme2F_X709BQPPuzi12WVGkOriijC-Gq9uCeZairizjWKiTBQ68IseqYp07xgEZYxc_a1XT8kw7kvduv2Kihha5Fm89w3d_kUnD5Rlzblil9ctolDycb_v2AT1usc1LqojGYLFmS-DWt3cwLXYhtaOgmtOJx3ILnHidRbrKyuLE3pVm6NlXWRj1_T0XhWWAND0pkX1vClfLZM7YyoulLok8K0oSiLXXhMrh4ur-26_4ItiB-WduwLkoWp7zlKipBkHhUIQISrCCYJlKNncl0Vxj7PHK5iL5UuHoESnGeZIqlD2rCUj3O5B1ZGeEBkHHDpCMoR5VEPoRkPogzhDKYs-0AaxTNRk5Prlxsxt-YwbVTGtMpYpbJ9sOdSk4qc45fro49zykrzU0RVHUwY-Vm0beZ__hyP6gY_Lg50GoNg9fIvGOZkkWfI_Q_-_sgTWB10E9a76d8eQgsHDGWs43VgqZzO5BGsiDe0hOmxMfl3VuT44A
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Scalable+Detection+of+Anomalous+Patterns+With+Connectivity+Constraints&rft.jtitle=Journal+of+computational+and+graphical+statistics&rft.au=Speakman%2C+Skyler&rft.au=McFowland%2C+Edward&rft.au=Neill%2C+Daniel+B.&rft.date=2015-10-02&rft.pub=Taylor+%26+Francis&rft.issn=1061-8600&rft.eissn=1537-2715&rft.volume=24&rft.issue=4&rft.spage=1014&rft.epage=1033&rft_id=info:doi/10.1080%2F10618600.2014.960926&rft.externalDocID=960926
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1061-8600&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1061-8600&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1061-8600&client=summon