Discovering and Explaining Abnormal Nodes in Semantic Graphs

An important problem in the area of homeland security is to identify suspicious entities in large datasets. Although there are methods from knowledge discovery and data mining (KDD) focusing on finding anomalies in numerical datasets, there has been little work aimed at discovering suspicious instan...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on knowledge and data engineering Ročník 20; číslo 8; s. 1039 - 1052
Hlavní autoři: Shou-de Lin, Chalupsky, H.
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York IEEE 01.08.2008
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:1041-4347, 1558-2191
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:An important problem in the area of homeland security is to identify suspicious entities in large datasets. Although there are methods from knowledge discovery and data mining (KDD) focusing on finding anomalies in numerical datasets, there has been little work aimed at discovering suspicious instances in large and complex semantic graphs whose nodes are richly connected with many different types of links. In this paper, we describe a novel, domain independent and unsupervised framework to identify such instances. Besides discovering suspicious instances, we believe that to complete the process, a system has to convince the users by providing understandable explanations for its findings. Therefore, in the second part of the paper we describe several explanation mechanisms to automatically generate human understandable explanations for the discovered results. To evaluate our discovery and explanation systems, we perform experiments on several different semantic graphs. The results show that our discovery system outperforms the state-of-the-art unsupervised network algorithms used to analyze the 9/11 terrorist network by a large margin. Additionally, the human study we conducted demonstrates that our explanation system, which provides natural language explanations for its findings, allowed human subjects to perform complex data analysis in a much more efficient and accurate manner.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ObjectType-Article-2
ObjectType-Feature-1
content type line 23
ISSN:1041-4347
1558-2191
DOI:10.1109/TKDE.2007.190691