There and back again: Outlier detection between statistical reasoning and data mining algorithms
Outlier detection has been a topic in statistics for centuries. Over mainly the last two decades, there has been also an increasing interest in the database and data mining community to develop scalable methods for outlier detection. Initially based on statistical reasoning, however, these methods s...
Gespeichert in:
| Veröffentlicht in: | Wiley interdisciplinary reviews. Data mining and knowledge discovery Jg. 8; H. 6; S. e1280 - n/a |
|---|---|
| Hauptverfasser: | , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Hoboken, USA
Wiley Periodicals, Inc
01.11.2018
Wiley Subscription Services, Inc |
| Schlagworte: | |
| ISSN: | 1942-4787, 1942-4795 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | Outlier detection has been a topic in statistics for centuries. Over mainly the last two decades, there has been also an increasing interest in the database and data mining community to develop scalable methods for outlier detection. Initially based on statistical reasoning, however, these methods soon lost the direct probabilistic interpretability of the derived outlier scores. Here, we detail from a joint point of view of data mining and statistics the roots and the path of development of statistical outlier detection and of database‐related data mining methods for outlier detection. We discuss their inherent meaning, review approaches to again find a statistically meaningful interpretation of outlier scores, and sketch related current research topics.
This article is categorized under:
Algorithmic Development > Statistics
Algorithmic Development > Scalable Statistical Methods
Technologies > Machine Learning
Masking and swamping: A distribution model (green density contours) computed for the inliers (green points) reveals the outlier (red point) as far of. If the outlier, however, was taken into account when fitting the distribution model to the data (red density contours), the outlier itself might be well covered by the model (it is masked), while some inlier might now appear as being too far off (the lower right inlier is swamped). |
|---|---|
| Bibliographie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1942-4787 1942-4795 |
| DOI: | 10.1002/widm.1280 |