DBSTexC Density-Based Spatio-Textual Clustering on Twitter
Density-based spatial clustering of applications with noise (DBSCAN) is the most commonly used density-based clustering algorithm, where it can discover multiple clusters with arbitrary shapes. DBSCAN works properly when the input data type is homogeneous, but the DBSCAN's approach may not be s...
Gespeichert in:
| Veröffentlicht in: | Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017 S. 23 - 26 |
|---|---|
| Hauptverfasser: | , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
New York, NY, USA
ACM
31.07.2017
|
| Schriftenreihe: | ACM Conferences |
| Schlagworte: | |
| ISBN: | 1450349935, 9781450349932 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | Density-based spatial clustering of applications with noise (DBSCAN) is the most commonly used density-based clustering algorithm, where it can discover multiple clusters with arbitrary shapes. DBSCAN works properly when the input data type is homogeneous, but the DBSCAN's approach may not be sufficient when the input dataset has textual heterogeneity (e.g., when we intend to find clusters from geo-tagged posts on social media relevant to a certain point-of-interest (POI)), thus leading to poor performance. In this paper, we present DBSTexC, a new density-based clustering algorithm using spatio--textual information on Twitter. We first define POI-relevant and POI-irrelevant tweets as the records that contain and do not contain a POI name or its coherent variations, respectively. By taking into account the fractions of POI-relevant and POI-irrelevant tweets, our DBSTexC algorithm shows a much higher clustering quality than the DBSCAN case in terms of the F1 score and its variants. DBSTexC can be thought of as a generalized version of DBSCAN due to the findings that it performs identically as DBSCAN when the inputs are homogeneous and far outperforms DBSCAN when the heterogeneous input data type is given. |
|---|---|
| ISBN: | 1450349935 9781450349932 |
| DOI: | 10.1145/3110025.3110096 |

