Beyond one billion time series: indexing and mining very large time series collections with $$i$$ SAX2+

Gespeichert in:
Bibliographische Detailangaben
Titel: Beyond one billion time series: indexing and mining very large time series collections with $$i$$ SAX2+
Autoren: A. Camerra, J. Shieh, Palpanas, Themistoklis, T. Rakthanmanon, E. Keogh
Quelle: Knowledge and Information Systems. 39:123-151
Verlagsinformationen: Springer Science and Business Media LLC, 2013.
Publikationsjahr: 2013
Schlagwörter: 13. Climate action, 0202 electrical engineering, electronic engineering, information engineering, time series, data mining, representations, indexing, bulk loading, 14. Life underwater, 02 engineering and technology, 7. Clean energy
Beschreibung: There is an increasingly pressing need, by several applications in diverse domains, for developing techniques able to index and mine very large collections of time series. Examples of such applications come from astronomy, biology, the web, and other domains. It is not unusual for these applications to involve numbers of time series in the order of hundreds of millions to billions. However, all relevant techniques that have been proposed in the literature so far have not considered any data collections much larger than one-million time series. In this paper, we describe $$i$$ SAX 2.0 and its improvements, $$i$$ SAX 2.0 Clustered and $$i$$ SAX2+, three methods designed for indexing and mining truly massive collections of time series. We show that the main bottleneck in mining such massive datasets is the time taken to build the index, and we thus introduce a novel bulk loading mechanism, the first of this kind specifically tailored to a time series index. We show how our methods allows mining on datasets that would otherwise be completely untenable, including the first published experiments to index one billion time series, and experiments in mining massive data from domains as diverse as entomology, DNA and web-scale image collections.
Publikationsart: Article
Sprache: English
ISSN: 0219-3116
0219-1377
DOI: 10.1007/s10115-012-0606-6
Zugangs-URL: https://dblp.uni-trier.de/db/journals/kais/kais39.html#CamerraSPRK14
https://cpe.ku.ac.th/~fengtwr/paper/13KAIS_iSAX2.0_plus.pdf
http://www.mi.parisdescartes.fr/~themisp/publications/kais14-isax2plus.pdf
https://link.springer.com/article/10.1007/s10115-012-0606-6
https://doi.org/10.1007/s10115-012-0606-6
http://disi.unitn.it/~themis/publications/kais14-isax2plus.pdf
http://link.springer.com/article/10.1007/s10115-012-0606-6#page-1
https://doi.org/10.1007/s10115-012-0606-6
https://hdl.handle.net/11572/95220
Rights: Springer TDM
Dokumentencode: edsair.doi.dedup.....c97e26fece128a8fe11d08d6175a8ef6
Datenbank: OpenAIRE
Beschreibung
Abstract:There is an increasingly pressing need, by several applications in diverse domains, for developing techniques able to index and mine very large collections of time series. Examples of such applications come from astronomy, biology, the web, and other domains. It is not unusual for these applications to involve numbers of time series in the order of hundreds of millions to billions. However, all relevant techniques that have been proposed in the literature so far have not considered any data collections much larger than one-million time series. In this paper, we describe $$i$$ SAX 2.0 and its improvements, $$i$$ SAX 2.0 Clustered and $$i$$ SAX2+, three methods designed for indexing and mining truly massive collections of time series. We show that the main bottleneck in mining such massive datasets is the time taken to build the index, and we thus introduce a novel bulk loading mechanism, the first of this kind specifically tailored to a time series index. We show how our methods allows mining on datasets that would otherwise be completely untenable, including the first published experiments to index one billion time series, and experiments in mining massive data from domains as diverse as entomology, DNA and web-scale image collections.
ISSN:02193116
02191377
DOI:10.1007/s10115-012-0606-6