Massively Distributed Time Series Indexing and Querying

Indexing is crucial for many data mining tasks that rely on efficient and effective similarity query processing. Consequently, indexing large volumes of time series, along with high performance similarity query processing, have became topics of high interest. For many applications across diverse dom...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on knowledge and data engineering Jg. 32; H. 1; S. 108 - 120
Hauptverfasser: Yagoubi, Djamel-Edine, Akbarinia, Reza, Masseglia, Florent, Palpanas, Themis
Format: Journal Article
Sprache:Englisch
Veröffentlicht: New York IEEE 01.01.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Institute of Electrical and Electronics Engineers
Schlagworte:
ISSN:1041-4347, 1558-2191
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Indexing is crucial for many data mining tasks that rely on efficient and effective similarity query processing. Consequently, indexing large volumes of time series, along with high performance similarity query processing, have became topics of high interest. For many applications across diverse domains though, the amount of data to be processed might be intractable for a single machine, making existing centralized indexing solutions inefficient. We propose a parallel indexing solution that gracefully scales to billions of time series, and a parallel query processing strategy that, given a batch of queries, efficiently exploits the index. Our experiments, on both synthetic and real world data, illustrate that our index creation algorithm works on four billion time series in less than five hours, while the state of the art centralized algorithms do not scale and have their limit on 1 billion time series, where they need more than five days. Also, our distributed querying algorithm is able to efficiently process millions of queries over collections of billions of time series, thanks to an effective load balancing mechanism.
AbstractList Indexing is crucial for many data mining tasks that rely on efficient and effective similarity query processing. Consequently, indexing large volumes of time series, along with high performance similarity query processing, have became topics of high interest. For many applications across diverse domains though, the amount of data to be processed might be intractable for a single machine, making existing centralized indexing solutions inefficient. We propose a parallel indexing solution that gracefully scales to billions of time series (or high-dimensional vectors, in general), and a parallel query processing strategy that, given a batch of queries, efficiently exploits the index. Our experiments, on both synthetic and real world data, illustrate that our index creation algorithm works on 4 billion time series in less than 5 hours, while the state of the art centralized algorithms do not scale and have their limit on 1 billion time series, where they need more than 5 days. Also, our distributed querying algorithm is able to efficiently process millions of queries over collections of billions of time series, thanks to an effective load balancing mechanism.
Indexing is crucial for many data mining tasks that rely on efficient and effective similarity query processing. Consequently, indexing large volumes of time series, along with high performance similarity query processing, have became topics of high interest. For many applications across diverse domains though, the amount of data to be processed might be intractable for a single machine, making existing centralized indexing solutions inefficient. We propose a parallel indexing solution that gracefully scales to billions of time series, and a parallel query processing strategy that, given a batch of queries, efficiently exploits the index. Our experiments, on both synthetic and real world data, illustrate that our index creation algorithm works on four billion time series in less than five hours, while the state of the art centralized algorithms do not scale and have their limit on 1 billion time series, where they need more than five days. Also, our distributed querying algorithm is able to efficiently process millions of queries over collections of billions of time series, thanks to an effective load balancing mechanism.
Author Palpanas, Themis
Akbarinia, Reza
Masseglia, Florent
Yagoubi, Djamel-Edine
Author_xml – sequence: 1
  givenname: Djamel-Edine
  surname: Yagoubi
  fullname: Yagoubi, Djamel-Edine
  email: Djamel-Edine.Yagoubi@inria.fr
  organization: Inria-University of Montpellier-Lirmm, Montpellier, Occitanie, France
– sequence: 2
  givenname: Reza
  orcidid: 0000-0003-0372-0241
  surname: Akbarinia
  fullname: Akbarinia, Reza
  email: Reza.Akbarinia@inria.fr
  organization: Inria-University of Montpellier-Lirmm, Montpellier, Occitanie, France
– sequence: 3
  givenname: Florent
  surname: Masseglia
  fullname: Masseglia, Florent
  email: Florent.Masseglia@inria.fr
  organization: Inria-University of Montpellier-Lirmm, Montpellier, Occitanie, France
– sequence: 4
  givenname: Themis
  orcidid: 0000-0002-8031-0265
  surname: Palpanas
  fullname: Palpanas, Themis
  email: themis@mi.parisdescartes.fr
  organization: Paris Descartes University, Paris, France
BackLink https://hal-lirmm.ccsd.cnrs.fr/lirmm-02197618$$DView record in HAL
BookMark eNp9kEFPwkAQhTdGEwH9AcZLE4-muLPLdqdHAyhEjDHiebO0U11TWtwtRP69JaAHD55mJvO9yZvXZcdVXRFjF8D7ADy9mT-Mxn3BAfsCkQtQR6wDSmEsIIXjtucDiAdyoE9ZN4QPzjlqhA7TjzYEt6FyG41caLxbrBvKo7lbUvRC3lGIplVOX656i2yVR89r8tt2OGMnhS0DnR9qj73ejefDSTx7up8Ob2dxJrVo4lxkiS1QZinkqcXFAjKlVWaVxgRTsrkVBSlZiLywSMiJFCImRaJkjguhZY9d7---29KsvFtavzW1dWZyOzOl88ulab9NdQK4gZa-2tMrX3-uKTTmo177qjVohBQKUColWkrvqczXIXgqTOYa27i6arx1pQFudpmaXaZml6k5ZNoq4Y_yx9J_msu9xhHRL4-q3SPIb9PYgvY
CODEN ITKEEH
CitedBy_id crossref_primary_10_1007_s10618_020_00685_w
crossref_primary_10_3390_en17215478
crossref_primary_10_1016_j_is_2025_102524
crossref_primary_10_1109_TKDE_2024_3487759
crossref_primary_10_7717_peerj_cs_1929
crossref_primary_10_3390_a14120353
crossref_primary_10_1109_TKDE_2023_3270264
crossref_primary_10_1145_3749160
crossref_primary_10_1007_s00778_021_00677_2
crossref_primary_10_1145_3588965
crossref_primary_10_3390_mi13030385
crossref_primary_10_1016_j_ins_2024_121320
crossref_primary_10_1109_TKDE_2022_3167257
crossref_primary_10_1007_s10115_020_01518_4
crossref_primary_10_1145_3709729
crossref_primary_10_1109_TKDE_2020_2975180
crossref_primary_10_14778_3717755_3717760
crossref_primary_10_1155_2021_9948533
crossref_primary_10_3390_ijgi12040179
Cites_doi 10.1109/ISPASS.2010.5452045
10.14778/2536206.2536208
10.1007/s10618-007-0064-z
10.1145/191843.191925
10.1109/ICDM.2016.0179
10.1007/s10618-009-0125-6
10.1007/978-3-662-49192-8_6
10.1109/ICDM.2010.124
10.1007/3-540-57301-1_5
10.1145/2814710.2814719
10.1109/MCI.2014.2326100
10.1016/B978-155860869-6/50043-3
10.1007/s00778-016-0442-5
10.1145/2588555.2610498
10.1145/882085.882086
10.1145/1401890.1401966
10.1109/ICASSP.1999.757470
10.1145/2339530.2339576
10.1007/s10115-012-0606-6
10.1145/1557019.1557122
10.1109/TKDE.2015.2411594
10.1145/1327452.1327492
10.1145/2379776.2379788
10.1109/ICASSP.2011.5946540
10.1145/1352431.1352464
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020
Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020
– notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
1XC
VOOES
DOI 10.1109/TKDE.2018.2880215
DatabaseName IEEE Xplore (IEEE)
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Hyper Article en Ligne (HAL)
Hyper Article en Ligne (HAL) (Open Access)
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList

Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1558-2191
EndPage 120
ExternalDocumentID oai:HAL:lirmm-02197618v1
10_1109_TKDE_2018_2880215
8528881
Genre orig-research
GrantInformation_xml – fundername: European Union's Horizon 2,020
  grantid: 732051
GroupedDBID -~X
.DC
0R~
29I
4.4
5GY
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACIWK
AENEX
AGQYO
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
F5P
HZ~
IEDLZ
IFIPE
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNS
RXW
TAE
TN5
UHB
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
1OL
1XC
5VS
9M8
ABFSI
AETIX
AGSQL
AI.
AIBXA
ALLEH
E.L
H~9
ICLAB
IFJZH
RNI
RZB
TAF
VH1
VOOES
ID FETCH-LOGICAL-c372t-d2c6af83c91d9a8bb1c575ca578689eada2fe53f2dfa8e80ee58886f653d8b273
IEDL.DBID RIE
ISICitedReferencesCount 26
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000502988400009&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1041-4347
IngestDate Tue Oct 14 20:34:11 EDT 2025
Sun Jun 29 16:39:58 EDT 2025
Tue Nov 18 22:24:34 EST 2025
Sat Nov 29 04:46:47 EST 2025
Wed Aug 27 02:49:36 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Keywords Parallel Indexing
Index Terms-Time Series
Distributed Querying
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c372t-d2c6af83c91d9a8bb1c575ca578689eada2fe53f2dfa8e80ee58886f653d8b273
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0003-0372-0241
0000-0002-8031-0265
0000-0002-1149-585X
0000-0002-7098-0361
OpenAccessLink https://hal-lirmm.ccsd.cnrs.fr/lirmm-02197618
PQID 2325183552
PQPubID 85438
PageCount 13
ParticipantIDs crossref_citationtrail_10_1109_TKDE_2018_2880215
ieee_primary_8528881
proquest_journals_2325183552
crossref_primary_10_1109_TKDE_2018_2880215
hal_primary_oai_HAL_lirmm_02197618v1
PublicationCentury 2000
PublicationDate 2020-Jan.-1
2020-1-1
20200101
2020
PublicationDateYYYYMMDD 2020-01-01
PublicationDate_xml – month: 01
  year: 2020
  text: 2020-Jan.-1
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on knowledge and data engineering
PublicationTitleAbbrev TKDE
PublicationYear 2020
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Institute of Electrical and Electronics Engineers
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
– name: Institute of Electrical and Electronics Engineers
References ref13
ref12
ref15
ref14
ref11
ref10
ref2
ref1
ref17
ref16
(ref9) 0
ref18
baumgartner (ref19) 2014; 563
zaharia (ref27) 2010
ref24
ref23
ref26
ref25
ref20
ref22
ref28
ref29
ref8
ref7
shasha (ref21) 1999; 22
ref4
ref3
ref6
ref5
References_xml – ident: ref20
  doi: 10.1109/ISPASS.2010.5452045
– ident: ref24
  doi: 10.14778/2536206.2536208
– ident: ref14
  doi: 10.1007/s10618-007-0064-z
– start-page: 10
  year: 2010
  ident: ref27
  article-title: Spark: Cluster computing with working sets
  publication-title: Proc 2nd USENIX Conf Hot Topics Cloud Comput
– ident: ref7
  doi: 10.1145/191843.191925
– ident: ref26
  doi: 10.1109/ICDM.2016.0179
– ident: ref23
  doi: 10.1007/s10618-009-0125-6
– ident: ref16
  doi: 10.1007/978-3-662-49192-8_6
– ident: ref3
  doi: 10.1109/ICDM.2010.124
– ident: ref1
  doi: 10.1007/3-540-57301-1_5
– ident: ref15
  doi: 10.1145/2814710.2814719
– ident: ref8
  doi: 10.1109/MCI.2014.2326100
– ident: ref12
  doi: 10.1016/B978-155860869-6/50043-3
– ident: ref29
  doi: 10.1007/s00778-016-0442-5
– year: 0
  ident: ref9
  article-title: Seismic data access
– ident: ref28
  doi: 10.1145/2588555.2610498
– ident: ref13
  doi: 10.1145/882085.882086
– volume: 563
  start-page: 1
  year: 2014
  ident: ref19
  article-title: Long-term variability of AGN at hard X-rays
  publication-title: Astronomy Astrophysics
– ident: ref22
  doi: 10.1145/1401890.1401966
– ident: ref11
  doi: 10.1109/ICASSP.1999.757470
– ident: ref17
  doi: 10.1145/2339530.2339576
– ident: ref4
  doi: 10.1007/s10115-012-0606-6
– ident: ref25
  doi: 10.1145/1557019.1557122
– volume: 22
  start-page: 40
  year: 1999
  ident: ref21
  article-title: Tuning time series queries in finance: Case studies and recommendations
  publication-title: IEEE Data Eng Bull
– ident: ref18
  doi: 10.1109/TKDE.2015.2411594
– ident: ref5
  doi: 10.1145/1327452.1327492
– ident: ref6
  doi: 10.1145/2379776.2379788
– ident: ref10
  doi: 10.1109/ICASSP.2011.5946540
– ident: ref2
  doi: 10.1145/1352431.1352464
SSID ssj0008781
Score 2.5033412
Snippet Indexing is crucial for many data mining tasks that rely on efficient and effective similarity query processing. Consequently, indexing large volumes of time...
SourceID hal
proquest
crossref
ieee
SourceType Open Access Repository
Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 108
SubjectTerms Algorithms
Computer Science
Data mining
distributed querying
Domains
Euclidean distance
Indexing
Information Retrieval
parallel indexing
Queries
Query processing
Servers
Similarity
Task analysis
Time series
Time series analysis
Title Massively Distributed Time Series Indexing and Querying
URI https://ieeexplore.ieee.org/document/8528881
https://www.proquest.com/docview/2325183552
https://hal-lirmm.ccsd.cnrs.fr/lirmm-02197618
Volume 32
WOSCitedRecordID wos000502988400009&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 1558-2191
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0008781
  issn: 1041-4347
  databaseCode: RIE
  dateStart: 19890101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSxxBEC5UckgOGjXBVRP64ClkdGd6ph9H8YHBRBIw4K3pRw0R1lX2IfjvreqdHSIJgdz6UAVNf9316O6qD-CgRZmUN77QNtZFrX0qyO-oIlXaoqwN9xPJZBP66src3NjvK_C5r4VBxPz5DA95mN_y032c81XZEWsarrNe1VotarV6q2t0JiSl7IJyIlnr7gWzHNqj68vTM_7EZQ5JnX3cCx-0-ot_QGZqlT_scXYy5xv_N723sN4Fk-J4gf4mrOB4CzaWRA2iO7db8Oa3roPboL9RwExGbvQkTrltLjNeYRJcDCL4sgyn4gv3UCRh4cdJ_JjjhIuh3sHP87Prk4ui408ootTVjNY7Kt8aGW2ZrDchlJGCs-jpkCpjaQv5qsVGtlVqvUEzRGxo_qpVjUwmUFzzHtbG92PcASFr0k8h1NZQgBWZCCEGJWNAFVoMZgDD5Yq62DUXZ46LkctJxtA6BsExCK4DYQCfepWHRWeNfwkfEEy9HPfEvjj-6ka3k7s7RwIUVJXmsRzANsPSy3WIDGB_iavrjujUUSjZkD1rmmr371p78Lri5Drft-zD2mwyxw_wKj7ObqeTj3n3PQPQXdXy
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSxxBEC6MCSQeYqIJbmKSPngKGd2Znkf3UXyw4roksAFvTT9qUFjXsA_Bf29V2zskRAK5zaFqaPrrrkd3V30Aey3KUFtls0b7MisbGzLyO3UWikajLBX3E4lkE81opC4v9fc1-NbVwiBifHyG-_wZ7_LDrV_yUdkBayqus37OzFmpWquzu6qJlKSUX1BWJMsm3WHmfX0wPj8-4Wdcap9-wF7uDy_07IrfQEZylb8scnQzp5v_N8A38DqFk-LwEf-3sIbTLdhcUTWItHO3YOO3voPb0FxQyExmbnIvjrlxLnNeYRBcDiL4uAzn4oy7KJKwsNMgfixxxuVQ7-Dn6cn4aJAlBoXMy6ZY0Iz72rZKep0HbZVzuafwzFvaprXStIhs0WIl2yK0VqHqI1Y0_rqtKxmUo8jmPaxPb6e4A0KWpB-cK7WiEMszFYJ3tfQOa9eiUz3or2bU-NRenFkuJiamGX1tGATDIJgEQg--diq_Hntr_Et4j2Dq5Lgr9uBwaCbXs5sbQwIUVuXqLu_BNsPSySVEerC7wtWkTTo3FExWZNGqqvjwtNYXeDkYXwzN8Gx0_hFeFZxqx9OXXVhfzJb4CV74u8X1fPY5rsQHH2zZOw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Massively+Distributed+Time+Series+Indexing+and+Querying&rft.jtitle=IEEE+transactions+on+knowledge+and+data+engineering&rft.au=Yagoubi%2C+Djamel-Edine+Edine&rft.au=Akbarinia%2C+Reza&rft.au=Masseglia%2C+Florent&rft.au=Palpanas%2C+Themis&rft.date=2020&rft.pub=Institute+of+Electrical+and+Electronics+Engineers&rft.issn=1041-4347&rft.volume=32&rft.issue=1&rft.spage=108&rft.epage=120&rft_id=info:doi/10.1109%2FTKDE.2018.2880215&rft.externalDBID=HAS_PDF_LINK&rft.externalDocID=oai%3AHAL%3Alirmm-02197618v1
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1041-4347&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1041-4347&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1041-4347&client=summon