Massively Distributed Time Series Indexing and Querying
Indexing is crucial for many data mining tasks that rely on efficient and effective similarity query processing. Consequently, indexing large volumes of time series, along with high performance similarity query processing, have became topics of high interest. For many applications across diverse dom...
Gespeichert in:
| Veröffentlicht in: | IEEE transactions on knowledge and data engineering Jg. 32; H. 1; S. 108 - 120 |
|---|---|
| Hauptverfasser: | , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
New York
IEEE
01.01.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Institute of Electrical and Electronics Engineers |
| Schlagworte: | |
| ISSN: | 1041-4347, 1558-2191 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Indexing is crucial for many data mining tasks that rely on efficient and effective similarity query processing. Consequently, indexing large volumes of time series, along with high performance similarity query processing, have became topics of high interest. For many applications across diverse domains though, the amount of data to be processed might be intractable for a single machine, making existing centralized indexing solutions inefficient. We propose a parallel indexing solution that gracefully scales to billions of time series, and a parallel query processing strategy that, given a batch of queries, efficiently exploits the index. Our experiments, on both synthetic and real world data, illustrate that our index creation algorithm works on four billion time series in less than five hours, while the state of the art centralized algorithms do not scale and have their limit on 1 billion time series, where they need more than five days. Also, our distributed querying algorithm is able to efficiently process millions of queries over collections of billions of time series, thanks to an effective load balancing mechanism. |
|---|---|
| AbstractList | Indexing is crucial for many data mining tasks that rely on efficient and effective similarity query processing. Consequently, indexing large volumes of time series, along with high performance similarity query processing, have became topics of high interest. For many applications across diverse domains though, the amount of data to be processed might be intractable for a single machine, making existing centralized indexing solutions inefficient. We propose a parallel indexing solution that gracefully scales to billions of time series (or high-dimensional vectors, in general), and a parallel query processing strategy that, given a batch of queries, efficiently exploits the index. Our experiments, on both synthetic and real world data, illustrate that our index creation algorithm works on 4 billion time series in less than 5 hours, while the state of the art centralized algorithms do not scale and have their limit on 1 billion time series, where they need more than 5 days. Also, our distributed querying algorithm is able to efficiently process millions of queries over collections of billions of time series, thanks to an effective load balancing mechanism. Indexing is crucial for many data mining tasks that rely on efficient and effective similarity query processing. Consequently, indexing large volumes of time series, along with high performance similarity query processing, have became topics of high interest. For many applications across diverse domains though, the amount of data to be processed might be intractable for a single machine, making existing centralized indexing solutions inefficient. We propose a parallel indexing solution that gracefully scales to billions of time series, and a parallel query processing strategy that, given a batch of queries, efficiently exploits the index. Our experiments, on both synthetic and real world data, illustrate that our index creation algorithm works on four billion time series in less than five hours, while the state of the art centralized algorithms do not scale and have their limit on 1 billion time series, where they need more than five days. Also, our distributed querying algorithm is able to efficiently process millions of queries over collections of billions of time series, thanks to an effective load balancing mechanism. |
| Author | Palpanas, Themis Akbarinia, Reza Masseglia, Florent Yagoubi, Djamel-Edine |
| Author_xml | – sequence: 1 givenname: Djamel-Edine surname: Yagoubi fullname: Yagoubi, Djamel-Edine email: Djamel-Edine.Yagoubi@inria.fr organization: Inria-University of Montpellier-Lirmm, Montpellier, Occitanie, France – sequence: 2 givenname: Reza orcidid: 0000-0003-0372-0241 surname: Akbarinia fullname: Akbarinia, Reza email: Reza.Akbarinia@inria.fr organization: Inria-University of Montpellier-Lirmm, Montpellier, Occitanie, France – sequence: 3 givenname: Florent surname: Masseglia fullname: Masseglia, Florent email: Florent.Masseglia@inria.fr organization: Inria-University of Montpellier-Lirmm, Montpellier, Occitanie, France – sequence: 4 givenname: Themis orcidid: 0000-0002-8031-0265 surname: Palpanas fullname: Palpanas, Themis email: themis@mi.parisdescartes.fr organization: Paris Descartes University, Paris, France |
| BackLink | https://hal-lirmm.ccsd.cnrs.fr/lirmm-02197618$$DView record in HAL |
| BookMark | eNp9kEFPwkAQhTdGEwH9AcZLE4-muLPLdqdHAyhEjDHiebO0U11TWtwtRP69JaAHD55mJvO9yZvXZcdVXRFjF8D7ADy9mT-Mxn3BAfsCkQtQR6wDSmEsIIXjtucDiAdyoE9ZN4QPzjlqhA7TjzYEt6FyG41caLxbrBvKo7lbUvRC3lGIplVOX656i2yVR89r8tt2OGMnhS0DnR9qj73ejefDSTx7up8Ob2dxJrVo4lxkiS1QZinkqcXFAjKlVWaVxgRTsrkVBSlZiLywSMiJFCImRaJkjguhZY9d7---29KsvFtavzW1dWZyOzOl88ulab9NdQK4gZa-2tMrX3-uKTTmo177qjVohBQKUColWkrvqczXIXgqTOYa27i6arx1pQFudpmaXaZml6k5ZNoq4Y_yx9J_msu9xhHRL4-q3SPIb9PYgvY |
| CODEN | ITKEEH |
| CitedBy_id | crossref_primary_10_1007_s10618_020_00685_w crossref_primary_10_3390_en17215478 crossref_primary_10_1016_j_is_2025_102524 crossref_primary_10_1109_TKDE_2024_3487759 crossref_primary_10_7717_peerj_cs_1929 crossref_primary_10_3390_a14120353 crossref_primary_10_1109_TKDE_2023_3270264 crossref_primary_10_1145_3749160 crossref_primary_10_1007_s00778_021_00677_2 crossref_primary_10_1145_3588965 crossref_primary_10_3390_mi13030385 crossref_primary_10_1016_j_ins_2024_121320 crossref_primary_10_1109_TKDE_2022_3167257 crossref_primary_10_1007_s10115_020_01518_4 crossref_primary_10_1145_3709729 crossref_primary_10_1109_TKDE_2020_2975180 crossref_primary_10_14778_3717755_3717760 crossref_primary_10_1155_2021_9948533 crossref_primary_10_3390_ijgi12040179 |
| Cites_doi | 10.1109/ISPASS.2010.5452045 10.14778/2536206.2536208 10.1007/s10618-007-0064-z 10.1145/191843.191925 10.1109/ICDM.2016.0179 10.1007/s10618-009-0125-6 10.1007/978-3-662-49192-8_6 10.1109/ICDM.2010.124 10.1007/3-540-57301-1_5 10.1145/2814710.2814719 10.1109/MCI.2014.2326100 10.1016/B978-155860869-6/50043-3 10.1007/s00778-016-0442-5 10.1145/2588555.2610498 10.1145/882085.882086 10.1145/1401890.1401966 10.1109/ICASSP.1999.757470 10.1145/2339530.2339576 10.1007/s10115-012-0606-6 10.1145/1557019.1557122 10.1109/TKDE.2015.2411594 10.1145/1327452.1327492 10.1145/2379776.2379788 10.1109/ICASSP.2011.5946540 10.1145/1352431.1352464 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020 Distributed under a Creative Commons Attribution 4.0 International License |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020 – notice: Distributed under a Creative Commons Attribution 4.0 International License |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D 1XC VOOES |
| DOI | 10.1109/TKDE.2018.2880215 |
| DatabaseName | IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Hyper Article en Ligne (HAL) Hyper Article en Ligne (HAL) (Open Access) |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Computer Science |
| EISSN | 1558-2191 |
| EndPage | 120 |
| ExternalDocumentID | oai:HAL:lirmm-02197618v1 10_1109_TKDE_2018_2880215 8528881 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: European Union's Horizon 2,020 grantid: 732051 |
| GroupedDBID | -~X .DC 0R~ 29I 4.4 5GY 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACIWK AENEX AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD F5P HZ~ IEDLZ IFIPE IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNS RXW TAE TN5 UHB AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D 1OL 1XC 5VS 9M8 ABFSI AETIX AGSQL AI. AIBXA ALLEH E.L H~9 ICLAB IFJZH RNI RZB TAF VH1 VOOES |
| ID | FETCH-LOGICAL-c372t-d2c6af83c91d9a8bb1c575ca578689eada2fe53f2dfa8e80ee58886f653d8b273 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 26 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000502988400009&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1041-4347 |
| IngestDate | Tue Oct 14 20:34:11 EDT 2025 Sun Jun 29 16:39:58 EDT 2025 Tue Nov 18 22:24:34 EST 2025 Sat Nov 29 04:46:47 EST 2025 Wed Aug 27 02:49:36 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 1 |
| Keywords | Parallel Indexing Index Terms-Time Series Distributed Querying |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c372t-d2c6af83c91d9a8bb1c575ca578689eada2fe53f2dfa8e80ee58886f653d8b273 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0003-0372-0241 0000-0002-8031-0265 0000-0002-1149-585X 0000-0002-7098-0361 |
| OpenAccessLink | https://hal-lirmm.ccsd.cnrs.fr/lirmm-02197618 |
| PQID | 2325183552 |
| PQPubID | 85438 |
| PageCount | 13 |
| ParticipantIDs | crossref_citationtrail_10_1109_TKDE_2018_2880215 ieee_primary_8528881 proquest_journals_2325183552 crossref_primary_10_1109_TKDE_2018_2880215 hal_primary_oai_HAL_lirmm_02197618v1 |
| PublicationCentury | 2000 |
| PublicationDate | 2020-Jan.-1 2020-1-1 20200101 2020 |
| PublicationDateYYYYMMDD | 2020-01-01 |
| PublicationDate_xml | – month: 01 year: 2020 text: 2020-Jan.-1 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | IEEE transactions on knowledge and data engineering |
| PublicationTitleAbbrev | TKDE |
| PublicationYear | 2020 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Institute of Electrical and Electronics Engineers |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) – name: Institute of Electrical and Electronics Engineers |
| References | ref13 ref12 ref15 ref14 ref11 ref10 ref2 ref1 ref17 ref16 (ref9) 0 ref18 baumgartner (ref19) 2014; 563 zaharia (ref27) 2010 ref24 ref23 ref26 ref25 ref20 ref22 ref28 ref29 ref8 ref7 shasha (ref21) 1999; 22 ref4 ref3 ref6 ref5 |
| References_xml | – ident: ref20 doi: 10.1109/ISPASS.2010.5452045 – ident: ref24 doi: 10.14778/2536206.2536208 – ident: ref14 doi: 10.1007/s10618-007-0064-z – start-page: 10 year: 2010 ident: ref27 article-title: Spark: Cluster computing with working sets publication-title: Proc 2nd USENIX Conf Hot Topics Cloud Comput – ident: ref7 doi: 10.1145/191843.191925 – ident: ref26 doi: 10.1109/ICDM.2016.0179 – ident: ref23 doi: 10.1007/s10618-009-0125-6 – ident: ref16 doi: 10.1007/978-3-662-49192-8_6 – ident: ref3 doi: 10.1109/ICDM.2010.124 – ident: ref1 doi: 10.1007/3-540-57301-1_5 – ident: ref15 doi: 10.1145/2814710.2814719 – ident: ref8 doi: 10.1109/MCI.2014.2326100 – ident: ref12 doi: 10.1016/B978-155860869-6/50043-3 – ident: ref29 doi: 10.1007/s00778-016-0442-5 – year: 0 ident: ref9 article-title: Seismic data access – ident: ref28 doi: 10.1145/2588555.2610498 – ident: ref13 doi: 10.1145/882085.882086 – volume: 563 start-page: 1 year: 2014 ident: ref19 article-title: Long-term variability of AGN at hard X-rays publication-title: Astronomy Astrophysics – ident: ref22 doi: 10.1145/1401890.1401966 – ident: ref11 doi: 10.1109/ICASSP.1999.757470 – ident: ref17 doi: 10.1145/2339530.2339576 – ident: ref4 doi: 10.1007/s10115-012-0606-6 – ident: ref25 doi: 10.1145/1557019.1557122 – volume: 22 start-page: 40 year: 1999 ident: ref21 article-title: Tuning time series queries in finance: Case studies and recommendations publication-title: IEEE Data Eng Bull – ident: ref18 doi: 10.1109/TKDE.2015.2411594 – ident: ref5 doi: 10.1145/1327452.1327492 – ident: ref6 doi: 10.1145/2379776.2379788 – ident: ref10 doi: 10.1109/ICASSP.2011.5946540 – ident: ref2 doi: 10.1145/1352431.1352464 |
| SSID | ssj0008781 |
| Score | 2.5033412 |
| Snippet | Indexing is crucial for many data mining tasks that rely on efficient and effective similarity query processing. Consequently, indexing large volumes of time... |
| SourceID | hal proquest crossref ieee |
| SourceType | Open Access Repository Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 108 |
| SubjectTerms | Algorithms Computer Science Data mining distributed querying Domains Euclidean distance Indexing Information Retrieval parallel indexing Queries Query processing Servers Similarity Task analysis Time series Time series analysis |
| Title | Massively Distributed Time Series Indexing and Querying |
| URI | https://ieeexplore.ieee.org/document/8528881 https://www.proquest.com/docview/2325183552 https://hal-lirmm.ccsd.cnrs.fr/lirmm-02197618 |
| Volume | 32 |
| WOSCitedRecordID | wos000502988400009&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1558-2191 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0008781 issn: 1041-4347 databaseCode: RIE dateStart: 19890101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSxxBEC5UckgOGjXBVRP64ClkdGd6ph9H8YHBRBIw4K3pRw0R1lX2IfjvreqdHSIJgdz6UAVNf9316O6qD-CgRZmUN77QNtZFrX0qyO-oIlXaoqwN9xPJZBP66src3NjvK_C5r4VBxPz5DA95mN_y032c81XZEWsarrNe1VotarV6q2t0JiSl7IJyIlnr7gWzHNqj68vTM_7EZQ5JnX3cCx-0-ot_QGZqlT_scXYy5xv_N723sN4Fk-J4gf4mrOB4CzaWRA2iO7db8Oa3roPboL9RwExGbvQkTrltLjNeYRJcDCL4sgyn4gv3UCRh4cdJ_JjjhIuh3sHP87Prk4ui408ootTVjNY7Kt8aGW2ZrDchlJGCs-jpkCpjaQv5qsVGtlVqvUEzRGxo_qpVjUwmUFzzHtbG92PcASFr0k8h1NZQgBWZCCEGJWNAFVoMZgDD5Yq62DUXZ46LkctJxtA6BsExCK4DYQCfepWHRWeNfwkfEEy9HPfEvjj-6ka3k7s7RwIUVJXmsRzANsPSy3WIDGB_iavrjujUUSjZkD1rmmr371p78Lri5Drft-zD2mwyxw_wKj7ObqeTj3n3PQPQXdXy |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSxxBEC6MCSQeYqIJbmKSPngKGd2Znkf3UXyw4roksAFvTT9qUFjXsA_Bf29V2zskRAK5zaFqaPrrrkd3V30Aey3KUFtls0b7MisbGzLyO3UWikajLBX3E4lkE81opC4v9fc1-NbVwiBifHyG-_wZ7_LDrV_yUdkBayqus37OzFmpWquzu6qJlKSUX1BWJMsm3WHmfX0wPj8-4Wdcap9-wF7uDy_07IrfQEZylb8scnQzp5v_N8A38DqFk-LwEf-3sIbTLdhcUTWItHO3YOO3voPb0FxQyExmbnIvjrlxLnNeYRBcDiL4uAzn4oy7KJKwsNMgfixxxuVQ7-Dn6cn4aJAlBoXMy6ZY0Iz72rZKep0HbZVzuafwzFvaprXStIhs0WIl2yK0VqHqI1Y0_rqtKxmUo8jmPaxPb6e4A0KWpB-cK7WiEMszFYJ3tfQOa9eiUz3or2bU-NRenFkuJiamGX1tGATDIJgEQg--diq_Hntr_Et4j2Dq5Lgr9uBwaCbXs5sbQwIUVuXqLu_BNsPSySVEerC7wtWkTTo3FExWZNGqqvjwtNYXeDkYXwzN8Gx0_hFeFZxqx9OXXVhfzJb4CV74u8X1fPY5rsQHH2zZOw |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Massively+Distributed+Time+Series+Indexing+and+Querying&rft.jtitle=IEEE+transactions+on+knowledge+and+data+engineering&rft.au=Yagoubi%2C+Djamel-Edine+Edine&rft.au=Akbarinia%2C+Reza&rft.au=Masseglia%2C+Florent&rft.au=Palpanas%2C+Themis&rft.date=2020&rft.pub=Institute+of+Electrical+and+Electronics+Engineers&rft.issn=1041-4347&rft.volume=32&rft.issue=1&rft.spage=108&rft.epage=120&rft_id=info:doi/10.1109%2FTKDE.2018.2880215&rft.externalDBID=HAS_PDF_LINK&rft.externalDocID=oai%3AHAL%3Alirmm-02197618v1 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1041-4347&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1041-4347&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1041-4347&client=summon |