An Efficient Filter Strategy for Theta-Join Query in Distributed Environment
Theta-join query is a very popular application in traditional databases, but due to tremendous computation cost and communication cost in distributed environment, it is not efficiently processed for big data. Current researches focus on processing theta-join by using MapReduce framework, which mainl...
Gespeichert in:
| Veröffentlicht in: | Proceedings - International Workshops on Parallel Processing S. 77 - 84 |
|---|---|
| Hauptverfasser: | , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
01.08.2017
|
| Schlagworte: | |
| ISSN: | 1530-2016 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Theta-join query is a very popular application in traditional databases, but due to tremendous computation cost and communication cost in distributed environment, it is not efficiently processed for big data. Current researches focus on processing theta-join by using MapReduce framework, which mainly consider the overheads of load balance in the network, when the data sets become larger, massive intermediate results lead to high communication cost. In this work, we propose a filter method for theta-join to reduce the computation and communication cost in distributed environment, which can effectively improve the theta-join query. We consider both the load balance in the cluster and the memory cost in the parallel framework. We have implemented our method in a popular general-purpose data processing framework, Spark. The experimental results demonstrate that our method can significantly improve the performance of theta-joins comparing the state-of-art solutions. |
|---|---|
| AbstractList | Theta-join query is a very popular application in traditional databases, but due to tremendous computation cost and communication cost in distributed environment, it is not efficiently processed for big data. Current researches focus on processing theta-join by using MapReduce framework, which mainly consider the overheads of load balance in the network, when the data sets become larger, massive intermediate results lead to high communication cost. In this work, we propose a filter method for theta-join to reduce the computation and communication cost in distributed environment, which can effectively improve the theta-join query. We consider both the load balance in the cluster and the memory cost in the parallel framework. We have implemented our method in a popular general-purpose data processing framework, Spark. The experimental results demonstrate that our method can significantly improve the performance of theta-joins comparing the state-of-art solutions. |
| Author | Wenjie Liu Yuntao Zhou Zhanhuai Li |
| Author_xml | – sequence: 1 surname: Wenjie Liu fullname: Wenjie Liu email: liuwenjie@nwpu.edu.cn organization: Sch. of Comput., Northwestern Polytech. Univ., Xi'an, China – sequence: 2 surname: Zhanhuai Li fullname: Zhanhuai Li email: lizhh@nwpu.edu.cn organization: Sch. of Comput., Northwestern Polytech. Univ., Xi'an, China – sequence: 3 surname: Yuntao Zhou fullname: Yuntao Zhou email: zhouyuntao@nwpu.edu.cn organization: Div. of Sci. & Technol. Res. Manage., Northwestern Polytech. Univ., Xi'an, China |
| BookMark | eNotTk9LwzAcjTDBdXr05CVfoDP5tUna46jdnBScOPE40vQXjWyppJnQb29BT-893h9eQma-90jILWdLzll5v612u_clMK6WkF-QhIuskJzlOczIfBIsnTx5RZJh-GIMWCbEnDQrT2trnXHoI127Y8RAX2PQET9GavtA958YdfrUO09fzhhGOpEHN8Tg2nPEjtb-x4Xen6b-Nbm0-jjgzT8uyNu63lePafO82VarJnVciZiKUmVK88Ia1WqhSpODKdAA5qDEdFGAlcBbQG61knknSlNKZae4AJRZly3I3d-uQ8TDd3AnHcZDwUAyBdkv9lFNkw |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/ICPPW.2017.24 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library (IEL) (UW System Shared) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 1538610442 9781538610442 |
| EndPage | 84 |
| ExternalDocumentID | 8026072 |
| Genre | orig-research |
| GroupedDBID | 23M 29O 6IE 6IK 6IL ALMA_UNASSIGNED_HOLDINGS CBEJK M43 RIE RIL RNS |
| ID | FETCH-LOGICAL-i175t-59737a18fc7ba579c42c8ec2e427501652f621b2e1fa764d59c967ffc752e63d3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 4 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000426948400011&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1530-2016 |
| IngestDate | Wed Aug 27 02:23:41 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i175t-59737a18fc7ba579c42c8ec2e427501652f621b2e1fa764d59c967ffc752e63d3 |
| PageCount | 8 |
| ParticipantIDs | ieee_primary_8026072 |
| PublicationCentury | 2000 |
| PublicationDate | 2017-Aug. |
| PublicationDateYYYYMMDD | 2017-08-01 |
| PublicationDate_xml | – month: 08 year: 2017 text: 2017-Aug. |
| PublicationDecade | 2010 |
| PublicationTitle | Proceedings - International Workshops on Parallel Processing |
| PublicationTitleAbbrev | ICPPW |
| PublicationYear | 2017 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0020355 |
| Score | 2.0299058 |
| Snippet | Theta-join query is a very popular application in traditional databases, but due to tremendous computation cost and communication cost in distributed... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 77 |
| SubjectTerms | Big Data big data query distributed computing Distributed databases Electronic mail filter strategy Filtering algorithms Sparks theta-join Transforms |
| Title | An Efficient Filter Strategy for Theta-Join Query in Distributed Environment |
| URI | https://ieeexplore.ieee.org/document/8026072 |
| WOSCitedRecordID | wos000426948400011&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFA5zePA0dRN_k4NHs7VpmqRHmRsqMioo7jbS5AUK0o3ZCfvvTdK5efDiLTShhfeSvpfkfd-H0E0ELorFlhFZaEsYVymRVghiuXDhMMuE5kUQmxCTiZxOs7yFbrdYGAAIxWfQ981wl2_meuWPygbSE2AJ98PdE4I3WK3t5ipKgsKpW8CR83zMd3yag8dhnr_7Ki7R98D2XyoqIYiMO__7_CHq7dB4ON_GmSPUguoYdX7kGPBmdXbR812FR4ERwr0Gj8uP0Nuwz66xS06xmxO1Ik_zssIvK1iusWvce-ZcL3oFBo92qLceehuPXocPZCOWQEqXAdTEbQwSoWJptShUKjLNqJagKTDP4B7zlFpO44JCbJXgzKSZzriwbnhKnb9McoLa1byCU4QTmlpWKGDGJUtegbxQHsAa0cxIA0lyhrreOLNFw4cx29jl_O_HF-jAm74pmrtE7Xq5giu0r7_q8nN5HZz4DVhxm-k |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwFH4haKInVDD-tgePFrZua7ejQQgokplg5Ea27jVZQobBYcJ_b1smePDiremapulr-17X930fwJ2D2ou5yqdhKhX1eRLQUAlBFRfaHUaRkDy1YhNiPA6n0yiuwf0WC4OINvkM26Zo3_KzhVyZX2Wd0BBgCX3g7hnlrAqttb1eOZ7VONVb2NG2d_mOUbMz7Mbxu8njEm0Dbf-lo2LdSL_xvwEcQWuHxyPx1tMcQw2LE2j8CDKQan82YfRQkJ7lhNDdkH4-t183_LNrosNToldFmdCnRV6Q1xUu10QXHg13rpG9woz0dri3Frz1e5PugFZyCTTXMUBJ9dXAE4kbKinSJBCR9JkMUTL0DYe7ywOmOHNThq5KBPezIJIRF0o3D5i2WOadQr1YFHgGxGOB8tME_UyHS0aDPE0MhNVhURZm6Hnn0DSTM_vYMGLMqnm5-Lv6Fg4Gk5fRbDQcP1_CoTHDJoXuCurlcoXXsC-_yvxzeWMN-g2pj58y |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+-+International+Workshops+on+Parallel+Processing&rft.atitle=An+Efficient+Filter+Strategy+for+Theta-Join+Query+in+Distributed+Environment&rft.au=Wenjie+Liu&rft.au=Zhanhuai+Li&rft.au=Yuntao+Zhou&rft.date=2017-08-01&rft.pub=IEEE&rft.issn=1530-2016&rft.spage=77&rft.epage=84&rft_id=info:doi/10.1109%2FICPPW.2017.24&rft.externalDocID=8026072 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1530-2016&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1530-2016&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1530-2016&client=summon |