A NoSQL Data Model for Scalable Big Data Workflow Execution
While big data workflows haven been proposed recently as the next-generation data-centric workflow paradigm to process and analyze data of ever increasing in scale, complexity, and rate of acquisition, a scalable distributed data model is still missing that abstracts and automates data distribution,...
Saved in:
| Published in: | 2016 IEEE International Congress on Big Data (BigData Congress) pp. 52 - 59 |
|---|---|
| Main Authors: | , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
01.06.2016
|
| Subjects: | |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | While big data workflows haven been proposed recently as the next-generation data-centric workflow paradigm to process and analyze data of ever increasing in scale, complexity, and rate of acquisition, a scalable distributed data model is still missing that abstracts and automates data distribution, parallelism, and scalable processing. In the meanwhile, although NoSQL has emerged as a new category of data models, they are optimized for storing and querying of large datasets, not for ad-hoc data analysis where data placement and data movement are necessary for optimized workflow execution. In this paper, we propose a NoSQL data model that: 1) supports high-performance MapReduce-style workflows that automate data partitioning and data-parallelism execution. In contrast to the traditional MapReduce framework, our MapReduce-style workflows are fully composable with other workflows enabling dataflow applications with a richer structure, 2) automates virtual machine provisioning and deprovisioning on demand according to the sizes of input datasets, 3) enables a flexible framework for workflow executors that take advantage of the proposed NoSQL data model to improve the performance of workflow execution. Our case studies and experiments show the competitive advantages of our proposed data model. The proposed NoSQL data model is implemented in a new release of DATAVIEW, one of the most usable big data workflow systems in the community. |
|---|---|
| AbstractList | While big data workflows haven been proposed recently as the next-generation data-centric workflow paradigm to process and analyze data of ever increasing in scale, complexity, and rate of acquisition, a scalable distributed data model is still missing that abstracts and automates data distribution, parallelism, and scalable processing. In the meanwhile, although NoSQL has emerged as a new category of data models, they are optimized for storing and querying of large datasets, not for ad-hoc data analysis where data placement and data movement are necessary for optimized workflow execution. In this paper, we propose a NoSQL data model that: 1) supports high-performance MapReduce-style workflows that automate data partitioning and data-parallelism execution. In contrast to the traditional MapReduce framework, our MapReduce-style workflows are fully composable with other workflows enabling dataflow applications with a richer structure, 2) automates virtual machine provisioning and deprovisioning on demand according to the sizes of input datasets, 3) enables a flexible framework for workflow executors that take advantage of the proposed NoSQL data model to improve the performance of workflow execution. Our case studies and experiments show the competitive advantages of our proposed data model. The proposed NoSQL data model is implemented in a new release of DATAVIEW, one of the most usable big data workflow systems in the community. |
| Author | Mohan, Aravind Kotov, Alexander Ebrahimi, Mahdi Shiyong Lu |
| Author_xml | – sequence: 1 givenname: Aravind surname: Mohan fullname: Mohan, Aravind email: amohan@wayne.edu organization: Wayne State Univ., Detroit, MI, USA – sequence: 2 givenname: Mahdi surname: Ebrahimi fullname: Ebrahimi, Mahdi email: mebrahimi@wayne.edu organization: Wayne State Univ., Detroit, MI, USA – sequence: 3 surname: Shiyong Lu fullname: Shiyong Lu email: shiyong@wayne.edu organization: Wayne State Univ., Detroit, MI, USA – sequence: 4 givenname: Alexander surname: Kotov fullname: Kotov, Alexander email: kotov@wayne.edu organization: Wayne State Univ., Detroit, MI, USA |
| BookMark | eNotjNFKwzAUQCMo6Oa-wAfzA603SZM2-DTrdEJVZIqPI03uHdXaSNuh_r2O-XQeDudM2GEXO2TsXEAqBNiLq2Zz7UZXxm7T4zCkEoRJhT5gE6HBgjRSqmM2G4Y3AJDSWKmLE3Y55w9x9VTxXcvvY8CWU-z5yrvW1S3yv-3evcb-ndr4xRff6LdjE7tTdkSuHXD2zyl7uVk8l8ukery9K-dV0ohcj0kNXmXoa0mhgECWjDYhp5yCMlY4R-h1oUCTCiL3xtfCWaMzRInaW8rUlJ3tvw0irj_75sP1P-tcF5mVoH4BjtBKjg |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/BigDataCongress.2016.15 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 1509026223 9781509026227 |
| EndPage | 59 |
| ExternalDocumentID | 7584920 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IL 6IN AAJGR AAWTH ADFMO ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK IEGSK OCL RIB RIC RIE RIL |
| ID | FETCH-LOGICAL-i175t-b0c34ecb2fd80df9f656d7f7fd3691aafec58305f3d17c6cb1a9654ee2e5c9f43 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 9 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000390212200007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 01:45:05 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i175t-b0c34ecb2fd80df9f656d7f7fd3691aafec58305f3d17c6cb1a9654ee2e5c9f43 |
| PageCount | 8 |
| ParticipantIDs | ieee_primary_7584920 |
| PublicationCentury | 2000 |
| PublicationDate | 2016-June |
| PublicationDateYYYYMMDD | 2016-06-01 |
| PublicationDate_xml | – month: 06 year: 2016 text: 2016-June |
| PublicationDecade | 2010 |
| PublicationTitle | 2016 IEEE International Congress on Big Data (BigData Congress) |
| PublicationTitleAbbrev | bigdatacongress |
| PublicationYear | 2016 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0002269258 |
| Score | 1.6780174 |
| Snippet | While big data workflows haven been proposed recently as the next-generation data-centric workflow paradigm to process and analyze data of ever increasing in... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 52 |
| SubjectTerms | Big data Big Data Workflows Cloud computing Clouds Data models NoSQL Parallel processing Virtual machining |
| Title | A NoSQL Data Model for Scalable Big Data Workflow Execution |
| URI | https://ieeexplore.ieee.org/document/7584920 |
| WOSCitedRecordID | wos000390212200007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA5VPHiq2opvcvDotvvKZoMnrS0eSqmo0FvJYyKFsit1q_58J9m1evDiLSSBhMljvpnM5CPk0gjBdBIlAQPFg9S6993EhoFJAe9LSGOr_CeuYz6Z5LOZmLbI1SYXBgB88Bn0XNG_5ZtSr52rrI_YNhUxGuhbnGd1rtbGn4IwQsQsb0K4olD0bxcvd7KSg7LwlqsL48p6jv72F42K1yKj9v_G3yPdn3Q8Ot0omn3SguKAtL_5GGhzPDvk-oZOyseHMXUToo7lbEkRk2IHuXQZUhTnWrc5F7ldlh90-Anab70ueR4Nnwb3QUOOECxQ41eBCnWSglaxNXlorLAIzAy33JokE5GUFjTL8TDbxERcZ1pFUmQsBYiBaYFrcki2i7KAI0ITzYVCLcUlM6mJMylCBE0IRXKWKzRvjknHyWL-Wv9_MW_EcPJ39SnZdZKuw6nOyHa1WsM52dHv1eJtdeEX7QuD3piZ |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELaqggRTgRbxxgMjafOw41hMUFoVEaIiitStcuwzqlQlqKTAz8dOQmFgYbNsD6fz47473_lD6EJxTmXgBQ6FlDlE2_fdQLuOImDuSyC-TstPXGOWJNF0yscNdLmuhQGAMvkMurZZvuWrXK5sqKxnsC3hvnHQNyghvltVa60jKgZIcJ9GdRKX5_LezfzlVhSin2el72oTucKuJcD9RaRS2pFh638S7KDOT0EeHq9NzS5qQLaHWt-MDLg-oG10dY2T_OkxxlYgbHnOFtigUjNBLGyNFDayVmM2SK4X-QcefIIsN18HPQ8Hk_7IqekRnLmx-YWTujIgIFNfq8hVmmsDzRTTTKsg5J4QGiSNzHHWgfKYDGXqCR5SAuADldysyj5qZnkGBwgHkvHU2CkmqCLKDwV3DWwyYCSiUWocnEPUtrqYvVY_YMxqNRz93X2OtkaTh3gW3yX3x2jbar1KrjpBzWK5glO0Kd-L-dvyrFzAL6QPm-A |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2016+IEEE+International+Congress+on+Big+Data+%28BigData+Congress%29&rft.atitle=A+NoSQL+Data+Model+for+Scalable+Big+Data+Workflow+Execution&rft.au=Mohan%2C+Aravind&rft.au=Ebrahimi%2C+Mahdi&rft.au=Shiyong+Lu&rft.au=Kotov%2C+Alexander&rft.date=2016-06-01&rft.pub=IEEE&rft.spage=52&rft.epage=59&rft_id=info:doi/10.1109%2FBigDataCongress.2016.15&rft.externalDocID=7584920 |