A NoSQL Data Model for Scalable Big Data Workflow Execution

While big data workflows haven been proposed recently as the next-generation data-centric workflow paradigm to process and analyze data of ever increasing in scale, complexity, and rate of acquisition, a scalable distributed data model is still missing that abstracts and automates data distribution,...

Full description

Saved in:
Bibliographic Details
Published in:2016 IEEE International Congress on Big Data (BigData Congress) pp. 52 - 59
Main Authors: Mohan, Aravind, Ebrahimi, Mahdi, Shiyong Lu, Kotov, Alexander
Format: Conference Proceeding
Language:English
Published: IEEE 01.06.2016
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract While big data workflows haven been proposed recently as the next-generation data-centric workflow paradigm to process and analyze data of ever increasing in scale, complexity, and rate of acquisition, a scalable distributed data model is still missing that abstracts and automates data distribution, parallelism, and scalable processing. In the meanwhile, although NoSQL has emerged as a new category of data models, they are optimized for storing and querying of large datasets, not for ad-hoc data analysis where data placement and data movement are necessary for optimized workflow execution. In this paper, we propose a NoSQL data model that: 1) supports high-performance MapReduce-style workflows that automate data partitioning and data-parallelism execution. In contrast to the traditional MapReduce framework, our MapReduce-style workflows are fully composable with other workflows enabling dataflow applications with a richer structure, 2) automates virtual machine provisioning and deprovisioning on demand according to the sizes of input datasets, 3) enables a flexible framework for workflow executors that take advantage of the proposed NoSQL data model to improve the performance of workflow execution. Our case studies and experiments show the competitive advantages of our proposed data model. The proposed NoSQL data model is implemented in a new release of DATAVIEW, one of the most usable big data workflow systems in the community.
AbstractList While big data workflows haven been proposed recently as the next-generation data-centric workflow paradigm to process and analyze data of ever increasing in scale, complexity, and rate of acquisition, a scalable distributed data model is still missing that abstracts and automates data distribution, parallelism, and scalable processing. In the meanwhile, although NoSQL has emerged as a new category of data models, they are optimized for storing and querying of large datasets, not for ad-hoc data analysis where data placement and data movement are necessary for optimized workflow execution. In this paper, we propose a NoSQL data model that: 1) supports high-performance MapReduce-style workflows that automate data partitioning and data-parallelism execution. In contrast to the traditional MapReduce framework, our MapReduce-style workflows are fully composable with other workflows enabling dataflow applications with a richer structure, 2) automates virtual machine provisioning and deprovisioning on demand according to the sizes of input datasets, 3) enables a flexible framework for workflow executors that take advantage of the proposed NoSQL data model to improve the performance of workflow execution. Our case studies and experiments show the competitive advantages of our proposed data model. The proposed NoSQL data model is implemented in a new release of DATAVIEW, one of the most usable big data workflow systems in the community.
Author Mohan, Aravind
Kotov, Alexander
Ebrahimi, Mahdi
Shiyong Lu
Author_xml – sequence: 1
  givenname: Aravind
  surname: Mohan
  fullname: Mohan, Aravind
  email: amohan@wayne.edu
  organization: Wayne State Univ., Detroit, MI, USA
– sequence: 2
  givenname: Mahdi
  surname: Ebrahimi
  fullname: Ebrahimi, Mahdi
  email: mebrahimi@wayne.edu
  organization: Wayne State Univ., Detroit, MI, USA
– sequence: 3
  surname: Shiyong Lu
  fullname: Shiyong Lu
  email: shiyong@wayne.edu
  organization: Wayne State Univ., Detroit, MI, USA
– sequence: 4
  givenname: Alexander
  surname: Kotov
  fullname: Kotov, Alexander
  email: kotov@wayne.edu
  organization: Wayne State Univ., Detroit, MI, USA
BookMark eNotjNFKwzAUQCMo6Oa-wAfzA603SZM2-DTrdEJVZIqPI03uHdXaSNuh_r2O-XQeDudM2GEXO2TsXEAqBNiLq2Zz7UZXxm7T4zCkEoRJhT5gE6HBgjRSqmM2G4Y3AJDSWKmLE3Y55w9x9VTxXcvvY8CWU-z5yrvW1S3yv-3evcb-ndr4xRff6LdjE7tTdkSuHXD2zyl7uVk8l8ukery9K-dV0ohcj0kNXmXoa0mhgECWjDYhp5yCMlY4R-h1oUCTCiL3xtfCWaMzRInaW8rUlJ3tvw0irj_75sP1P-tcF5mVoH4BjtBKjg
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/BigDataCongress.2016.15
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 1509026223
9781509026227
EndPage 59
ExternalDocumentID 7584920
Genre orig-research
GroupedDBID 6IE
6IF
6IL
6IN
AAJGR
AAWTH
ADFMO
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
IEGSK
OCL
RIB
RIC
RIE
RIL
ID FETCH-LOGICAL-i175t-b0c34ecb2fd80df9f656d7f7fd3691aafec58305f3d17c6cb1a9654ee2e5c9f43
IEDL.DBID RIE
ISICitedReferencesCount 9
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000390212200007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 01:45:05 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i175t-b0c34ecb2fd80df9f656d7f7fd3691aafec58305f3d17c6cb1a9654ee2e5c9f43
PageCount 8
ParticipantIDs ieee_primary_7584920
PublicationCentury 2000
PublicationDate 2016-June
PublicationDateYYYYMMDD 2016-06-01
PublicationDate_xml – month: 06
  year: 2016
  text: 2016-June
PublicationDecade 2010
PublicationTitle 2016 IEEE International Congress on Big Data (BigData Congress)
PublicationTitleAbbrev bigdatacongress
PublicationYear 2016
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0002269258
Score 1.6780174
Snippet While big data workflows haven been proposed recently as the next-generation data-centric workflow paradigm to process and analyze data of ever increasing in...
SourceID ieee
SourceType Publisher
StartPage 52
SubjectTerms Big data
Big Data Workflows
Cloud computing
Clouds
Data models
NoSQL
Parallel processing
Virtual machining
Title A NoSQL Data Model for Scalable Big Data Workflow Execution
URI https://ieeexplore.ieee.org/document/7584920
WOSCitedRecordID wos000390212200007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA5VPHiq2opvcvDotvvKZoMnrS0eSqmo0FvJYyKFsit1q_58J9m1evDiLSSBhMljvpnM5CPk0gjBdBIlAQPFg9S6993EhoFJAe9LSGOr_CeuYz6Z5LOZmLbI1SYXBgB88Bn0XNG_5ZtSr52rrI_YNhUxGuhbnGd1rtbGn4IwQsQsb0K4olD0bxcvd7KSg7LwlqsL48p6jv72F42K1yKj9v_G3yPdn3Q8Ot0omn3SguKAtL_5GGhzPDvk-oZOyseHMXUToo7lbEkRk2IHuXQZUhTnWrc5F7ldlh90-Anab70ueR4Nnwb3QUOOECxQ41eBCnWSglaxNXlorLAIzAy33JokE5GUFjTL8TDbxERcZ1pFUmQsBYiBaYFrcki2i7KAI0ITzYVCLcUlM6mJMylCBE0IRXKWKzRvjknHyWL-Wv9_MW_EcPJ39SnZdZKuw6nOyHa1WsM52dHv1eJtdeEX7QuD3piZ
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELaqggRTgRbxxgMjafOw41hMUFoVEaIiitStcuwzqlQlqKTAz8dOQmFgYbNsD6fz47473_lD6EJxTmXgBQ6FlDlE2_fdQLuOImDuSyC-TstPXGOWJNF0yscNdLmuhQGAMvkMurZZvuWrXK5sqKxnsC3hvnHQNyghvltVa60jKgZIcJ9GdRKX5_LezfzlVhSin2el72oTucKuJcD9RaRS2pFh638S7KDOT0EeHq9NzS5qQLaHWt-MDLg-oG10dY2T_OkxxlYgbHnOFtigUjNBLGyNFDayVmM2SK4X-QcefIIsN18HPQ8Hk_7IqekRnLmx-YWTujIgIFNfq8hVmmsDzRTTTKsg5J4QGiSNzHHWgfKYDGXqCR5SAuADldysyj5qZnkGBwgHkvHU2CkmqCLKDwV3DWwyYCSiUWocnEPUtrqYvVY_YMxqNRz93X2OtkaTh3gW3yX3x2jbar1KrjpBzWK5glO0Kd-L-dvyrFzAL6QPm-A
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2016+IEEE+International+Congress+on+Big+Data+%28BigData+Congress%29&rft.atitle=A+NoSQL+Data+Model+for+Scalable+Big+Data+Workflow+Execution&rft.au=Mohan%2C+Aravind&rft.au=Ebrahimi%2C+Mahdi&rft.au=Shiyong+Lu&rft.au=Kotov%2C+Alexander&rft.date=2016-06-01&rft.pub=IEEE&rft.spage=52&rft.epage=59&rft_id=info:doi/10.1109%2FBigDataCongress.2016.15&rft.externalDocID=7584920