A Transducers-based Programming Framework for Efficient Data Transformation

Many data analytics and scientific applications rely on data transformation tasks, such as encoding, decoding, parsing of structured and unstructured data, and conversions between data formats and layouts. Previous work has shown that data transformation can represent a performance bottleneck for da...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:2024 33rd International Conference on Parallel Architectures and Compilation Techniques (PACT) S. 66 - 77
Hauptverfasser: Nguyen, Tri, Becchi, Michela
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: ACM 13.10.2024
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Many data analytics and scientific applications rely on data transformation tasks, such as encoding, decoding, parsing of structured and unstructured data, and conversions between data formats and layouts. Previous work has shown that data transformation can represent a performance bottleneck for data analytics workloads. The transducers computational abstraction can be used to express a wide range of data transformations, and recent efforts have proposed configurable engines implementing various transducer models (from finite state transducers, to pushdown transducers, to extended models). This line of research, however, is still at an early stage. Notably, expressing data transformation using transducers requires a paradigm shift, impacting programmability. To address this problem, we propose a programming framework to map data transformation tasks onto a variety of transducer models. Our framework includes: (1) a platform agnostic programming language (xPTLang) to code transducer programs using intuitive programming constructs, and (2) a compiler that, given an xPTLang program, generates efficient transducer processing engines for CPU and GPU. Our compiler includes a set of optimizations to improve code efficiency. We demonstrate our framework on a diverse set of data transformation tasks on an Intel CPU and an Nvidia GPU.
AbstractList Many data analytics and scientific applications rely on data transformation tasks, such as encoding, decoding, parsing of structured and unstructured data, and conversions between data formats and layouts. Previous work has shown that data transformation can represent a performance bottleneck for data analytics workloads. The transducers computational abstraction can be used to express a wide range of data transformations, and recent efforts have proposed configurable engines implementing various transducer models (from finite state transducers, to pushdown transducers, to extended models). This line of research, however, is still at an early stage. Notably, expressing data transformation using transducers requires a paradigm shift, impacting programmability. To address this problem, we propose a programming framework to map data transformation tasks onto a variety of transducer models. Our framework includes: (1) a platform agnostic programming language (xPTLang) to code transducer programs using intuitive programming constructs, and (2) a compiler that, given an xPTLang program, generates efficient transducer processing engines for CPU and GPU. Our compiler includes a set of optimizations to improve code efficiency. We demonstrate our framework on a diverse set of data transformation tasks on an Intel CPU and an Nvidia GPU.
Author Becchi, Michela
Nguyen, Tri
Author_xml – sequence: 1
  givenname: Tri
  surname: Nguyen
  fullname: Nguyen, Tri
  email: tmnguye7@ncsu.edu
  organization: North Carolina State University,United States of America
– sequence: 2
  givenname: Michela
  surname: Becchi
  fullname: Becchi, Michela
  email: mbecchi@ncsu.edu
  organization: North Carolina State University,United States of America
BookMark eNotjM1KAzEYRSMoqHXWblzkBabmb_KzLLVVsaCLui5fMl9K0MlIMiK-vQN1dS-He-41Oc9jRkJuOVtyrrp7qTvNuFtKbbR1_Iw0zjirGDNMS24vSVNr8qwzYt5ZcUVeVnRfINf-O2CprYeKPX0r47HAMKR8pNu54M9YPmgcC93EmELCPNEHmOCkznyAKY35hlxE-KzY_OeCvG83-_VTu3t9fF6vdi0IZae2wxCkwc4LEE5ygw5YFL0R2oPuhQ6gDAflIFqHHoT3zOsgNVrpALmSC3J3-k2IePgqaYDye-DMMiOFkX8lF06Z
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/3656019.3676891
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798400706318
EndPage 77
ExternalDocumentID 10807327
Genre orig-research
GrantInformation_xml – fundername: National Science Foundation
  grantid: CCF-1907863
  funderid: 10.13039/100000001
GroupedDBID 6IE
6IL
ACM
ALMA_UNASSIGNED_HOLDINGS
APO
CBEJK
LHSKQ
RIE
RIL
ID FETCH-LOGICAL-a248t-5ecc37e5b2a29317e9a0f2d726ba6d26ca471a49af89eba2bb0b6c36e839ae143
IEDL.DBID RIE
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001344829000006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Jan 08 06:10:43 EST 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a248t-5ecc37e5b2a29317e9a0f2d726ba6d26ca471a49af89eba2bb0b6c36e839ae143
PageCount 12
ParticipantIDs ieee_primary_10807327
PublicationCentury 2000
PublicationDate 2024-Oct.-13
PublicationDateYYYYMMDD 2024-10-13
PublicationDate_xml – month: 10
  year: 2024
  text: 2024-Oct.-13
  day: 13
PublicationDecade 2020
PublicationTitle 2024 33rd International Conference on Parallel Architectures and Compilation Techniques (PACT)
PublicationTitleAbbrev PACT
PublicationYear 2024
Publisher ACM
Publisher_xml – name: ACM
SSID ssib057256082
Score 2.2707927
Snippet Many data analytics and scientific applications rely on data transformation tasks, such as encoding, decoding, parsing of structured and unstructured data, and...
SourceID ieee
SourceType Publisher
StartPage 66
SubjectTerms Codes
Computational modeling
Computer languages
Data analysis
Data models
Engines
Graphics processing units
Optimization
Programming
Transducers
Title A Transducers-based Programming Framework for Efficient Data Transformation
URI https://ieeexplore.ieee.org/document/10807327
WOSCitedRecordID wos001344829000006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELVoxcAEiCK-5YHVbeI4sT0iaIWEVHUAqVt1ti8SAy1qU34_PictLAxsluU40p2V9xTfe8fYfeldhM3CCjL3FkqGIMBkIKAMIY-Eua5U22xCT6dmPrezTqyetDCImIrPcEjDdJcfVn5Lv8pGVA-nC6l7rKe1bsVau8NTagJvIzv7nlyVo4KMZXI7JFMyQyacv_qnJPiYHP_zxSds8CPE47M9xJyyA1yesZcHnhAmxKysN4JwKNAqqrP6iMv4ZFdwxSMj5eNkEhH350_QQPvoXrI4YG-T8evjs-h6IgiQyjSijCEvNJZOQgTqXKOFrJZBy8pBFWTlIaINKAu1sehAOpe5yhcVRiIEGMnROesvV0u8YNxHamEyA9J7pdAq8KhCjWT5rlxkFpdsQJFYfLa2F4tdEK7-mL9mRzIiPn3Y8-KG9Zv1Fm_Zof9q3jfru5SsbxCPlrI
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEG0UTfSkRozf7sFrYbfbbtujUQgGJBww4Uam7WziQSCw-Pttu4BePHhrmm43mWn2vWznvSHkUVjjYTPXNJh7U86co6BSoCCcyzxhLgteN5uQw6GaTPRoI1aPWhhEjMVn2ArDeJfv5nYdfpW1Qz2czJncJweCc5bVcq3t8REywLdiGwOfjIt2HqxlMt0KtmQq2HD-6qASAaR78s9Xn5LmjxQvGe1A5ozs4eyc9J-SiDHO52W5ogGJXFgVKq0-_bKkuy25SjwnTTrRJsLvn7xABfWjO9Fik7x3O-PnHt10RaDAuKqo8EHPJQrDwEN1JlFDWjInWWGgcKyw4PEGuIZSaTTAjElNYfMCPRUC9PTogjRm8xleksR6cqFSBcxazlFzsMhdicH0nRvPLa5IM0RiuqiNL6bbIFz_Mf9Ajnrjt8F08Drs35Bj5vE_fOaz_JY0quUa78ih_ao-Vsv7mLhvFxCZ-Q
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+33rd+International+Conference+on+Parallel+Architectures+and+Compilation+Techniques+%28PACT%29&rft.atitle=A+Transducers-based+Programming+Framework+for+Efficient+Data+Transformation&rft.au=Nguyen%2C+Tri&rft.au=Becchi%2C+Michela&rft.date=2024-10-13&rft.pub=ACM&rft.spage=66&rft.epage=77&rft_id=info:doi/10.1145%2F3656019.3676891&rft.externalDocID=10807327