Monadic composition for deterministic, parallel batch processing

Achieving determinism on real software systems remains difficult. Even a batch-processing job, whose task is to map input bits to output bits, risks nondeterminism from thread scheduling, system calls, CPU instructions, and leakage of environmental information such as date or CPU model. In this work...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of ACM on programming languages Jg. 1; H. OOPSLA; S. 1 - 26
Hauptverfasser: Scott, Ryan G., Navarro Leija, Omar S., Devietti, Joseph, Newton, Ryan R.
Format: Journal Article
Sprache:Englisch
Veröffentlicht: 01.10.2017
ISSN:2475-1421, 2475-1421
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Achieving determinism on real software systems remains difficult. Even a batch-processing job, whose task is to map input bits to output bits, risks nondeterminism from thread scheduling, system calls, CPU instructions, and leakage of environmental information such as date or CPU model. In this work, we present a system for achieving low-overhead deterministic execution of batch-processing programs that read and write the file system—turning them into pure functions on files. We allow multi-process executions where a permissions system prevents races on the file system. Process separation enables different processes to enforce permissions and enforce determinism using distinct mechanisms. Our prototype, DetFlow, allows a statically-typed coordinator process to use shared-memory parallelism, as well as invoking process-trees of sandboxed legacy binaries. DetFlow currently implements the coordinator as a Haskell program with a restricted I/O type for its main function: a new monad we call DetIO. Legacy binaries launched by the coordinator run concurrently, but internally each process schedules threads sequentially, allowing dynamic determinism-enforcement with predictably low overhead. We evaluate DetFlow by applying it to bioinformatics data pipelines and software build systems. DetFlow enables determinizing these data-processing workflows by porting a small amount of code to become a statically-typed coordinator. This hybrid approach of static and dynamic determinism enforcement permits freedom where possible but restrictions where necessary.
AbstractList Achieving determinism on real software systems remains difficult. Even a batch-processing job, whose task is to map input bits to output bits, risks nondeterminism from thread scheduling, system calls, CPU instructions, and leakage of environmental information such as date or CPU model. In this work, we present a system for achieving low-overhead deterministic execution of batch-processing programs that read and write the file system—turning them into pure functions on files. We allow multi-process executions where a permissions system prevents races on the file system. Process separation enables different processes to enforce permissions and enforce determinism using distinct mechanisms. Our prototype, DetFlow, allows a statically-typed coordinator process to use shared-memory parallelism, as well as invoking process-trees of sandboxed legacy binaries. DetFlow currently implements the coordinator as a Haskell program with a restricted I/O type for its main function: a new monad we call DetIO. Legacy binaries launched by the coordinator run concurrently, but internally each process schedules threads sequentially, allowing dynamic determinism-enforcement with predictably low overhead. We evaluate DetFlow by applying it to bioinformatics data pipelines and software build systems. DetFlow enables determinizing these data-processing workflows by porting a small amount of code to become a statically-typed coordinator. This hybrid approach of static and dynamic determinism enforcement permits freedom where possible but restrictions where necessary.
Author Navarro Leija, Omar S.
Newton, Ryan R.
Scott, Ryan G.
Devietti, Joseph
Author_xml – sequence: 1
  givenname: Ryan G.
  surname: Scott
  fullname: Scott, Ryan G.
  organization: Indiana University, USA
– sequence: 2
  givenname: Omar S.
  surname: Navarro Leija
  fullname: Navarro Leija, Omar S.
  organization: University of Pennsylvania, USA
– sequence: 3
  givenname: Joseph
  surname: Devietti
  fullname: Devietti, Joseph
  organization: University of Pennsylvania, USA
– sequence: 4
  givenname: Ryan R.
  surname: Newton
  fullname: Newton, Ryan R.
  organization: Indiana University, USA
BookMark eNpljztPwzAYRS1UJEqp-AveWAj48yOxN1DFS2rVBebIcWwwSuzI9sK_pxEdEEz3DkdX95yjRYjBInQJ5AaAi1sGjEnVnKAl5Y2ogFNY_OpnaJ3zJyEEFOOSqSW628Wge2-wieMUsy8-Buxiwr0tNo0--Fy8ucaTTnoY7IA7XcwHnlI0Nmcf3i_QqdNDtutjrtDb48Pr5rna7p9eNvfbylAhS6W0pKrmliugvaqptGB5o4Gynjmiul4qwm3NwMw_G8OFdtAJw5yERhy0VujqZ9ekmHOyrp2SH3X6aoG0s3t7dD-Q1R_S-KJnsZK0H_7x31HrWtU
CitedBy_id crossref_primary_10_1016_j_flowmeasinst_2021_102042
Cites_doi 10.1145/1508244.1508255
10.1145/2741948.2741960
10.5555/1924943.1924956
10.1145/1950365.1950376
10.1145/2666356.2594312
10.1145/2364506.2364524
10.1145/3037697.3037751
10.1007/3-540-44898-5_4
10.1145/2541940.2541964
10.1109/HPCA.2011.5749741
10.1145/277650.277725
10.1145/2451116.2451170
10.1145/1869459.1869515
10.1007/3-540-45937-5_14
10.1145/2815400.2815411
10.1145/125826.125861
10.1145/1508244.1508256
10.1145/1863523.1863535
10.1093/bioinformatics/14.9.755
10.1017/S0956796897002943
10.1145/1596550.1596563
10.1007/978-3-642-22655-7_15
10.1145/2465351.2465365
10.1093/bioinformatics/btu033
10.1145/215399.215419
10.1145/292540.292561
10.1145/2254064.2254127
10.1145/301618.301637
10.1145/2034675.2034686
10.1093/nar/gkg500
10.1145/1542431.1542435
10.1145/1640089.1640097
10.1145/2555243.2555252
10.1145/268998.266669
10.1145/69558.69562
10.1109/SP.2011.39
10.1145/1248648.1248652
10.1145/2034675.2034685
10.5555/1924943.1924957
10.1145/1772954.1772958
10.1145/2535838.2535842
10.1093/bioinformatics/btp698
10.1145/363516.363526
10.1145/2043556.2043587
10.1145/1250734.1250746
ContentType Journal Article
DBID AAYXX
CITATION
DOI 10.1145/3133897
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList CrossRef
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2475-1421
EndPage 26
ExternalDocumentID 10_1145_3133897
GroupedDBID AAKMM
AAYFX
AAYXX
ACM
AEFXT
AEJOY
AIKLT
AKRVB
ALMA_UNASSIGNED_HOLDINGS
CITATION
EBS
GUFHI
LHSKQ
M~E
OK1
ROL
ID FETCH-LOGICAL-c258t-9a82964e4912d9628e1e47a123d3f09bd8904e631c14217c45af1b5c3f8175313
ISICitedReferencesCount 4
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000688014000029&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2475-1421
IngestDate Tue Nov 18 22:27:52 EST 2025
Sat Nov 29 07:49:00 EST 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue OOPSLA
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c258t-9a82964e4912d9628e1e47a123d3f09bd8904e631c14217c45af1b5c3f8175313
OpenAccessLink https://dl.acm.org/doi/pdf/10.1145/3133897
PageCount 26
ParticipantIDs crossref_primary_10_1145_3133897
crossref_citationtrail_10_1145_3133897
PublicationCentury 2000
PublicationDate 2017-10-01
PublicationDateYYYYMMDD 2017-10-01
PublicationDate_xml – month: 10
  year: 2017
  text: 2017-10-01
  day: 01
PublicationDecade 2010
PublicationTitle Proceedings of ACM on programming languages
PublicationYear 2017
References e_1_2_2_4_1
e_1_2_2_24_1
e_1_2_2_49_1
e_1_2_2_6_1
Cardelli Luca (e_1_2_2_10_1)
e_1_2_2_22_1
e_1_2_2_20_1
e_1_2_2_2_1
e_1_2_2_41_1
e_1_2_2_8_1
e_1_2_2_28_1
e_1_2_2_45_1
e_1_2_2_26_1
e_1_2_2_47_1
Schloss Patrick D (e_1_2_2_51_1) 2009
Salzman Peter Jay (e_1_2_2_50_1)
e_1_2_2_13_1
e_1_2_2_38_1
e_1_2_2_11_1
Blelloch Guy (e_1_2_2_5_1) 1992
e_1_2_2_30_1
e_1_2_2_19_1
e_1_2_2_32_1
e_1_2_2_53_1
e_1_2_2_17_1
e_1_2_2_34_1
e_1_2_2_55_1
e_1_2_2_36_1
e_1_2_2_57_1
e_1_2_2_25_1
e_1_2_2_48_1
e_1_2_2_23_1
e_1_2_2_7_1
e_1_2_2_21_1
e_1_2_2_1_1
e_1_2_2_40_1
e_1_2_2_42_1
e_1_2_2_9_1
e_1_2_2_29_1
e_1_2_2_44_1
e_1_2_2_27_1
e_1_2_2_46_1
Benet Juan (e_1_2_2_3_1) 2014
(e_1_2_2_43_1) 2015
Devecsery David (e_1_2_2_15_1)
e_1_2_2_12_1
e_1_2_2_39_1
e_1_2_2_52_1
e_1_2_2_31_1
e_1_2_2_54_1
e_1_2_2_18_1
e_1_2_2_33_1
e_1_2_2_56_1
e_1_2_2_16_1
e_1_2_2_35_1
(e_1_2_2_37_1) 2015
Wiki Debian (e_1_2_2_14_1) 2016
References_xml – ident: e_1_2_2_16_1
  doi: 10.1145/1508244.1508255
– ident: e_1_2_2_54_1
– ident: e_1_2_2_35_1
  doi: 10.1145/2741948.2741960
– ident: e_1_2_2_4_1
  doi: 10.5555/1924943.1924956
– volume-title: https://wiki.debian.org/ReproducibleBuilds?action=recall&rev=339 [Online
  year: 2016
  ident: e_1_2_2_14_1
– ident: e_1_2_2_17_1
  doi: 10.1145/1950365.1950376
– ident: e_1_2_2_24_1
  doi: 10.1145/2666356.2594312
– ident: e_1_2_2_56_1
  doi: 10.1145/2364506.2364524
– volume-title: Nix: The Purely Functional Package Manager.
  year: 2015
  ident: e_1_2_2_43_1
– ident: e_1_2_2_33_1
  doi: 10.1145/3037697.3037751
– volume-title: Introducing mothur: open-source, platformindependent, community-supported software for describing and comparing microbial communities. Applied and environmental microbiology 75, 23
  year: 2009
  ident: e_1_2_2_51_1
– ident: e_1_2_2_8_1
  doi: 10.1007/3-540-44898-5_4
– ident: e_1_2_2_42_1
  doi: 10.1145/2541940.2541964
– volume-title: Eidetic Systems. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI’14)
  ident: e_1_2_2_15_1
– ident: e_1_2_2_21_1
  doi: 10.1109/HPCA.2011.5749741
– ident: e_1_2_2_28_1
– ident: e_1_2_2_19_1
  doi: 10.1145/277650.277725
– ident: e_1_2_2_22_1
  doi: 10.1145/2451116.2451170
– volume-title: NESL: A Nested Data-Parallel Language. Technical Report CMU-CS-92-103
  year: 1992
  ident: e_1_2_2_5_1
– ident: e_1_2_2_9_1
  doi: 10.1145/1869459.1869515
– ident: e_1_2_2_57_1
  doi: 10.1007/3-540-45937-5_14
– ident: e_1_2_2_48_1
  doi: 10.1145/2815400.2815411
– ident: e_1_2_2_34_1
  doi: 10.1145/125826.125861
– ident: e_1_2_2_44_1
  doi: 10.1145/1508244.1508256
– ident: e_1_2_2_30_1
  doi: 10.1145/1863523.1863535
– ident: e_1_2_2_18_1
  doi: 10.1093/bioinformatics/14.9.755
– ident: e_1_2_2_49_1
  doi: 10.1017/S0956796897002943
– ident: e_1_2_2_32_1
  doi: 10.1145/1596550.1596563
– ident: e_1_2_2_7_1
  doi: 10.1007/978-3-642-22655-7_15
– ident: e_1_2_2_36_1
  doi: 10.1145/2465351.2465365
– ident: e_1_2_2_53_1
  doi: 10.1093/bioinformatics/btu033
– volume-title: Modula-3 report (revised)
  ident: e_1_2_2_10_1
– ident: e_1_2_2_52_1
  doi: 10.1145/215399.215419
– ident: e_1_2_2_38_1
  doi: 10.1145/292540.292561
– ident: e_1_2_2_47_1
  doi: 10.1145/2254064.2254127
– ident: e_1_2_2_46_1
  doi: 10.1145/301618.301637
– ident: e_1_2_2_26_1
  doi: 10.1145/2034675.2034686
– ident: e_1_2_2_12_1
  doi: 10.1093/nar/gkg500
– ident: e_1_2_2_20_1
  doi: 10.1145/1542431.1542435
– volume-title: The Linux Kernel Module Programming Guide. CreateSpace
  ident: e_1_2_2_50_1
– ident: e_1_2_2_6_1
  doi: 10.1145/1640089.1640097
– ident: e_1_2_2_23_1
  doi: 10.1145/2555243.2555252
– volume-title: Versioned, P2P File System. CoRR abs/1407.3561
  year: 2014
  ident: e_1_2_2_3_1
– volume-title: rr: lightweight recording &amp
  year: 2015
  ident: e_1_2_2_37_1
– ident: e_1_2_2_39_1
  doi: 10.1145/268998.266669
– ident: e_1_2_2_1_1
  doi: 10.1145/69558.69562
– ident: e_1_2_2_55_1
  doi: 10.1109/SP.2011.39
– ident: e_1_2_2_11_1
  doi: 10.1145/1248648.1248652
– ident: e_1_2_2_31_1
  doi: 10.1145/2034675.2034685
– ident: e_1_2_2_2_1
  doi: 10.5555/1924943.1924957
– ident: e_1_2_2_45_1
  doi: 10.1145/1772954.1772958
– ident: e_1_2_2_25_1
  doi: 10.1145/2535838.2535842
– ident: e_1_2_2_27_1
  doi: 10.1093/bioinformatics/btp698
– ident: e_1_2_2_40_1
  doi: 10.1145/363516.363526
– ident: e_1_2_2_29_1
  doi: 10.1145/2043556.2043587
– ident: e_1_2_2_13_1
– ident: e_1_2_2_41_1
  doi: 10.1145/1250734.1250746
SSID ssj0001934839
Score 2.0401025
Snippet Achieving determinism on real software systems remains difficult. Even a batch-processing job, whose task is to map input bits to output bits, risks...
SourceID crossref
SourceType Enrichment Source
Index Database
StartPage 1
Title Monadic composition for deterministic, parallel batch processing
Volume 1
WOSCitedRecordID wos000688014000029&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2475-1421
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001934839
  issn: 2475-1421
  databaseCode: M~E
  dateStart: 20170101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9NAEF6FlgMXSnmItoD2gLgEB-_DsfdGVGg5pE3UFNRb5F2vRZDrRmkayoV_wf9l9uGtG5CAAxcrcnatZOfbmfHsNzMIvWRUqhIsUcSpaWEGPkUkc8WiWBKelHGfatsO6NMwPT7Ozs7EuNP50eTCrKq0rrPrazH_r6KGeyBskzr7D-IOD4Ub8BmEDlcQO1z_SvCwS_NipixZ3DOyLJew8MQXW5nZrKyp-l1VuupKUMefu3OXMtCYMu-wjoOBs5yPwf6ROV7wrK5zE2doIp7BOZ8oTwI5-QbK47AX4s35ylR87A717It1WUfn-aI7Cd-_MzZ66egF_mAiRKq_epa_feRJrx2rAPvXsN68SqM8TSLCXU50T__mXqOTW9AbjcaT4aClZEnLWrt0-1_tADclM5h5AXf039uVttcsYOAluiztZOon3kGbNE2E0fdH31uhO8F4ZnvUhd_u0rHN3Dd-bsvPaTkspw_Qff-mgQcOIduoo-uHaKvp4oG9Un-E3nrA4BZgMAAG3wLMa9zABVu44Bu4PEYfD96f7n-IfF-NSNEkW0Yiz8xhu-aC0EL0aaaJ5mkOPkzByljIIhMx131GlPlrqeJJXhKZKFZmpq4rYU_QRn1R66cIwwanksQMLEPBcwrbHFxoQUqZSHB8qdpBr5pVmCpfdN70Pqmma0u9g3AYOHd1VtaH7P55yB66dwO7Z2hjubjSz9FdtVrOLhcvrBR_Aii8blQ
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Monadic+composition+for+deterministic%2C+parallel+batch+processing&rft.jtitle=Proceedings+of+ACM+on+programming+languages&rft.au=Scott%2C+Ryan+G.&rft.au=Navarro+Leija%2C+Omar+S.&rft.au=Devietti%2C+Joseph&rft.au=Newton%2C+Ryan+R.&rft.date=2017-10-01&rft.issn=2475-1421&rft.eissn=2475-1421&rft.volume=1&rft.issue=OOPSLA&rft.spage=1&rft.epage=26&rft_id=info:doi/10.1145%2F3133897&rft.externalDBID=n%2Fa&rft.externalDocID=10_1145_3133897
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2475-1421&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2475-1421&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2475-1421&client=summon