Monadic composition for deterministic, parallel batch processing
Achieving determinism on real software systems remains difficult. Even a batch-processing job, whose task is to map input bits to output bits, risks nondeterminism from thread scheduling, system calls, CPU instructions, and leakage of environmental information such as date or CPU model. In this work...
Gespeichert in:
| Veröffentlicht in: | Proceedings of ACM on programming languages Jg. 1; H. OOPSLA; S. 1 - 26 |
|---|---|
| Hauptverfasser: | , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
01.10.2017
|
| ISSN: | 2475-1421, 2475-1421 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Achieving determinism on real software systems remains difficult. Even a batch-processing job, whose task is to map input bits to output bits, risks nondeterminism from thread scheduling, system calls, CPU instructions, and leakage of environmental information such as date or CPU model. In this work, we present a system for achieving low-overhead deterministic execution of batch-processing programs that read and write the file system—turning them into pure functions on files.
We allow multi-process executions where a permissions system prevents races on the file system. Process separation enables different processes to enforce permissions and enforce determinism using distinct mechanisms. Our prototype, DetFlow, allows a statically-typed coordinator process to use shared-memory parallelism, as well as invoking process-trees of sandboxed legacy binaries. DetFlow currently implements the coordinator as a Haskell program with a restricted I/O type for its main function: a new monad we call DetIO. Legacy binaries launched by the coordinator run concurrently, but internally each process schedules threads sequentially, allowing dynamic determinism-enforcement with predictably low overhead.
We evaluate DetFlow by applying it to bioinformatics data pipelines and software build systems. DetFlow enables determinizing these data-processing workflows by porting a small amount of code to become a statically-typed coordinator. This hybrid approach of static and dynamic determinism enforcement permits freedom where possible but restrictions where necessary. |
|---|---|
| AbstractList | Achieving determinism on real software systems remains difficult. Even a batch-processing job, whose task is to map input bits to output bits, risks nondeterminism from thread scheduling, system calls, CPU instructions, and leakage of environmental information such as date or CPU model. In this work, we present a system for achieving low-overhead deterministic execution of batch-processing programs that read and write the file system—turning them into pure functions on files.
We allow multi-process executions where a permissions system prevents races on the file system. Process separation enables different processes to enforce permissions and enforce determinism using distinct mechanisms. Our prototype, DetFlow, allows a statically-typed coordinator process to use shared-memory parallelism, as well as invoking process-trees of sandboxed legacy binaries. DetFlow currently implements the coordinator as a Haskell program with a restricted I/O type for its main function: a new monad we call DetIO. Legacy binaries launched by the coordinator run concurrently, but internally each process schedules threads sequentially, allowing dynamic determinism-enforcement with predictably low overhead.
We evaluate DetFlow by applying it to bioinformatics data pipelines and software build systems. DetFlow enables determinizing these data-processing workflows by porting a small amount of code to become a statically-typed coordinator. This hybrid approach of static and dynamic determinism enforcement permits freedom where possible but restrictions where necessary. |
| Author | Navarro Leija, Omar S. Newton, Ryan R. Scott, Ryan G. Devietti, Joseph |
| Author_xml | – sequence: 1 givenname: Ryan G. surname: Scott fullname: Scott, Ryan G. organization: Indiana University, USA – sequence: 2 givenname: Omar S. surname: Navarro Leija fullname: Navarro Leija, Omar S. organization: University of Pennsylvania, USA – sequence: 3 givenname: Joseph surname: Devietti fullname: Devietti, Joseph organization: University of Pennsylvania, USA – sequence: 4 givenname: Ryan R. surname: Newton fullname: Newton, Ryan R. organization: Indiana University, USA |
| BookMark | eNpljztPwzAYRS1UJEqp-AveWAj48yOxN1DFS2rVBebIcWwwSuzI9sK_pxEdEEz3DkdX95yjRYjBInQJ5AaAi1sGjEnVnKAl5Y2ogFNY_OpnaJ3zJyEEFOOSqSW628Wge2-wieMUsy8-Buxiwr0tNo0--Fy8ucaTTnoY7IA7XcwHnlI0Nmcf3i_QqdNDtutjrtDb48Pr5rna7p9eNvfbylAhS6W0pKrmliugvaqptGB5o4Gynjmiul4qwm3NwMw_G8OFdtAJw5yERhy0VujqZ9ekmHOyrp2SH3X6aoG0s3t7dD-Q1R_S-KJnsZK0H_7x31HrWtU |
| CitedBy_id | crossref_primary_10_1016_j_flowmeasinst_2021_102042 |
| Cites_doi | 10.1145/1508244.1508255 10.1145/2741948.2741960 10.5555/1924943.1924956 10.1145/1950365.1950376 10.1145/2666356.2594312 10.1145/2364506.2364524 10.1145/3037697.3037751 10.1007/3-540-44898-5_4 10.1145/2541940.2541964 10.1109/HPCA.2011.5749741 10.1145/277650.277725 10.1145/2451116.2451170 10.1145/1869459.1869515 10.1007/3-540-45937-5_14 10.1145/2815400.2815411 10.1145/125826.125861 10.1145/1508244.1508256 10.1145/1863523.1863535 10.1093/bioinformatics/14.9.755 10.1017/S0956796897002943 10.1145/1596550.1596563 10.1007/978-3-642-22655-7_15 10.1145/2465351.2465365 10.1093/bioinformatics/btu033 10.1145/215399.215419 10.1145/292540.292561 10.1145/2254064.2254127 10.1145/301618.301637 10.1145/2034675.2034686 10.1093/nar/gkg500 10.1145/1542431.1542435 10.1145/1640089.1640097 10.1145/2555243.2555252 10.1145/268998.266669 10.1145/69558.69562 10.1109/SP.2011.39 10.1145/1248648.1248652 10.1145/2034675.2034685 10.5555/1924943.1924957 10.1145/1772954.1772958 10.1145/2535838.2535842 10.1093/bioinformatics/btp698 10.1145/363516.363526 10.1145/2043556.2043587 10.1145/1250734.1250746 |
| ContentType | Journal Article |
| DBID | AAYXX CITATION |
| DOI | 10.1145/3133897 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 2475-1421 |
| EndPage | 26 |
| ExternalDocumentID | 10_1145_3133897 |
| GroupedDBID | AAKMM AAYFX AAYXX ACM AEFXT AEJOY AIKLT AKRVB ALMA_UNASSIGNED_HOLDINGS CITATION EBS GUFHI LHSKQ M~E OK1 ROL |
| ID | FETCH-LOGICAL-c258t-9a82964e4912d9628e1e47a123d3f09bd8904e631c14217c45af1b5c3f8175313 |
| ISICitedReferencesCount | 4 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000688014000029&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2475-1421 |
| IngestDate | Tue Nov 18 22:27:52 EST 2025 Sat Nov 29 07:49:00 EST 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | OOPSLA |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c258t-9a82964e4912d9628e1e47a123d3f09bd8904e631c14217c45af1b5c3f8175313 |
| OpenAccessLink | https://dl.acm.org/doi/pdf/10.1145/3133897 |
| PageCount | 26 |
| ParticipantIDs | crossref_primary_10_1145_3133897 crossref_citationtrail_10_1145_3133897 |
| PublicationCentury | 2000 |
| PublicationDate | 2017-10-01 |
| PublicationDateYYYYMMDD | 2017-10-01 |
| PublicationDate_xml | – month: 10 year: 2017 text: 2017-10-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationTitle | Proceedings of ACM on programming languages |
| PublicationYear | 2017 |
| References | e_1_2_2_4_1 e_1_2_2_24_1 e_1_2_2_49_1 e_1_2_2_6_1 Cardelli Luca (e_1_2_2_10_1) e_1_2_2_22_1 e_1_2_2_20_1 e_1_2_2_2_1 e_1_2_2_41_1 e_1_2_2_8_1 e_1_2_2_28_1 e_1_2_2_45_1 e_1_2_2_26_1 e_1_2_2_47_1 Schloss Patrick D (e_1_2_2_51_1) 2009 Salzman Peter Jay (e_1_2_2_50_1) e_1_2_2_13_1 e_1_2_2_38_1 e_1_2_2_11_1 Blelloch Guy (e_1_2_2_5_1) 1992 e_1_2_2_30_1 e_1_2_2_19_1 e_1_2_2_32_1 e_1_2_2_53_1 e_1_2_2_17_1 e_1_2_2_34_1 e_1_2_2_55_1 e_1_2_2_36_1 e_1_2_2_57_1 e_1_2_2_25_1 e_1_2_2_48_1 e_1_2_2_23_1 e_1_2_2_7_1 e_1_2_2_21_1 e_1_2_2_1_1 e_1_2_2_40_1 e_1_2_2_42_1 e_1_2_2_9_1 e_1_2_2_29_1 e_1_2_2_44_1 e_1_2_2_27_1 e_1_2_2_46_1 Benet Juan (e_1_2_2_3_1) 2014 (e_1_2_2_43_1) 2015 Devecsery David (e_1_2_2_15_1) e_1_2_2_12_1 e_1_2_2_39_1 e_1_2_2_52_1 e_1_2_2_31_1 e_1_2_2_54_1 e_1_2_2_18_1 e_1_2_2_33_1 e_1_2_2_56_1 e_1_2_2_16_1 e_1_2_2_35_1 (e_1_2_2_37_1) 2015 Wiki Debian (e_1_2_2_14_1) 2016 |
| References_xml | – ident: e_1_2_2_16_1 doi: 10.1145/1508244.1508255 – ident: e_1_2_2_54_1 – ident: e_1_2_2_35_1 doi: 10.1145/2741948.2741960 – ident: e_1_2_2_4_1 doi: 10.5555/1924943.1924956 – volume-title: https://wiki.debian.org/ReproducibleBuilds?action=recall&rev=339 [Online year: 2016 ident: e_1_2_2_14_1 – ident: e_1_2_2_17_1 doi: 10.1145/1950365.1950376 – ident: e_1_2_2_24_1 doi: 10.1145/2666356.2594312 – ident: e_1_2_2_56_1 doi: 10.1145/2364506.2364524 – volume-title: Nix: The Purely Functional Package Manager. year: 2015 ident: e_1_2_2_43_1 – ident: e_1_2_2_33_1 doi: 10.1145/3037697.3037751 – volume-title: Introducing mothur: open-source, platformindependent, community-supported software for describing and comparing microbial communities. Applied and environmental microbiology 75, 23 year: 2009 ident: e_1_2_2_51_1 – ident: e_1_2_2_8_1 doi: 10.1007/3-540-44898-5_4 – ident: e_1_2_2_42_1 doi: 10.1145/2541940.2541964 – volume-title: Eidetic Systems. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI’14) ident: e_1_2_2_15_1 – ident: e_1_2_2_21_1 doi: 10.1109/HPCA.2011.5749741 – ident: e_1_2_2_28_1 – ident: e_1_2_2_19_1 doi: 10.1145/277650.277725 – ident: e_1_2_2_22_1 doi: 10.1145/2451116.2451170 – volume-title: NESL: A Nested Data-Parallel Language. Technical Report CMU-CS-92-103 year: 1992 ident: e_1_2_2_5_1 – ident: e_1_2_2_9_1 doi: 10.1145/1869459.1869515 – ident: e_1_2_2_57_1 doi: 10.1007/3-540-45937-5_14 – ident: e_1_2_2_48_1 doi: 10.1145/2815400.2815411 – ident: e_1_2_2_34_1 doi: 10.1145/125826.125861 – ident: e_1_2_2_44_1 doi: 10.1145/1508244.1508256 – ident: e_1_2_2_30_1 doi: 10.1145/1863523.1863535 – ident: e_1_2_2_18_1 doi: 10.1093/bioinformatics/14.9.755 – ident: e_1_2_2_49_1 doi: 10.1017/S0956796897002943 – ident: e_1_2_2_32_1 doi: 10.1145/1596550.1596563 – ident: e_1_2_2_7_1 doi: 10.1007/978-3-642-22655-7_15 – ident: e_1_2_2_36_1 doi: 10.1145/2465351.2465365 – ident: e_1_2_2_53_1 doi: 10.1093/bioinformatics/btu033 – volume-title: Modula-3 report (revised) ident: e_1_2_2_10_1 – ident: e_1_2_2_52_1 doi: 10.1145/215399.215419 – ident: e_1_2_2_38_1 doi: 10.1145/292540.292561 – ident: e_1_2_2_47_1 doi: 10.1145/2254064.2254127 – ident: e_1_2_2_46_1 doi: 10.1145/301618.301637 – ident: e_1_2_2_26_1 doi: 10.1145/2034675.2034686 – ident: e_1_2_2_12_1 doi: 10.1093/nar/gkg500 – ident: e_1_2_2_20_1 doi: 10.1145/1542431.1542435 – volume-title: The Linux Kernel Module Programming Guide. CreateSpace ident: e_1_2_2_50_1 – ident: e_1_2_2_6_1 doi: 10.1145/1640089.1640097 – ident: e_1_2_2_23_1 doi: 10.1145/2555243.2555252 – volume-title: Versioned, P2P File System. CoRR abs/1407.3561 year: 2014 ident: e_1_2_2_3_1 – volume-title: rr: lightweight recording & year: 2015 ident: e_1_2_2_37_1 – ident: e_1_2_2_39_1 doi: 10.1145/268998.266669 – ident: e_1_2_2_1_1 doi: 10.1145/69558.69562 – ident: e_1_2_2_55_1 doi: 10.1109/SP.2011.39 – ident: e_1_2_2_11_1 doi: 10.1145/1248648.1248652 – ident: e_1_2_2_31_1 doi: 10.1145/2034675.2034685 – ident: e_1_2_2_2_1 doi: 10.5555/1924943.1924957 – ident: e_1_2_2_45_1 doi: 10.1145/1772954.1772958 – ident: e_1_2_2_25_1 doi: 10.1145/2535838.2535842 – ident: e_1_2_2_27_1 doi: 10.1093/bioinformatics/btp698 – ident: e_1_2_2_40_1 doi: 10.1145/363516.363526 – ident: e_1_2_2_29_1 doi: 10.1145/2043556.2043587 – ident: e_1_2_2_13_1 – ident: e_1_2_2_41_1 doi: 10.1145/1250734.1250746 |
| SSID | ssj0001934839 |
| Score | 2.0401025 |
| Snippet | Achieving determinism on real software systems remains difficult. Even a batch-processing job, whose task is to map input bits to output bits, risks... |
| SourceID | crossref |
| SourceType | Enrichment Source Index Database |
| StartPage | 1 |
| Title | Monadic composition for deterministic, parallel batch processing |
| Volume | 1 |
| WOSCitedRecordID | wos000688014000029&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2475-1421 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001934839 issn: 2475-1421 databaseCode: M~E dateStart: 20170101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9NAEF6FlgMXSnmItoD2gLgEB-_DsfdGVGg5pE3UFNRb5F2vRZDrRmkayoV_wf9l9uGtG5CAAxcrcnatZOfbmfHsNzMIvWRUqhIsUcSpaWEGPkUkc8WiWBKelHGfatsO6NMwPT7Ozs7EuNP50eTCrKq0rrPrazH_r6KGeyBskzr7D-IOD4Ub8BmEDlcQO1z_SvCwS_NipixZ3DOyLJew8MQXW5nZrKyp-l1VuupKUMefu3OXMtCYMu-wjoOBs5yPwf6ROV7wrK5zE2doIp7BOZ8oTwI5-QbK47AX4s35ylR87A717It1WUfn-aI7Cd-_MzZ66egF_mAiRKq_epa_feRJrx2rAPvXsN68SqM8TSLCXU50T__mXqOTW9AbjcaT4aClZEnLWrt0-1_tADclM5h5AXf039uVttcsYOAluiztZOon3kGbNE2E0fdH31uhO8F4ZnvUhd_u0rHN3Dd-bsvPaTkspw_Qff-mgQcOIduoo-uHaKvp4oG9Un-E3nrA4BZgMAAG3wLMa9zABVu44Bu4PEYfD96f7n-IfF-NSNEkW0Yiz8xhu-aC0EL0aaaJ5mkOPkzByljIIhMx131GlPlrqeJJXhKZKFZmpq4rYU_QRn1R66cIwwanksQMLEPBcwrbHFxoQUqZSHB8qdpBr5pVmCpfdN70Pqmma0u9g3AYOHd1VtaH7P55yB66dwO7Z2hjubjSz9FdtVrOLhcvrBR_Aii8blQ |
| linkProvider | ISSN International Centre |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Monadic+composition+for+deterministic%2C+parallel+batch+processing&rft.jtitle=Proceedings+of+ACM+on+programming+languages&rft.au=Scott%2C+Ryan+G.&rft.au=Navarro+Leija%2C+Omar+S.&rft.au=Devietti%2C+Joseph&rft.au=Newton%2C+Ryan+R.&rft.date=2017-10-01&rft.issn=2475-1421&rft.eissn=2475-1421&rft.volume=1&rft.issue=OOPSLA&rft.spage=1&rft.epage=26&rft_id=info:doi/10.1145%2F3133897&rft.externalDBID=n%2Fa&rft.externalDocID=10_1145_3133897 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2475-1421&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2475-1421&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2475-1421&client=summon |