External memory pipelining made easy with TPIE
When handling large datasets that exceed the capacity of the main memory, movement of data between main memory and external memory (disk), rather than actual (CPU) computation time, is often the bottleneck in the computation. Since data is moved between disk and main memory in large contiguous block...
Saved in:
| Published in: | 2017 IEEE International Conference on Big Data (Big Data) pp. 319 - 324 |
|---|---|
| Main Authors: | , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
01.12.2017
|
| Subjects: | |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | When handling large datasets that exceed the capacity of the main memory, movement of data between main memory and external memory (disk), rather than actual (CPU) computation time, is often the bottleneck in the computation. Since data is moved between disk and main memory in large contiguous blocks, this has led to the development of a large number of I/O-efficient algorithms that minimize the number of such block movements. However, actually implementing these algorithms can be somewhat of a challenge since operating systems do not give complete control over movement of blocks and management of main memory. TPIE is one of two major libraries that have been developed to support I/O-efficient algorithm implementations. It relies heavily on the fact that most I/O-efficient algorithms are naturally composed of components that stream through one or more lists of data items, while producing one or more such output lists, or components that sort such lists. Thus TPIE provides an interface where list stream processing and sorting can be implemented in a simple and modular way without having to worry about memory management or block movement. However, if care is not taken, such streaming-based implementations can lead to practically inefficient algorithms since lists of data items are typically written to (and read from) disk between components. In this paper we present a major extension of the TPIE library that includes a pipelining framework that allows for practically efficient streaming-based implementations while minimizing I/O-overhead between streaming components. The framework pipelines streaming components to avoid I/Os between components, that is, it processes several components simultaneously while passing output from one component directly to the input of the next component in main memory. TPIE automatically determines which components to pipeline and performs the required main memory management, and the extension also includes support for parallelization of internal memory computation and progress tracking across an entire application. Thus TPIE supports efficient streaming-based implementations of I/O-efficient algorithms in a simple, modular and maintainable way. The extended library has already been used to evaluate I/O-efficient algorithms in the research literature, and is heavily used in I/O-efficient commercial terrain processing applications by the Danish startup SCALGO. |
|---|---|
| AbstractList | When handling large datasets that exceed the capacity of the main memory, movement of data between main memory and external memory (disk), rather than actual (CPU) computation time, is often the bottleneck in the computation. Since data is moved between disk and main memory in large contiguous blocks, this has led to the development of a large number of I/O-efficient algorithms that minimize the number of such block movements. However, actually implementing these algorithms can be somewhat of a challenge since operating systems do not give complete control over movement of blocks and management of main memory. TPIE is one of two major libraries that have been developed to support I/O-efficient algorithm implementations. It relies heavily on the fact that most I/O-efficient algorithms are naturally composed of components that stream through one or more lists of data items, while producing one or more such output lists, or components that sort such lists. Thus TPIE provides an interface where list stream processing and sorting can be implemented in a simple and modular way without having to worry about memory management or block movement. However, if care is not taken, such streaming-based implementations can lead to practically inefficient algorithms since lists of data items are typically written to (and read from) disk between components. In this paper we present a major extension of the TPIE library that includes a pipelining framework that allows for practically efficient streaming-based implementations while minimizing I/O-overhead between streaming components. The framework pipelines streaming components to avoid I/Os between components, that is, it processes several components simultaneously while passing output from one component directly to the input of the next component in main memory. TPIE automatically determines which components to pipeline and performs the required main memory management, and the extension also includes support for parallelization of internal memory computation and progress tracking across an entire application. Thus TPIE supports efficient streaming-based implementations of I/O-efficient algorithms in a simple, modular and maintainable way. The extended library has already been used to evaluate I/O-efficient algorithms in the research literature, and is heavily used in I/O-efficient commercial terrain processing applications by the Danish startup SCALGO. |
| Author | Arge, Lars Svendsen, Svend C. Rav, Mathias Truelsen, Jakob |
| Author_xml | – sequence: 1 givenname: Lars surname: Arge fullname: Arge, Lars email: large@madalgo.au.dk organization: Dept. of Comput. Sci., Aarhus Univ. Aarhus, Aarhus, Denmark – sequence: 2 givenname: Mathias surname: Rav fullname: Rav, Mathias email: rav@madalgo.au.dk organization: Dept. of Comput. Sci., Aarhus Univ. Aarhus, Aarhus, Denmark – sequence: 3 givenname: Svend C. surname: Svendsen fullname: Svendsen, Svend C. email: svendcs@madalgo.au.dk organization: Dept. of Comput. Sci., Aarhus Univ. Aarhus, Aarhus, Denmark – sequence: 4 givenname: Jakob surname: Truelsen fullname: Truelsen, Jakob email: jakob@scalgo.com organization: SCALGO, Aarhus, Denmark |
| BookMark | eNotj0tOwzAUAI0ECyg9ASx8gQS_-L-EEkqlSrDIvnp2noulJI3SSJDbg0RXs5vR3LHr4TQQY48gSgDhn17y8RVnLCsBtnSVtl6JK7b21oGWzlQWtLhlZf0z0zRgx3vqT9PCxzxSl4c8HHmPLXHC88K_8_zFm89dfc9uEnZnWl-4Ys1b3Wzei_3Hdrd53hfZi7kgmQClCVIlpSOZAPCXa0NUwXpDAlzUiEb5GFqPymhCFcklqFqBIlm5Yg__2kxEh3HKPU7L4TIhfwE80UFs |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/BigData.2017.8257940 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9781538627150 1538627159 |
| EndPage | 324 |
| ExternalDocumentID | 8257940 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL CBEJK RIE RIL |
| ID | FETCH-LOGICAL-i90t-e3f1a36b34f45ce6b11271dbc4b796e018c5aa649cbd9a465ea4ce8f12d0a0f73 |
| IEDL.DBID | RIE |
| IngestDate | Thu Jun 29 18:36:30 EDT 2023 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i90t-e3f1a36b34f45ce6b11271dbc4b796e018c5aa649cbd9a465ea4ce8f12d0a0f73 |
| PageCount | 6 |
| ParticipantIDs | ieee_primary_8257940 |
| PublicationCentury | 2000 |
| PublicationDate | 2017-Dec. |
| PublicationDateYYYYMMDD | 2017-12-01 |
| PublicationDate_xml | – month: 12 year: 2017 text: 2017-Dec. |
| PublicationDecade | 2010 |
| PublicationTitle | 2017 IEEE International Conference on Big Data (Big Data) |
| PublicationTitleAbbrev | BigData |
| PublicationYear | 2017 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| Score | 1.6655283 |
| Snippet | When handling large datasets that exceed the capacity of the main memory, movement of data between main memory and external memory (disk), rather than actual... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 319 |
| SubjectTerms | Algorithm design and analysis C++ Hardware I/O-efficient algorithms Libraries Memory management Operating systems Pipeline processing Software algorithms software framework |
| Title | External memory pipelining made easy with TPIE |
| URI | https://ieeexplore.ieee.org/document/8257940 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB5q8eBJpRXf5ODRtJtuNtlc1RYFKT300FtJZieyhz6oW6H_3mS7VAQv3kIYCDMJzCPzfQPwYBLyAgVyCrE9l8EH8NxqxZ1DVD636KkGCr_r8TifzcykBY8HLAwR1c1n1IvL-i-_WOE2lsr6IZsJzyck6Edaqz1Wq0HDicT0n8qPF1tFLiGhe43or5kptcsYnf7vsDPo_mDv2OTgVc6hRcsO9IYNWTNbxM7YHVuX6wgkDxJsYQtiZD93LNZU2XTyNuzCdDScPr_yZtIBL01ScUq9sKlyqfQyQ1IuBEFaFA6l00ZRInLMrFXSoCuMlSojK5FyLwZFYhOv0wtoL1dLugRmSQQbq1QOvJSZR6eyIkf0WmeaSNMVdKKq8_Wey2LeaHn99_YNnERr7ts3bqFdbbZ0B8f4VZWfm_v6Ar4BEYqK3w |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA6lCnpSacW3OXh02003j92r2tJiLT3sobeSzE5kD33QboX-e5PdpSJ48RbCkDAzgXlkvhlCnpIQLQMGATrfPuDOBgSxVjIwBkDaWIPFEig8VpNJPJsl0wZ5PmBhELEsPsOOX5Z_-dkKdj5V1nXRjHs-LkA_Eu7UsEJr1Xg4Fibdl_zzTRe-mxBTnZr419SU0mgMzv533Tlp_6Dv6PRgVy5IA5ct0unX7ZrpwtfG7uk6X3souaOgC50hRb3dU59Vpel01G-TdNBPX4dBPesgyJOwCDCyTEfSRNxyASiNc4MUywxwoxKJIYtBaC15AiZLNJcCNQeMLetloQ6tii5Jc7la4hWhGpmTsox4z3IuLBgpshjAKiUUosJr0vKsztdVN4t5zeXN39uP5GSYfozn49Hk_ZaceslWxRx3pFlsdnhPjuGryLebh1IZ33wMjiY |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2017+IEEE+International+Conference+on+Big+Data+%28Big+Data%29&rft.atitle=External+memory+pipelining+made+easy+with+TPIE&rft.au=Arge%2C+Lars&rft.au=Rav%2C+Mathias&rft.au=Svendsen%2C+Svend+C.&rft.au=Truelsen%2C+Jakob&rft.date=2017-12-01&rft.pub=IEEE&rft.spage=319&rft.epage=324&rft_id=info:doi/10.1109%2FBigData.2017.8257940&rft.externalDocID=8257940 |