TeraHeap

Uložené v:
Podrobná bibliografia
Názov: TeraHeap
Autori: Kolokasis, Iacovos G., Evdorou, Giannos, Akram, Shoaib, Kozanitis, Christos, Papagiannis, Anastasios, Zakkak, Foivos S., Pratikakis, Polyvios, Bilas, Angelos
Zdroj: ACM Transactions on Programming Languages and Systems
Rok vydania: 2024
Zbierka: Australian National University: ANU Digital Collections
Predmety: fast storage devices, garbage collection, Java Virtual Machine (JVM), large analytics, large managed heaps, memory hierarchy, memory management, serialization
Popis: Big data analytics frameworks, such as Spark and Giraph, need to process and cache massive datasets that do not always fit on the managed heap. Therefore, frameworks temporarily move long-lived objects outside the heap (off-heap) on a fast storage device. However, this practice results in (1) high serialization/deserialization (S/D) cost and (2) high memory pressure when off-heap objects are moved back for processing. In this article, we propose TeraHeap, a system that eliminates S/D overhead and expensive GC scans for a large portion of objects in analytics frameworks. TeraHeap relies on three concepts: (1) It eliminates S/D by extending the managed runtime (JVM) to use a second high-capacity heap (H2) over a fast storage device. (2) It offers a simple hint-based interface, allowing analytics frameworks to leverage object knowledge to populate H2. (3) It reduces GC cost by fencing the collector from scanning H2 objects while maintaining the illusion of a single managed heap, ensuring memory safety. We implement TeraHeap in OpenJDK8 and OpenJDK17 and evaluate it with fifteen widely used applications in two real-world big data frameworks, Spark and Giraph. We find that for the same DRAM size, TeraHeap improves performance by up to 73% and 28% compared to native Spark and Giraph. Also, it can still provide better performance by consuming up to and less DRAM than native Spark and Giraph, respectively. TeraHeap can also be used for in-memory frameworks and applying it to the Neo4j Graph Data Science library improves its performance by up to 26%. Finally, it outperforms Panthera, a state-of-the-art garbage collector for hybrid DRAM-NVM memories, by up to 69%. ; We thankfully acknowledge the support of the European Commission under the Horizon 2020 Framework Programme for Research and Innovation through the projects AERO (Grant agreement No. 10048318). Iacovos G. Kolokasis is also supported by the Meta Research PhD Fellowship and the State Scholarship Foundation of Cyprus. ; Peer-reviewed
Druh dokumentu: article in journal/newspaper
Jazyk: English
Relation: http://www.scopus.com/inward/record.url?scp=86000552868&partnerID=8YFLogxK; https://hdl.handle.net/1885/733752737; 86000552868
DOI: 10.1145/3700593
Dostupnosť: https://hdl.handle.net/1885/733752737
http://www.scopus.com/inward/record.url?scp=86000552868&partnerID=8YFLogxK
https://doi.org/10.1145/3700593
Rights: Publisher Copyright: © 2024 Copyright held by the owner/author(s).
Prístupové číslo: edsbas.DBF30768
Databáza: BASE
FullText Text:
  Availability: 0
CustomLinks:
  – Url: https://hdl.handle.net/1885/733752737#
    Name: EDS - BASE (s4221598)
    Category: fullText
    Text: View record from BASE
  – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=Kolokasis%20IG
    Name: ISI
    Category: fullText
    Text: Nájsť tento článok vo Web of Science
    Icon: https://imagesrvr.epnet.com/ls/20docs.gif
    MouseOverText: Nájsť tento článok vo Web of Science
Header DbId: edsbas
DbLabel: BASE
An: edsbas.DBF30768
RelevancyScore: 969
AccessLevel: 3
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 968.605590820313
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: TeraHeap
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Kolokasis%2C+Iacovos+G%2E%22">Kolokasis, Iacovos G.</searchLink><br /><searchLink fieldCode="AR" term="%22Evdorou%2C+Giannos%22">Evdorou, Giannos</searchLink><br /><searchLink fieldCode="AR" term="%22Akram%2C+Shoaib%22">Akram, Shoaib</searchLink><br /><searchLink fieldCode="AR" term="%22Kozanitis%2C+Christos%22">Kozanitis, Christos</searchLink><br /><searchLink fieldCode="AR" term="%22Papagiannis%2C+Anastasios%22">Papagiannis, Anastasios</searchLink><br /><searchLink fieldCode="AR" term="%22Zakkak%2C+Foivos+S%2E%22">Zakkak, Foivos S.</searchLink><br /><searchLink fieldCode="AR" term="%22Pratikakis%2C+Polyvios%22">Pratikakis, Polyvios</searchLink><br /><searchLink fieldCode="AR" term="%22Bilas%2C+Angelos%22">Bilas, Angelos</searchLink>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: ACM Transactions on Programming Languages and Systems
– Name: DatePubCY
  Label: Publication Year
  Group: Date
  Data: 2024
– Name: Subset
  Label: Collection
  Group: HoldingsInfo
  Data: Australian National University: ANU Digital Collections
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22fast+storage+devices%22">fast storage devices</searchLink><br /><searchLink fieldCode="DE" term="%22garbage+collection%22">garbage collection</searchLink><br /><searchLink fieldCode="DE" term="%22Java+Virtual+Machine+%28JVM%29%22">Java Virtual Machine (JVM)</searchLink><br /><searchLink fieldCode="DE" term="%22large+analytics%22">large analytics</searchLink><br /><searchLink fieldCode="DE" term="%22large+managed+heaps%22">large managed heaps</searchLink><br /><searchLink fieldCode="DE" term="%22memory+hierarchy%22">memory hierarchy</searchLink><br /><searchLink fieldCode="DE" term="%22memory+management%22">memory management</searchLink><br /><searchLink fieldCode="DE" term="%22serialization%22">serialization</searchLink>
– Name: Abstract
  Label: Description
  Group: Ab
  Data: Big data analytics frameworks, such as Spark and Giraph, need to process and cache massive datasets that do not always fit on the managed heap. Therefore, frameworks temporarily move long-lived objects outside the heap (off-heap) on a fast storage device. However, this practice results in (1) high serialization/deserialization (S/D) cost and (2) high memory pressure when off-heap objects are moved back for processing. In this article, we propose TeraHeap, a system that eliminates S/D overhead and expensive GC scans for a large portion of objects in analytics frameworks. TeraHeap relies on three concepts: (1) It eliminates S/D by extending the managed runtime (JVM) to use a second high-capacity heap (H2) over a fast storage device. (2) It offers a simple hint-based interface, allowing analytics frameworks to leverage object knowledge to populate H2. (3) It reduces GC cost by fencing the collector from scanning H2 objects while maintaining the illusion of a single managed heap, ensuring memory safety. We implement TeraHeap in OpenJDK8 and OpenJDK17 and evaluate it with fifteen widely used applications in two real-world big data frameworks, Spark and Giraph. We find that for the same DRAM size, TeraHeap improves performance by up to 73% and 28% compared to native Spark and Giraph. Also, it can still provide better performance by consuming up to and less DRAM than native Spark and Giraph, respectively. TeraHeap can also be used for in-memory frameworks and applying it to the Neo4j Graph Data Science library improves its performance by up to 26%. Finally, it outperforms Panthera, a state-of-the-art garbage collector for hybrid DRAM-NVM memories, by up to 69%. ; We thankfully acknowledge the support of the European Commission under the Horizon 2020 Framework Programme for Research and Innovation through the projects AERO (Grant agreement No. 10048318). Iacovos G. Kolokasis is also supported by the Meta Research PhD Fellowship and the State Scholarship Foundation of Cyprus. ; Peer-reviewed
– Name: TypeDocument
  Label: Document Type
  Group: TypDoc
  Data: article in journal/newspaper
– Name: Language
  Label: Language
  Group: Lang
  Data: English
– Name: NoteTitleSource
  Label: Relation
  Group: SrcInfo
  Data: http://www.scopus.com/inward/record.url?scp=86000552868&partnerID=8YFLogxK; https://hdl.handle.net/1885/733752737; 86000552868
– Name: DOI
  Label: DOI
  Group: ID
  Data: 10.1145/3700593
– Name: URL
  Label: Availability
  Group: URL
  Data: https://hdl.handle.net/1885/733752737<br />http://www.scopus.com/inward/record.url?scp=86000552868&partnerID=8YFLogxK<br />https://doi.org/10.1145/3700593
– Name: Copyright
  Label: Rights
  Group: Cpyrght
  Data: Publisher Copyright: © 2024 Copyright held by the owner/author(s).
– Name: AN
  Label: Accession Number
  Group: ID
  Data: edsbas.DBF30768
PLink https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.DBF30768
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1145/3700593
    Languages:
      – Text: English
    Subjects:
      – SubjectFull: fast storage devices
        Type: general
      – SubjectFull: garbage collection
        Type: general
      – SubjectFull: Java Virtual Machine (JVM)
        Type: general
      – SubjectFull: large analytics
        Type: general
      – SubjectFull: large managed heaps
        Type: general
      – SubjectFull: memory hierarchy
        Type: general
      – SubjectFull: memory management
        Type: general
      – SubjectFull: serialization
        Type: general
    Titles:
      – TitleFull: TeraHeap
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Kolokasis, Iacovos G.
      – PersonEntity:
          Name:
            NameFull: Evdorou, Giannos
      – PersonEntity:
          Name:
            NameFull: Akram, Shoaib
      – PersonEntity:
          Name:
            NameFull: Kozanitis, Christos
      – PersonEntity:
          Name:
            NameFull: Papagiannis, Anastasios
      – PersonEntity:
          Name:
            NameFull: Zakkak, Foivos S.
      – PersonEntity:
          Name:
            NameFull: Pratikakis, Polyvios
      – PersonEntity:
          Name:
            NameFull: Bilas, Angelos
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 01
              Type: published
              Y: 2024
          Identifiers:
            – Type: issn-locals
              Value: edsbas
            – Type: issn-locals
              Value: edsbas.oa
          Titles:
            – TitleFull: ACM Transactions on Programming Languages and Systems
              Type: main
ResultId 1