Adaptive memory reservation strategy for heavy workloads in the Spark environment.

Uloženo v:
Podrobná bibliografie
Název: Adaptive memory reservation strategy for heavy workloads in the Spark environment.
Autoři: Li, Bohan, He, Xin, Yu, Junyang, Wang, Guanghui, Song, Yixin, Pan, Shunjie, Gu, Hangyu
Zdroj: PeerJ Computer Science; Nov2024, p1-28, 28p
Témata: DISTRIBUTED computing, PARALLEL programming, INTERNET of things, PARALLEL processing, EVICTION
Abstrakt: The rise of the Internet of Things (IoT) and Industry 2.0 has spurred a growing need for extensive data computing, and Spark emerged as a promising Big Data platform, attributed to its distributed in-memory computing capabilities. However, practical heavy workloads often lead to memory bottleneck issues in the Spark platform. This results in resilient distributed datasets (RDD) eviction and, in extreme cases, violent memory contentions, causing a significant degradation in Spark computational efficiency. To tackle this issue, we propose an adaptive memory reservation (AMR) strategy in this article, specifically designed for heavy workloads in the Spark environment. Specifically, we model optimal task parallelism by minimizing the disparity between the number of tasks completed without blocking and the number completed in regular rounds. Optimal memory for task parallelism is determined to establish an efficient execution memory space for computational parallelism. Subsequently, through adaptive execution memory reservation and dynamic adjustments, such as compression or expansion based on task progress, the strategy ensures dynamic task parallelism in the Spark parallel computing process. Considering the cost of RDD cache location and real-time memory space usage, we select suitable storage locations for different RDD types to alleviate execution memory pressure. Finally, we conduct extensive laboratory experiments to validate the effectiveness of AMR. Results indicate that, compared to existing memory management solutions, AMR reduces the execution time by approximately 46.8%. [ABSTRACT FROM AUTHOR]
Copyright of PeerJ Computer Science is the property of PeerJ Inc. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Databáze: Complementary Index
FullText Text:
  Availability: 0
CustomLinks:
  – Url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=search&db=pmc&term=2376-5992[TA]+AND+1[PG]+AND+2024[PDAT]
    Name: FREE - PubMed Central (ISSN based link)
    Category: fullText
    Text: Full Text
    Icon: https://imageserver.ebscohost.com/NetImages/iconPdf.gif
    MouseOverText: Check this PubMed for the article full text.
  – Url: https://resolver.ebscohost.com/openurl?sid=EBSCO:edb&genre=article&issn=23765992&ISBN=&volume=&issue=&date=20241101&spage=1&pages=1-28&title=PeerJ Computer Science&atitle=Adaptive%20memory%20reservation%20strategy%20for%20heavy%20workloads%20in%20the%20Spark%20environment.&aulast=Li%2C%20Bohan&id=DOI:10.7717/peerj-cs.2460
    Name: Full Text Finder
    Category: fullText
    Text: Full Text Finder
    Icon: https://imageserver.ebscohost.com/branding/images/FTF.gif
    MouseOverText: Full Text Finder
  – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=Li%20B
    Name: ISI
    Category: fullText
    Text: Nájsť tento článok vo Web of Science
    Icon: https://imagesrvr.epnet.com/ls/20docs.gif
    MouseOverText: Nájsť tento článok vo Web of Science
Header DbId: edb
DbLabel: Complementary Index
An: 181524311
RelevancyScore: 1007
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 1007.06042480469
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Adaptive memory reservation strategy for heavy workloads in the Spark environment.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Li%2C+Bohan%22">Li, Bohan</searchLink><br /><searchLink fieldCode="AR" term="%22He%2C+Xin%22">He, Xin</searchLink><br /><searchLink fieldCode="AR" term="%22Yu%2C+Junyang%22">Yu, Junyang</searchLink><br /><searchLink fieldCode="AR" term="%22Wang%2C+Guanghui%22">Wang, Guanghui</searchLink><br /><searchLink fieldCode="AR" term="%22Song%2C+Yixin%22">Song, Yixin</searchLink><br /><searchLink fieldCode="AR" term="%22Pan%2C+Shunjie%22">Pan, Shunjie</searchLink><br /><searchLink fieldCode="AR" term="%22Gu%2C+Hangyu%22">Gu, Hangyu</searchLink>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: PeerJ Computer Science; Nov2024, p1-28, 28p
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22DISTRIBUTED+computing%22">DISTRIBUTED computing</searchLink><br /><searchLink fieldCode="DE" term="%22PARALLEL+programming%22">PARALLEL programming</searchLink><br /><searchLink fieldCode="DE" term="%22INTERNET+of+things%22">INTERNET of things</searchLink><br /><searchLink fieldCode="DE" term="%22PARALLEL+processing%22">PARALLEL processing</searchLink><br /><searchLink fieldCode="DE" term="%22EVICTION%22">EVICTION</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: The rise of the Internet of Things (IoT) and Industry 2.0 has spurred a growing need for extensive data computing, and Spark emerged as a promising Big Data platform, attributed to its distributed in-memory computing capabilities. However, practical heavy workloads often lead to memory bottleneck issues in the Spark platform. This results in resilient distributed datasets (RDD) eviction and, in extreme cases, violent memory contentions, causing a significant degradation in Spark computational efficiency. To tackle this issue, we propose an adaptive memory reservation (AMR) strategy in this article, specifically designed for heavy workloads in the Spark environment. Specifically, we model optimal task parallelism by minimizing the disparity between the number of tasks completed without blocking and the number completed in regular rounds. Optimal memory for task parallelism is determined to establish an efficient execution memory space for computational parallelism. Subsequently, through adaptive execution memory reservation and dynamic adjustments, such as compression or expansion based on task progress, the strategy ensures dynamic task parallelism in the Spark parallel computing process. Considering the cost of RDD cache location and real-time memory space usage, we select suitable storage locations for different RDD types to alleviate execution memory pressure. Finally, we conduct extensive laboratory experiments to validate the effectiveness of AMR. Results indicate that, compared to existing memory management solutions, AMR reduces the execution time by approximately 46.8%. [ABSTRACT FROM AUTHOR]
– Name: Abstract
  Label:
  Group: Ab
  Data: <i>Copyright of PeerJ Computer Science is the property of PeerJ Inc. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edb&AN=181524311
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.7717/peerj-cs.2460
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 28
        StartPage: 1
    Subjects:
      – SubjectFull: DISTRIBUTED computing
        Type: general
      – SubjectFull: PARALLEL programming
        Type: general
      – SubjectFull: INTERNET of things
        Type: general
      – SubjectFull: PARALLEL processing
        Type: general
      – SubjectFull: EVICTION
        Type: general
    Titles:
      – TitleFull: Adaptive memory reservation strategy for heavy workloads in the Spark environment.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Li, Bohan
      – PersonEntity:
          Name:
            NameFull: He, Xin
      – PersonEntity:
          Name:
            NameFull: Yu, Junyang
      – PersonEntity:
          Name:
            NameFull: Wang, Guanghui
      – PersonEntity:
          Name:
            NameFull: Song, Yixin
      – PersonEntity:
          Name:
            NameFull: Pan, Shunjie
      – PersonEntity:
          Name:
            NameFull: Gu, Hangyu
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 11
              Text: Nov2024
              Type: published
              Y: 2024
          Identifiers:
            – Type: issn-print
              Value: 23765992
          Titles:
            – TitleFull: PeerJ Computer Science
              Type: main
ResultId 1