Adaptive scheduling of inference pipelines on multicore architectures

Uloženo v:
Podrobná bibliografie
Název: Adaptive scheduling of inference pipelines on multicore architectures
Autoři: Soomro, Pirah Noor, 1993
Témata: Online tuning, CNN parallel pipelines, Design space exploration, Interference Mitigation, Heterogeneous computing units, Processing on chiplets, Inference Serving Systems
Popis: In today’s data-driven world, machine learning (ML) algorithms, particularly Convolutional Neural Networks (CNNs), play a pivotal role in powering a myriad of applications across various domains. As the demand for real-time inference continues to escalate, optimizing CNN inference across diverse computational platforms becomes imperative. This thesis addresses this challenge by exploring the complexities posed by heterogeneous edge devices, chiplet-based architectures, and inference-serving systems. Heterogeneous edge devices present unique challenges due to resource constraints and architectural diversity, while chiplet-based architectures offer potential enhancements in inference performance. Leveraging innovative techniques such as online tuning algorithms, malleable and moldable inference pipelines, and adaptive scheduling strategies, our thesis proposes a comprehensive framework for optimizing DNN inference. This framework aims to advance system performance, reduce latency, and mitigate interference effects, thereby contributing to the development of more efficient and scalable AI systems capable of meeting the evolving demands of real-time inference across diverse computational platforms. The thesis addresses several key problem statements, including enabling runtime scheduling of inference pipelines on edge devices, fully online scheduling of inference pipelines on heterogeneous platforms, mitigating interference effects on inference pipelines in inference-serving systems, and optimizing resource allocation in inference-serving systems for adaptive SLO-aware inference serving. The contributions of this thesis are encapsulated in four papers, each focusing on distinct aspects of CNN inference optimization. These contributions include the development of comprehensive frameworks for online scheduling of CNN pipelines, leveraging platform knowledge for expedited seed generation, dynamic scheduling techniques to alleviate interference effects, and SLO-aware scheduling techniques for optimizing resource allocation in inference-serving systems. Through these contributions, this thesis seeks to advance the state-of-the-art in CNN inference optimization and inference-serving systems, paving the way for more efficient and scalable AI systems capable of meeting the demands of real-time inference across diverse computational platforms.
Popis souboru: electronic
Přístupová URL adresa: https://research.chalmers.se/publication/547635
https://research.chalmers.se/publication/547635/file/547635_Fulltext.pdf
Databáze: SwePub
FullText Text:
  Availability: 0
CustomLinks:
  – Url: https://research.chalmers.se/publication/547635#
    Name: EDS - SwePub (s4221598)
    Category: fullText
    Text: View record in SwePub
  – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=Soomro%20PN
    Name: ISI
    Category: fullText
    Text: Nájsť tento článok vo Web of Science
    Icon: https://imagesrvr.epnet.com/ls/20docs.gif
    MouseOverText: Nájsť tento článok vo Web of Science
Header DbId: edsswe
DbLabel: SwePub
An: edsswe.oai.research.chalmers.se.9dcfc6c3.8c1b.49ec.a5a2.694d4b341b28
RelevancyScore: 987
AccessLevel: 6
PubType:
PubTypeId: unknown
PreciseRelevancyScore: 986.736389160156
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Adaptive scheduling of inference pipelines on multicore architectures
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Soomro%2C+Pirah+Noor%22">Soomro, Pirah Noor</searchLink>, 1993
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Online+tuning%22">Online tuning</searchLink><br /><searchLink fieldCode="DE" term="%22CNN+parallel+pipelines%22">CNN parallel pipelines</searchLink><br /><searchLink fieldCode="DE" term="%22Design+space+exploration%22">Design space exploration</searchLink><br /><searchLink fieldCode="DE" term="%22Interference+Mitigation%22">Interference Mitigation</searchLink><br /><searchLink fieldCode="DE" term="%22Heterogeneous+computing+units%22">Heterogeneous computing units</searchLink><br /><searchLink fieldCode="DE" term="%22Processing+on+chiplets%22">Processing on chiplets</searchLink><br /><searchLink fieldCode="DE" term="%22Inference+Serving+Systems%22">Inference Serving Systems</searchLink>
– Name: Abstract
  Label: Description
  Group: Ab
  Data: In today’s data-driven world, machine learning (ML) algorithms, particularly Convolutional Neural Networks (CNNs), play a pivotal role in powering a myriad of applications across various domains. As the demand for real-time inference continues to escalate, optimizing CNN inference across diverse computational platforms becomes imperative. This thesis addresses this challenge by exploring the complexities posed by heterogeneous edge devices, chiplet-based architectures, and inference-serving systems. Heterogeneous edge devices present unique challenges due to resource constraints and architectural diversity, while chiplet-based architectures offer potential enhancements in inference performance. Leveraging innovative techniques such as online tuning algorithms, malleable and moldable inference pipelines, and adaptive scheduling strategies, our thesis proposes a comprehensive framework for optimizing DNN inference. This framework aims to advance system performance, reduce latency, and mitigate interference effects, thereby contributing to the development of more efficient and scalable AI systems capable of meeting the evolving demands of real-time inference across diverse computational platforms. The thesis addresses several key problem statements, including enabling runtime scheduling of inference pipelines on edge devices, fully online scheduling of inference pipelines on heterogeneous platforms, mitigating interference effects on inference pipelines in inference-serving systems, and optimizing resource allocation in inference-serving systems for adaptive SLO-aware inference serving. The contributions of this thesis are encapsulated in four papers, each focusing on distinct aspects of CNN inference optimization. These contributions include the development of comprehensive frameworks for online scheduling of CNN pipelines, leveraging platform knowledge for expedited seed generation, dynamic scheduling techniques to alleviate interference effects, and SLO-aware scheduling techniques for optimizing resource allocation in inference-serving systems. Through these contributions, this thesis seeks to advance the state-of-the-art in CNN inference optimization and inference-serving systems, paving the way for more efficient and scalable AI systems capable of meeting the demands of real-time inference across diverse computational platforms.
– Name: Format
  Label: File Description
  Group: SrcInfo
  Data: electronic
– Name: URL
  Label: Access URL
  Group: URL
  Data: <link linkTarget="URL" linkTerm="https://research.chalmers.se/publication/547635" linkWindow="_blank">https://research.chalmers.se/publication/547635</link><br /><link linkTarget="URL" linkTerm="https://research.chalmers.se/publication/547635/file/547635_Fulltext.pdf" linkWindow="_blank">https://research.chalmers.se/publication/547635/file/547635_Fulltext.pdf</link>
PLink https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsswe&AN=edsswe.oai.research.chalmers.se.9dcfc6c3.8c1b.49ec.a5a2.694d4b341b28
RecordInfo BibRecord:
  BibEntity:
    Languages:
      – Text: English
    Subjects:
      – SubjectFull: Online tuning
        Type: general
      – SubjectFull: CNN parallel pipelines
        Type: general
      – SubjectFull: Design space exploration
        Type: general
      – SubjectFull: Interference Mitigation
        Type: general
      – SubjectFull: Heterogeneous computing units
        Type: general
      – SubjectFull: Processing on chiplets
        Type: general
      – SubjectFull: Inference Serving Systems
        Type: general
    Titles:
      – TitleFull: Adaptive scheduling of inference pipelines on multicore architectures
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Soomro, Pirah Noor
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 01
              Type: published
              Y: 2025
          Identifiers:
            – Type: issn-locals
              Value: SWEPUB_FREE
            – Type: issn-locals
              Value: CTH_SWEPUB
ResultId 1