Adaptive scheduling of inference pipelines on multicore architectures
Uloženo v:
| Název: | Adaptive scheduling of inference pipelines on multicore architectures |
|---|---|
| Autoři: | Soomro, Pirah Noor, 1993 |
| Témata: | Online tuning, CNN parallel pipelines, Design space exploration, Interference Mitigation, Heterogeneous computing units, Processing on chiplets, Inference Serving Systems |
| Popis: | In today’s data-driven world, machine learning (ML) algorithms, particularly Convolutional Neural Networks (CNNs), play a pivotal role in powering a myriad of applications across various domains. As the demand for real-time inference continues to escalate, optimizing CNN inference across diverse computational platforms becomes imperative. This thesis addresses this challenge by exploring the complexities posed by heterogeneous edge devices, chiplet-based architectures, and inference-serving systems. Heterogeneous edge devices present unique challenges due to resource constraints and architectural diversity, while chiplet-based architectures offer potential enhancements in inference performance. Leveraging innovative techniques such as online tuning algorithms, malleable and moldable inference pipelines, and adaptive scheduling strategies, our thesis proposes a comprehensive framework for optimizing DNN inference. This framework aims to advance system performance, reduce latency, and mitigate interference effects, thereby contributing to the development of more efficient and scalable AI systems capable of meeting the evolving demands of real-time inference across diverse computational platforms. The thesis addresses several key problem statements, including enabling runtime scheduling of inference pipelines on edge devices, fully online scheduling of inference pipelines on heterogeneous platforms, mitigating interference effects on inference pipelines in inference-serving systems, and optimizing resource allocation in inference-serving systems for adaptive SLO-aware inference serving. The contributions of this thesis are encapsulated in four papers, each focusing on distinct aspects of CNN inference optimization. These contributions include the development of comprehensive frameworks for online scheduling of CNN pipelines, leveraging platform knowledge for expedited seed generation, dynamic scheduling techniques to alleviate interference effects, and SLO-aware scheduling techniques for optimizing resource allocation in inference-serving systems. Through these contributions, this thesis seeks to advance the state-of-the-art in CNN inference optimization and inference-serving systems, paving the way for more efficient and scalable AI systems capable of meeting the demands of real-time inference across diverse computational platforms. |
| Popis souboru: | electronic |
| Přístupová URL adresa: | https://research.chalmers.se/publication/547635 https://research.chalmers.se/publication/547635/file/547635_Fulltext.pdf |
| Databáze: | SwePub |
| FullText | Text: Availability: 0 CustomLinks: – Url: https://research.chalmers.se/publication/547635# Name: EDS - SwePub (s4221598) Category: fullText Text: View record in SwePub – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=Soomro%20PN Name: ISI Category: fullText Text: Nájsť tento článok vo Web of Science Icon: https://imagesrvr.epnet.com/ls/20docs.gif MouseOverText: Nájsť tento článok vo Web of Science |
|---|---|
| Header | DbId: edsswe DbLabel: SwePub An: edsswe.oai.research.chalmers.se.9dcfc6c3.8c1b.49ec.a5a2.694d4b341b28 RelevancyScore: 987 AccessLevel: 6 PubType: PubTypeId: unknown PreciseRelevancyScore: 986.736389160156 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: Adaptive scheduling of inference pipelines on multicore architectures – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Soomro%2C+Pirah+Noor%22">Soomro, Pirah Noor</searchLink>, 1993 – Name: Subject Label: Subject Terms Group: Su Data: <searchLink fieldCode="DE" term="%22Online+tuning%22">Online tuning</searchLink><br /><searchLink fieldCode="DE" term="%22CNN+parallel+pipelines%22">CNN parallel pipelines</searchLink><br /><searchLink fieldCode="DE" term="%22Design+space+exploration%22">Design space exploration</searchLink><br /><searchLink fieldCode="DE" term="%22Interference+Mitigation%22">Interference Mitigation</searchLink><br /><searchLink fieldCode="DE" term="%22Heterogeneous+computing+units%22">Heterogeneous computing units</searchLink><br /><searchLink fieldCode="DE" term="%22Processing+on+chiplets%22">Processing on chiplets</searchLink><br /><searchLink fieldCode="DE" term="%22Inference+Serving+Systems%22">Inference Serving Systems</searchLink> – Name: Abstract Label: Description Group: Ab Data: In today’s data-driven world, machine learning (ML) algorithms, particularly Convolutional Neural Networks (CNNs), play a pivotal role in powering a myriad of applications across various domains. As the demand for real-time inference continues to escalate, optimizing CNN inference across diverse computational platforms becomes imperative. This thesis addresses this challenge by exploring the complexities posed by heterogeneous edge devices, chiplet-based architectures, and inference-serving systems. Heterogeneous edge devices present unique challenges due to resource constraints and architectural diversity, while chiplet-based architectures offer potential enhancements in inference performance. Leveraging innovative techniques such as online tuning algorithms, malleable and moldable inference pipelines, and adaptive scheduling strategies, our thesis proposes a comprehensive framework for optimizing DNN inference. This framework aims to advance system performance, reduce latency, and mitigate interference effects, thereby contributing to the development of more efficient and scalable AI systems capable of meeting the evolving demands of real-time inference across diverse computational platforms. The thesis addresses several key problem statements, including enabling runtime scheduling of inference pipelines on edge devices, fully online scheduling of inference pipelines on heterogeneous platforms, mitigating interference effects on inference pipelines in inference-serving systems, and optimizing resource allocation in inference-serving systems for adaptive SLO-aware inference serving. The contributions of this thesis are encapsulated in four papers, each focusing on distinct aspects of CNN inference optimization. These contributions include the development of comprehensive frameworks for online scheduling of CNN pipelines, leveraging platform knowledge for expedited seed generation, dynamic scheduling techniques to alleviate interference effects, and SLO-aware scheduling techniques for optimizing resource allocation in inference-serving systems. Through these contributions, this thesis seeks to advance the state-of-the-art in CNN inference optimization and inference-serving systems, paving the way for more efficient and scalable AI systems capable of meeting the demands of real-time inference across diverse computational platforms. – Name: Format Label: File Description Group: SrcInfo Data: electronic – Name: URL Label: Access URL Group: URL Data: <link linkTarget="URL" linkTerm="https://research.chalmers.se/publication/547635" linkWindow="_blank">https://research.chalmers.se/publication/547635</link><br /><link linkTarget="URL" linkTerm="https://research.chalmers.se/publication/547635/file/547635_Fulltext.pdf" linkWindow="_blank">https://research.chalmers.se/publication/547635/file/547635_Fulltext.pdf</link> |
| PLink | https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsswe&AN=edsswe.oai.research.chalmers.se.9dcfc6c3.8c1b.49ec.a5a2.694d4b341b28 |
| RecordInfo | BibRecord: BibEntity: Languages: – Text: English Subjects: – SubjectFull: Online tuning Type: general – SubjectFull: CNN parallel pipelines Type: general – SubjectFull: Design space exploration Type: general – SubjectFull: Interference Mitigation Type: general – SubjectFull: Heterogeneous computing units Type: general – SubjectFull: Processing on chiplets Type: general – SubjectFull: Inference Serving Systems Type: general Titles: – TitleFull: Adaptive scheduling of inference pipelines on multicore architectures Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Soomro, Pirah Noor IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 01 Type: published Y: 2025 Identifiers: – Type: issn-locals Value: SWEPUB_FREE – Type: issn-locals Value: CTH_SWEPUB |
| ResultId | 1 |
Nájsť tento článok vo Web of Science