DNN Partitioning for Inference Throughput Acceleration at the Edge
Deep neural network (DNN) inference on streaming data requires computing resources to satisfy inference throughput requirements. However, latency and privacy sensitive deep learning applications cannot afford to offload computation to remote clouds because of the implied transmission cost and lack o...
Uloženo v:
| Vydáno v: | IEEE access Ročník 11; s. 52236 - 52249 |
|---|---|
| Hlavní autoři: | , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Piscataway
IEEE
01.01.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Témata: | |
| ISSN: | 2169-3536, 2169-3536 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Deep neural network (DNN) inference on streaming data requires computing resources to satisfy inference throughput requirements. However, latency and privacy sensitive deep learning applications cannot afford to offload computation to remote clouds because of the implied transmission cost and lack of trust in third-party cloud providers. Among solutions to increase performance while keeping computation on a constrained environment, hardware acceleration can be onerous, and model optimization requires extensive design efforts while hindering accuracy. DNN partitioning is a third complementary approach, and consists of distributing the inference workload over several available edge devices, taking into account the edge network properties and the DNN structure, with the objective of maximizing the inference throughput (number of inferences per second). This paper introduces a method to predict inference and transmission latencies for multi-threaded distributed DNN deployments, and defines an optimization process to maximize the inference throughput. A branch and bound solver is then presented and analyzed to quantify the achieved performance and complexity. This analysis has led to the definition of the acceleration region, which describes deterministic conditions on the DNN and network properties under which DNN partitioning is beneficial. Finally, experimental results confirm the simulations and show inference throughput improvements in sample edge deployments. |
|---|---|
| AbstractList | Deep neural network (DNN) inference on streaming data requires computing resources to satisfy inference throughput requirements. However, latency and privacy sensitive deep learning applications cannot afford to offload computation to remote clouds because of the implied transmission cost and lack of trust in third-party cloud providers. Among solutions to increase performance while keeping computation on a constrained environment, hardware acceleration can be onerous, and model optimization requires extensive design efforts while hindering accuracy. DNN partitioning is a third complementary approach, and consists of distributing the inference workload over several available edge devices, taking into account the edge network properties and the DNN structure, with the objective of maximizing the inference throughput (number of inferences per second). This paper introduces a method to predict inference and transmission latencies for multi-threaded distributed DNN deployments, and defines an optimization process to maximize the inference throughput. A branch and bound solver is then presented and analyzed to quantify the achieved performance and complexity. This analysis has led to the definition of the acceleration region, which describes deterministic conditions on the DNN and network properties under which DNN partitioning is beneficial. Finally, experimental results confirm the simulations and show inference throughput improvements in sample edge deployments. |
| Author | Clausen, Thomas H. Cordero-Fuertes, Juan-Antonio Feltin, Thomas Brockners, Frank Marcho, Leo |
| Author_xml | – sequence: 1 givenname: Thomas orcidid: 0000-0002-8708-7422 surname: Feltin fullname: Feltin, Thomas email: thomas.feltin@polytechnique.edu organization: École Polytechnique, Palaiseau, France – sequence: 2 givenname: Leo orcidid: 0000-0002-2625-2543 surname: Marcho fullname: Marcho, Leo organization: Cisco Systems, San Jose, CA, USA – sequence: 3 givenname: Juan-Antonio orcidid: 0000-0001-5771-3122 surname: Cordero-Fuertes fullname: Cordero-Fuertes, Juan-Antonio organization: École Polytechnique, Palaiseau, France – sequence: 4 givenname: Frank surname: Brockners fullname: Brockners, Frank organization: Cisco Systems, San Jose, CA, USA – sequence: 5 givenname: Thomas H. orcidid: 0000-0002-7400-8887 surname: Clausen fullname: Clausen, Thomas H. organization: École Polytechnique, Palaiseau, France |
| BackLink | https://polytechnique.hal.science/hal-04008199$$DView record in HAL |
| BookMark | eNp9UU1PGzEQtSoqFSi_oBxW4sQhqcdfuz6GNC2RIkCCni3HHicbLevgdSrx79llqQQ9dC4zenrvaWbeCTlqY4uEfAM6BaD6-2w-X9zfTxllfMqZEEKXn8gxA6UnXHJ19G7-Qs66bkf7qnpIlsfk6sfNTXFnU65zHdu63RQhpmLZBkzYOiwetikeNtv9IRcz57DBZAdiYXORt1gs_Aa_ks_BNh2evfVT8vvn4mF-PVnd_lrOZ6uJE4LniachMM4UBLCBc-FKp72iQENZgbeVLSUErz1WWijOsGISpFPCoyglo4qfkuXo66PdmX2qH216NtHW5hWIaWOGO1yDZh288uAZr3AtqAiW0lJzAMEVW2u-7r0uR6-tbT5YXc9WZsCoGH6k9R_ouRcjd5_i0wG7bHbxkNr-VMMqBiVIqYbt9MhyKXZdwmBcnV9_lZOtGwPUDGmZMS0zpGXe0uq1_B_t35X-rzofVTUivlNQwQSV_AVfsZ8e |
| CODEN | IAECCG |
| CitedBy_id | crossref_primary_10_1016_j_comnet_2025_111531 crossref_primary_10_1049_cmu2_70048 crossref_primary_10_1145_3630266 |
| Cites_doi | 10.1109/ICNN.1995.488968 10.1587/transfun.E100.A.2878 10.1145/3458864.3467882 10.7551/mitpress/1090.001.0001 10.2307/1910129 10.1109/INFOCOM41043.2020.9155237 10.1109/INFOCOM.2019.8737614 10.1109/TPDS.2021.3100298 10.1145/3267809.3267828 10.1145/3093337.3037698 10.1109/TMC.2021.3125949 10.1145/3229556.3229562 10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00226 10.1109/TSC.2021.3116597 10.1109/VLSI-DAT.2018.8373244 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023 Attribution |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023 – notice: Attribution |
| DBID | 97E ESBDL RIA RIE AAYXX CITATION 7SC 7SP 7SR 8BQ 8FD JG9 JQ2 L7M L~C L~D 1XC VOOES DOA |
| DOI | 10.1109/ACCESS.2023.3244497 |
| DatabaseName | IEEE Xplore (IEEE) IEEE Xplore Open Access Journals IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Xplore CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Engineered Materials Abstracts METADEX Technology Research Database Materials Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Hyper Article en Ligne (HAL) Hyper Article en Ligne (HAL) (Open Access) DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef Materials Research Database Engineered Materials Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace METADEX Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Materials Research Database |
| Database_xml | – sequence: 1 dbid: DOA name: Open Access资源_DOAJ url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: RIE name: IEEE Xplore url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Computer Science |
| EISSN | 2169-3536 |
| EndPage | 52249 |
| ExternalDocumentID | oai_doaj_org_article_bfd6d1d238eb404fa00793114362b93b oai:HAL:hal-04008199v1 10_1109_ACCESS_2023_3244497 10042405 |
| Genre | orig-research |
| GroupedDBID | 0R~ 4.4 5VS 6IK 97E AAJGR ABAZT ABVLG ACGFS ADBBV AGSQL ALMA_UNASSIGNED_HOLDINGS BCNDV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD ESBDL GROUPED_DOAJ IPLJI JAVBF KQ8 M43 M~E O9- OCL OK1 RIA RIE RNS AAYXX CITATION 7SC 7SP 7SR 8BQ 8FD JG9 JQ2 L7M L~C L~D 1XC VOOES |
| ID | FETCH-LOGICAL-c443t-d0ff23261f1af334c7c9d6010f781da8a751fd9de894632e82515c64de4752063 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 7 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001005654800001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2169-3536 |
| IngestDate | Fri Oct 03 12:43:20 EDT 2025 Tue Oct 14 20:40:42 EDT 2025 Sun Jun 29 12:45:48 EDT 2025 Sat Nov 29 04:02:25 EST 2025 Tue Nov 18 21:42:25 EST 2025 Wed Aug 27 02:25:52 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Language | English |
| License | https://creativecommons.org/licenses/by/4.0/legalcode Attribution: http://creativecommons.org/licenses/by |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c443t-d0ff23261f1af334c7c9d6010f781da8a751fd9de894632e82515c64de4752063 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0002-7400-8887 0000-0001-5771-3122 0000-0002-2625-2543 0000-0002-8708-7422 |
| OpenAccessLink | https://ieeexplore.ieee.org/document/10042405 |
| PQID | 2821715566 |
| PQPubID | 4845423 |
| PageCount | 14 |
| ParticipantIDs | proquest_journals_2821715566 doaj_primary_oai_doaj_org_article_bfd6d1d238eb404fa00793114362b93b crossref_citationtrail_10_1109_ACCESS_2023_3244497 ieee_primary_10042405 hal_primary_oai_HAL_hal_04008199v1 crossref_primary_10_1109_ACCESS_2023_3244497 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-01-01 |
| PublicationDateYYYYMMDD | 2023-01-01 |
| PublicationDate_xml | – month: 01 year: 2023 text: 2023-01-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | Piscataway |
| PublicationPlace_xml | – name: Piscataway |
| PublicationTitle | IEEE access |
| PublicationTitleAbbrev | Access |
| PublicationYear | 2023 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref13 ref12 ref23 ref15 ref14 ref11 gong (ref5) 2014; abs 1412 eberhart (ref22) 1995; 4 ref10 ref21 ref2 ref1 russell (ref17) 2020 dudziak (ref19) 2020 ref18 ref8 ivakhnenko (ref16) 1967; 8 ref9 hinton (ref6) 2015 ref3 redmon (ref20) 2016; abs 1612 wang (ref7) 2019 han (ref4) 2015 |
| References_xml | – year: 2015 ident: ref6 article-title: Distilling the knowledge in a neural network publication-title: ArXiv 1503 02531 – volume: 4 start-page: 1942 year: 1995 ident: ref22 article-title: Particle swarm optimization publication-title: Proc Int Conf Neural Netw doi: 10.1109/ICNN.1995.488968 – ident: ref1 doi: 10.1587/transfun.E100.A.2878 – ident: ref18 doi: 10.1145/3458864.3467882 – start-page: 1135 year: 2015 ident: ref4 article-title: Learning both weights and connections for efficient neural networks publication-title: Proc Adv Neural Inf Process Syst – year: 2020 ident: ref17 publication-title: Artificial Intelligence A Modern Approach – ident: ref21 doi: 10.7551/mitpress/1090.001.0001 – volume: 8 year: 1967 ident: ref16 publication-title: Cybernetics and Forecasting Techniques – ident: ref23 doi: 10.2307/1910129 – ident: ref11 doi: 10.1109/INFOCOM41043.2020.9155237 – ident: ref12 doi: 10.1109/INFOCOM.2019.8737614 – start-page: 10480 year: 2020 ident: ref19 article-title: BRP-NAS: Prediction-based NAS using GCNs publication-title: Proc Adv Neural Inf Process Syst – ident: ref9 doi: 10.1109/TPDS.2021.3100298 – ident: ref15 doi: 10.1145/3267809.3267828 – ident: ref8 doi: 10.1145/3093337.3037698 – year: 2019 ident: ref7 article-title: Convergence of edge computing and deep learning: A comprehensive survey publication-title: arXiv 1907 08349 – ident: ref13 doi: 10.1109/TMC.2021.3125949 – ident: ref14 doi: 10.1145/3229556.3229562 – ident: ref3 doi: 10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00226 – ident: ref10 doi: 10.1109/TSC.2021.3116597 – volume: abs 1612 start-page: 1 year: 2016 ident: ref20 article-title: YOLO9000: Better, faster, stronger publication-title: CoRR – ident: ref2 doi: 10.1109/VLSI-DAT.2018.8373244 – volume: abs 1412 start-page: 1 year: 2014 ident: ref5 article-title: Compressing deep convolutional networks using vector quantization publication-title: CoRR |
| SSID | ssj0000816957 |
| Score | 2.3545628 |
| Snippet | Deep neural network (DNN) inference on streaming data requires computing resources to satisfy inference throughput requirements. However, latency and privacy... |
| SourceID | doaj hal proquest crossref ieee |
| SourceType | Open Website Open Access Repository Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 52236 |
| SubjectTerms | Artificial intelligence Artificial neural networks Cloud computing Computation offloading Computational modeling Computer Science Deep learning Design optimization Distributed artificial intelligence Edge computing Inference Machine learning Neural networks Optimization Partitioning scheduling and task partitioning Throughput |
| SummonAdditionalLinks | – databaseName: DOAJ Directory of Open Access Journals dbid: DOA link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3PS8UwDA4iHvQg_sTpU4p4dLpu3boen09FQR4eVLyVLe1UkKfofH-_ybYnE0EvXks30iRLvnTtF4ADqbVTCeoQS1-GqoiTsMwNhl6btCA867Ah0r670uNxfn9vrnutvvhMWEsP3CruuKxc5qSjzOJLFamqiJjSjVA8Rd7SJCVHX0I9vWKqicG5zEyqO5ohGZnj4WhEKzribuFHBCKUYpqnXipqGPspwTzyecim0cqP6NyknPMVWO6wohi2Mq7CnJ-swVKPQXAdTk7HY3HNq-g2VgWBUHE5u8Unbto2PK8ftRgiUoppDS6KWhDyE2fuwW_A7fnZzegi7NoihKhUUocuqirCQZmsZFEliUKNxnFdVWkCn0Ve6FRWzjifG5UlsefLqSlmynml05ggySbMT14mfguE0rlJfYIRxqikoqm5QycL9KmnPK4CiGcasthxhnPrimfb1A6Rsa1aLavVdmoN4PDrodeWMuP36Ses-q-pzHfdDJAX2M4L7F9eEMA-Ge7bOy6GV5bHOEoR7DFTGcAG27UnFP_3jdIABjND2-4LfrdUikpNYCvLtv9DwB1Y5EW3mzcDmK_fPvwuLOC0fnp_22uc9xMCKOwt priority: 102 providerName: Directory of Open Access Journals |
| Title | DNN Partitioning for Inference Throughput Acceleration at the Edge |
| URI | https://ieeexplore.ieee.org/document/10042405 https://www.proquest.com/docview/2821715566 https://polytechnique.hal.science/hal-04008199 https://doaj.org/article/bfd6d1d238eb404fa00793114362b93b |
| Volume | 11 |
| WOSCitedRecordID | wos001005654800001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: Open Access资源_DOAJ customDbUrl: eissn: 2169-3536 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000816957 issn: 2169-3536 databaseCode: DOA dateStart: 20130101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2169-3536 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000816957 issn: 2169-3536 databaseCode: M~E dateStart: 20130101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3BTtwwELUo4tAe2lJADaXIqnpsljhx4vi4bBdRCVYcaMXNSsYTqFQtCLIc-fbOON7VVqiVeokiy45ij-15M_a8EeKzMsbrAkwKLbapbvIibWsLKRpbNoRnPQQi7R9nZjarr67sRQxWD7EwiBgun-GIX8NZvr-FBbvKjlQ4qGPG0hfGmCFYa-VQ4QwStjSRWUhl9mg8mVAnRpwgfES4QWtmdlrTPoGkn3TKDV-BDLlVnm3IQcucvPnP_3srXkc4KceD_LfFBs7fiVdrJIM74vjrbCYveIZE36sknCq_LQP95OWQqedu0csxAGmhYU7IppcEDuXUX-Ou-H4yvZycpjFzQgpaF33qs64jqFSpTjVdUWgwYD2bXp0hfNrUjSlV563H2uqqyJHjV0uotEdtypxQy57YnN_O8b2Q2tS2xAIyyEErTVVrD141gCWSqteJyJcj6iDSinN2i18umBeZdYMYHIvBRTEk4suq0d3AqvHv6scsqlVVpsQOBTT6Lq4w13a-8soTBMFWZ7prMub-I3OPVHRrizYRn0jQf3zjdHzmuIw3MkJG9lElYpfFuvZTg0QTcbCcGC4u8gdH1qoyhMeqav8vzT6Il9yPwWVzIDb7-wV-FFvw2P98uD8M9j89z5-mh2Eu_wbgCOw7 |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fb9MwELbQQAIe-DlEYICFeCRdnFzi-LErmzpRoj0UtDcrOTsDCXXTlu7v585xqyIEEm-RZUe27-z7fPZ9J8QHpbWDAnWKne9SaPMi7WqDqdembAnPOgxE2t8Wumnq83NzFoPVQyyM9z48PvMT_gx3-e4S1-wqO1Thoo4ZS--WALkaw7W2LhXOIWFKHbmFVGYOp7MZDWPCKcInhBwAmNtpx_4Emn6yKt_5EWTIrvLHlhzszMnj_-zhE_EoAko5HTXgqbjjV8_Ewx2awefi6FPTyDPWkeh9lYRU5ekm1E8ux1w9V-tBThHJDo1aIdtBEjyUx-7C74uvJ8fL2TyNuRNSBCiG1GV9T2CpUr1q-6IA1GgcH756TQi1rVtdqt4Z52sDVZF7jmAtsQLnQZc54ZYXYm91ufIvhQRdm9IXmGGOoICq1g6datGXnow9JCLfzKjFSCzO-S1-2nDAyIwdxWBZDDaKIREft42uRl6Nf1c_YlFtqzIpdiig2bdxjdmud5VTjkCI7yCDvs2Y_Y8OfGSkO1N0iXhPgv7tH_PpwnIZb2WEjcytSsQ-i3WnU6NEE3GwUQwbl_mNpfOq0oTIqurVX5q9E_fnyy8LuzhtPr8WD3hMowPnQOwN12v_RtzD2-HHzfXboMu_AC8f7Vw |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=DNN+Partitioning+for+Inference+Throughput+Acceleration+at+the+Edge&rft.jtitle=IEEE+access&rft.au=Feltin%2C+Thomas&rft.au=Marcho%2C+Leo&rft.au=Cordero-Fuertes%2C+Juan-Antonio&rft.au=Brockners%2C+Frank&rft.date=2023-01-01&rft.pub=IEEE&rft.eissn=2169-3536&rft.volume=11&rft.spage=52236&rft.epage=52249&rft_id=info:doi/10.1109%2FACCESS.2023.3244497&rft.externalDocID=10042405 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2169-3536&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2169-3536&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2169-3536&client=summon |