DNN Partitioning for Inference Throughput Acceleration at the Edge

Deep neural network (DNN) inference on streaming data requires computing resources to satisfy inference throughput requirements. However, latency and privacy sensitive deep learning applications cannot afford to offload computation to remote clouds because of the implied transmission cost and lack o...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access Vol. 11; pp. 52236 - 52249
Main Authors: Feltin, Thomas, Marcho, Leo, Cordero-Fuertes, Juan-Antonio, Brockners, Frank, Clausen, Thomas H.
Format: Journal Article
Language:English
Published: Piscataway IEEE 01.01.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:2169-3536, 2169-3536
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Deep neural network (DNN) inference on streaming data requires computing resources to satisfy inference throughput requirements. However, latency and privacy sensitive deep learning applications cannot afford to offload computation to remote clouds because of the implied transmission cost and lack of trust in third-party cloud providers. Among solutions to increase performance while keeping computation on a constrained environment, hardware acceleration can be onerous, and model optimization requires extensive design efforts while hindering accuracy. DNN partitioning is a third complementary approach, and consists of distributing the inference workload over several available edge devices, taking into account the edge network properties and the DNN structure, with the objective of maximizing the inference throughput (number of inferences per second). This paper introduces a method to predict inference and transmission latencies for multi-threaded distributed DNN deployments, and defines an optimization process to maximize the inference throughput. A branch and bound solver is then presented and analyzed to quantify the achieved performance and complexity. This analysis has led to the definition of the acceleration region, which describes deterministic conditions on the DNN and network properties under which DNN partitioning is beneficial. Finally, experimental results confirm the simulations and show inference throughput improvements in sample edge deployments.
AbstractList Deep neural network (DNN) inference on streaming data requires computing resources to satisfy inference throughput requirements. However, latency and privacy sensitive deep learning applications cannot afford to offload computation to remote clouds because of the implied transmission cost and lack of trust in third-party cloud providers. Among solutions to increase performance while keeping computation on a constrained environment, hardware acceleration can be onerous, and model optimization requires extensive design efforts while hindering accuracy. DNN partitioning is a third complementary approach, and consists of distributing the inference workload over several available edge devices, taking into account the edge network properties and the DNN structure, with the objective of maximizing the inference throughput (number of inferences per second). This paper introduces a method to predict inference and transmission latencies for multi-threaded distributed DNN deployments, and defines an optimization process to maximize the inference throughput. A branch and bound solver is then presented and analyzed to quantify the achieved performance and complexity. This analysis has led to the definition of the acceleration region, which describes deterministic conditions on the DNN and network properties under which DNN partitioning is beneficial. Finally, experimental results confirm the simulations and show inference throughput improvements in sample edge deployments.
Author Clausen, Thomas H.
Cordero-Fuertes, Juan-Antonio
Feltin, Thomas
Brockners, Frank
Marcho, Leo
Author_xml – sequence: 1
  givenname: Thomas
  orcidid: 0000-0002-8708-7422
  surname: Feltin
  fullname: Feltin, Thomas
  email: thomas.feltin@polytechnique.edu
  organization: École Polytechnique, Palaiseau, France
– sequence: 2
  givenname: Leo
  orcidid: 0000-0002-2625-2543
  surname: Marcho
  fullname: Marcho, Leo
  organization: Cisco Systems, San Jose, CA, USA
– sequence: 3
  givenname: Juan-Antonio
  orcidid: 0000-0001-5771-3122
  surname: Cordero-Fuertes
  fullname: Cordero-Fuertes, Juan-Antonio
  organization: École Polytechnique, Palaiseau, France
– sequence: 4
  givenname: Frank
  surname: Brockners
  fullname: Brockners, Frank
  organization: Cisco Systems, San Jose, CA, USA
– sequence: 5
  givenname: Thomas H.
  orcidid: 0000-0002-7400-8887
  surname: Clausen
  fullname: Clausen, Thomas H.
  organization: École Polytechnique, Palaiseau, France
BackLink https://polytechnique.hal.science/hal-04008199$$DView record in HAL
BookMark eNp9UU1PGzEQtSoqFSi_oBxW4sQhqcdfuz6GNC2RIkCCni3HHicbLevgdSrx79llqQQ9dC4zenrvaWbeCTlqY4uEfAM6BaD6-2w-X9zfTxllfMqZEEKXn8gxA6UnXHJ19G7-Qs66bkf7qnpIlsfk6sfNTXFnU65zHdu63RQhpmLZBkzYOiwetikeNtv9IRcz57DBZAdiYXORt1gs_Aa_ks_BNh2evfVT8vvn4mF-PVnd_lrOZ6uJE4LniachMM4UBLCBc-FKp72iQENZgbeVLSUErz1WWijOsGISpFPCoyglo4qfkuXo66PdmX2qH216NtHW5hWIaWOGO1yDZh288uAZr3AtqAiW0lJzAMEVW2u-7r0uR6-tbT5YXc9WZsCoGH6k9R_ouRcjd5_i0wG7bHbxkNr-VMMqBiVIqYbt9MhyKXZdwmBcnV9_lZOtGwPUDGmZMS0zpGXe0uq1_B_t35X-rzofVTUivlNQwQSV_AVfsZ8e
CODEN IAECCG
CitedBy_id crossref_primary_10_1016_j_comnet_2025_111531
crossref_primary_10_1049_cmu2_70048
crossref_primary_10_1145_3630266
Cites_doi 10.1109/ICNN.1995.488968
10.1587/transfun.E100.A.2878
10.1145/3458864.3467882
10.7551/mitpress/1090.001.0001
10.2307/1910129
10.1109/INFOCOM41043.2020.9155237
10.1109/INFOCOM.2019.8737614
10.1109/TPDS.2021.3100298
10.1145/3267809.3267828
10.1145/3093337.3037698
10.1109/TMC.2021.3125949
10.1145/3229556.3229562
10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00226
10.1109/TSC.2021.3116597
10.1109/VLSI-DAT.2018.8373244
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023
Attribution
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023
– notice: Attribution
DBID 97E
ESBDL
RIA
RIE
AAYXX
CITATION
7SC
7SP
7SR
8BQ
8FD
JG9
JQ2
L7M
L~C
L~D
1XC
VOOES
DOA
DOI 10.1109/ACCESS.2023.3244497
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE Xplore : Open Access Journals and Conferences [open access]
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Engineered Materials Abstracts
METADEX
Technology Research Database
Materials Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Hyper Article en Ligne (HAL)
Hyper Article en Ligne (HAL) (Open Access)
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
Materials Research Database
Engineered Materials Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
METADEX
Computer and Information Systems Abstracts Professional
DatabaseTitleList


Materials Research Database
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 2169-3536
EndPage 52249
ExternalDocumentID oai_doaj_org_article_bfd6d1d238eb404fa00793114362b93b
oai:HAL:hal-04008199v1
10_1109_ACCESS_2023_3244497
10042405
Genre orig-research
GroupedDBID 0R~
4.4
5VS
6IK
97E
AAJGR
ABAZT
ABVLG
ACGFS
ADBBV
AGSQL
ALMA_UNASSIGNED_HOLDINGS
BCNDV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
ESBDL
GROUPED_DOAJ
IPLJI
JAVBF
KQ8
M43
M~E
O9-
OCL
OK1
RIA
RIE
RNS
AAYXX
CITATION
7SC
7SP
7SR
8BQ
8FD
JG9
JQ2
L7M
L~C
L~D
1XC
VOOES
ID FETCH-LOGICAL-c443t-d0ff23261f1af334c7c9d6010f781da8a751fd9de894632e82515c64de4752063
IEDL.DBID DOA
ISICitedReferencesCount 7
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001005654800001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2169-3536
IngestDate Fri Oct 03 12:43:20 EDT 2025
Tue Oct 14 20:40:42 EDT 2025
Sun Jun 29 12:45:48 EDT 2025
Sat Nov 29 04:02:25 EST 2025
Tue Nov 18 21:42:25 EST 2025
Wed Aug 27 02:25:52 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
License https://creativecommons.org/licenses/by/4.0/legalcode
Attribution: http://creativecommons.org/licenses/by
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c443t-d0ff23261f1af334c7c9d6010f781da8a751fd9de894632e82515c64de4752063
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-7400-8887
0000-0001-5771-3122
0000-0002-2625-2543
0000-0002-8708-7422
OpenAccessLink https://doaj.org/article/bfd6d1d238eb404fa00793114362b93b
PQID 2821715566
PQPubID 4845423
PageCount 14
ParticipantIDs proquest_journals_2821715566
doaj_primary_oai_doaj_org_article_bfd6d1d238eb404fa00793114362b93b
crossref_citationtrail_10_1109_ACCESS_2023_3244497
ieee_primary_10042405
hal_primary_oai_HAL_hal_04008199v1
crossref_primary_10_1109_ACCESS_2023_3244497
PublicationCentury 2000
PublicationDate 2023-01-01
PublicationDateYYYYMMDD 2023-01-01
PublicationDate_xml – month: 01
  year: 2023
  text: 2023-01-01
  day: 01
PublicationDecade 2020
PublicationPlace Piscataway
PublicationPlace_xml – name: Piscataway
PublicationTitle IEEE access
PublicationTitleAbbrev Access
PublicationYear 2023
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
ref23
ref15
ref14
ref11
gong (ref5) 2014; abs 1412
eberhart (ref22) 1995; 4
ref10
ref21
ref2
ref1
russell (ref17) 2020
dudziak (ref19) 2020
ref18
ref8
ivakhnenko (ref16) 1967; 8
ref9
hinton (ref6) 2015
ref3
redmon (ref20) 2016; abs 1612
wang (ref7) 2019
han (ref4) 2015
References_xml – year: 2015
  ident: ref6
  article-title: Distilling the knowledge in a neural network
  publication-title: ArXiv 1503 02531
– volume: 4
  start-page: 1942
  year: 1995
  ident: ref22
  article-title: Particle swarm optimization
  publication-title: Proc Int Conf Neural Netw
  doi: 10.1109/ICNN.1995.488968
– ident: ref1
  doi: 10.1587/transfun.E100.A.2878
– ident: ref18
  doi: 10.1145/3458864.3467882
– start-page: 1135
  year: 2015
  ident: ref4
  article-title: Learning both weights and connections for efficient neural networks
  publication-title: Proc Adv Neural Inf Process Syst
– year: 2020
  ident: ref17
  publication-title: Artificial Intelligence A Modern Approach
– ident: ref21
  doi: 10.7551/mitpress/1090.001.0001
– volume: 8
  year: 1967
  ident: ref16
  publication-title: Cybernetics and Forecasting Techniques
– ident: ref23
  doi: 10.2307/1910129
– ident: ref11
  doi: 10.1109/INFOCOM41043.2020.9155237
– ident: ref12
  doi: 10.1109/INFOCOM.2019.8737614
– start-page: 10480
  year: 2020
  ident: ref19
  article-title: BRP-NAS: Prediction-based NAS using GCNs
  publication-title: Proc Adv Neural Inf Process Syst
– ident: ref9
  doi: 10.1109/TPDS.2021.3100298
– ident: ref15
  doi: 10.1145/3267809.3267828
– ident: ref8
  doi: 10.1145/3093337.3037698
– year: 2019
  ident: ref7
  article-title: Convergence of edge computing and deep learning: A comprehensive survey
  publication-title: arXiv 1907 08349
– ident: ref13
  doi: 10.1109/TMC.2021.3125949
– ident: ref14
  doi: 10.1145/3229556.3229562
– ident: ref3
  doi: 10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00226
– ident: ref10
  doi: 10.1109/TSC.2021.3116597
– volume: abs 1612
  start-page: 1
  year: 2016
  ident: ref20
  article-title: YOLO9000: Better, faster, stronger
  publication-title: CoRR
– ident: ref2
  doi: 10.1109/VLSI-DAT.2018.8373244
– volume: abs 1412
  start-page: 1
  year: 2014
  ident: ref5
  article-title: Compressing deep convolutional networks using vector quantization
  publication-title: CoRR
SSID ssj0000816957
Score 2.3545628
Snippet Deep neural network (DNN) inference on streaming data requires computing resources to satisfy inference throughput requirements. However, latency and privacy...
SourceID doaj
hal
proquest
crossref
ieee
SourceType Open Website
Open Access Repository
Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 52236
SubjectTerms Artificial intelligence
Artificial neural networks
Cloud computing
Computation offloading
Computational modeling
Computer Science
Deep learning
Design optimization
Distributed artificial intelligence
Edge computing
Inference
Machine learning
Neural networks
Optimization
Partitioning
scheduling and task partitioning
Throughput
SummonAdditionalLinks – databaseName: IEEE Electronic Library (IEL)
  dbid: RIE
  link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8QwEA4qHvTgW1xfBPFo12ab5nFcV0VBFg8K3kKbhwqyinb9_c6k2WVFFLyV0pQ0X9L5ZpL5hpBjrsGoWckzy0OZcc3qTKlQZbYEsHnhVOki0jdyOFQPD_o2JavHXBjvfTx85rt4Gffy3asdY6jslMWNOlQsnZdStsla04AKVpDQpUzKQizXp_3BAD6iiwXCu8AbOEdlpxnrE0X6waY84RHIWFvlxw85WpnL1X_2b42sJDpJ-y3-62TOjzbI8ozI4CY5Ox8O6S3OkBR7pcBT6fUk0Y_etZV63sYN7VsLVqidE7RqKJBDeuEe_Ra5v7y4G1xlqXICDDkvmszlIQBVEiywKhQFt9Jqh65XkMBPK1XJkgWnnVeai6LnMX-1tII7z2XZA9ayTRZGryO_Q2gptAu6UtwrIB89UYGHJeFa5YUtvKg7pDcZUWOTrDhWt3gx0b3ItWlhMAiDSTB0yMm00VurqvH342cI1fRRlMSON2D0TVphpg5OOOaAgvia5zxUOWr_gbsHJrrWBXT0CID-9o6r_o3Be_gjA2akP1mHbCGsM51qEe2Q_cnEMGmRfxjwVpkEPibE7i_N9sgSfkcbstknC8372B-QRfvZPH-8H8b5-wUybOj1
  priority: 102
  providerName: IEEE
Title DNN Partitioning for Inference Throughput Acceleration at the Edge
URI https://ieeexplore.ieee.org/document/10042405
https://www.proquest.com/docview/2821715566
https://polytechnique.hal.science/hal-04008199
https://doaj.org/article/bfd6d1d238eb404fa00793114362b93b
Volume 11
WOSCitedRecordID wos001005654800001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 2169-3536
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000816957
  issn: 2169-3536
  databaseCode: DOA
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2169-3536
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000816957
  issn: 2169-3536
  databaseCode: M~E
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3NS91AEB9EetBDaa3FqJVFemw0-7Kfx-frEwV9eLDgbUn2oy2Up2j02L-9M5s8SSnUi5cQlk3YnZnM_Gaz-xuAz8JiUPNalF4kWQrL29KY1JReorJFHYwMWdMXerEwNzf2alTqi_aE9fTAveCO2xRU4AEjS2xFJVJTEaUbonj0vK2tW_K-iHpGyVT2wYYrK_VAM8QrezydzXBGR1Qt_AhBhBBE8zQKRZmxHwPMD9oPmQut_OOdc8g5fQdvB6zIpv0Y38NaXG7B5ohB8AOcfF0s2BXNYlhYZQhC2fnqFB-77svw3D12bOo9hphe4azpGCI_Ng_f4zZ8O51fz87KoSwCylPUXRmqlBAHKZ54k-paeO1toLwqaQSfjWm05CnYEI0Vqp5EOpwqvRIhCi0nCEk-wvrydhl3gEllQ7KNEdEgspioBtMnjfemqn0dVVvAZCUh5wfOcCpd8cvl3KGyrherI7G6QawFfHl-6K6nzPh_9xMS_XNX4rvODWgFbrAC95IVFHCIivvrHWfTC0dt5KUQ9tgnXsA26XU0KPrvW8kC9leKdsMX_OAwFeUawZZSu68xwD3YoEn3izf7sN7dP8ZP8MY_dT8f7g-y8eL18vf8IB9B_ANQO-0_
linkProvider Directory of Open Access Journals
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwELZQiwQ98CxiaQsW4ki2ceLncbtttRVL1MMi9WYlfgAS2lZttr-_M453tQiBxC2Kksj2N_F8Y3u-IeQTN-DUnOKF41EU3LCu0Dq2hRMANq-9Fj4hPVdNo6-uzGVOVk-5MCGEdPgsjPEy7eX7a7fCpbJjljbqULF0V3BesSFda7OkgjUkjFBZW4iV5ngynUI3xlgifAzMgXPUdtryP0mmH7zKDzwEmaqr_DElJz9z_vw_W_iCPMuEkk4GC3hJHoXlK7K3JTP4mpycNg29RBvJq68UmCq9WKf60cVQq-dm1dOJc-CHBqugbU-BHtIz_z3sk2_nZ4vprMi1E2DQed0XvowRyJJkkbWxrrlTzngMvqIChtrqVgkWvfFBGy7rKmAGq3CS-8CVqIC3vCE7y-tleEuokMZH02oeNNCPSrYQYym41mXt6iC7EanWI2pdFhbH-ha_bAowSmMHGCzCYDMMI_J589LNoKvx78dPEKrNoyiKnW7A6Nv8j9kueumZBxISOl7y2Jao_gcBHzjpztTQ0I8A9G_fmE3mFu_hVAbcyNyzEdlHWLcaNSA6Iodrw7D5N7-zEK8yBYxMynd_ee0DeTJbfJ3b-UXz5YA8xT4NCziHZKe_XYUj8tjd9z_vbt8nW34AeC7sPA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=DNN+Partitioning+for+Inference+Throughput+Acceleration+at+the+Edge&rft.jtitle=IEEE+access&rft.au=Thomas+Feltin&rft.au=Leo+Marcho&rft.au=Juan-Antonio+Cordero-Fuertes&rft.au=Frank+Brockners&rft.date=2023-01-01&rft.pub=IEEE&rft.eissn=2169-3536&rft.volume=11&rft.spage=52236&rft.epage=52249&rft_id=info:doi/10.1109%2FACCESS.2023.3244497&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_bfd6d1d238eb404fa00793114362b93b
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2169-3536&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2169-3536&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2169-3536&client=summon