High-Speed Data Communication With Advanced Networks in Large Language Model Training
Large language models (LLMs) like Generative Pre-trained Transformer, Bidirectional Encoder Representations from Transformers, and T5 are pivotal in natural language processing. Their distributed training is influenced by high-speed interconnects. This article characterizes their training performanc...
Gespeichert in:
| Veröffentlicht in: | IEEE MICRO Jg. 44; H. 2; S. 31 - 40 |
|---|---|
| Hauptverfasser: | , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Los Alamitos
IEEE
01.03.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Schlagworte: | |
| ISSN: | 0272-1732, 1937-4143 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Large language models (LLMs) like Generative Pre-trained Transformer, Bidirectional Encoder Representations from Transformers, and T5 are pivotal in natural language processing. Their distributed training is influenced by high-speed interconnects. This article characterizes their training performance across various interconnects and communication protocols: TCP/IP, Internet Protocol over InfiniBand, (IPoIB), and Remote Direct Memory Access (RDMA), using data and model parallelism. RDMA-100 Gbps outperforms IPoIB-100 Gbps and TCP/IP-10 Gbps, with average gains of 2.5x and 4.8x in data parallelism, while in model parallelism, the gains were 1.1x and 1.2x. RDMA achieves the highest interconnect utilization (up to 60 Gbps), compared to IPoIB with up to 20 Gbps and TCP/IP with up to 9 Gbps. Larger models demand increased communication bandwidth, with AllReduce in data parallelism consuming up to 91% of training time, and forward receive and back-embedding AllReduce in model parallelism taking up to 90%. The larger-scale experiment confirms that communication predominates iterations. Our findings underscore the significance of communication in distributed LLM training and present opportunities for optimization. |
|---|---|
| AbstractList | Large language models (LLMs) like Generative Pre-trained Transformer, Bidirectional Encoder Representations from Transformers, and T5 are pivotal in natural language processing. Their distributed training is influenced by high-speed interconnects. This article characterizes their training performance across various interconnects and communication protocols: TCP/IP, Internet Protocol over InfiniBand, (IPoIB), and Remote Direct Memory Access (RDMA), using data and model parallelism. RDMA-100 Gbps outperforms IPoIB-100 Gbps and TCP/IP-10 Gbps, with average gains of 2.5x and 4.8x in data parallelism, while in model parallelism, the gains were 1.1x and 1.2x. RDMA achieves the highest interconnect utilization (up to 60 Gbps), compared to IPoIB with up to 20 Gbps and TCP/IP with up to 9 Gbps. Larger models demand increased communication bandwidth, with AllReduce in data parallelism consuming up to 91% of training time, and forward receive and back-embedding AllReduce in model parallelism taking up to 90%. The larger-scale experiment confirms that communication predominates iterations. Our findings underscore the significance of communication in distributed LLM training and present opportunities for optimization. |
| Author | Dai, Liuyao Lu, Xiaoyi Qi, Hao Chen, Weicong |
| Author_xml | – sequence: 1 givenname: Liuyao orcidid: 0000-0002-0907-6920 surname: Dai fullname: Dai, Liuyao email: ldai8@ucmerced.edu organization: University of California Merced, Merced, CA, 95343, USA – sequence: 2 givenname: Hao surname: Qi fullname: Qi, Hao email: hqi6@ucmerced.edu organization: University of California Merced, Merced, CA, 95343, USA – sequence: 3 givenname: Weicong orcidid: 0000-0003-0573-8808 surname: Chen fullname: Chen, Weicong email: wchen97@ucmerced.edu organization: University of California Merced, Merced, CA, 95343, USA – sequence: 4 givenname: Xiaoyi orcidid: 0000-0001-7581-8905 surname: Lu fullname: Lu, Xiaoyi email: xiaoyi.lu@ucmerced.edu organization: University of California Merced, Merced, CA, 95343, USA |
| BookMark | eNpNkEtPAjEQgBuDiYCevXjYxPNCX3SXI8EHJqwehHhsuu3sUoQWu7sa_70lcDCTzMzhm5nMN0A95x0gdEvwiBA8HRfFiGLKR4wJjHNygfpkyrKUE856qI9pRlOSMXqFBk2zxRhPKM77aL2w9SZ9PwCY5EG1Kpn7_b5zVqvWepd82HaTzMy3cjoCr9D--PDZJNYlSxVqiNnVnYpN4Q3sklVQ1llXX6PLSu0auDnXIVo_Pa7mi3T59vwyny1TTTlvU11pofOsZEIA01xxOi1LAzTLmSgNUQB6UhoVA1clyRUYroVRIFSeQ8aADdH9ae8h-K8OmlZufRdcPCkZjiLiiyyP1PhE6eCbJkAlD8HuVfiVBMujO1kU8uhOnt3FibvThAWAfzQnGRZT9gcYEG00 |
| CODEN | IEMIDZ |
| Cites_doi | 10.1145/3458817.3476209 10.1145/3065386 10.1109/tcom.1974.1092259 10.1109/hoti59126.2023.00022 10.17487/rfc4392 10.17487/rfc5040 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024 |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| DOI | 10.1109/MM.2024.3360081 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE/IET Electronic Library (IEL) (UW System Shared) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library (IEL) (UW System Shared) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1937-4143 |
| EndPage | 40 |
| ExternalDocumentID | 10_1109_MM_2024_3360081 10417069 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: MRI grantid: 2019144 funderid: 10.13039/100011612 – fundername: Office of Advanced Cyberinfrastructure grantid: 2321123; 2340982 funderid: 10.13039/100000105 |
| GroupedDBID | -DZ -~X .DC 0R~ 29I 3EH 4.4 5GY 5VS 6IK 97E AAFWJ AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACGOD ACIWK ACNCT AENEX AETEA AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV AZLTO BEFXN BFFAM BGNUA BKEBE BKOMP BPEOZ C1A CS3 DU5 EBS EJD HZ~ H~9 IBMZZ ICLAB IEDLZ IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ O9- OCL OHT P2P PQQKQ RIA RIE RNI RNS RZB TAE TN5 TWZ VH1 YZZ ZCG AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c244t-cfc6c87b366e3c4a429bbde27836bd1aeec5bdadad0fb18aed4c6dae6a88e73e3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 3 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001198266500002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0272-1732 |
| IngestDate | Mon Jun 30 07:18:52 EDT 2025 Sat Nov 29 06:18:43 EST 2025 Wed Aug 27 02:17:07 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 2 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c244t-cfc6c87b366e3c4a429bbde27836bd1aeec5bdadad0fb18aed4c6dae6a88e73e3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0001-7581-8905 0000-0003-0573-8808 0000-0002-0907-6920 |
| PQID | 3033620838 |
| PQPubID | 37061 |
| PageCount | 10 |
| ParticipantIDs | crossref_primary_10_1109_MM_2024_3360081 ieee_primary_10417069 proquest_journals_3033620838 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-03-01 |
| PublicationDateYYYYMMDD | 2024-03-01 |
| PublicationDate_xml | – month: 03 year: 2024 text: 2024-03-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | Los Alamitos |
| PublicationPlace_xml | – name: Los Alamitos |
| PublicationTitle | IEEE MICRO |
| PublicationTitleAbbrev | MM |
| PublicationYear | 2024 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref8 Huang (ref9) 2019 ref12 Radford (ref5) 2019; 1 ref4 ref3 Dean (ref7) 2012; 25 ref11 ref10 ref2 Devlin (ref1) 2018 Raffel (ref6) 2024; 21 |
| References_xml | – volume: 25 start-page: 1223 year: 2012 ident: ref7 article-title: Large scale distributed deep networks publication-title: ,” in Proc. Adv. Neural Inf. Process. Syst. – volume-title: GPT-4 ident: ref2 – ident: ref3 doi: 10.1145/3458817.3476209 – ident: ref8 doi: 10.1145/3065386 – ident: ref10 doi: 10.1109/tcom.1974.1092259 – volume: 1 start-page: 9 issue: 8 year: 2019 ident: ref5 article-title: Language models are unsupervised multitask learners publication-title: OpenAI Blog – volume-title: GPipe: Efficient Training of Giant Neural Networks Using Pipeline Parallelism year: 2019 ident: ref9 – volume: 21 start-page: 5485 issue: 1 year: 2024 ident: ref6 article-title: Exploring the limits of transfer learning with a unified text-to-text transformer publication-title: J. Mach. Learn. Res. – ident: ref4 doi: 10.1109/hoti59126.2023.00022 – ident: ref11 doi: 10.17487/rfc4392 – ident: ref12 doi: 10.17487/rfc5040 – year: 2018 ident: ref1 article-title: BERT: Pre-training of deep bidirectional transformers for language understanding |
| SSID | ssj0005208 |
| Score | 2.4111812 |
| Snippet | Large language models (LLMs) like Generative Pre-trained Transformer, Bidirectional Encoder Representations from Transformers, and T5 are pivotal in natural... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Index Database Publisher |
| StartPage | 31 |
| SubjectTerms | Communication Computational modeling Data communication Data models Decoding High speed Interconnections IP (Internet Protocol) Large language models Natural language processing Parallel processing Synchronization TCP/IP (protocol) TCPIP Training Transformers |
| Title | High-Speed Data Communication With Advanced Networks in Large Language Model Training |
| URI | https://ieeexplore.ieee.org/document/10417069 https://www.proquest.com/docview/3033620838 |
| Volume | 44 |
| WOSCitedRecordID | wos001198266500002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE/IET Electronic Library (IEL) (UW System Shared) customDbUrl: eissn: 1937-4143 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0005208 issn: 0272-1732 databaseCode: RIE dateStart: 19810101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFA46PHhx_pg4nZKDBy-ZbdMm6XGow8M6BDfcraRJigPpxtb59_uSZrAhHqQQekhLea_J9yX53nsI3QNpU5qVlGheBCSOtSAyUBIaJpJEBSoJXKDwiI_HYjZL33ywuouFMcY48Znp21t3lq8XamO3ymCExzbbS3qIDjlnTbDWrp7DTbsRj0jIaeTz-IRB-phlsBCM4j6lzELgHgS5miq_JmKHLsP2P7_rFJ14GokHjd_P0IGpzlF7W6IB-xF7gaZWx0Hel4BR-FnWEu8FhOCPef2JB14GgMeNJHyN5xUeWYU4tM1uJrYl077wxNeT6KDp8GXy9Ep8JQWiAL5rokrFlOAFZcxQFUsAoaLQxlbZYIUOpTEqKbSEKyiLUEijY8W0NEwKYTg19BK1qkVlrhBmJTAaIGVMuTNWDfTKgF1tmnbFgH120cPWuPmySZiRu4VGkOZZlls_5N4PXdSxttzp1pixi3pbb-R-RK1zgFrAWiCM4vqPx27QsX17IxDroVa92phbdKS-6_l6ded-lh-8BLz6 |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFH7oFPTi_DFxOjUHD146-zNtj0MdE9siuOFuJU1SHMg2ts6_35c0gw3xIIXQQ0rLe02-L8n33gO4Q9LGBS09S4SFbfm-iCxmc4YNjYKA2zywdaBwEmZZNB7HbyZYXcfCSCm1-Ex21a0-yxczvlJbZTjCfZXtJd6FvcDHhU8drrWp6NATrxu6lhN6rsnk49jxQ5riUtD1u55HFQhugZCuqvJrKtb40m_-88uO4cgQSdKrPX8CO3J6Cs11kQZixuwZjJSSw3qfI0qRJ1YxshUSQj4m1SfpGSEAyWpR-JJMpiRRGnFs6_1MooqmfZGhqSjRglH_efg4sEwtBYsjgFcWLznlUVh4lEqP-wxhqCiEVHU2aCEcJiUPCsHwssvCiZgUPqeCScqiSIae9M6hMZ1N5QUQWiKnQVpGuT5lFUiwJNpVJWrnFPlnG-7Xxs3ndcqMXC817DhP01z5ITd-aENL2XKjW23GNnTW3sjNmFrmCLaItkgZo8s_HruFg8EwTfLkJXu9gkP1plou1oFGtVjJa9jn39VkubjRP84PvZvAQQ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=High-Speed+Data+Communication+With+Advanced+Networks+in+Large+Language+Model+Training&rft.jtitle=IEEE+MICRO&rft.au=Dai%2C+Liuyao&rft.au=Qi%2C+Hao&rft.au=Chen%2C+Weicong&rft.au=Lu%2C+Xiaoyi&rft.date=2024-03-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=0272-1732&rft.eissn=1937-4143&rft.volume=44&rft.issue=2&rft.spage=31&rft_id=info:doi/10.1109%2FMM.2024.3360081&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0272-1732&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0272-1732&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0272-1732&client=summon |