Configurable-ECC: Architecting a Flexible ECC Scheme to Support Different Sized Accesses in High Bandwidth Memory Systems
Designing error correction code (ECC) to guarantee strong reliability for high bandwidth memory (HBM) is imperative in high performance computers, especially for systems equipped with graphics processing units (GPUs). The design of ECC is challenging because future GPUs are expected to implement a m...
Uloženo v:
| Vydáno v: | IEEE transactions on computers Ročník 68; číslo 5; s. 646 - 659 |
|---|---|
| Hlavní autoři: | , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
New York
IEEE
01.05.2019
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Témata: | |
| ISSN: | 0018-9340, 1557-9956 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Designing error correction code (ECC) to guarantee strong reliability for high bandwidth memory (HBM) is imperative in high performance computers, especially for systems equipped with graphics processing units (GPUs). The design of ECC is challenging because future GPUs are expected to implement a memory subsystem supporting fine and coarse-grained data accesses to match the difference in the spatial locality of GPGPU applications. Current ECC designs, however, are developed for a fixed data fetch granularity. To have a more flexible design, we propose a novel memory protection scheme, called Config(urable)-ECC, which provides strong reliability for both fine and coarse-grained data accesses. Config-ECC consists of two tiers of ECC protection. The tier-1 code is a strong product code that can correct errors due to small granularity faults and detect errors caused by large granularity faults. The tier-2 code is an XOR-based code that is employed to correct errors incurred by large granularity faults. Config-ECC provides stronger reliability and/or lower energy consumption compared to state-of-the-art fixed 32B and 64B ECC schemes. It reduces the HBM energy by 17-21 percent while reducing the failure in time (FIT) rate by 20 times compared to a state-of-the-art fixed 64B ECC scheme with an insignificant 1.2 percent performance overhead. |
|---|---|
| AbstractList | Designing error correction code (ECC) to guarantee strong reliability for high bandwidth memory (HBM) is imperative in high performance computers, especially for systems equipped with graphics processing units (GPUs). The design of ECC is challenging because future GPUs are expected to implement a memory subsystem supporting fine and coarse-grained data accesses to match the difference in the spatial locality of GPGPU applications. Current ECC designs, however, are developed for a fixed data fetch granularity. To have a more flexible design, we propose a novel memory protection scheme, called Config(urable)-ECC, which provides strong reliability for both fine and coarse-grained data accesses. Config-ECC consists of two tiers of ECC protection. The tier-1 code is a strong product code that can correct errors due to small granularity faults and detect errors caused by large granularity faults. The tier-2 code is an XOR-based code that is employed to correct errors incurred by large granularity faults. Config-ECC provides stronger reliability and/or lower energy consumption compared to state-of-the-art fixed 32B and 64B ECC schemes. It reduces the HBM energy by 17-21 percent while reducing the failure in time (FIT) rate by 20 times compared to a state-of-the-art fixed 64B ECC scheme with an insignificant 1.2 percent performance overhead. |
| Author | Lee, Shin-Ying Chakrabarti, Chaitali Mudge, Trevor Wu, Carole-Jean Chen, Hsing-Min |
| Author_xml | – sequence: 1 givenname: Hsing-Min orcidid: 0000-0003-2894-6503 surname: Chen fullname: Chen, Hsing-Min email: hchen136@asu.edu organization: Intel Corporation, Santa Clara, CA, USA – sequence: 2 givenname: Shin-Ying orcidid: 0000-0001-6676-3121 surname: Lee fullname: Lee, Shin-Ying email: lshinyin@asu.edu organization: Samsung, Austin, TX, USA – sequence: 3 givenname: Trevor surname: Mudge fullname: Mudge, Trevor email: tnm@umich.edu organization: Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, USA – sequence: 4 givenname: Carole-Jean orcidid: 0000-0002-9032-7239 surname: Wu fullname: Wu, Carole-Jean email: carole-jean.wu@asu.edu organization: School of Electrical, Computer and Energy Engineering, Arizon State University, Tempe, AZ, USA – sequence: 5 givenname: Chaitali surname: Chakrabarti fullname: Chakrabarti, Chaitali email: chaitali@asu.edu organization: School of Electrical, Computer and Energy Engineering, Arizon State University, Tempe, AZ, USA |
| BookMark | eNp9kL1PwzAUxC1UJFpgZmCxxJxiO41js5VQPiQQQ8ocOc5za9TaxXYF5a8nqIiBgeXdcPd7J90IDZx3gNAZJWNKibycV2NGqBgzIbgQkwM0pEVRZlIWfICGpLcymU_IERrF-EoI4YzIIdpV3hm72AbVriCbVdUVnga9tAl0sm6BFb5dwYftTdybuNZLWANOHtfbzcaHhG-sMRDAJVzbT-jwVGuIESK2Dt_bxRJfK9e92y4t8ROsfdjhehcTrOMJOjRqFeH0R4_Ry-1sXt1nj893D9X0MdNMyJQpKTtpuCn7k7dtUYpcc9WB4Ey0RcsmnCkGXWsoNcx0UJZctkpSphUVeZ8-Rhf7v5vg37YQU_Pqt8H1lQ1jhHMiCC_61OU-pYOPMYBpNsGuVdg1lDTf-zbzqvnet_nZtyeKP4S2SSXrXQrKrv7hzvecBYDfFlGUnJc0_wJuMYl8 |
| CODEN | ITCOB4 |
| CitedBy_id | crossref_primary_10_1007_s11227_025_07503_4 crossref_primary_10_1109_JIOT_2024_3509525 crossref_primary_10_1109_TETC_2020_2965193 crossref_primary_10_1109_TR_2023_3303189 crossref_primary_10_1109_TR_2025_3550972 crossref_primary_10_1109_TVLSI_2024_3474791 crossref_primary_10_3390_jmse9121352 crossref_primary_10_1109_TVLSI_2025_3585971 |
| Cites_doi | 10.1145/2189750.2150989 10.1109/MM.2010.103 10.1109/IISWC.2013.6704684 10.1109/IISWC.2016.7581276 10.1109/MICRO.2014.62 10.1145/2304576.2304582 10.1109/MICRO.2014.57 10.1145/2540708.2540717 10.1145/2840807 10.1109/IISWC.2010.5650274 10.1109/TCSII.2013.2291091 10.1145/2957758 10.1109/ACSSC.2003.1292358 10.1109/ISSCC.2016.7418034 10.1109/ISCA.2012.6237047 10.1109/IISWC.2014.6983054 10.1109/TCOM.1984.1096175 10.1145/272991.272995 10.1109/L-CA.2011.4 10.1109/JRPROC.1961.287814 10.1145/2485922.2485958 10.1145/2694344.2694348 10.1109/HPCA.2016.7446094 10.1145/2830772.2830799 10.1109/DSN.2004.1311885 10.1109/HPCA.2012.6168947 10.1109/IISWC.2009.5306797 10.1145/1454115.1454152 10.1109/TEST.2014.7035318 10.1145/2503210.2503243 10.1145/2366231.2337192 10.1109/HPCA.2015.7056025 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019 |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| DOI | 10.1109/TC.2018.2886884 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Computer Science |
| EISSN | 1557-9956 |
| EndPage | 659 |
| ExternalDocumentID | 10_1109_TC_2018_2886884 8576671 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: Defense Advanced Research Projects Agency funderid: 10.13039/100000185 – fundername: US National Science Foundation grantid: CCF-1618039 |
| GroupedDBID | --Z -DZ -~X .DC 0R~ 29I 4.4 5GY 6IK 85S 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACIWK ACNCT AENEX AETEA AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ IEDLZ IFIPE IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNS RXW TAE TN5 TWZ UHB UPT XZL YZZ AAYXX ABUFD CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c289t-a99d9f6f79f63bb5783c6ade8628b5b2462a2edbf11f2fde7769ba912ca183783 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 11 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000464129300002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0018-9340 |
| IngestDate | Mon Jun 30 06:25:31 EDT 2025 Tue Nov 18 21:07:04 EST 2025 Sat Nov 29 01:35:40 EST 2025 Wed Aug 27 02:44:37 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 5 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c289t-a99d9f6f79f63bb5783c6ade8628b5b2462a2edbf11f2fde7769ba912ca183783 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0002-9032-7239 0000-0001-6676-3121 0000-0003-2894-6503 |
| PQID | 2206608065 |
| PQPubID | 85452 |
| PageCount | 14 |
| ParticipantIDs | crossref_primary_10_1109_TC_2018_2886884 ieee_primary_8576671 proquest_journals_2206608065 crossref_citationtrail_10_1109_TC_2018_2886884 |
| PublicationCentury | 2000 |
| PublicationDate | 2019-05-01 |
| PublicationDateYYYYMMDD | 2019-05-01 |
| PublicationDate_xml | – month: 05 year: 2019 text: 2019-05-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | IEEE transactions on computers |
| PublicationTitleAbbrev | TC |
| PublicationYear | 2019 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref13 (ref25) 2009 ref12 ref15 sridharan (ref16) 2012 ref11 ref10 ref17 ref19 (ref2) 2016 (ref1) 2015 koopman (ref33) 0 (ref26) 2015 ref45 ref48 ref47 ref42 ref41 ref43 chen (ref44) 2012 udipi (ref29) 2012 ref49 ref8 ref9 mekkat (ref46) 2013 macri (ref3) 2015 ref40 ref35 ref34 ref37 ref36 (ref14) 2013 ref31 ref30 ref32 ref39 (ref7) 2012 jacob (ref5) 2007 ref24 ref23 (ref38) 2011 ref20 ref22 ref21 rao (ref6) 1989 ref28 (ref27) 2014 lin (ref4) 2004 sridharan (ref18) 2013 |
| References_xml | – ident: ref17 doi: 10.1145/2189750.2150989 – ident: ref28 doi: 10.1109/MM.2010.103 – ident: ref40 doi: 10.1109/IISWC.2013.6704684 – start-page: 1 year: 2015 ident: ref3 article-title: AMD's next generation GPU and high bandwidth memory architecture: FURY publication-title: Proc IEEE Hot Chips 27 Symp – ident: ref9 doi: 10.1109/IISWC.2016.7581276 – ident: ref48 doi: 10.1109/MICRO.2014.62 – year: 2015 ident: ref1 publication-title: High-bandwidth Memory (HBM) DRAM – ident: ref10 doi: 10.1145/2304576.2304582 – year: 2004 ident: ref4 publication-title: Error Control Coding – year: 2012 ident: ref7 article-title: White paper - AMD Graphics Cores Next (GCN) Architecture – ident: ref13 doi: 10.1109/MICRO.2014.57 – year: 2007 ident: ref5 publication-title: Memory Systems Cache DRAM Disk – ident: ref8 doi: 10.1145/2540708.2540717 – ident: ref49 doi: 10.1145/2840807 – start-page: 225 year: 2013 ident: ref46 article-title: Managing shared last-level cache in a heterogeneous multicore processor publication-title: Proc 22nd IEEE/ACM Int Conf Parallel Architect Compilation Techn – year: 2013 ident: ref14 publication-title: High-bandwidth Memory (HBM) DRAM – ident: ref42 doi: 10.1109/IISWC.2010.5650274 – year: 2011 ident: ref38 article-title: CUDA C/C++ SDK code samples v4.0 – year: 2014 ident: ref27 article-title: NVIDIA GeForce GTX 750 Ti: Featuring first-generation Maxwell GPU technology, designed for extreme performance per watt – ident: ref22 doi: 10.1109/TCSII.2013.2291091 – ident: ref11 doi: 10.1145/2957758 – year: 2009 ident: ref25 article-title: NVIDIA's next generation CUDA compute architecture: Fermi – ident: ref35 doi: 10.1109/ACSSC.2003.1292358 – ident: ref15 doi: 10.1109/ISSCC.2016.7418034 – year: 1989 ident: ref6 publication-title: Error-control coding for computer systems – start-page: 1 year: 2012 ident: ref16 article-title: A field study of DRAM errors publication-title: Proc Int Conf High Perform Comput Netw Storage Anal – ident: ref24 doi: 10.1109/ISCA.2012.6237047 – year: 2016 ident: ref2 article-title: Nvidia Tesla P100 - Whitepaper publication-title: WP-08019-001 v01 1 – ident: ref47 doi: 10.1109/IISWC.2014.6983054 – ident: ref31 doi: 10.1109/TCOM.1984.1096175 – ident: ref36 doi: 10.1145/272991.272995 – ident: ref43 doi: 10.1109/L-CA.2011.4 – ident: ref32 doi: 10.1109/JRPROC.1961.287814 – ident: ref37 doi: 10.1145/2485922.2485958 – ident: ref19 doi: 10.1145/2694344.2694348 – start-page: 33 year: 2012 ident: ref44 article-title: CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory publication-title: Proc Des Autom Test Eur Conf Exhib – ident: ref21 doi: 10.1109/HPCA.2016.7446094 – ident: ref23 doi: 10.1145/2830772.2830799 – ident: ref34 doi: 10.1109/DSN.2004.1311885 – ident: ref45 doi: 10.1109/HPCA.2012.6168947 – start-page: 1 year: 2013 ident: ref18 article-title: Feng Shui of supercomputer memory: Positional effects in DRAM and SRAM faults publication-title: Proc Int Conf High Perform Comput Netw Storage Anal – ident: ref41 doi: 10.1109/IISWC.2009.5306797 – ident: ref39 doi: 10.1145/1454115.1454152 – ident: ref12 doi: 10.1109/TEST.2014.7035318 – ident: ref30 doi: 10.1145/2503210.2503243 – start-page: 285 year: 2012 ident: ref29 article-title: LOT-ECC: Localized and tiered reliability mechanisms for commodity memory systems publication-title: Proc Int Symp Comput Archit doi: 10.1145/2366231.2337192 – year: 2015 ident: ref26 article-title: The compute architecture of Intel processor graphics Gen9 – year: 0 ident: ref33 article-title: Best CRC polynomials – ident: ref20 doi: 10.1109/HPCA.2015.7056025 |
| SSID | ssj0006209 |
| Score | 2.3176506 |
| Snippet | Designing error correction code (ECC) to guarantee strong reliability for high bandwidth memory (HBM) is imperative in high performance computers, especially... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 646 |
| SubjectTerms | 3D DRAM Bandwidth Computer memory Design Energy consumption error control coding and GPU Error correction Error correction codes Fault detection Graphics processing units memory reliability Random access memory Reliability Three-dimensional displays Two dimensional displays |
| Title | Configurable-ECC: Architecting a Flexible ECC Scheme to Support Different Sized Accesses in High Bandwidth Memory Systems |
| URI | https://ieeexplore.ieee.org/document/8576671 https://www.proquest.com/docview/2206608065 |
| Volume | 68 |
| WOSCitedRecordID | wos000464129300002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1557-9956 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0006209 issn: 0018-9340 databaseCode: RIE dateStart: 19680101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Nb9QwEB2VigMcWmhBbCloDhw4kG3s9To2tyV0xQEqJBbUW-SPSRsJsmg3Cyq_HjvrrEDAgUsUyR-K9Dz2ODPzHsAzZ3PluPOZLWqeCStYprkVmRFSutxNve4JTD-9LS4u1OWlfr8HL3a1METUJ5_ROL72sXy_dJv4q-xMBedYxoLxW0Uht7Vau11XDukcLBjwROSJxofl-mxRxhQuNeZKSaXEbydQL6nyxz7cHy7zw__7rHtwkJxInG1Rvw971B7B4SDQgMlej-DuL2yDx3ATq_uaq80qVktl52X5EmdDGCF0QIPzyI4ZGjE0hlmu6Qtht8So_Bm8dHydxFQ6_ND8II-zXmyR1ti0GPNF8JVp_ffGd9f4Libw3mDiQ38AH-fni_JNlpQXMhcuYF1mtPa6lnURHhNrg1VPnDSewvVH2anlQnLDyduasZrXngIc2hrNuDMsMtRPHsJ-u2zpEaAOPp0ykhRz4cCshcmJpGB5rafkVOFGMB7QqFyiJY_qGJ-r_nqS62pRVhG-KsE3gue7AV-3jBz_7noc0dp1S0CN4HSAu0oWu6545LXPY5j55O-jHsOdMLfeJjuewn632tATuO2-dc169bRfjD8BgRrcqA |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3Pb9MwFH6aBhJwYGwDUTaGDzvsQLrYdR2bWwmrhugqJAraLfKPFxYJUtSmoO2vx06diolx4BJFsp1E-vLsZ7_3vg_g2JpUWmZdYrKSJdxwmihmeKK5EDa1Q6daAtMvk2w6lZeX6uMWvN7UwiBim3yG_XDbxvLd3K7CUdmp9M6xCAXj94JyVqzW2sy7okvooN6EBzyNRD40VaezPCRxyT6TUkjJb61BrajKXzNxu7yMd_7vw57A4-hGktEa913YwnoPdjqJBhItdg8e_cE3uA_Xob6v-rpahHqp5CzP35BRF0jwHYgm48CP6RuJb_RPucLvSJo5Cdqf3k8n76KcSkM-VTfoyKiVW8QlqWoSMkbIW127X5VrrshFSOG9JpER_Sl8Hp_N8vMkai8k1m_BmkQr5VQpysxfBsZ4ux5YoR36DZA0Q8O4YJqhMyWlJSsdZplQRivKrKaBo37wDLbreY3PgSjv1UktUFLrl8yS6xRRcJqWaohWZrYH_Q6NwkZi8qCP8a1oNyipKmZ5EeArInw9ONkM-LHm5Ph31_2A1qZbBKoHhx3cRbTZZcECs30aAs0v7h71Ch6czy4mxeT99MMBPPTvUevUx0PYbhYrfAn37c-mWi6O2h_zNyaG3_E |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Configurable-ECC%3A+Architecting+a+Flexible+ECC+Scheme+to+Support+Different+Sized+Accesses+in+High+Bandwidth+Memory+Systems&rft.jtitle=IEEE+transactions+on+computers&rft.au=Chen%2C+Hsing-Min&rft.au=Lee%2C+Shin-Ying&rft.au=Mudge%2C+Trevor&rft.au=Wu%2C+Carole-Jean&rft.date=2019-05-01&rft.pub=IEEE&rft.issn=0018-9340&rft.volume=68&rft.issue=5&rft.spage=646&rft.epage=659&rft_id=info:doi/10.1109%2FTC.2018.2886884&rft.externalDocID=8576671 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0018-9340&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0018-9340&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0018-9340&client=summon |