Search Results - "Computer systems organization Architectures Serial architectures Pipeline computing"

Refine Results
  1. 1

    Spatz: A Compact Vector Processing Unit for High-Performance and Energy-Efficient Shared-L1 Clusters by Cavalcante, Matheus, Wuthrich, Domenic, Perotti, Matteo, Riedel, Samuel, Benini, Luca

    ISSN: 1558-2434
    Published: ACM 29.10.2022
    “…While parallel architectures based on clusters of Processing Elements (PEs) sharing L1 memory are widespread, there is no consensus on how lean their PE should…”
    Get full text
    Conference Proceeding
  2. 2

    Buffer Prospector: Discovering and Exploiting Untapped Buffer Resources in Many-Core DNN Accelerators by Wei, Yuchen, Cai, Jingwei, Gao, Mingyu, Peng, Sen, Wu, Zuotong, Shi, Guiming, Ma, Kaisheng

    Published: IEEE 22.06.2025
    “…In large-scale DNN inference accelerators, the many-core architecture has emerged as a predominant design, with layer-pipeline (LP) mapping being a mainstream…”
    Get full text
    Conference Proceeding
  3. 3

    Lookup Table-based Multiplication-free All-digital DNN Accelerator Featuring Self-Synchronous Pipeline Accumulation by Tagata, Hiroto, Sato, Takashi, Awano, Hiromitsu

    Published: IEEE 22.06.2025
    “…Deep neural networks (DNNs) have been widely applied in our society, yet reducing power consumption due to large-scale matrix computations remains a critical…”
    Get full text
    Conference Proceeding
  4. 4

    UDP: Utility-Driven Fetch Directed Instruction Prefetching by Oh, Surim, Xu, Mingsheng, Khan, Tanvir Ahmed, Kasikci, Baris, Litz, Heiner

    Published: IEEE 29.06.2024
    “…Datacenter applications exhibit large instruction footprints causing significant instruction cache misses and, as a result, frontend stalls. To address this…”
    Get full text
    Conference Proceeding
  5. 5

    Constable: Improving Performance and Power Efficiency by Safely Eliminating Load Instruction Execution by Bera, Rahul, Ranganathan, Adithya, Rakshit, Joydeep, Mahto, Sujit, Nori, Anant V., Gaur, Jayesh, Olgun, Ataberk, Kanellopoulos, Konstantinos, Sadrosadati, Mohammad, Subramoney, Sreenivas, Mutlu, Onur

    Published: IEEE 29.06.2024
    “…Load instructions often limit instruction-level parallelism (ILP) in modern processors due to data and resource dependences they cause. Prior techniques like…”
    Get full text
    Conference Proceeding
  6. 6

    Alternate Path Fetch by Deshmukh, Aniket, Cai, Lingzhe Chester, Patt, Yale N.

    Published: IEEE 29.06.2024
    “…Modern out-of-order cores rely on a large instruction supply from the processor frontend to achieve high performance. This requires building wider pipelines…”
    Get full text
    Conference Proceeding
  7. 7

    Alternate Path μ-op Cache Prefetching by Singh, Sawan, Perais, Arthur, Jimborean, Alexandra, Ros, Alberto

    Published: IEEE 29.06.2024
    “…Datacenter applications are well-known for their large code footprints. This has caused frontend design to evolve by implementing decoupled fetching and large…”
    Get full text
    Conference Proceeding
  8. 8

    Sparse-T: Hardware accelerator thread for unstructured sparse data processing by Vasireddy, Pranathi, Kavi, Krishna, Mehta, Gayatri

    ISSN: 1558-2434
    Published: ACM 29.10.2022
    “…Sparse matrix-dense vector (SpMV) multiplication is inherent in most scientific, neural networks and machine learning algorithms. To efficiently exploit…”
    Get full text
    Conference Proceeding
  9. 9

    Bit-level Perceptron Prediction for Indirect Branches by Garza, Elba, Mirbagher-Ajorpaz, Samira, Khan, Tahsin Ahmad, Jimenez, Daniel A.

    ISSN: 2575-713X
    Published: ACM 01.06.2019
    “…Modern software uses indirect branches for various purposes including, but not limited to, virtual method dispatch and implementation of switch statements…”
    Get full text
    Conference Proceeding
  10. 10

    AVM-BTB: Adaptive and Virtualized Multi-level Branch Target Buffer by Liu, Yunzhe, Li, Xinyu, Zhang, Tingting, Liu, Tianyi, Guo, Qi, Zhang, Fuxin, Wang, Jian

    Published: IEEE 29.06.2024
    “…Branch Target Buffer (BTB) plays an important role in modern processors. It is used to identify branches in the instruction stream and predict branch targets…”
    Get full text
    Conference Proceeding
  11. 11

    PipeLink: A Pipelined Resource Sharing System for Dataflow High-Level Synthesis by Li, Rui, Berkley, Lincoln, Manohar, Rajit

    Published: IEEE 22.06.2025
    “…Dynamically scheduled high-level synthesis (HLS) is an approach to HLS that maps programs into dataflow circuits. These circuits use distributed control for…”
    Get full text
    Conference Proceeding
  12. 12

    UpPipe: A Novel Pipeline Management on In-Memory Processors for RNA-seq Quantification by Chen, Liang-Chi, Ho, Chien-Chung, Chang, Yuan-Hao

    Published: IEEE 09.07.2023
    “…RNA sequence quantification is an important analysis method to measure transcript abundances. A key overhead in RNA-seq quantification is to map a set of RNA…”
    Get full text
    Conference Proceeding
  13. 13

    Leaky MDU: ARM Memory Disambiguation Unit Uncovered and Vulnerabilities Exposed by Liu, Chang, Lyu, Yongqiang, Wang, Haixia, Qiu, Pengfei, Ju, Dapeng, Qu, Gang, Wang, Dongsheng

    Published: IEEE 09.07.2023
    “…Memory Disambiguation Unit (MDU) is widely used on modern processors to speculatively execute load instructions and improve pipeline performance. Given that…”
    Get full text
    Conference Proceeding
  14. 14

    Load value prediction via path-based address prediction: avoiding mispredictions due to conflicting stores by Sheikh, Rami, Cain, Harold W., Damodaran, Raguram

    ISBN: 1450349528, 9781450349529
    ISSN: 2379-3155
    Published: New York, NY, USA ACM 14.10.2017
    “…Current flagship processors excel at extracting instruction-level-parallelism (ILP) by forming large instruction windows. Even then, extracting ILP is…”
    Get full text
    Conference Proceeding
  15. 15

    MixPipe: Efficient Bidirectional Pipeline Parallelism for Training Large-Scale Models by Zhang, Weigang, Zhou, Biyu, Tang, Xuehai, Wang, Zhaoxing, Hu, Songlin

    Published: IEEE 09.07.2023
    “…The rapid development of large-scale deep neural networks has put forward an urgent demand for the efficiency of parallel training. Recently, bidirectional…”
    Get full text
    Conference Proceeding
  16. 16

    SMT-COP: Defeating Side-Channel Attacks on Execution Units in SMT Processors by Townley, Daniel, Ponomarev, Dmitry

    ISSN: 2641-7936
    Published: IEEE 01.09.2019
    “…Recent advances in side-channel attacks put intoquestion the viability of Simultaneous Multithreading (SMT) architectures from the security standpoint. To…”
    Get full text
    Conference Proceeding
  17. 17

    Filter Caching for Free: The Untapped Potential of the Store-Buffer by Alves, Ricardo, Ros, Alberto, Black-Schaffer, David, Kaxiras, Stefanos

    ISSN: 2575-713X
    Published: ACM 01.06.2019
    “…Modern processors contain store-buffers to allow stores to retire under a miss, thus hiding store-miss latency. The store-buffer needs to be large (for…”
    Get full text
    Conference Proceeding
  18. 18

    FabScalar: composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template by Choudhary, Niket K., Wadhavkar, Salil V., Shah, Tanmay A., Mayukh, Hiran, Gandhi, Jayneel, Dwiel, Brandon H., Navada, Sandeep, Najaf-abadi, Hashem H., Rotenberg, Eric

    ISBN: 9781450304726, 1450304729
    ISSN: 1063-6897
    Published: New York, NY, USA ACM 04.06.2011
    “…A growing body of work has compiled a strong case for the single-ISA heterogeneous multi-core paradigm. A single-ISA heterogeneous multi-core provides…”
    Get full text
    Conference Proceeding
  19. 19

    X-Layer: Building Composable Pipelined Dataflows for Low-Rank Convolutions by Vedula, Naveen, Hojabr, Reza, Khonsari, Ahmad, Shriraman, Arrvindh

    Published: IEEE 01.09.2021
    “…Prior research in hardware accelerators has largely focused on spatial convolutions (CONV). However, state-of-the-art DNNs employ low-rank convolutions…”
    Get full text
    Conference Proceeding
  20. 20

    Pipelining a triggered processing element by Repetti, Thomas J., Cerqueira, João P., Kim, Martha A., Seok, Mingoo

    ISBN: 1450349528, 9781450349529
    ISSN: 2379-3155
    Published: New York, NY, USA ACM 14.10.2017
    “…Programmable spatial architectures composed of ensembles of autonomous fixed-ISA processing elements offer a compelling design point between the flexibility of…”
    Get full text
    Conference Proceeding