Výsledky vyhledávání - "Computer systems organization Architectures Parallel architectures"

Upřesnit hledání
  1. 1

    Splitwise: Efficient Generative LLM Inference Using Phase Splitting Autor Patel, Pratyush, Choukse, Esha, Zhang, Chaojie, Shah, Aashaka, Goiri, Inigo, Maleki, Saeed, Bianchini, Ricardo

    Vydáno: IEEE 29.06.2024
    “…Generative large language model (LLM) applications are growing rapidly, leading to large-scale deployments of expensive and power-hungry GPUs. Our…”
    Získat plný text
    Konferenční příspěvek
  2. 2

    Scheduling techniques for GPU architectures with processing-in-memory capabilities Autor Pattnaik, Ashutosh, Xulong Tang, Adwait Jog, Kayiran, Onur, Mishra, Asit K., Kandemir, Mahmut T., Mutlu, Onur, Das, Chita R.

    Vydáno: ACM 01.09.2016
    “…Processing data in or near memory (PIM), as opposed to in conventional computational units in a processor, can greatly alleviate the performance and energy…”
    Získat plný text
    Konferenční příspěvek
  3. 3

    Trapezoid: A Versatile Accelerator for Dense and Sparse Matrix Multiplications Autor Yang, Yifan, Emer, Joel S., Sanchez, Daniel

    Vydáno: IEEE 29.06.2024
    “…Accelerating matrix multiplication is crucial to achieve high performance in many application domains, including neural networks, graph analytics, and…”
    Získat plný text
    Konferenční příspěvek
  4. 4

    CoNDA: Efficient Cache Coherence Support for Near-Data Accelerators Autor Boroumand, Amirali, Ghose, Saugata, Patel, Minesh, Hassan, Hasan, Lucia, Brandon, Ausavarungnirun, Rachata, Hsieh, Kevin, Hajinazar, Nastaran, Malladi, Krishna T., Zheng, Hongzhong, Mutlu, Onur

    ISSN: 2575-713X
    Vydáno: ACM 01.06.2019
    “…Specialized on-chip accelerators are widely used to improve the energy efficiency of computing systems. Recent advances in memory technology have enabled…”
    Získat plný text
    Konferenční příspěvek
  5. 5

    MCM-GPU: Multi-chip-module GPUs for continued performance scalability Autor Arunkumar, Akhil, Bolotin, Evgeny, Cho, Benjamin, Milic, Ugljesa, Ebrahimi, Eiman, Villa, Oreste, Jaleel, Aamer, Wu, Carole-Jean, Nellans, David

    Vydáno: ACM 01.06.2017
    “…Historically, improvements in GPU-based high performance computing have been tightly coupled to transistor scaling. As Moore's law slows down, and the number…”
    Získat plný text
    Konferenční příspěvek
  6. 6

    HIVE: A High-Priority Victim Cache for Accelerating GPU Memory Accesses Autor Tang, Yuhan, Zhang, Jianmin, Ma, Sheng, Li, Tiejun, Li, Hanqing, Luo, Shengbai, Tang, Jixuan, Wu, Lizhou

    Vydáno: IEEE 22.06.2025
    “…The victim cache was originally designed as a secondary cache to handle misses in the L1 data (L1D) cache in CPUs. However, this design is often sub-optimal…”
    Získat plný text
    Konferenční příspěvek
  7. 7

    SCNN: An accelerator for compressed-sparse convolutional neural networks Autor Parashar, Angshuman, Minsoo Rhu, Mukkara, Anurag, Puglielli, Antonio, Venkatesan, Rangharajan, Khailany, Brucek, Emer, Joel, Keckler, Stephen W., Dally, William J.

    Vydáno: ACM 01.06.2017
    “…Convolutional Neural Networks (CNNs) have emerged as a fundamental technology for machine learning. High performance and extreme energy efficiency are critical…”
    Získat plný text
    Konferenční příspěvek
  8. 8

    ACRS: Adjacent Computation Resource Sharing among Partitioned GPU Sub-Cores Autor Song, Penghao, Wang, Chongxi, Han, Chenji, Zhao, Haoyu, Zhang, Tingting, Liu, Tianyi, Wang, Jian

    Vydáno: IEEE 22.06.2025
    “…Modern GPUs typically segment Streaming Multiprocessors (SMs) into sub-cores (e.g. 4 sub-cores) to reduce power consumption and chip area. However, this…”
    Získat plný text
    Konferenční příspěvek
  9. 9

    GoPTX: Fine-grained GPU Kernel Fusion by PTX-level Instruction Flow Weaving Autor Wu, Kan, Lin, Zejia, Xi, Mengyue, Zheng, Zhongchun, Pan, Wenxuan, Zhang, Xianwei, Lu, Yutong

    Vydáno: IEEE 22.06.2025
    “…GPUs have been heavily utilized in diverse applications, and numerous approaches, including kernel fusion, have been proposed to boost GPU efficiency through…”
    Získat plný text
    Konferenční příspěvek
  10. 10

    SplitSync: Bank Group-Level Split-Synchronization for High-Performance DRAM PIM Autor Yoon, Byungkuk, Han, Sanghyeok, Park, Gyeonghwan, Kim, Jae-Joon

    Vydáno: IEEE 22.06.2025
    “…Processing in Memory (PIM) architectures enhance memory bandwidth by utilizing bank-level parallelism, typically implemented with a SIMD structure where all…”
    Získat plný text
    Konferenční příspěvek
  11. 11

    Late Breaking Results: On-the-Fly Hadamard Hypervector Processing for Efficient Hyperdimensional Computing Autor Masum, Abu Kaisar Mohammad, Moghadam, Mehran Shoushtari, Moon, Sabrina Hassan, Ahmed, Ahmed Mamdouh Mohamed, Najafi, M. Hassan, Reis, Dayane, Aygun, Sercan

    Vydáno: IEEE 22.06.2025
    “…Inspired by the human brain, Hyperdimensional Computing (HDC) processes information efficiently by operating in high-dimensional space using hypervectors…”
    Získat plný text
    Konferenční příspěvek
  12. 12

    RASA: Efficient Register-Aware Systolic Array Matrix Engine for CPU Autor Jeong, Geonhwa, Qin, Eric, Samajdar, Ananda, Hughes, Christopher J., Subramoney, Sreenivas, Kim, Hyesoon, Krishna, Tushar

    Vydáno: IEEE 05.12.2021
    “…As AI-based applications become pervasive, CPU vendors are starting to incorporate matrix engines within the datapath to boost efficiency. Systolic arrays have…”
    Získat plný text
    Konferenční příspěvek
  13. 13

    Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology Autor Seshadri, Vivek, Lee, Donghyuk, Mullins, Thomas, Hassan, Hasan, Boroumand, Amirali, Kim, Jeremie, Kozuch, Michael A., Mutlu, Onur, Gibbons, Phillip B., Mowry, Todd C.

    ISBN: 1450349528, 9781450349529
    ISSN: 2379-3155
    Vydáno: New York, NY, USA ACM 14.10.2017
    “…Many important applications trigger bulk bitwise operations, i.e., bitwise operations on large bit vectors. In fact, recent works design techniques that…”
    Získat plný text
    Konferenční příspěvek
  14. 14

    DenSparSA: A Balanced Systolic Array Approach for Dense and Sparse Matrix Multiplication Autor Wang, Ziheng, Sun, Ruiqi, He, Xin, Ma, Tianrui, Zou, An

    Vydáno: IEEE 22.06.2025
    “…Numerous studies have proposed hardware architectures to accelerate sparse matrix multiplication, but these approaches often incur substantial area and power…”
    Získat plný text
    Konferenční příspěvek
  15. 15

    INTERPRET: Inter-Warp Register Reuse for GPU Tensor Core Autor Kwak, Jae Seok, Yoon, Myung Kuk, Jeong, Ipoom, Jin, Seunghyun, Ro, Won Woo

    Vydáno: IEEE 21.10.2023
    “…Tensor cores in the recent NVIDIA GPUs are under the spotlight due to their superior computation throughput for general matrix-matrix multiplication (GEMM)…”
    Získat plný text
    Konferenční příspěvek
  16. 16

    A scalable processing-in-memory accelerator for parallel graph processing Autor Ahn, Junwhan, Hong, Sungpack, Yoo, Sungjoo, Mutlu, Onur, Choi, Kiyoung

    ISSN: 1063-6897
    Vydáno: IEEE 01.06.2015
    “…The explosion of digital data and the ever-growing need for fast data analysis have made in-memory big-data processing in computer systems increasingly…”
    Získat plný text
    Konferenční příspěvek
  17. 17

    FSPA: An FeFET-based Sparse Matrix-Dense Vector Multiplication Accelerator Autor Zhang, Xiaoyu, Li, Zerun, Liu, Rui, Chen, Xiaoming, Han, Yinhe

    Vydáno: IEEE 09.07.2023
    “…Sparse matrix-dense vector multiplication (SpMV) is widely used in various applications. The performance of traditional SpMV accelerators is bounded by memory…”
    Získat plný text
    Konferenční příspěvek
  18. 18

    Spatz: A Compact Vector Processing Unit for High-Performance and Energy-Efficient Shared-L1 Clusters Autor Cavalcante, Matheus, Wuthrich, Domenic, Perotti, Matteo, Riedel, Samuel, Benini, Luca

    ISSN: 1558-2434
    Vydáno: ACM 29.10.2022
    “…While parallel architectures based on clusters of Processing Elements (PEs) sharing L1 memory are widespread, there is no consensus on how lean their PE should…”
    Získat plný text
    Konferenční příspěvek
  19. 19

    Bit-pragmatic deep neural network computing Autor Albericio, Jorge, Delmás, Alberto, Judd, Patrick, Sharify, Sayeh, O'Leary, Gerard, Genov, Roman, Moshovos, Andreas

    ISBN: 1450349528, 9781450349529
    ISSN: 2379-3155
    Vydáno: New York, NY, USA ACM 14.10.2017
    “…Deep Neural Networks expose a high degree of parallelism, making them amenable to highly data parallel architectures. However, data-parallel architectures…”
    Získat plný text
    Konferenční příspěvek
  20. 20

    HexaMesh: Scaling to Hundreds of Chiplets with an Optimized Chiplet Arrangement Autor Iff, Patrick, Besta, Maciej, Cavalcante, Matheus, Fischer, Tim, Benini, Luca, Hoefler, Torsten

    Vydáno: IEEE 09.07.2023
    “…2.5D integration is an important technique to tackle the growing cost of manufacturing chips in advanced technology nodes. This poses the challenge of…”
    Získat plný text
    Konferenční příspěvek