Výsledky vyhledávání - "Computer systems organization Architectures Parallel architectures"

1

Načítá se…

Splitwise: Efficient Generative LLM Inference Using Phase Splitting Autor Patel, Pratyush, Choukse, Esha, Zhang, Chaojie, Shah, Aashaka, Goiri, Inigo, Maleki, Saeed, Bianchini, Ricardo

Vydáno: IEEE 29.06.2024

Vydáno v 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) (29.06.2024)
“…Generative large language model (LLM) applications are growing rapidly, leading to large-scale deployments of expensive and power-hungry GPUs. Our…”

Získat plný text

Konferenční příspěvek

Přidat do oblíbených

Uloženo v:
2

Načítá se…

Scheduling techniques for GPU architectures with processing-in-memory capabilities Autor Pattnaik, Ashutosh, Xulong Tang, Adwait Jog, Kayiran, Onur, Mishra, Asit K., Kandemir, Mahmut T., Mutlu, Onur, Das, Chita R.

Vydáno: ACM 01.09.2016

Vydáno v 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) (01.09.2016)
“…Processing data in or near memory (PIM), as opposed to in conventional computational units in a processor, can greatly alleviate the performance and energy…”

Získat plný text

Konferenční příspěvek

Přidat do oblíbených

Uloženo v:
3

Načítá se…

Trapezoid: A Versatile Accelerator for Dense and Sparse Matrix Multiplications Autor Yang, Yifan, Emer, Joel S., Sanchez, Daniel

Vydáno: IEEE 29.06.2024

Vydáno v 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) (29.06.2024)
“…Accelerating matrix multiplication is crucial to achieve high performance in many application domains, including neural networks, graph analytics, and…”

Získat plný text

Konferenční příspěvek

Přidat do oblíbených

Uloženo v:
4

Načítá se…

CoNDA: Efficient Cache Coherence Support for Near-Data Accelerators Autor Boroumand, Amirali, Ghose, Saugata, Patel, Minesh, Hassan, Hasan, Lucia, Brandon, Ausavarungnirun, Rachata, Hsieh, Kevin, Hajinazar, Nastaran, Malladi, Krishna T., Zheng, Hongzhong, Mutlu, Onur

ISSN: 2575-713X

Vydáno: ACM 01.06.2019

Vydáno v 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA) (01.06.2019)
“…Specialized on-chip accelerators are widely used to improve the energy efficiency of computing systems. Recent advances in memory technology have enabled…”

Získat plný text

Konferenční příspěvek

Přidat do oblíbených

Uloženo v:
5

Načítá se…

MCM-GPU: Multi-chip-module GPUs for continued performance scalability Autor Arunkumar, Akhil, Bolotin, Evgeny, Cho, Benjamin, Milic, Ugljesa, Ebrahimi, Eiman, Villa, Oreste, Jaleel, Aamer, Wu, Carole-Jean, Nellans, David

Vydáno: ACM 01.06.2017

Vydáno v 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) (01.06.2017)
“…Historically, improvements in GPU-based high performance computing have been tightly coupled to transistor scaling. As Moore's law slows down, and the number…”

Získat plný text

Konferenční příspěvek

Přidat do oblíbených

Uloženo v:
6

Načítá se…

HIVE: A High-Priority Victim Cache for Accelerating GPU Memory Accesses Autor Tang, Yuhan, Zhang, Jianmin, Ma, Sheng, Li, Tiejun, Li, Hanqing, Luo, Shengbai, Tang, Jixuan, Wu, Lizhou

Vydáno: IEEE 22.06.2025

Vydáno v 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)
“…The victim cache was originally designed as a secondary cache to handle misses in the L1 data (L1D) cache in CPUs. However, this design is often sub-optimal…”

Získat plný text

Konferenční příspěvek

Přidat do oblíbených

Uloženo v:
7

Načítá se…

SCNN: An accelerator for compressed-sparse convolutional neural networks Autor Parashar, Angshuman, Minsoo Rhu, Mukkara, Anurag, Puglielli, Antonio, Venkatesan, Rangharajan, Khailany, Brucek, Emer, Joel, Keckler, Stephen W., Dally, William J.

Vydáno: ACM 01.06.2017

Vydáno v 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) (01.06.2017)
“…Convolutional Neural Networks (CNNs) have emerged as a fundamental technology for machine learning. High performance and extreme energy efficiency are critical…”

Získat plný text

Konferenční příspěvek

Přidat do oblíbených

Uloženo v:
8

Načítá se…

ACRS: Adjacent Computation Resource Sharing among Partitioned GPU Sub-Cores Autor Song, Penghao, Wang, Chongxi, Han, Chenji, Zhao, Haoyu, Zhang, Tingting, Liu, Tianyi, Wang, Jian

Vydáno: IEEE 22.06.2025

Vydáno v 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)
“…Modern GPUs typically segment Streaming Multiprocessors (SMs) into sub-cores (e.g. 4 sub-cores) to reduce power consumption and chip area. However, this…”

Získat plný text

Konferenční příspěvek

Přidat do oblíbených

Uloženo v:
9

Načítá se…

GoPTX: Fine-grained GPU Kernel Fusion by PTX-level Instruction Flow Weaving Autor Wu, Kan, Lin, Zejia, Xi, Mengyue, Zheng, Zhongchun, Pan, Wenxuan, Zhang, Xianwei, Lu, Yutong

Vydáno: IEEE 22.06.2025

Vydáno v 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)
“…GPUs have been heavily utilized in diverse applications, and numerous approaches, including kernel fusion, have been proposed to boost GPU efficiency through…”

Získat plný text

Konferenční příspěvek

Přidat do oblíbených

Uloženo v:
10

Načítá se…

SplitSync: Bank Group-Level Split-Synchronization for High-Performance DRAM PIM Autor Yoon, Byungkuk, Han, Sanghyeok, Park, Gyeonghwan, Kim, Jae-Joon

Vydáno: IEEE 22.06.2025

Vydáno v 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)
“…Processing in Memory (PIM) architectures enhance memory bandwidth by utilizing bank-level parallelism, typically implemented with a SIMD structure where all…”

Získat plný text

Konferenční příspěvek

Přidat do oblíbených

Uloženo v:
11

Načítá se…

Late Breaking Results: On-the-Fly Hadamard Hypervector Processing for Efficient Hyperdimensional Computing Autor Masum, Abu Kaisar Mohammad, Moghadam, Mehran Shoushtari, Moon, Sabrina Hassan, Ahmed, Ahmed Mamdouh Mohamed, Najafi, M. Hassan, Reis, Dayane, Aygun, Sercan

Vydáno: IEEE 22.06.2025

Vydáno v 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)
“…Inspired by the human brain, Hyperdimensional Computing (HDC) processes information efficiently by operating in high-dimensional space using hypervectors…”

Získat plný text

Konferenční příspěvek

Přidat do oblíbených

Uloženo v:
12

Načítá se…

RASA: Efficient Register-Aware Systolic Array Matrix Engine for CPU Autor Jeong, Geonhwa, Qin, Eric, Samajdar, Ananda, Hughes, Christopher J., Subramoney, Sreenivas, Kim, Hyesoon, Krishna, Tushar

Vydáno: IEEE 05.12.2021

Vydáno v 2021 58th ACM/IEEE Design Automation Conference (DAC) (05.12.2021)
“…As AI-based applications become pervasive, CPU vendors are starting to incorporate matrix engines within the datapath to boost efficiency. Systolic arrays have…”

Získat plný text

Konferenční příspěvek

Přidat do oblíbených

Uloženo v:
13

Načítá se…

Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology Autor Seshadri, Vivek, Lee, Donghyuk, Mullins, Thomas, Hassan, Hasan, Boroumand, Amirali, Kim, Jeremie, Kozuch, Michael A., Mutlu, Onur, Gibbons, Phillip B., Mowry, Todd C.

ISBN: 1450349528, 9781450349529

ISSN: 2379-3155

Vydáno: New York, NY, USA ACM 14.10.2017

Vydáno v MICRO-50 : the 50th annual IEEE/ACM International Symposium on Microarchitecture : proceedings : October 14-18, 2017, Cambridge, MA (14.10.2017)
“…Many important applications trigger bulk bitwise operations, i.e., bitwise operations on large bit vectors. In fact, recent works design techniques that…”

Získat plný text

Konferenční příspěvek

Přidat do oblíbených

Uloženo v:
14

Načítá se…

DenSparSA: A Balanced Systolic Array Approach for Dense and Sparse Matrix Multiplication Autor Wang, Ziheng, Sun, Ruiqi, He, Xin, Ma, Tianrui, Zou, An

Vydáno: IEEE 22.06.2025

Vydáno v 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)
“…Numerous studies have proposed hardware architectures to accelerate sparse matrix multiplication, but these approaches often incur substantial area and power…”

Získat plný text

Konferenční příspěvek

Přidat do oblíbených

Uloženo v:
15

Načítá se…

INTERPRET: Inter-Warp Register Reuse for GPU Tensor Core Autor Kwak, Jae Seok, Yoon, Myung Kuk, Jeong, Ipoom, Jin, Seunghyun, Ro, Won Woo

Vydáno: IEEE 21.10.2023

Vydáno v 2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT) (21.10.2023)
“…Tensor cores in the recent NVIDIA GPUs are under the spotlight due to their superior computation throughput for general matrix-matrix multiplication (GEMM)…”

Získat plný text

Konferenční příspěvek

Přidat do oblíbených

Uloženo v:
16

Načítá se…

A scalable processing-in-memory accelerator for parallel graph processing Autor Ahn, Junwhan, Hong, Sungpack, Yoo, Sungjoo, Mutlu, Onur, Choi, Kiyoung

ISSN: 1063-6897

Vydáno: IEEE 01.06.2015

Vydáno v Proceedings - International Symposium on Computer Architecture (01.06.2015)
“…The explosion of digital data and the ever-growing need for fast data analysis have made in-memory big-data processing in computer systems increasingly…”

Získat plný text

Konferenční příspěvek

Přidat do oblíbených

Uloženo v:
17

Načítá se…

FSPA: An FeFET-based Sparse Matrix-Dense Vector Multiplication Accelerator Autor Zhang, Xiaoyu, Li, Zerun, Liu, Rui, Chen, Xiaoming, Han, Yinhe

Vydáno: IEEE 09.07.2023

Vydáno v 2023 60th ACM/IEEE Design Automation Conference (DAC) (09.07.2023)
“…Sparse matrix-dense vector multiplication (SpMV) is widely used in various applications. The performance of traditional SpMV accelerators is bounded by memory…”

Získat plný text

Konferenční příspěvek

Přidat do oblíbených

Uloženo v:
18

Načítá se…

Spatz: A Compact Vector Processing Unit for High-Performance and Energy-Efficient Shared-L1 Clusters Autor Cavalcante, Matheus, Wuthrich, Domenic, Perotti, Matteo, Riedel, Samuel, Benini, Luca

ISSN: 1558-2434

Vydáno: ACM 29.10.2022

Vydáno v 2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD) (29.10.2022)
“…While parallel architectures based on clusters of Processing Elements (PEs) sharing L1 memory are widespread, there is no consensus on how lean their PE should…”

Získat plný text

Konferenční příspěvek

Přidat do oblíbených

Uloženo v:
19

Načítá se…

Bit-pragmatic deep neural network computing Autor Albericio, Jorge, Delmás, Alberto, Judd, Patrick, Sharify, Sayeh, O'Leary, Gerard, Genov, Roman, Moshovos, Andreas

ISBN: 1450349528, 9781450349529

ISSN: 2379-3155

Vydáno: New York, NY, USA ACM 14.10.2017

Vydáno v MICRO-50 : the 50th annual IEEE/ACM International Symposium on Microarchitecture : proceedings : October 14-18, 2017, Cambridge, MA (14.10.2017)
“…Deep Neural Networks expose a high degree of parallelism, making them amenable to highly data parallel architectures. However, data-parallel architectures…”

Získat plný text

Konferenční příspěvek

Přidat do oblíbených

Uloženo v:
20

Načítá se…

HexaMesh: Scaling to Hundreds of Chiplets with an Optimized Chiplet Arrangement Autor Iff, Patrick, Besta, Maciej, Cavalcante, Matheus, Fischer, Tim, Benini, Luca, Hoefler, Torsten

Vydáno: IEEE 09.07.2023

Vydáno v 2023 60th ACM/IEEE Design Automation Conference (DAC) (09.07.2023)
“…2.5D integration is an important technique to tackle the growing cost of manufacturing chips in advanced technology nodes. This poses the challenge of…”

Získat plný text

Konferenční příspěvek

Přidat do oblíbených

Uloženo v:

Výsledky vyhledávání - "Computer systems organization Architectures Parallel architectures"

Splitwise: Efficient Generative LLM Inference Using Phase Splitting Autor Patel, Pratyush, Choukse, Esha, Zhang, Chaojie, Shah, Aashaka, Goiri, Inigo, Maleki, Saeed, Bianchini, Ricardo

Scheduling techniques for GPU architectures with processing-in-memory capabilities Autor Pattnaik, Ashutosh, Xulong Tang, Adwait Jog, Kayiran, Onur, Mishra, Asit K., Kandemir, Mahmut T., Mutlu, Onur, Das, Chita R.

Trapezoid: A Versatile Accelerator for Dense and Sparse Matrix Multiplications Autor Yang, Yifan, Emer, Joel S., Sanchez, Daniel

CoNDA: Efficient Cache Coherence Support for Near-Data Accelerators Autor Boroumand, Amirali, Ghose, Saugata, Patel, Minesh, Hassan, Hasan, Lucia, Brandon, Ausavarungnirun, Rachata, Hsieh, Kevin, Hajinazar, Nastaran, Malladi, Krishna T., Zheng, Hongzhong, Mutlu, Onur

MCM-GPU: Multi-chip-module GPUs for continued performance scalability Autor Arunkumar, Akhil, Bolotin, Evgeny, Cho, Benjamin, Milic, Ugljesa, Ebrahimi, Eiman, Villa, Oreste, Jaleel, Aamer, Wu, Carole-Jean, Nellans, David

HIVE: A High-Priority Victim Cache for Accelerating GPU Memory Accesses Autor Tang, Yuhan, Zhang, Jianmin, Ma, Sheng, Li, Tiejun, Li, Hanqing, Luo, Shengbai, Tang, Jixuan, Wu, Lizhou

SCNN: An accelerator for compressed-sparse convolutional neural networks Autor Parashar, Angshuman, Minsoo Rhu, Mukkara, Anurag, Puglielli, Antonio, Venkatesan, Rangharajan, Khailany, Brucek, Emer, Joel, Keckler, Stephen W., Dally, William J.

ACRS: Adjacent Computation Resource Sharing among Partitioned GPU Sub-Cores Autor Song, Penghao, Wang, Chongxi, Han, Chenji, Zhao, Haoyu, Zhang, Tingting, Liu, Tianyi, Wang, Jian

GoPTX: Fine-grained GPU Kernel Fusion by PTX-level Instruction Flow Weaving Autor Wu, Kan, Lin, Zejia, Xi, Mengyue, Zheng, Zhongchun, Pan, Wenxuan, Zhang, Xianwei, Lu, Yutong

SplitSync: Bank Group-Level Split-Synchronization for High-Performance DRAM PIM Autor Yoon, Byungkuk, Han, Sanghyeok, Park, Gyeonghwan, Kim, Jae-Joon

Late Breaking Results: On-the-Fly Hadamard Hypervector Processing for Efficient Hyperdimensional Computing Autor Masum, Abu Kaisar Mohammad, Moghadam, Mehran Shoushtari, Moon, Sabrina Hassan, Ahmed, Ahmed Mamdouh Mohamed, Najafi, M. Hassan, Reis, Dayane, Aygun, Sercan

RASA: Efficient Register-Aware Systolic Array Matrix Engine for CPU Autor Jeong, Geonhwa, Qin, Eric, Samajdar, Ananda, Hughes, Christopher J., Subramoney, Sreenivas, Kim, Hyesoon, Krishna, Tushar

Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology Autor Seshadri, Vivek, Lee, Donghyuk, Mullins, Thomas, Hassan, Hasan, Boroumand, Amirali, Kim, Jeremie, Kozuch, Michael A., Mutlu, Onur, Gibbons, Phillip B., Mowry, Todd C.

DenSparSA: A Balanced Systolic Array Approach for Dense and Sparse Matrix Multiplication Autor Wang, Ziheng, Sun, Ruiqi, He, Xin, Ma, Tianrui, Zou, An

INTERPRET: Inter-Warp Register Reuse for GPU Tensor Core Autor Kwak, Jae Seok, Yoon, Myung Kuk, Jeong, Ipoom, Jin, Seunghyun, Ro, Won Woo

A scalable processing-in-memory accelerator for parallel graph processing Autor Ahn, Junwhan, Hong, Sungpack, Yoo, Sungjoo, Mutlu, Onur, Choi, Kiyoung

FSPA: An FeFET-based Sparse Matrix-Dense Vector Multiplication Accelerator Autor Zhang, Xiaoyu, Li, Zerun, Liu, Rui, Chen, Xiaoming, Han, Yinhe

Spatz: A Compact Vector Processing Unit for High-Performance and Energy-Efficient Shared-L1 Clusters Autor Cavalcante, Matheus, Wuthrich, Domenic, Perotti, Matteo, Riedel, Samuel, Benini, Luca

Bit-pragmatic deep neural network computing Autor Albericio, Jorge, Delmás, Alberto, Judd, Patrick, Sharify, Sayeh, O'Leary, Gerard, Genov, Roman, Moshovos, Andreas

HexaMesh: Scaling to Hundreds of Chiplets with an Optimized Chiplet Arrangement Autor Iff, Patrick, Besta, Maciej, Cavalcante, Matheus, Fischer, Tim, Benini, Luca, Hoefler, Torsten

Vyhledávací nástroje:

Upřesnit hledání

Médium

Předmětová oblast

Téma

Jazyk

Rok vydání