Search Results - Computer system organization Architectures Other architectures Reconfigurable computing*

Refine Results
  1. 1

    DRISA: a DRAM-based Reconfigurable In-Situ Accelerator by Li, Shuangchen, Niu, Dimin, Malladi, Krishna T., Zheng, Hongzhong, Brennan, Bob, Xie, Yuan

    ISBN: 1450349528, 9781450349529
    ISSN: 2379-3155
    Published: New York, NY, USA ACM 14.10.2017
    “… To address the challenge, we propose DRISA, a DRAM-based Reconfigurable In-Situ Accelerator architecture, to provide both powerful computing capability and large memory capacity/bandwidth…”
    Get full text
    Conference Proceeding
  2. 2

    Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks by Chen Zhang, Zhenman Fang, Peipei Zhou, Peichen Pan, Jason Cong

    ISSN: 1558-2434
    Published: ACM 01.11.2016
    “… Second, we design Caffeine with the goal to maximize the underlying FPGA computing and bandwidth…”
    Get full text
    Conference Proceeding
  3. 3

    Stream-dataflow acceleration by Nowatzki, Tony, Gangadhar, Vinay, Ardalani, Newsha, Sankaralingam, Karthikeyan

    Published: ACM 01.06.2017
    “…) are insufficient, as evidenced by the order-of-magnitude improvements and industry adoption of application and domain-specific accelerators in important areas like machine learning, computer vision and big data…”
    Get full text
    Conference Proceeding
  4. 4

    Maximizing CNN accelerator efficiency through resource partitioning by Yongming Shen, Ferdman, Michael, Milder, Peter

    Published: ACM 01.06.2017
    “…Convolutional neural networks (CNNs) are revolutionizing machine learning, but they present significant computational challenges. Recently, many FPGA-based…”
    Get full text
    Conference Proceeding
  5. 5

    SODA: Stencil with Optimized Dataflow Architecture by Chi, Yuze, Cong, Jason, Wei, Peng, Zhou, Peipei

    ISSN: 1558-2434
    Published: ACM 01.11.2018
    “… In this paper we present SODA, an automated framework for implementing Stencil algorithms with Optimized Dataflow Architecture on FPGAs…”
    Get full text
    Conference Proceeding
  6. 6

    Qubit Mapping for Reconfigurable Atom Arrays by Tan, Bochen, Bluvstein, Dolev, Lukin, Mikhail D., Cong, Jason

    ISSN: 1558-2434
    Published: ACM 29.10.2022
    “…Because of the largest number of qubits available, and the massive parallel execution of entangling two-qubit gates, atom arrays is a promising platform for quantum computing…”
    Get full text
    Conference Proceeding
  7. 7

    FEATHER: A Reconfigurable Accelerator with Data Reordering Support for Low-Cost On-Chip Dataflow Switching by Tong, Jianming, Itagi, Anirudh, Chatarasi, Prasanth, Krishna, Tushar

    Published: IEEE 29.06.2024
    “…The inference of ML models composed of diverse structures, types, and sizes boils down to the execution of different dataflows (i.e. different tiling,…”
    Get full text
    Conference Proceeding
  8. 8

    HAL: Hardware-assisted Load Balancing for Energy-efficient SNIC-Host Cooperative Computing by Huang, Jinghan, Lou, Jiaqi, Vanavasam, Srikar, Kong, Xinhao, Ji, Houxiang, Jeong, Ipoom, Zhuo, Danyang, Lee, Eun Kyung, Kim, Nam Sung

    Published: IEEE 29.06.2024
    “… With such a processor, the SNIC has promised to notably improve the system-wide energy efficiency of datacenter servers…”
    Get full text
    Conference Proceeding
  9. 9

    MECLA: Memory-Compute-Efficient LLM Accelerator with Scaling Sub-matrix Partition by Qin, Yubin, Wang, Yang, Zhao, Zhiren, Yang, Xiaolong, Zhou, Yang, Wei, Shaojun, Hu, Yang, Yin, Shouyi

    Published: IEEE 29.06.2024
    “…Large language models (LLMs) have been showing surprising performance in processing language tasks, bringing a new prevalence to deploy LLM from cloud to edge…”
    Get full text
    Conference Proceeding
  10. 10

    TGPA: Tile-Grained Pipeline Architecture for Low Latency CNN Inference by Wei, Xuechao, Liang, Yun, Li, Xiuhong, Yu, Cody Hao, Zhang, Peng, Cong, Jason

    ISSN: 1558-2434
    Published: ACM 01.11.2018
    “…FPGAs are more and more widely used as reconfigurable hardware accelerators for applications leveraging convolutional neural networks (CNNs) in recent years…”
    Get full text
    Conference Proceeding
  11. 11

    Understanding and optimizing asynchronous low-precision stochastic gradient descent by De Sa, Christopher, Feldman, Matthew, Re, Christopher, Olukotun, Kunle

    Published: ACM 01.06.2017
    “…Stochastic gradient descent (SGD) is one of the most popular numerical algorithms used in machine learning and other domains. Since this is likely to continue…”
    Get full text
    Conference Proceeding
  12. 12

    Map-and-Conquer: Energy-Efficient Mapping of Dynamic Neural Nets onto Heterogeneous MPSoCs by Bouzidi, Halima, Odema, Mohanad, Ouarnoughi, Hamza, Niar, Smail, Al Faruque, Mohammad Abdullah

    Published: IEEE 09.07.2023
    “… To date, the mapping strategies of neural networks (NNs) onto such systems are yet to exploit the full potential of processing parallelism, made possible through both the intrinsic NNs' structure and underlying hardware composition…”
    Get full text
    Conference Proceeding
  13. 13

    MASR: A Modular Accelerator for Sparse RNNs by Gupta, Udit, Reagen, Brandon, Pentecost, Lillian, Donato, Marco, Tambe, Thierry, Rush, Alexander M., Wei, Gu-Yeon, Brooks, David

    ISSN: 2641-7936
    Published: IEEE 01.09.2019
    “… In this paper we present MASR, a principled and modular architecture that accelerates bidirectional RNNs for on-chip ASR…”
    Get full text
    Conference Proceeding
  14. 14

    CoSPARSE: A Software and Hardware Reconfigurable SpMV Framework for Graph Analytics by Feng, Siying, Sun, Jiawen, Pal, Subhankar, He, Xin, Kaszyk, Kuba, Park, Dong-hyeon, Morton, Magnus, Mudge, Trevor, Cole, Murray, O'Boyle, Michael, Chakrabarti, Chaitali, Dreslinski, Ronald

    Published: IEEE 05.12.2021
    “… reconfiguration as a synergistic solution to accelerate SpMV-based graph analytics algorithms. Building on previously proposed general-purpose reconfigurable hardware…”
    Get full text
    Conference Proceeding
  15. 15

    MambaOPU: An FPGA Overlay Processor for State-space-duality-based Mamba Models by Lu, Shaoqiang, Yu, Xuliang, Zhao, Tiandong, Miao, Siyuan, Sheng, Xinsong, Wu, Chen, Zhao, Liang, Lin, Ting-Jung, He, Lei

    Published: IEEE 22.06.2025
    “…State-space models (SSMs), such as Mamba, have emerged as a promising alternative to Transformers. However, the recently developed Mamba2, based on state space…”
    Get full text
    Conference Proceeding
  16. 16

    NTT-PIM: Row-Centric Architecture and Mapping for Efficient Number-Theoretic Transform on PIM by Park, Jaewoo, Lee, Sugil, Lee, Jongeun

    Published: IEEE 09.07.2023
    “…Recently DRAM-based PIMs (processing-in-memories) with unmodified cell arrays have demonstrated impressive performance for accelerating AI applications…”
    Get full text
    Conference Proceeding
  17. 17

    DRAFT: Decoupling Backpropagation from Pre-trained Backbone for Efficient Transformer Fine-Tuning on Edge by Huang, Zhirui, Liu, Shiwei, Zhu, Haozhe, Liu, Qi, Chen, Chixiao

    Published: IEEE 22.06.2025
    “…). The existing fine-tuning techniques require the BP through the massive pre-trained backbone weights for computing the input gradient, resulting in significant computing overhead and memory footprint…”
    Get full text
    Conference Proceeding
  18. 18

    RASA: Efficient Register-Aware Systolic Array Matrix Engine for CPU by Jeong, Geonhwa, Qin, Eric, Samajdar, Ananda, Hughes, Christopher J., Subramoney, Sreenivas, Kim, Hyesoon, Krishna, Tushar

    Published: IEEE 05.12.2021
    “…As AI-based applications become pervasive, CPU vendors are starting to incorporate matrix engines within the datapath to boost efficiency. Systolic arrays have…”
    Get full text
    Conference Proceeding
  19. 19

    Buffer Prospector: Discovering and Exploiting Untapped Buffer Resources in Many-Core DNN Accelerators by Wei, Yuchen, Cai, Jingwei, Gao, Mingyu, Peng, Sen, Wu, Zuotong, Shi, Guiming, Ma, Kaisheng

    Published: IEEE 22.06.2025
    “…In large-scale DNN inference accelerators, the many-core architecture has emerged as a predominant design, with layer-pipeline (LP…”
    Get full text
    Conference Proceeding
  20. 20

    Heterogeneous Reconfigurable Accelerators: Trends and Perspectives by Luk, Wayne

    Published: IEEE 09.07.2023
    “…Heterogeneity and reconfigurability have both been adopted by accelerators to improve their flexibility and efficiency for a wide variety of applications, from cloud computing to embedded systems…”
    Get full text
    Conference Proceeding