A Sparsity-Aware Autonomous Path Planning Accelerator with Algorithm-Architecture Co-Design

Path planning is a critical task in autonomous driving systems that is most susceptible to real-time constraints but often demands computationally intensive mathematical solvers, two contradictory goals. This conflict makes the computing of path planning a paramount challenge. At the heart of most p...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Digest of technical papers - IEEE/ACM International Conference on Computer-Aided Design s. 1 - 9
Hlavní autoři: Zhang, Yanjun, Niu, Xiaoyu, Zhang, Yifan, Tian, Hongzheng, Yu, Bo, Liu, Shaoshan, Huang, Sitao
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: ACM 27.10.2024
Témata:
ISSN:1558-2434
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Path planning is a critical task in autonomous driving systems that is most susceptible to real-time constraints but often demands computationally intensive mathematical solvers, two contradictory goals. This conflict makes the computing of path planning a paramount challenge. At the heart of most path planners is the quadratic programming (QP) solver, which places excessive demands on the CPU in real-world autonomous driving applications. In this paper, we present an FPGA-based acceleration framework for path planning problems. Our approach leverages an operator splitting solver for quadratic programs (OSQP) and employs the preconditioned conjugate gradient (PCG) method for solving linear systems, which are customized to be more hardware-friendly than prior works. Specific memory management and parallel processing were tailored to the matrix pattern, and the incorporation of pipelining was executed to enhance throughput and execution speed. Our FPGA-based implementation achieves state-of-the-art performance against existing works, including an average 1.98 \times speedup compared with the state-of-the-art QP solver on Intel i7-11800H CPU, 3.90 \times speedup over an ARM Cortex-A57 embedded CPU, and 12.3 \times speedup over an NVIDIA RTX 3090 GPU.
ISSN:1558-2434
DOI:10.1145/3676536.3676700