A Sparsity-Aware Autonomous Path Planning Accelerator with Algorithm-Architecture Co-Design

Path planning is a critical task in autonomous driving systems that is most susceptible to real-time constraints but often demands computationally intensive mathematical solvers, two contradictory goals. This conflict makes the computing of path planning a paramount challenge. At the heart of most p...

Full description

Saved in:

Bibliographic Details
Published in:	Digest of technical papers - IEEE/ACM International Conference on Computer-Aided Design pp. 1 - 9
Main Authors:	Zhang, Yanjun, Niu, Xiaoyu, Zhang, Yifan, Tian, Hongzheng, Yu, Bo, Liu, Shaoshan, Huang, Sitao
Format:	Conference Proceeding
Language:	English
Published:	ACM 27.10.2024
Subjects:	autonomous driving Autonomous vehicles FPGA Graphics processing units Linear systems Memory management Path planning Phonocardiography Pipeline processing Quadratic programming Real-time systems Throughput
ISSN:	1558-2434
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Path planning is a critical task in autonomous driving systems that is most susceptible to real-time constraints but often demands computationally intensive mathematical solvers, two contradictory goals. This conflict makes the computing of path planning a paramount challenge. At the heart of most path planners is the quadratic programming (QP) solver, which places excessive demands on the CPU in real-world autonomous driving applications. In this paper, we present an FPGA-based acceleration framework for path planning problems. Our approach leverages an operator splitting solver for quadratic programs (OSQP) and employs the preconditioned conjugate gradient (PCG) method for solving linear systems, which are customized to be more hardware-friendly than prior works. Specific memory management and parallel processing were tailored to the matrix pattern, and the incorporation of pipelining was executed to enhance throughput and execution speed. Our FPGA-based implementation achieves state-of-the-art performance against existing works, including an average 1.98 \times speedup compared with the state-of-the-art QP solver on Intel i7-11800H CPU, 3.90 \times speedup over an ARM Cortex-A57 embedded CPU, and 12.3 \times speedup over an NVIDIA RTX 3090 GPU.
ISSN:	1558-2434
DOI:	10.1145/3676536.3676700