GANDAFL: Dataflow Acceleration for Short Read Alignment on NGS Data

DNA read alignment is an integral part of genome study, which has been revolutionised thanks to the growth of Next Generation Sequencing (NGS) technologies. The inherent computational intensity of string matching algorithms such as Smith-Waterman (SmW) and the vast amount of NGS input data, create a...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on computers Vol. 71; no. 11; pp. 3018 - 3031
Main Authors: Koliogeorgi, Konstantina, Xydis, Sotirios, Gaydadjiev, Georgi, Soudris, Dimitrios
Format: Journal Article
Language:English
Published: New York IEEE 01.11.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:0018-9340, 1557-9956
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:DNA read alignment is an integral part of genome study, which has been revolutionised thanks to the growth of Next Generation Sequencing (NGS) technologies. The inherent computational intensity of string matching algorithms such as Smith-Waterman (SmW) and the vast amount of NGS input data, create a bottleneck in the workflows. Accelerated reconfigurable computing has been extensively leveraged to alleviate this bottleneck, focusing on high-performance albeit standalone implementations. In existing accelerated solutions effective co-design of NGS short-read alignment still remains an open issue, mainly due to narrow view on real integration aspects, such as system wide communication and accelerator call overheads. In this paper, we first propose GANDAFL , a novel G enome A ligNment DA ta- FL ow architecture for SmW Matrix-fill and Traceback stages to perform high throughput short-read alignment on NGS data. We then propose a radical software restructuring to widely-used Bowtie2 aligner that allows read alignment by batches to expose acceleration capabilities. Batch alignment minimizes calling overhead of the accelerators whereas moving both Matrix-fill and Traceback on chip extinguishes the communication data overheads. The standalone solution delivers up to ×116 and ×2 speedup over state-of-the-art software and hardware accelerators respectively and GANDAFL-enhanced Bowtie2 aligner delivers a ×1.9 speedup.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0018-9340
1557-9956
DOI:10.1109/TC.2022.3144115