Optimizing High Performance Distributed Memory Parallel Hash Tables for DNA k-mer Counting

High-throughput DNA sequencing is the mainstay of modern genomics research. A common operation used in bioinformatic analysis for many applications of high-throughput sequencing is the counting and indexing of fixed length substrings of DNA sequences called k-mers. Counting k-mers is often accomplis...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	SC18: International Conference for High Performance Computing, Networking, Storage and Analysis s. 135 - 147
Hlavní autoři:	Pan, Tony C., Misra, Sanchit, Aluru, Srinivas
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 01.11.2018
Témata:	Bandwidth Bioinformatics cache-aware optimizations distributed memory algorithms Genomics Hash functions Hash tables k-mer counting Memory management Prefetching Sequential analysis vectorization
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	High-throughput DNA sequencing is the mainstay of modern genomics research. A common operation used in bioinformatic analysis for many applications of high-throughput sequencing is the counting and indexing of fixed length substrings of DNA sequences called k-mers. Counting k-mers is often accomplished via hashing, and distributed memory k-mer counting algorithms for large datasets are memory access and network communication bound. In this work, we present two optimized distributed parallel hash table techniques that utilize cache friendly algorithms for local hashing, overlapped communication and computation to hide communication costs, and vectorized hash functions that are specialized for fc-mer and other short key indices. On 4096 cores of the NERSC Cori supercomputer, our implementation completed index construction and query on an approximately 1 TB human genome dataset in just 11.8 seconds and 5.8 seconds, demonstrating speedups of 2.06× and 3.7×, respectively, over the previous state-of-the-art distributed memory k-mer counter.
DOI:	10.1109/SC.2018.00014