Metagenomic sequence classification based on local sensitive hashing and Bi-LSTM
Current metagenomic classification methods are limited by short -mer lengths and database dependency, resulting in insufficient taxonomic resolution at the species and genus level. This study proposes the first method integrating Locality-Sensitive Hashing (LSH) and Bidirectional Long-Short Term Mem...
Uloženo v:
| Vydáno v: | Journal of bioinformatics and computational biology Ročník 23; číslo 4; s. 2550012 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Singapore
01.08.2025
|
| Témata: | |
| ISSN: | 1757-6334, 1757-6334 |
| On-line přístup: | Zjistit podrobnosti o přístupu |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Current metagenomic classification methods are limited by short
-mer lengths and database dependency, resulting in insufficient taxonomic resolution at the species and genus level. This study proposes the first method integrating Locality-Sensitive Hashing (LSH) and Bidirectional Long-Short Term Memory (Bi-LSTM) networks for metagenomic sequence classification. The approach reduces runtime reliance on reference databases by learning discriminative features directly from sequences, while supporting long
-mers. The method consists of three key steps: (1)
-mer representation via locality-sensitive hashing, (2)
-mer embedding implementation using the skip-gram model, (3) label assignment to embedded vectors, followed by training in a Bi-LSTM network. Experimental results demonstrate superior classification performance at the genus level compared to existing models. Future work will explore the application of this method in the rapid detection of clinical pathogens. |
|---|---|
| Bibliografie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 1757-6334 1757-6334 |
| DOI: | 10.1142/S021972002550012X |