Metagenomic sequence classification based on local sensitive hashing and Bi-LSTM

Current metagenomic classification methods are limited by short -mer lengths and database dependency, resulting in insufficient taxonomic resolution at the species and genus level. This study proposes the first method integrating Locality-Sensitive Hashing (LSH) and Bidirectional Long-Short Term Mem...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Journal of bioinformatics and computational biology Ročník 23; číslo 4; s. 2550012
Hlavní autoři: Qian, Yan, Xiao, Lei, Zhou, Yiding, Deng, Li
Médium: Journal Article
Jazyk:angličtina
Vydáno: Singapore 01.08.2025
Témata:
ISSN:1757-6334, 1757-6334
On-line přístup:Zjistit podrobnosti o přístupu
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Current metagenomic classification methods are limited by short -mer lengths and database dependency, resulting in insufficient taxonomic resolution at the species and genus level. This study proposes the first method integrating Locality-Sensitive Hashing (LSH) and Bidirectional Long-Short Term Memory (Bi-LSTM) networks for metagenomic sequence classification. The approach reduces runtime reliance on reference databases by learning discriminative features directly from sequences, while supporting long -mers. The method consists of three key steps: (1) -mer representation via locality-sensitive hashing, (2) -mer embedding implementation using the skip-gram model, (3) label assignment to embedded vectors, followed by training in a Bi-LSTM network. Experimental results demonstrate superior classification performance at the genus level compared to existing models. Future work will explore the application of this method in the rapid detection of clinical pathogens.
AbstractList Current metagenomic classification methods are limited by short k-mer lengths and database dependency, resulting in insufficient taxonomic resolution at the species and genus level. This study proposes the first method integrating Locality-Sensitive Hashing (LSH) and Bidirectional Long-Short Term Memory (Bi-LSTM) networks for metagenomic sequence classification. The approach reduces runtime reliance on reference databases by learning discriminative features directly from sequences, while supporting long k-mers. The method consists of three key steps: (1) k-mer representation via locality-sensitive hashing, (2) k-mer embedding implementation using the skip-gram model, (3) label assignment to embedded vectors, followed by training in a Bi-LSTM network. Experimental results demonstrate superior classification performance at the genus level compared to existing models. Future work will explore the application of this method in the rapid detection of clinical pathogens.Current metagenomic classification methods are limited by short k-mer lengths and database dependency, resulting in insufficient taxonomic resolution at the species and genus level. This study proposes the first method integrating Locality-Sensitive Hashing (LSH) and Bidirectional Long-Short Term Memory (Bi-LSTM) networks for metagenomic sequence classification. The approach reduces runtime reliance on reference databases by learning discriminative features directly from sequences, while supporting long k-mers. The method consists of three key steps: (1) k-mer representation via locality-sensitive hashing, (2) k-mer embedding implementation using the skip-gram model, (3) label assignment to embedded vectors, followed by training in a Bi-LSTM network. Experimental results demonstrate superior classification performance at the genus level compared to existing models. Future work will explore the application of this method in the rapid detection of clinical pathogens.
Current metagenomic classification methods are limited by short -mer lengths and database dependency, resulting in insufficient taxonomic resolution at the species and genus level. This study proposes the first method integrating Locality-Sensitive Hashing (LSH) and Bidirectional Long-Short Term Memory (Bi-LSTM) networks for metagenomic sequence classification. The approach reduces runtime reliance on reference databases by learning discriminative features directly from sequences, while supporting long -mers. The method consists of three key steps: (1) -mer representation via locality-sensitive hashing, (2) -mer embedding implementation using the skip-gram model, (3) label assignment to embedded vectors, followed by training in a Bi-LSTM network. Experimental results demonstrate superior classification performance at the genus level compared to existing models. Future work will explore the application of this method in the rapid detection of clinical pathogens.
Author Deng, Li
Qian, Yan
Zhou, Yiding
Xiao, Lei
Author_xml – sequence: 1
  givenname: Yan
  orcidid: 0009-0005-2559-4057
  surname: Qian
  fullname: Qian, Yan
  organization: School of Mechatronic Engineering and Automation, Shanghai University, Shanghai, P. R. China
– sequence: 2
  givenname: Lei
  orcidid: 0000-0002-0902-0417
  surname: Xiao
  fullname: Xiao, Lei
  organization: School of Mechatronic Engineering and Automation, Shanghai University, Shanghai, P. R. China
– sequence: 3
  givenname: Yiding
  orcidid: 0009-0000-5344-2572
  surname: Zhou
  fullname: Zhou, Yiding
  organization: School of Mechatronic Engineering and Automation, Shanghai University, Shanghai, P. R. China
– sequence: 4
  givenname: Li
  orcidid: 0000-0002-9976-147X
  surname: Deng
  fullname: Deng, Li
  organization: Shanghai Key Laboratory of Power Station Automation Technology, Shanghai, P. R. China
BackLink https://www.ncbi.nlm.nih.gov/pubmed/40808601$$D View this record in MEDLINE/PubMed
BookMark eNpNkE1LAzEYhINUtFZ_gBfJ0cvqmzfZjxy11A9oUWiF3pYk-24b2M3WZiv4712wgqeZw8MwMxdsFLpAjF0LuBNC4f0SUOgcATBNAQSuT9hY5GmeZFKq0T9_xs4VFFBkIMbsfUG92VDoWu94pM8DBUfcNSZGX3tnet8Fbk2kig-m6ZxpBixE3_sv4lsTtz5suAkVf_TJfLlaXLLT2jSRro46YR9Ps9X0JZm_Pb9OH-aJk6jWidXWZJkiawGtLKDWtiYAqSTWWpNKq4IEanRQqDQHV-lagMz0sAJTkQNO2O1v7m7fDa1jX7Y-OmoaE6g7xFKi1AqkyNIBvTmiB9tSVe72vjX77_LvBfwBkW1cdQ
ContentType Journal Article
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1142/S021972002550012X
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
EISSN 1757-6334
ExternalDocumentID 40808601
Genre Journal Article
GroupedDBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
ID FETCH-LOGICAL-c324X-b9ba664ebb02b380f9bfe003432f99e45d8e1292c084570cd9f10369175251702
IEDL.DBID 7X8
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001549422400003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1757-6334
IngestDate Sat Nov 01 15:03:53 EDT 2025
Thu Sep 04 05:03:06 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 4
Keywords Bi-LSTM
LSH
Metagenomics
taxonomic classification
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c324X-b9ba664ebb02b380f9bfe003432f99e45d8e1292c084570cd9f10369175251702
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0009-0005-2559-4057
0009-0000-5344-2572
0000-0002-0902-0417
0000-0002-9976-147X
PMID 40808601
PQID 3239403165
PQPubID 23479
ParticipantIDs proquest_miscellaneous_3239403165
pubmed_primary_40808601
PublicationCentury 2000
PublicationDate 2025-Aug
20250801
PublicationDateYYYYMMDD 2025-08-01
PublicationDate_xml – month: 08
  year: 2025
  text: 2025-Aug
PublicationDecade 2020
PublicationPlace Singapore
PublicationPlace_xml – name: Singapore
PublicationTitle Journal of bioinformatics and computational biology
PublicationTitleAlternate J Bioinform Comput Biol
PublicationYear 2025
Score 2.3749325
Snippet Current metagenomic classification methods are limited by short -mer lengths and database dependency, resulting in insufficient taxonomic resolution at the...
Current metagenomic classification methods are limited by short k-mer lengths and database dependency, resulting in insufficient taxonomic resolution at the...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 2550012
SubjectTerms Algorithms
Computational Biology - methods
Databases, Genetic
Humans
Metagenome
Metagenomics - methods
Neural Networks, Computer
Sequence Analysis, DNA - methods
Title Metagenomic sequence classification based on local sensitive hashing and Bi-LSTM
URI https://www.ncbi.nlm.nih.gov/pubmed/40808601
https://www.proquest.com/docview/3239403165
Volume 23
WOSCitedRecordID wos001549422400003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8QwEA7qevDiA1_riwhew7ZJ2jQnUXHx4C4Lu0JvpXkU92B3tau_35m0iydB8FJ6aKBMpvm-ZKbfR8hNFovY8UozJ71lWFlipZaSCY5SJ5FFDfBgNqHG4yzP9aQ7cGu6tsr1mhgWareweEY-EMHDW8Rpcrt8Z-gahdXVzkJjk_QEUBnMapWHv99UolgqhOwKmbHkgynAmVY8sGhE-fx3UhnAZbj339faJ7sdraR3bR4ckA1fH5LJyK9KFGF9m1u67pmmFvkyNgiFOaEIY47CTUA1eKxuQjsRfW1tlmhZO3o_Z8_T2eiIvAwfZw9PrHNQYBaIUs6MNmWaSm9MxI3IokqbyqMkjYDp0V4mLvMA-NxGmUxUZJ2uYoA02MIlKGUW8WOyVS9qf0qoMq4ErsIrlWppIl1yGFwBBsalkc6bPrleB6iADMWyQ1n7xWdT_ISoT07aKBfLVkqjkEBYM9gTnv1h9DnZ4Wi-G7rvLkivgu_TX5Jt-7WaNx9XYerhOp6MvgFNCreY
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Metagenomic+sequence+classification+based+on+local+sensitive+hashing+and+Bi-LSTM&rft.jtitle=Journal+of+bioinformatics+and+computational+biology&rft.au=Qian%2C+Yan&rft.au=Xiao%2C+Lei&rft.au=Zhou%2C+Yiding&rft.au=Deng%2C+Li&rft.date=2025-08-01&rft.issn=1757-6334&rft.eissn=1757-6334&rft.volume=23&rft.issue=4&rft.spage=2550012&rft_id=info:doi/10.1142%2FS021972002550012X&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1757-6334&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1757-6334&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1757-6334&client=summon