Metagenomic sequence classification based on local sensitive hashing and Bi-LSTM
Current metagenomic classification methods are limited by short -mer lengths and database dependency, resulting in insufficient taxonomic resolution at the species and genus level. This study proposes the first method integrating Locality-Sensitive Hashing (LSH) and Bidirectional Long-Short Term Mem...
Uloženo v:
| Vydáno v: | Journal of bioinformatics and computational biology Ročník 23; číslo 4; s. 2550012 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Singapore
01.08.2025
|
| Témata: | |
| ISSN: | 1757-6334, 1757-6334 |
| On-line přístup: | Zjistit podrobnosti o přístupu |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Current metagenomic classification methods are limited by short
-mer lengths and database dependency, resulting in insufficient taxonomic resolution at the species and genus level. This study proposes the first method integrating Locality-Sensitive Hashing (LSH) and Bidirectional Long-Short Term Memory (Bi-LSTM) networks for metagenomic sequence classification. The approach reduces runtime reliance on reference databases by learning discriminative features directly from sequences, while supporting long
-mers. The method consists of three key steps: (1)
-mer representation via locality-sensitive hashing, (2)
-mer embedding implementation using the skip-gram model, (3) label assignment to embedded vectors, followed by training in a Bi-LSTM network. Experimental results demonstrate superior classification performance at the genus level compared to existing models. Future work will explore the application of this method in the rapid detection of clinical pathogens. |
|---|---|
| AbstractList | Current metagenomic classification methods are limited by short k-mer lengths and database dependency, resulting in insufficient taxonomic resolution at the species and genus level. This study proposes the first method integrating Locality-Sensitive Hashing (LSH) and Bidirectional Long-Short Term Memory (Bi-LSTM) networks for metagenomic sequence classification. The approach reduces runtime reliance on reference databases by learning discriminative features directly from sequences, while supporting long k-mers. The method consists of three key steps: (1) k-mer representation via locality-sensitive hashing, (2) k-mer embedding implementation using the skip-gram model, (3) label assignment to embedded vectors, followed by training in a Bi-LSTM network. Experimental results demonstrate superior classification performance at the genus level compared to existing models. Future work will explore the application of this method in the rapid detection of clinical pathogens.Current metagenomic classification methods are limited by short k-mer lengths and database dependency, resulting in insufficient taxonomic resolution at the species and genus level. This study proposes the first method integrating Locality-Sensitive Hashing (LSH) and Bidirectional Long-Short Term Memory (Bi-LSTM) networks for metagenomic sequence classification. The approach reduces runtime reliance on reference databases by learning discriminative features directly from sequences, while supporting long k-mers. The method consists of three key steps: (1) k-mer representation via locality-sensitive hashing, (2) k-mer embedding implementation using the skip-gram model, (3) label assignment to embedded vectors, followed by training in a Bi-LSTM network. Experimental results demonstrate superior classification performance at the genus level compared to existing models. Future work will explore the application of this method in the rapid detection of clinical pathogens. Current metagenomic classification methods are limited by short -mer lengths and database dependency, resulting in insufficient taxonomic resolution at the species and genus level. This study proposes the first method integrating Locality-Sensitive Hashing (LSH) and Bidirectional Long-Short Term Memory (Bi-LSTM) networks for metagenomic sequence classification. The approach reduces runtime reliance on reference databases by learning discriminative features directly from sequences, while supporting long -mers. The method consists of three key steps: (1) -mer representation via locality-sensitive hashing, (2) -mer embedding implementation using the skip-gram model, (3) label assignment to embedded vectors, followed by training in a Bi-LSTM network. Experimental results demonstrate superior classification performance at the genus level compared to existing models. Future work will explore the application of this method in the rapid detection of clinical pathogens. |
| Author | Deng, Li Qian, Yan Zhou, Yiding Xiao, Lei |
| Author_xml | – sequence: 1 givenname: Yan orcidid: 0009-0005-2559-4057 surname: Qian fullname: Qian, Yan organization: School of Mechatronic Engineering and Automation, Shanghai University, Shanghai, P. R. China – sequence: 2 givenname: Lei orcidid: 0000-0002-0902-0417 surname: Xiao fullname: Xiao, Lei organization: School of Mechatronic Engineering and Automation, Shanghai University, Shanghai, P. R. China – sequence: 3 givenname: Yiding orcidid: 0009-0000-5344-2572 surname: Zhou fullname: Zhou, Yiding organization: School of Mechatronic Engineering and Automation, Shanghai University, Shanghai, P. R. China – sequence: 4 givenname: Li orcidid: 0000-0002-9976-147X surname: Deng fullname: Deng, Li organization: Shanghai Key Laboratory of Power Station Automation Technology, Shanghai, P. R. China |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/40808601$$D View this record in MEDLINE/PubMed |
| BookMark | eNpNkE1LAzEYhINUtFZ_gBfJ0cvqmzfZjxy11A9oUWiF3pYk-24b2M3WZiv4712wgqeZw8MwMxdsFLpAjF0LuBNC4f0SUOgcATBNAQSuT9hY5GmeZFKq0T9_xs4VFFBkIMbsfUG92VDoWu94pM8DBUfcNSZGX3tnet8Fbk2kig-m6ZxpBixE3_sv4lsTtz5suAkVf_TJfLlaXLLT2jSRro46YR9Ps9X0JZm_Pb9OH-aJk6jWidXWZJkiawGtLKDWtiYAqSTWWpNKq4IEanRQqDQHV-lagMz0sAJTkQNO2O1v7m7fDa1jX7Y-OmoaE6g7xFKi1AqkyNIBvTmiB9tSVe72vjX77_LvBfwBkW1cdQ |
| ContentType | Journal Article |
| DBID | CGR CUY CVF ECM EIF NPM 7X8 |
| DOI | 10.1142/S021972002550012X |
| DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic MEDLINE |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| EISSN | 1757-6334 |
| ExternalDocumentID | 40808601 |
| Genre | Journal Article |
| GroupedDBID | CGR CUY CVF ECM EIF NPM 7X8 |
| ID | FETCH-LOGICAL-c324X-b9ba664ebb02b380f9bfe003432f99e45d8e1292c084570cd9f10369175251702 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001549422400003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1757-6334 |
| IngestDate | Sat Nov 01 15:03:53 EDT 2025 Thu Sep 04 05:03:06 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 4 |
| Keywords | Bi-LSTM LSH Metagenomics taxonomic classification |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c324X-b9ba664ebb02b380f9bfe003432f99e45d8e1292c084570cd9f10369175251702 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ORCID | 0009-0005-2559-4057 0009-0000-5344-2572 0000-0002-0902-0417 0000-0002-9976-147X |
| PMID | 40808601 |
| PQID | 3239403165 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_3239403165 pubmed_primary_40808601 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-Aug 20250801 |
| PublicationDateYYYYMMDD | 2025-08-01 |
| PublicationDate_xml | – month: 08 year: 2025 text: 2025-Aug |
| PublicationDecade | 2020 |
| PublicationPlace | Singapore |
| PublicationPlace_xml | – name: Singapore |
| PublicationTitle | Journal of bioinformatics and computational biology |
| PublicationTitleAlternate | J Bioinform Comput Biol |
| PublicationYear | 2025 |
| Score | 2.3749325 |
| Snippet | Current metagenomic classification methods are limited by short
-mer lengths and database dependency, resulting in insufficient taxonomic resolution at the... Current metagenomic classification methods are limited by short k-mer lengths and database dependency, resulting in insufficient taxonomic resolution at the... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | 2550012 |
| SubjectTerms | Algorithms Computational Biology - methods Databases, Genetic Humans Metagenome Metagenomics - methods Neural Networks, Computer Sequence Analysis, DNA - methods |
| Title | Metagenomic sequence classification based on local sensitive hashing and Bi-LSTM |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/40808601 https://www.proquest.com/docview/3239403165 |
| Volume | 23 |
| WOSCitedRecordID | wos001549422400003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07T8MwELaAMrDwEK_ykpFYraaOHdsTAkTF0FaVWlC2KH6JDqSFFH4_ZycVExISi-XBkaLzxd93vst9CN1Qrbl03hBBU0aY5ppI7ilJpAIXciKxse3iy1CMxzLP1aS9cKvbssr1mRgParsw4Y68l0YN77Sf8dvlOwmqUSG72kpobKJOClQmeLXI499vgguSpSlrE5l9RntTgDMlaGTRAeXz30llBJfB3n9fax_ttrQS3zV-cIA2XHWIJiO3KkMT1re5weuaaWwCXw4FQnFPcIAxi2ESUQ2WVXUsJ8KvjcwSLiuL7-dkOJ2NjtDz4HH28ERaBQVigCjlRCtdZhlzWidUpzLxSnsXWtKk1CvlGLfSAeBTk0jGRWKs8n2ANAjheGhlltBjtFUtKneKsADm6BTV3peSUZ9oD6Ea97C0T621ZRddrw1UgIeGtENZucVnXfyYqItOGisXy6aVRsGAsEqICc_-8PQ52qFBfDdW312gjofv012ibfO1mtcfV3HrYRxPRt-Z1rgk |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Metagenomic+sequence+classification+based+on+local+sensitive+hashing+and+Bi-LSTM&rft.jtitle=Journal+of+bioinformatics+and+computational+biology&rft.au=Qian%2C+Yan&rft.au=Xiao%2C+Lei&rft.au=Zhou%2C+Yiding&rft.au=Deng%2C+Li&rft.date=2025-08-01&rft.issn=1757-6334&rft.eissn=1757-6334&rft.volume=23&rft.issue=4&rft.spage=2550012&rft_id=info:doi/10.1142%2FS021972002550012X&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1757-6334&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1757-6334&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1757-6334&client=summon |