Unsupervised fuzzy binning of metagenomic sequence fragments on three-dimensional Barnes-Hut t-Stochastic Neighbor Embeddings
Shotgun metagenomic studies attempt to reconstruct population genome sequences from complex microbial communities. In some traditional genome demarcation approaches, high-dimensional sequence data are embedded into two-dimensional spaces and subsequently binned into candidate genomic populations. On...
Saved in:
| Published in: | Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference Vol. 2018; pp. 1315 - 1318 |
|---|---|
| Main Authors: | , , |
| Format: | Conference Proceeding Journal Article |
| Language: | English |
| Published: |
United States
IEEE
01.07.2018
|
| Subjects: | |
| ISSN: | 1557-170X, 2694-0604, 1558-4615, 2694-0604 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Shotgun metagenomic studies attempt to reconstruct population genome sequences from complex microbial communities. In some traditional genome demarcation approaches, high-dimensional sequence data are embedded into two-dimensional spaces and subsequently binned into candidate genomic populations. One such approach uses a combination of the Barnes-Hut approximation and the t-Stochastic Neighbor Embedding (BH-SNE) algorithm for dimensionality reduction of DNA sequence data pentamer profiles; and demarcation of groups based on Gaussian mixture models within humanimposed boundaries. We found that genome demarcation from three-dimensional BH-SNE embeddings consistently results in more accurate binnings than 2-D embeddings. We further addressed the lack of a priori population number information by developing an unsupervised binning approach based on the Subtractive and Fuzzy c-means (FCM) clustering algorithms combined with internal clustering validity indices. Lastly, we addressed the subject of shared membership of individual data objects in a mixed community by assigning a degree of membership to individual objects using the FCM algorithm, and discriminated between confidently binned and uncertain sequence data objects from the community for subsequent biological interpretation. The binning of metagenome sequence fragments according to thresholds in the degree of membership opens the door for the identification of horizontally transferred elements and other genomic regions of uncertain assignment in which biologically meaningful information resides. The reported approach improves the unsupervised genome demarcation of populations within complex communities, increases the confidence in the coherence of the binned elements, and enables the identification of evolutionary processes ignored in hard-binning approaches in shotgun metagenomic studies. |
|---|---|
| AbstractList | Shotgun metagenomic studies attempt to reconstruct population genome sequences from complex microbial communities. In some traditional genome demarcation approaches, high-dimensional sequence data are embedded into two-dimensional spaces and subsequently binned into candidate genomic populations. One such approach uses a combination of the Barnes-Hut approximation and the t-Stochastic Neighbor Embedding (BH-SNE) algorithm for dimensionality reduction of DNA sequence data pentamer profiles; and demarcation of groups based on Gaussian mixture models within humanimposed boundaries. We found that genome demarcation from three-dimensional BH-SNE embeddings consistently results in more accurate binnings than 2-D embeddings. We further addressed the lack of a priori population number information by developing an unsupervised binning approach based on the Subtractive and Fuzzy c-means (FCM) clustering algorithms combined with internal clustering validity indices. Lastly, we addressed the subject of shared membership of individual data objects in a mixed community by assigning a degree of membership to individual objects using the FCM algorithm, and discriminated between confidently binned and uncertain sequence data objects from the community for subsequent biological interpretation. The binning of metagenome sequence fragments according to thresholds in the degree of membership opens the door for the identification of horizontally transferred elements and other genomic regions of uncertain assignment in which biologically meaningful information resides. The reported approach improves the unsupervised genome demarcation of populations within complex communities, increases the confidence in the coherence of the binned elements, and enables the identification of evolutionary processes ignored in hard-binning approaches in shotgun metagenomic studies. Shotgun metagenomic studies attempt to reconstruct population genome sequences from complex microbial communities. In some traditional genome demarcation approaches, high-dimensional sequence data are embedded into two-dimensional spaces and subsequently binned into candidate genomic populations. One such approach uses a combination of the Barnes-Hut approximation and the $t -$Stochastic Neighbor Embedding (BH-SNE) algorithm for dimensionality reduction of DNA sequence data pentamer profiles; and demarcation of groups based on Gaussian mixture models within humanimposed boundaries. We found that genome demarcation from three-dimensional BH-SNE embeddings consistently results in more accurate binnings than 2-D embeddings. We further addressed the lack of a priori population number information by developing an unsupervised binning approach based on the Subtractive and Fuzzy c-means (FCM) clustering algorithms combined with internal clustering validity indices. Lastly, we addressed the subject of shared membership of individual data objects in a mixed community by assigning a degree of membership to individual objects using the FCM algorithm, and discriminated between confidently binned and uncertain sequence data objects from the community for subsequent biological interpretation. The binning of metagenome sequence fragments according to thresholds in the degree of membership opens the door for the identification of horizontally transferred elements and other genomic regions of uncertain assignment in which biologically meaningful information resides. The reported approach improves the unsupervised genome demarcation of populations within complex communities, increases the confidence in the coherence of the binned elements, and enables the identification of evolutionary processes ignored in hard-binning approaches in shotgun metagenomic studies.Shotgun metagenomic studies attempt to reconstruct population genome sequences from complex microbial communities. In some traditional genome demarcation approaches, high-dimensional sequence data are embedded into two-dimensional spaces and subsequently binned into candidate genomic populations. One such approach uses a combination of the Barnes-Hut approximation and the $t -$Stochastic Neighbor Embedding (BH-SNE) algorithm for dimensionality reduction of DNA sequence data pentamer profiles; and demarcation of groups based on Gaussian mixture models within humanimposed boundaries. We found that genome demarcation from three-dimensional BH-SNE embeddings consistently results in more accurate binnings than 2-D embeddings. We further addressed the lack of a priori population number information by developing an unsupervised binning approach based on the Subtractive and Fuzzy c-means (FCM) clustering algorithms combined with internal clustering validity indices. Lastly, we addressed the subject of shared membership of individual data objects in a mixed community by assigning a degree of membership to individual objects using the FCM algorithm, and discriminated between confidently binned and uncertain sequence data objects from the community for subsequent biological interpretation. The binning of metagenome sequence fragments according to thresholds in the degree of membership opens the door for the identification of horizontally transferred elements and other genomic regions of uncertain assignment in which biologically meaningful information resides. The reported approach improves the unsupervised genome demarcation of populations within complex communities, increases the confidence in the coherence of the binned elements, and enables the identification of evolutionary processes ignored in hard-binning approaches in shotgun metagenomic studies. |
| Author | Pinel, Nicolas Quintero, O. L. Ariza-Jimenez, Leandro |
| Author_xml | – sequence: 1 givenname: Leandro surname: Ariza-Jimenez fullname: Ariza-Jimenez, Leandro email: larizaj@eafit.edu.co organization: Math. Modelling Res. Group, Univ. EAFIT, Medellin, Colombia – sequence: 2 givenname: O. L. surname: Quintero fullname: Quintero, O. L. email: oquinte1@eafit.edu.co organization: Math. Modelling Res. Group, Univ. EAFIT, Medellin, Colombia – sequence: 3 givenname: Nicolas surname: Pinel fullname: Pinel, Nicolas email: npinelp@eafit.edu.co organization: Cons. Res. Group, Univ. EAFIT, Medellin, Colombia |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/30440633$$D View this record in MEDLINE/PubMed |
| BookMark | eNo9kU1LAzEQhqMo1mp_gAiSo5et-dhkd4-21A_w46CCt5JNZttIN6mbrGDB_27Q6mleZp6ZYd4Zoj3nHSB0QsmYUlJdzO4n0zEjtByXgjLBqh00qoqSCl5KLnMpd9EhFaLMcknF3o8uMlqQ1wEahvBGCCNE0AM04CTPieT8EH29uNCvofuwAQxu-s3mE9fWOesW2De4hagW4HxrNQ7w3oPTgJtOLVpwMWDvcFx2AJmxKRGsd2qFJ6pzELKbPuKYPUWvlyrE1P8AdrGsfYdnbQ3GpA3hGO03ahVgtI1H6OVq9jy9ye4er2-nl3eZZZzGDCrNTF4aBooVDYUyV0prYagqpDSyIUaXQjbasIpVKs91DbrhwFkSta45P0Lnv3PXnU9HhDhvbdCwWikHvg9zRnkyNDlJEnq2Rfu6BTNfd7ZV3ef8z7IEnP4CFgD-y9uH8G_Een_M |
| ContentType | Conference Proceeding Journal Article |
| DBID | 6IE 6IH CBEJK RIE RIO NPM 7X8 |
| DOI | 10.1109/EMBC.2018.8512529 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present PubMed MEDLINE - Academic |
| DatabaseTitle | PubMed MEDLINE - Academic |
| DatabaseTitleList | PubMed MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher – sequence: 3 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISBN | 9781538636466 1538636468 |
| EISSN | 1558-4615 2694-0604 |
| EndPage | 1318 |
| ExternalDocumentID | 30440633 8512529 |
| Genre | orig-research Journal Article |
| GroupedDBID | 6IE 6IF 6IH AAJGR ACGFS AFFNX ALMA_UNASSIGNED_HOLDINGS CBEJK M43 RIE RIO RNS 29F 29G 6IK 6IM IPLJI NPM 7X8 |
| ID | FETCH-LOGICAL-i231t-e9c2d48d2ea27f1e84aacc5d1a766d6f0dc856fcd2929a44cbecf3e32cbebcb33 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 2 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000596231901193&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1557-170X 2694-0604 |
| IngestDate | Thu Oct 02 13:41:51 EDT 2025 Thu Jan 02 23:02:42 EST 2025 Wed Aug 27 02:50:00 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i231t-e9c2d48d2ea27f1e84aacc5d1a766d6f0dc856fcd2929a44cbecf3e32cbebcb33 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| PMID | 30440633 |
| PQID | 2135127810 |
| PQPubID | 23479 |
| PageCount | 4 |
| ParticipantIDs | ieee_primary_8512529 pubmed_primary_30440633 proquest_miscellaneous_2135127810 |
| PublicationCentury | 2000 |
| PublicationDate | 2018-07-00 |
| PublicationDateYYYYMMDD | 2018-07-01 |
| PublicationDate_xml | – month: 07 year: 2018 text: 2018-07-00 |
| PublicationDecade | 2010 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference |
| PublicationTitleAbbrev | EMBC |
| PublicationTitleAlternate | Conf Proc IEEE Eng Med Biol Soc |
| PublicationYear | 2018 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0020051 ssj0061641 ssib061542107 ssib053545923 ssib042469959 |
| Score | 2.1655903 |
| Snippet | Shotgun metagenomic studies attempt to reconstruct population genome sequences from complex microbial communities. In some traditional genome demarcation... |
| SourceID | proquest pubmed ieee |
| SourceType | Aggregation Database Index Database Publisher |
| StartPage | 1315 |
| SubjectTerms | Bioinformatics Clustering algorithms Dimensionality reduction Genomics Indexes Sociology |
| Title | Unsupervised fuzzy binning of metagenomic sequence fragments on three-dimensional Barnes-Hut t-Stochastic Neighbor Embeddings |
| URI | https://ieeexplore.ieee.org/document/8512529 https://www.ncbi.nlm.nih.gov/pubmed/30440633 https://www.proquest.com/docview/2135127810 |
| Volume | 2018 |
| WOSCitedRecordID | wos000596231901193&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwED4BYoCFN5SXjMRIILGdxFlBRSxUSIDUrXLsM3RogtoECST-O74kFAYY2DzEVuQ7-d7fB3CqnbOSuqdcnuhAxsQG6GIfrCSJzVNv8ENuG7KJdDBQw2F2twBn81kYRGyaz_Cclk0t35amplTZhfcOeMyzRVhM06Sd1ZoHV6RdXdUyCrOL_u3lFTVuqfNuU8ee8rcj2RiU67X__co6bH9P5rG7uc3ZgAUsNmH1B6jgFnw8FrP6hZ6AGVrm6vf3N-bDX8p_sNKxCVaagFknY8O--qiZm-qnZtaNlQWrvHgxsAT730J2sEs99S9icFNXrAruq9I8a4J3ZgPKq3olYv1JjrapYm3D43X_4eom6EgWgrF37aoAM8OtVJaj5qmLUEmtjYltpFMvrsSF1qg4ccZy70hpKY0XuhMouF_kJhdiB5aKssA9YCLTKP2myNlMopB5FBr_YayUtkpHogdbdJGjlxZHY9TdYQ9OvkQy8rpNBQtdYFnPRpzoA3mqorAHu62s5psFcWUnQuz_fugBrJD028baQ1iqpjUewbJ5rcaz6bFXoKE6bhToEwXoyjE |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT9wwEB4BrQRcWspraQuuxJFAYjuJcwUt2gpYIRUkbpFjj9s9bIJ2EySQ-O94krDlUA69-RBbkWfkeX8fwKF2zkrqnnJFogMZExugi32wkiS2SL3BD7ltySbS8Vjd3WXXS3C0mIVBxLb5DI9p2dbybWUaSpWdeO-Axzxbhg-xlDzsprUW4RXpV1-3jMLsZHh1ekatW-q439bzp7zvSrYm5fzT__3MZ9j6O5vHrhdWZwOWsPwC629gBTfh-bacN_f0CMzRMtc8PT0yHwBTBoRVjk2x1gTNOp0Y9tpJzdxM_26n3VhVstoLGANLwP8daAc71TP_JgajpmZ18KuuzB9NAM9sTJlVr0ZsOC3QtnWsLbg9H96cjYKeZiGYeOeuDjAz3EplOWqeugiV1NqY2EY69QJLXGiNihNnLPeulJbSeLE7gYL7RWEKIbZhpaxK3AUmMo3Sb4qczSQKWUSh8R_GSmmrdCQGsEkXmd93SBp5f4cD-PEqktxrN5UsdIlVM885EQjyVEXhAHY6WS02C2LLToTY-_ehB7A6urm6zC9_ji--whppQtdm-w1W6lmD3-Gjeagn89l-q0Yv5VPMkA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+annual+international+conference+of+the+IEEE+Engineering+in+Medicine+and+Biology+Society&rft.atitle=Unsupervised+fuzzy+binning+of+metagenomic+sequence+fragments+on+three-dimensional+Barnes-Hut+t-Stochastic+Neighbor+Embeddings&rft.au=Ariza-Jimenez%2C+Leandro&rft.au=Quintero%2C+O.+L.&rft.au=Pinel%2C+Nicolas&rft.date=2018-07-01&rft.pub=IEEE&rft.eissn=1558-4615&rft.spage=1315&rft.epage=1318&rft_id=info:doi/10.1109%2FEMBC.2018.8512529&rft.externalDocID=8512529 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1557-170X&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1557-170X&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1557-170X&client=summon |