Unsupervised fuzzy binning of metagenomic sequence fragments on three-dimensional Barnes-Hut t-Stochastic Neighbor Embeddings

Shotgun metagenomic studies attempt to reconstruct population genome sequences from complex microbial communities. In some traditional genome demarcation approaches, high-dimensional sequence data are embedded into two-dimensional spaces and subsequently binned into candidate genomic populations. On...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference Jg. 2018; S. 1315 - 1318
Hauptverfasser: Ariza-Jimenez, Leandro, Quintero, O. L., Pinel, Nicolas
Format: Tagungsbericht Journal Article
Sprache:Englisch
Veröffentlicht: United States IEEE 01.07.2018
Schlagworte:
ISSN:1557-170X, 2694-0604, 1558-4615, 2694-0604
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Shotgun metagenomic studies attempt to reconstruct population genome sequences from complex microbial communities. In some traditional genome demarcation approaches, high-dimensional sequence data are embedded into two-dimensional spaces and subsequently binned into candidate genomic populations. One such approach uses a combination of the Barnes-Hut approximation and the t-Stochastic Neighbor Embedding (BH-SNE) algorithm for dimensionality reduction of DNA sequence data pentamer profiles; and demarcation of groups based on Gaussian mixture models within humanimposed boundaries. We found that genome demarcation from three-dimensional BH-SNE embeddings consistently results in more accurate binnings than 2-D embeddings. We further addressed the lack of a priori population number information by developing an unsupervised binning approach based on the Subtractive and Fuzzy c-means (FCM) clustering algorithms combined with internal clustering validity indices. Lastly, we addressed the subject of shared membership of individual data objects in a mixed community by assigning a degree of membership to individual objects using the FCM algorithm, and discriminated between confidently binned and uncertain sequence data objects from the community for subsequent biological interpretation. The binning of metagenome sequence fragments according to thresholds in the degree of membership opens the door for the identification of horizontally transferred elements and other genomic regions of uncertain assignment in which biologically meaningful information resides. The reported approach improves the unsupervised genome demarcation of populations within complex communities, increases the confidence in the coherence of the binned elements, and enables the identification of evolutionary processes ignored in hard-binning approaches in shotgun metagenomic studies.
AbstractList Shotgun metagenomic studies attempt to reconstruct population genome sequences from complex microbial communities. In some traditional genome demarcation approaches, high-dimensional sequence data are embedded into two-dimensional spaces and subsequently binned into candidate genomic populations. One such approach uses a combination of the Barnes-Hut approximation and the t-Stochastic Neighbor Embedding (BH-SNE) algorithm for dimensionality reduction of DNA sequence data pentamer profiles; and demarcation of groups based on Gaussian mixture models within humanimposed boundaries. We found that genome demarcation from three-dimensional BH-SNE embeddings consistently results in more accurate binnings than 2-D embeddings. We further addressed the lack of a priori population number information by developing an unsupervised binning approach based on the Subtractive and Fuzzy c-means (FCM) clustering algorithms combined with internal clustering validity indices. Lastly, we addressed the subject of shared membership of individual data objects in a mixed community by assigning a degree of membership to individual objects using the FCM algorithm, and discriminated between confidently binned and uncertain sequence data objects from the community for subsequent biological interpretation. The binning of metagenome sequence fragments according to thresholds in the degree of membership opens the door for the identification of horizontally transferred elements and other genomic regions of uncertain assignment in which biologically meaningful information resides. The reported approach improves the unsupervised genome demarcation of populations within complex communities, increases the confidence in the coherence of the binned elements, and enables the identification of evolutionary processes ignored in hard-binning approaches in shotgun metagenomic studies.
Shotgun metagenomic studies attempt to reconstruct population genome sequences from complex microbial communities. In some traditional genome demarcation approaches, high-dimensional sequence data are embedded into two-dimensional spaces and subsequently binned into candidate genomic populations. One such approach uses a combination of the Barnes-Hut approximation and the $t -$Stochastic Neighbor Embedding (BH-SNE) algorithm for dimensionality reduction of DNA sequence data pentamer profiles; and demarcation of groups based on Gaussian mixture models within humanimposed boundaries. We found that genome demarcation from three-dimensional BH-SNE embeddings consistently results in more accurate binnings than 2-D embeddings. We further addressed the lack of a priori population number information by developing an unsupervised binning approach based on the Subtractive and Fuzzy c-means (FCM) clustering algorithms combined with internal clustering validity indices. Lastly, we addressed the subject of shared membership of individual data objects in a mixed community by assigning a degree of membership to individual objects using the FCM algorithm, and discriminated between confidently binned and uncertain sequence data objects from the community for subsequent biological interpretation. The binning of metagenome sequence fragments according to thresholds in the degree of membership opens the door for the identification of horizontally transferred elements and other genomic regions of uncertain assignment in which biologically meaningful information resides. The reported approach improves the unsupervised genome demarcation of populations within complex communities, increases the confidence in the coherence of the binned elements, and enables the identification of evolutionary processes ignored in hard-binning approaches in shotgun metagenomic studies.Shotgun metagenomic studies attempt to reconstruct population genome sequences from complex microbial communities. In some traditional genome demarcation approaches, high-dimensional sequence data are embedded into two-dimensional spaces and subsequently binned into candidate genomic populations. One such approach uses a combination of the Barnes-Hut approximation and the $t -$Stochastic Neighbor Embedding (BH-SNE) algorithm for dimensionality reduction of DNA sequence data pentamer profiles; and demarcation of groups based on Gaussian mixture models within humanimposed boundaries. We found that genome demarcation from three-dimensional BH-SNE embeddings consistently results in more accurate binnings than 2-D embeddings. We further addressed the lack of a priori population number information by developing an unsupervised binning approach based on the Subtractive and Fuzzy c-means (FCM) clustering algorithms combined with internal clustering validity indices. Lastly, we addressed the subject of shared membership of individual data objects in a mixed community by assigning a degree of membership to individual objects using the FCM algorithm, and discriminated between confidently binned and uncertain sequence data objects from the community for subsequent biological interpretation. The binning of metagenome sequence fragments according to thresholds in the degree of membership opens the door for the identification of horizontally transferred elements and other genomic regions of uncertain assignment in which biologically meaningful information resides. The reported approach improves the unsupervised genome demarcation of populations within complex communities, increases the confidence in the coherence of the binned elements, and enables the identification of evolutionary processes ignored in hard-binning approaches in shotgun metagenomic studies.
Author Pinel, Nicolas
Quintero, O. L.
Ariza-Jimenez, Leandro
Author_xml – sequence: 1
  givenname: Leandro
  surname: Ariza-Jimenez
  fullname: Ariza-Jimenez, Leandro
  email: larizaj@eafit.edu.co
  organization: Math. Modelling Res. Group, Univ. EAFIT, Medellin, Colombia
– sequence: 2
  givenname: O. L.
  surname: Quintero
  fullname: Quintero, O. L.
  email: oquinte1@eafit.edu.co
  organization: Math. Modelling Res. Group, Univ. EAFIT, Medellin, Colombia
– sequence: 3
  givenname: Nicolas
  surname: Pinel
  fullname: Pinel, Nicolas
  email: npinelp@eafit.edu.co
  organization: Cons. Res. Group, Univ. EAFIT, Medellin, Colombia
BackLink https://www.ncbi.nlm.nih.gov/pubmed/30440633$$D View this record in MEDLINE/PubMed
BookMark eNo9kU1LAzEQhqMo1mp_gAiSo5et-dhkd4-21A_w46CCt5JNZttIN6mbrGDB_27Q6mleZp6ZYd4Zoj3nHSB0QsmYUlJdzO4n0zEjtByXgjLBqh00qoqSCl5KLnMpd9EhFaLMcknF3o8uMlqQ1wEahvBGCCNE0AM04CTPieT8EH29uNCvofuwAQxu-s3mE9fWOesW2De4hagW4HxrNQ7w3oPTgJtOLVpwMWDvcFx2AJmxKRGsd2qFJ6pzELKbPuKYPUWvlyrE1P8AdrGsfYdnbQ3GpA3hGO03ahVgtI1H6OVq9jy9ye4er2-nl3eZZZzGDCrNTF4aBooVDYUyV0prYagqpDSyIUaXQjbasIpVKs91DbrhwFkSta45P0Lnv3PXnU9HhDhvbdCwWikHvg9zRnkyNDlJEnq2Rfu6BTNfd7ZV3ef8z7IEnP4CFgD-y9uH8G_Een_M
ContentType Conference Proceeding
Journal Article
DBID 6IE
6IH
CBEJK
RIE
RIO
NPM
7X8
DOI 10.1109/EMBC.2018.8512529
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
PubMed
MEDLINE - Academic
DatabaseTitle PubMed
MEDLINE - Academic
DatabaseTitleList
PubMed
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
– sequence: 3
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISBN 9781538636466
1538636468
EISSN 1558-4615
2694-0604
EndPage 1318
ExternalDocumentID 30440633
8512529
Genre orig-research
Journal Article
GroupedDBID 6IE
6IF
6IH
AAJGR
ACGFS
AFFNX
ALMA_UNASSIGNED_HOLDINGS
CBEJK
M43
RIE
RIO
RNS
29F
29G
6IK
6IM
IPLJI
NPM
7X8
ID FETCH-LOGICAL-i231t-e9c2d48d2ea27f1e84aacc5d1a766d6f0dc856fcd2929a44cbecf3e32cbebcb33
IEDL.DBID RIE
ISICitedReferencesCount 2
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000596231901193&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1557-170X
2694-0604
IngestDate Thu Oct 02 13:41:51 EDT 2025
Thu Jan 02 23:02:42 EST 2025
Wed Aug 27 02:50:00 EDT 2025
IsPeerReviewed true
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i231t-e9c2d48d2ea27f1e84aacc5d1a766d6f0dc856fcd2929a44cbecf3e32cbebcb33
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
PMID 30440633
PQID 2135127810
PQPubID 23479
PageCount 4
ParticipantIDs ieee_primary_8512529
pubmed_primary_30440633
proquest_miscellaneous_2135127810
PublicationCentury 2000
PublicationDate 2018-07-00
PublicationDateYYYYMMDD 2018-07-01
PublicationDate_xml – month: 07
  year: 2018
  text: 2018-07-00
PublicationDecade 2010
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
PublicationTitleAbbrev EMBC
PublicationTitleAlternate Conf Proc IEEE Eng Med Biol Soc
PublicationYear 2018
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0020051
ssj0061641
ssib061542107
ssib053545923
ssib042469959
Score 2.1656942
Snippet Shotgun metagenomic studies attempt to reconstruct population genome sequences from complex microbial communities. In some traditional genome demarcation...
SourceID proquest
pubmed
ieee
SourceType Aggregation Database
Index Database
Publisher
StartPage 1315
SubjectTerms Bioinformatics
Clustering algorithms
Dimensionality reduction
Genomics
Indexes
Sociology
Title Unsupervised fuzzy binning of metagenomic sequence fragments on three-dimensional Barnes-Hut t-Stochastic Neighbor Embeddings
URI https://ieeexplore.ieee.org/document/8512529
https://www.ncbi.nlm.nih.gov/pubmed/30440633
https://www.proquest.com/docview/2135127810
Volume 2018
WOSCitedRecordID wos000596231901193&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwED7RigEWXgXKS0ZixG3ivNeiIgaokHioW-XYZ-jQBLUJEkj8d3xJKAwwsFmKHEW-i--713cAZ2i0hQGJ_ZFSx-c-BoYnsY8c09RV1qRpUTXSPl5Ho1E8Hie3K3C-7IVBxKr4DHu0rHL5Olclhcr6Fh2IQCQtaEVRWPdqLZ0r0q4ma-k6SX94M7igwq2412xqpqf8DSQrg3K58b9P2YTOd2ceu13anC1YwWwb1n-QCu7Ax0O2KF_oCligZqZ8f39j1v2l-AfLDZthIYmYdTZV7KuOmpm5fKp63ViescKKF7km2v-asoMN5NzeiPyqLFjB74pcPUuid2YjiqtaJWLDWYq6ymJ14OFyeH9xxZshC3xqoV3BMVFC-7EWKEVkXIx9KZUKtCujMNShcbSKg9AoLSyQkr6vrNCNh56wi1SlnrcL7SzPcB-YdAJMAnqYWlTjRjT80_rqXppIgY4Ou7BDBzl5qXk0Js0ZduH0SyQTq9uUsJAZ5uViImh8oIhi1-nCXi2r5WaPZmWHnnfw-0sPYY2kXxfWHkG7mJd4DKvqtZgu5idWgcbxSaVAn0vAyjE
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT9wwEB5RqNRy4d0uj2KkHmtIHOd1BS1a1GWFBFTcIscewx42QbtJJZD473iSsHCAAzdLkaPIM_F88_oG4Dda42BA6n6k3JNcYmh5mkjkmOe-dibNiKaR9t8wHo2Sm5v0YgH-zHthELEpPsNDWja5fFPqmkJlRw4diFCkX2AplFJ4bbfW3L0i_erylr6XHvXPj0-odCs57LZ181M-hpKNSTld-dzHrMLma28eu5hbnTVYwGIdlt_QCm7A03Uxq-_pEpihYbZ-fHxgzgGmCAgrLZtgpYiadTLW7KWSmtmpum263VhZsMoJGLkh4v-WtIMdq6m7E_mgrljFL6tS3ykieGYjiqw6NWL9SY6myWNtwvVp_-pkwLsxC3zswF3FMdXCyMQIVCK2PiZSKa1D46s4ikxkPaOTMLLaCAellJTaid0GGAi3yHUeBFuwWJQF_gSmvBDTkB7mDtf4MY3_dN56kKdKoGeiHmzQQWb3LZNG1p1hDw5eRJI57aaUhSqwrGeZoAGCIk58rwc_WlnNNwc0LTsKgu33X7oP3wZX58NseDb6uwPfSRPaMttdWKymNe7BV_2_Gs-mvxo1egYrisyQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+annual+international+conference+of+the+IEEE+Engineering+in+Medicine+and+Biology+Society&rft.atitle=Unsupervised+fuzzy+binning+of+metagenomic+sequence+fragments+on+three-dimensional+Barnes-Hut+t-Stochastic+Neighbor+Embeddings&rft.au=Ariza-Jimenez%2C+Leandro&rft.au=Quintero%2C+O.+L.&rft.au=Pinel%2C+Nicolas&rft.date=2018-07-01&rft.pub=IEEE&rft.eissn=1558-4615&rft.spage=1315&rft.epage=1318&rft_id=info:doi/10.1109%2FEMBC.2018.8512529&rft.externalDocID=8512529
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1557-170X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1557-170X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1557-170X&client=summon