MDSCAN: RMSD-based HDBSCAN clustering of long molecular dynamics
The term clustering designates a comprehensive family of unsupervised learning methods allowing to group similar elements into sets called clusters. Geometrical clustering of molecular dynamics (MD) trajectories is a well-established analysis to gain insights into the conformational behavior of simu...
Gespeichert in:
| Veröffentlicht in: | Bioinformatics (Oxford, England) Jg. 38; H. 23; S. 5191 - 5198 |
|---|---|
| Hauptverfasser: | , , , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
England
Oxford University Press (OUP)
30.11.2022
|
| Schlagworte: | |
| ISSN: | 1367-4803, 1367-4811, 1367-4811 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | The term clustering designates a comprehensive family of unsupervised learning methods allowing to group similar elements into sets called clusters. Geometrical clustering of molecular dynamics (MD) trajectories is a well-established analysis to gain insights into the conformational behavior of simulated systems. However, popular variants collapse when processing relatively long trajectories because of their quadratic memory or time complexity. From the arsenal of clustering algorithms, HDBSCAN stands out as a hierarchical density-based alternative that provides robust differentiation of intimately related elements from noise data. Although a very efficient implementation of this algorithm is available for programming-skilled users (HDBSCAN*), it cannot treat long trajectories under the de facto molecular similarity metric RMSD.
Here, we propose MDSCAN, an HDBSCAN-inspired software specifically conceived for non-programmers users to perform memory-efficient RMSD-based clustering of long MD trajectories. Methodological improvements over the original version include the encoding of trajectories as a particular class of vantage-point tree (decreasing time complexity), and a dual-heap approach to construct a quasi-minimum spanning tree (reducing memory complexity). MDSCAN was able to process a trajectory of 1 million frames using the RMSD metric in about 21 h with <8 GB of RAM, a task that would have taken a similar time but more than 32 TB of RAM with the accelerated HDBSCAN* implementation generally used.
The source code and documentation of MDSCAN are free and publicly available on GitHub (https://github.com/LQCT/MDScan.git) and as a PyPI package (https://pypi.org/project/mdscan/).
Supplementary data are available at Bioinformatics online. |
|---|---|
| AbstractList | The term clustering designates a comprehensive family of unsupervised learning methods allowing to group similar elements into sets called clusters. Geometrical clustering of molecular dynamics (MD) trajectories is a well-established analysis to gain insights into the conformational behavior of simulated systems. However, popular variants collapse when processing relatively long trajectories because of their quadratic memory or time complexity. From the arsenal of clustering algorithms, HDBSCAN stands out as a hierarchical density-based alternative that provides robust differentiation of intimately related elements from noise data. Although a very efficient implementation of this algorithm is available for programming-skilled users (HDBSCAN*), it cannot treat long trajectories under the de facto molecular similarity metric RMSD.MOTIVATIONThe term clustering designates a comprehensive family of unsupervised learning methods allowing to group similar elements into sets called clusters. Geometrical clustering of molecular dynamics (MD) trajectories is a well-established analysis to gain insights into the conformational behavior of simulated systems. However, popular variants collapse when processing relatively long trajectories because of their quadratic memory or time complexity. From the arsenal of clustering algorithms, HDBSCAN stands out as a hierarchical density-based alternative that provides robust differentiation of intimately related elements from noise data. Although a very efficient implementation of this algorithm is available for programming-skilled users (HDBSCAN*), it cannot treat long trajectories under the de facto molecular similarity metric RMSD.Here, we propose MDSCAN, an HDBSCAN-inspired software specifically conceived for non-programmers users to perform memory-efficient RMSD-based clustering of long MD trajectories. Methodological improvements over the original version include the encoding of trajectories as a particular class of vantage-point tree (decreasing time complexity), and a dual-heap approach to construct a quasi-minimum spanning tree (reducing memory complexity). MDSCAN was able to process a trajectory of 1 million frames using the RMSD metric in about 21 h with <8 GB of RAM, a task that would have taken a similar time but more than 32 TB of RAM with the accelerated HDBSCAN* implementation generally used.RESULTSHere, we propose MDSCAN, an HDBSCAN-inspired software specifically conceived for non-programmers users to perform memory-efficient RMSD-based clustering of long MD trajectories. Methodological improvements over the original version include the encoding of trajectories as a particular class of vantage-point tree (decreasing time complexity), and a dual-heap approach to construct a quasi-minimum spanning tree (reducing memory complexity). MDSCAN was able to process a trajectory of 1 million frames using the RMSD metric in about 21 h with <8 GB of RAM, a task that would have taken a similar time but more than 32 TB of RAM with the accelerated HDBSCAN* implementation generally used.The source code and documentation of MDSCAN are free and publicly available on GitHub (https://github.com/LQCT/MDScan.git) and as a PyPI package (https://pypi.org/project/mdscan/).AVAILABILITY AND IMPLEMENTATIONThe source code and documentation of MDSCAN are free and publicly available on GitHub (https://github.com/LQCT/MDScan.git) and as a PyPI package (https://pypi.org/project/mdscan/).Supplementary data are available at Bioinformatics online.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online. The term clustering designates a comprehensive family of unsupervised learning methods allowing to group similar elements into sets called clusters. Geometrical clustering of molecular dynamics (MD) trajectories is a well-established analysis to gain insights into the conformational behavior of simulated systems. However, popular variants collapse when processing relatively long trajectories because of their quadratic memory or time complexity. From the arsenal of clustering algorithms, HDBSCAN stands out as a hierarchical density-based alternative that provides robust differentiation of intimately related elements from noise data. Although a very efficient implementation of this algorithm is available for programming-skilled users (HDBSCAN*), it cannot treat long trajectories under the de facto molecular similarity metric RMSD. Here, we propose MDSCAN, an HDBSCAN-inspired software specifically conceived for non-programmers users to perform memory-efficient RMSD-based clustering of long MD trajectories. Methodological improvements over the original version include the encoding of trajectories as a particular class of vantage-point tree (decreasing time complexity), and a dual-heap approach to construct a quasi-minimum spanning tree (reducing memory complexity). MDSCAN was able to process a trajectory of 1 million frames using the RMSD metric in about 21 h with <8 GB of RAM, a task that would have taken a similar time but more than 32 TB of RAM with the accelerated HDBSCAN* implementation generally used. The source code and documentation of MDSCAN are free and publicly available on GitHub (https://github.com/LQCT/MDScan.git) and as a PyPI package (https://pypi.org/project/mdscan/). Supplementary data are available at Bioinformatics online. Abstract Motivation The term clustering designates a comprehensive family of unsupervised learning methods allowing to group similar elements into sets called clusters. Geometrical clustering of molecular dynamics (MD) trajectories is a well-established analysis to gain insights into the conformational behavior of simulated systems. However, popular variants collapse when processing relatively long trajectories because of their quadratic memory or time complexity. From the arsenal of clustering algorithms, HDBSCAN stands out as a hierarchical density-based alternative that provides robust differentiation of intimately related elements from noise data. Although a very efficient implementation of this algorithm is available for programming-skilled users (HDBSCAN*), it cannot treat long trajectories under the de facto molecular similarity metric RMSD. Results Here, we propose MDSCAN, an HDBSCAN-inspired software specifically conceived for non-programmers users to perform memory-efficient RMSD-based clustering of long MD trajectories. Methodological improvements over the original version include the encoding of trajectories as a particular class of vantage-point tree (decreasing time complexity), and a dual-heap approach to construct a quasi-minimum spanning tree (reducing memory complexity). MDSCAN was able to process a trajectory of 1 million frames using the RMSD metric in about 21 h with <8 GB of RAM, a task that would have taken a similar time but more than 32 TB of RAM with the accelerated HDBSCAN* implementation generally used. Availability and implementation The source code and documentation of MDSCAN are free and publicly available on GitHub (https://github.com/LQCT/MDScan.git) and as a PyPI package (https://pypi.org/project/mdscan/). Supplementary information Supplementary data are available at Bioinformatics online. |
| Author | González-Alemán, Roy Leclerc, Fabrice Rodríguez-Serradet, Alejandro Caballero, Julio Montero-Cabrera, Luis Platero-Rochart, Daniel Hernández-Rodríguez, Erix W |
| Author_xml | – sequence: 1 givenname: Roy orcidid: 0000-0003-3852-4902 surname: González-Alemán fullname: González-Alemán, Roy – sequence: 2 givenname: Daniel orcidid: 0000-0001-6454-4320 surname: Platero-Rochart fullname: Platero-Rochart, Daniel – sequence: 3 givenname: Alejandro surname: Rodríguez-Serradet fullname: Rodríguez-Serradet, Alejandro – sequence: 4 givenname: Erix W surname: Hernández-Rodríguez fullname: Hernández-Rodríguez, Erix W – sequence: 5 givenname: Julio orcidid: 0000-0003-0182-1444 surname: Caballero fullname: Caballero, Julio – sequence: 6 givenname: Fabrice surname: Leclerc fullname: Leclerc, Fabrice – sequence: 7 givenname: Luis surname: Montero-Cabrera fullname: Montero-Cabrera, Luis |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/36205607$$D View this record in MEDLINE/PubMed https://hal.science/hal-03938219$$DView record in HAL |
| BookMark | eNqFkctOwzAQRS0E4lH4hSpLWIT6UdsJYkFpgSIVkCisLcdxwMiJi50g8fc4akGCDasZjc6dO5p7ALYb12gAhgieIpiTUWGcaSrna9kaFUZFKxVjbAvsI8J4Os4Q2v7pIdkDByG8QQgppGwX7BGGYwP5Pri4my2nk_uz5PFuOUsLGXSZzGeX_SxRtgut9qZ5SVyVWBdr7axWnZU-KT8bWUfrQ7BTSRv00aYOwPP11dN0ni4ebm6nk0WqSE7aVFKO6BjzimrOx0STAhKkFOWKFYrRnNASIYaLXEKMS8l4zqoKy0yXJdUsQ2QATtZ7X6UVK29q6T-Fk0bMJwvRz2D0yTDKP3r2eM2uvHvvdGhFbYLS1spGuy4IzDGJ11BMIjrcoF1R6_Jn8_eHInC-BpR3IXhdCWXa-HTXtF4aKxAUfSDidyBiE0iUsz_yb4d_hF9HmpQ2 |
| CitedBy_id | crossref_primary_10_1007_s12633_024_03148_9 crossref_primary_10_1007_s00894_024_05996_z crossref_primary_10_1016_j_bpc_2025_107389 crossref_primary_10_1021_acs_jcim_4c02217 crossref_primary_10_1016_j_jmb_2025_169233 crossref_primary_10_1080_07391102_2023_2280675 crossref_primary_10_3390_molecules29163902 |
| Cites_doi | 10.1021/acs.jcim.9b00828 10.1021/acs.jctc.6b00757 10.1007/978-1-4939-2978-8_15 10.1007/s10618-008-0120-3 10.1002/widm.1343 10.1007/s10115-003-0086-9 10.1016/j.bpj.2015.08.015 10.1109/T-C.1975.224110 10.1093/bioinformatics/btac021 10.1021/ct700119m 10.1021/acs.jcim.9b00558 10.1021/acs.jctc.7b00028 10.1002/pro.3268 10.1198/jcgs.2009.07049 10.1063/1674-0068/31/cjcp1806147 10.1021/ct400341p 10.1145/3068335 10.1093/bioinformatics/btab595 |
| ContentType | Journal Article |
| Copyright | The Author(s) 2022. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com. Distributed under a Creative Commons Attribution 4.0 International License |
| Copyright_xml | – notice: The Author(s) 2022. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com. – notice: Distributed under a Creative Commons Attribution 4.0 International License |
| DBID | AAYXX CITATION CGR CUY CVF ECM EIF NPM 7X8 1XC VOOES |
| DOI | 10.1093/bioinformatics/btac666 |
| DatabaseName | CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic Hyper Article en Ligne (HAL) Hyper Article en Ligne (HAL) (Open Access) |
| DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic MEDLINE |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Biology |
| EISSN | 1367-4811 |
| EndPage | 5198 |
| ExternalDocumentID | oai:HAL:hal-03938219v1 36205607 10_1093_bioinformatics_btac666 |
| Genre | Research Support, Non-U.S. Gov't Journal Article |
| GrantInformation_xml | – fundername: Eiffel Scholarship Program of Excellence of Campus France grantid: P104786Z – fundername: Project Hubert Curien-Carlos J. Finlay grantid: 41814TM – fundername: Fondo Nacional de Desarrollo Científico y Tecnológico grantid: 1210138 – fundername: Cuban Oficina de Gestión de Fondos y Proyectos Internacionales grantid: PN223LH010-02 |
| GroupedDBID | --- -E4 -~X .2P .DC .I3 0R~ 23N 2WC 4.4 48X 53G 5GY 5WA 70D AAIJN AAIMJ AAJKP AAKPC AAMDB AAMVS AAOGV AAPQZ AAPXW AAVAP AAVLN AAYXX ABEJV ABEUO ABGNP ABIXL ABNKS ABPTD ABQLI ABWST ABXVV ABZBJ ACGFS ACIWK ACPRK ACUFI ACYTK ADBBV ADEYI ADEZT ADFTL ADGZP ADHKW ADHZD ADMLS ADOCK ADPDF ADRTK ADYVW ADZTZ ADZXQ AECKG AEGPL AEJOX AEKKA AEKSI AELWJ AEMDU AENEX AENZO AEPUE AETBJ AEWNT AFFZL AFIYH AFOFC AFRAH AGINJ AGKEF AGQXC AGSYK AHMBA AHXPO AIJHB AJEUX AKHUL AKWXX ALMA_UNASSIGNED_HOLDINGS ALTZX ALUQC AMNDL APIBT APWMN ARIXL ASPBG AVWKF AXUDD AYOIW AZVOD BAWUL BAYMD BHONS BQDIO BQUQU BSWAC BTQHN C45 CDBKE CITATION CS3 CZ4 DAKXR DIK DILTD DU5 D~K EBD EBS EE~ EMOBN F5P F9B FEDTE FHSFR FLIZI FLUFQ FOEOM FQBLK GAUVT GJXCC GROUPED_DOAJ GX1 H13 H5~ HAR HW0 HZ~ IOX J21 JXSIZ KAQDR KOP KQ8 KSI KSN M-Z MK~ ML0 N9A NGC NLBLG NMDNZ NOMLY O9- OAWHX ODMLO OJQWA OK1 OVD OVEED P2P PAFKI PEELM PQQKQ Q1. Q5Y R44 RD5 RNS ROL ROX RPM RUSNO RW1 RXO SV3 TEORI TJP TLC TOX TR2 W8F WOQ X7H YAYTL YKOAZ YXANX ZKX ~91 ~KM CGR CUY CVF ECM EIF NPM 7X8 1XC VOOES |
| ID | FETCH-LOGICAL-c393t-a5715427f5e7743e3b031cc57c6bc65935d1162b9a022da6796ff2a8edd5e6813 |
| ISICitedReferencesCount | 12 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000869800100001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1367-4803 1367-4811 |
| IngestDate | Tue Oct 14 20:38:01 EDT 2025 Thu Jul 10 17:34:46 EDT 2025 Mon Jul 21 05:36:47 EDT 2025 Sat Nov 29 03:49:25 EST 2025 Tue Nov 18 22:38:27 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 23 |
| Language | English |
| License | https://academic.oup.com/pages/standard-publication-reuse-rights The Author(s) 2022. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com. Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0 |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c393t-a5715427f5e7743e3b031cc57c6bc65935d1162b9a022da6796ff2a8edd5e6813 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ORCID | 0000-0003-3852-4902 0000-0003-0182-1444 0000-0001-6454-4320 0000-0002-5641-1525 |
| OpenAccessLink | https://hal.science/hal-03938219 |
| PMID | 36205607 |
| PQID | 2723154523 |
| PQPubID | 23479 |
| PageCount | 8 |
| ParticipantIDs | hal_primary_oai_HAL_hal_03938219v1 proquest_miscellaneous_2723154523 pubmed_primary_36205607 crossref_citationtrail_10_1093_bioinformatics_btac666 crossref_primary_10_1093_bioinformatics_btac666 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-11-30 |
| PublicationDateYYYYMMDD | 2022-11-30 |
| PublicationDate_xml | – month: 11 year: 2022 text: 2022-11-30 day: 30 |
| PublicationDecade | 2020 |
| PublicationPlace | England |
| PublicationPlace_xml | – name: England |
| PublicationTitle | Bioinformatics (Oxford, England) |
| PublicationTitleAlternate | Bioinformatics |
| PublicationYear | 2022 |
| Publisher | Oxford University Press (OUP) |
| Publisher_xml | – name: Oxford University Press (OUP) |
| References | Campello (2022113016195100500_btac666-B3) 2020; 10 González-Alemán (2022113016195100500_btac666-B6) 2021; 38 Roe (2022113016195100500_btac666-B16) 2013; 9 Schubert (2022113016195100500_btac666-B18) 2017; 42 González-Alemán (2022113016195100500_btac666-B5) 2020; 60 González-Alemán (2022113016195100500_btac666-B4) 2020; 60 McInnes (2022113016195100500_btac666-B9) 2017 Platero-Rochart (2022113016195100500_btac666-B15) 2022; 38 Shea (2022113016195100500_btac666-B20) 2016; 1345 Campello (2022113016195100500_btac666-B2) 2013 Pei (2022113016195100500_btac666-B13) 2009; 18 Shao (2022113016195100500_btac666-B19) 2007; 3 McGibbon (2022113016195100500_btac666-B8) 2015; 109 Melvin (2022113016195100500_btac666-B11) 2018; 27 Pedregosa (2022113016195100500_btac666-B12) 2011; 12 Hinneburg (2022113016195100500_btac666-B7) 2003; 5 Sun (2022113016195100500_btac666-B22) 2010 Melvin (2022113016195100500_btac666-B10) 2016; 12 Yianilos (2022113016195100500_btac666-B23) 1993 Stuetzle (2022113016195100500_btac666-B21) 2010; 19 Sargsyan (2022113016195100500_btac666-B17) 2017; 13 Peng (2022113016195100500_btac666-B14) 2018; 31 Baskett (2022113016195100500_btac666-B1) 1975; C-24 |
| References_xml | – volume: 60 start-page: 444 year: 2020 ident: 2022113016195100500_btac666-B4 article-title: BitClust: fast geometrical clustering of long molecular dynamics simulations publication-title: J. Chem. Inf. Model doi: 10.1021/acs.jcim.9b00828 – volume: 12 start-page: 6130 year: 2016 ident: 2022113016195100500_btac666-B10 article-title: Uncovering large-scale conformational change in molecular dynamics without prior knowledge publication-title: J. Chem. Theory Comput doi: 10.1021/acs.jctc.6b00757 – start-page: 160 year: 2013 ident: 2022113016195100500_btac666-B2 – volume: 1345 start-page: 225 year: 2016 ident: 2022113016195100500_btac666-B20 article-title: Studying the early stages of protein aggregation using replica exchange molecular dynamics simulations publication-title: Methods Mol. Biol doi: 10.1007/978-1-4939-2978-8_15 – volume: 18 start-page: 337 year: 2009 ident: 2022113016195100500_btac666-B13 article-title: DECODE: a new method for discovering clusters of different densities in spatial data publication-title: Data Min. Knowl. Disc doi: 10.1007/s10618-008-0120-3 – volume: 10 start-page: 1 year: 2020 ident: 2022113016195100500_btac666-B3 article-title: Density-based clustering publication-title: WIREs Data Mining Knowl. Discov doi: 10.1002/widm.1343 – volume: 5 start-page: 387 year: 2003 ident: 2022113016195100500_btac666-B7 article-title: A general approach to clustering in large databases with noise publication-title: Knowl. Inform. Syst doi: 10.1007/s10115-003-0086-9 – volume: 109 start-page: 1528 year: 2015 ident: 2022113016195100500_btac666-B8 article-title: MDTraj: a modern open library for the analysis of molecular dynamics trajectories publication-title: Biophys. J doi: 10.1016/j.bpj.2015.08.015 – volume: C-24 start-page: 1000 year: 1975 ident: 2022113016195100500_btac666-B1 article-title: An algorithm for finding nearest neighbors publication-title: IEEE Trans. Comput doi: 10.1109/T-C.1975.224110 – volume: 38 start-page: 1863 year: 2022 ident: 2022113016195100500_btac666-B15 article-title: RCDPeaks: memory-efficient density peaks clustering of long molecular dynamics publication-title: Bioinformatics doi: 10.1093/bioinformatics/btac021 – volume: 3 start-page: 2312 year: 2007 ident: 2022113016195100500_btac666-B19 article-title: Clustering molecular dynamics trajectories: 1. Characterizing the performance of different clustering algorithms publication-title: J. Chem. Theory Comput doi: 10.1021/ct700119m – start-page: 311 year: 1993 ident: 2022113016195100500_btac666-B23 – volume: 60 start-page: 467 year: 2020 ident: 2022113016195100500_btac666-B5 article-title: Quality threshold clustering of molecular dynamics: a word of caution publication-title: J. Chem. Inf. Model doi: 10.1021/acs.jcim.9b00558 – volume: 13 start-page: 1518 year: 2017 ident: 2022113016195100500_btac666-B17 article-title: How molecular size impacts RMSD applications in molecular dynamics simulations publication-title: J. Chem. Theory Comput doi: 10.1021/acs.jctc.7b00028 – volume: 27 start-page: 62 year: 2018 ident: 2022113016195100500_btac666-B11 article-title: Visualizing correlated motion with HDBSCAN clustering publication-title: Protein Sci doi: 10.1002/pro.3268 – volume: 19 start-page: 397 year: 2010 ident: 2022113016195100500_btac666-B21 article-title: A generalized single linkage method for estimating the cluster tree of a density publication-title: J. Comput. Graph. Stat doi: 10.1198/jcgs.2009.07049 – start-page: 481 year: 2010 ident: 2022113016195100500_btac666-B22 – volume: 31 start-page: 404 year: 2018 ident: 2022113016195100500_btac666-B14 article-title: Clustering algorithms to analyze molecular dynamics simulation trajectories for complex chemical and biological systems publication-title: Chin. J. Chem. Phys doi: 10.1063/1674-0068/31/cjcp1806147 – volume: 9 start-page: 3084 year: 2013 ident: 2022113016195100500_btac666-B16 article-title: PTRAJ and CPPTRAJ: software for processing and analysis of molecular dynamics trajectory data publication-title: J. Chem. Theory Comput doi: 10.1021/ct400341p – start-page: 33 year: 2017 ident: 2022113016195100500_btac666-B9 – volume: 12 start-page: 2825 year: 2011 ident: 2022113016195100500_btac666-B12 article-title: Scikit-learn: machine learning in python publication-title: J. Mach. Learn. Res – volume: 42 start-page: 1 year: 2017 ident: 2022113016195100500_btac666-B18 article-title: DBSCAN revisited, revisited: why and how you should (still) use DBSCAN publication-title: ACM Trans. Database Syst doi: 10.1145/3068335 – volume: 38 start-page: 73 year: 2021 ident: 2022113016195100500_btac666-B6 article-title: BitQT: a graph-based approach to the quality threshold clustering of molecular dynamics publication-title: Bioinformatics doi: 10.1093/bioinformatics/btab595 |
| SSID | ssj0005056 |
| Score | 2.4716215 |
| Snippet | The term clustering designates a comprehensive family of unsupervised learning methods allowing to group similar elements into sets called clusters.... Abstract Motivation The term clustering designates a comprehensive family of unsupervised learning methods allowing to group similar elements into sets called... |
| SourceID | hal proquest pubmed crossref |
| SourceType | Open Access Repository Aggregation Database Index Database Enrichment Source |
| StartPage | 5191 |
| SubjectTerms | Algorithms Biochemistry, Molecular Biology Biophysics Chemical Sciences Cluster Analysis Life Sciences Molecular Dynamics Simulation or physical chemistry Software Theoretical and |
| Title | MDSCAN: RMSD-based HDBSCAN clustering of long molecular dynamics |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/36205607 https://www.proquest.com/docview/2723154523 https://hal.science/hal-03938219 |
| Volume | 38 |
| WOSCitedRecordID | wos000869800100001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVASL databaseName: Oxford Journals Open Access Collection customDbUrl: eissn: 1367-4811 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0005056 issn: 1367-4803 databaseCode: TOX dateStart: 19850101 isFulltext: true titleUrlDefault: https://academic.oup.com/journals/ providerName: Oxford University Press |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3db9MwELfaARIvE5-jA6aAEC9V1Cau44QnyrrRh66bWCf1LbIdB4aypPRLZf8f_xfn2EkaqYjxwEvUnmK78l3Pd7673yH0zhUE7HAW2w7hxO75hNmcBT0bzv6Yc8qwoDxvNkHHY386DS4ajV9FLcw6oWnqbzbB7L-yGmjAbFU6-w_sLicFAnwGpsMT2A7POzH-bHB53B_niW5nlwNbHVNRezj4pKhtkawUMoJJdU5Uo6GbokFuO9Ld6Re1QO91ZrBVczxnBU66KfLhTQOQrcuEz1l6m4fenUTe2v1E3uhv2tHPyuv7iwRM3Hlmf8lU1deyqnWv4j-RjuEPvq5gJtBocxbpuAnM-p0pmIXqGneemmUieLc-1Oj6jU4kLO43wDU2uIqlSsYKmd03KlnuoBk9jv0teXXxllYGK9XZeVxoKC1e20lFWDLheTsQusfn4enVaBROTqaT97MftmpepoL8ppNLE91zKQlUZuHkfFplGnXzJsLljy4K1QPcqS_dMQvXbKTmN5Wh-yf3JzeDJo_QvvFfrL6Wu8eoIdMn6IHuaPrzKfqope-DVcmeZWTPqmTPymJLyZ5Vyp5VyN4zdHV6Mjke2qZJhy1wgJc2IxSscJfGRIIngSXmcEwIQajwuPBIgEnkOJ7LAwa8jZi6toxjl_kyioj0fAc_R3tplsoXyALPNhAk4qJHec9l3UCAtU5E7EqBiR_LFiLFtoTCINirRipJqDMpcFjfztBsZwt1ynEzjeHy1xFvYdfLlxUE-7A_ChVN1bL7cMyvnRZ6UzAlBJ2sAm0sldlqEboUvCZwTVzcQgeaW-VcYDCCOHTp4R1Gv0QPq3_EK7S3nK_ka3RfrJfXi_kRatKpf5SL2m_ak7Z0 |
| linkProvider | Oxford University Press |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=MDSCAN%3A+RMSD-based+HDBSCAN+clustering+of+long+molecular+dynamics&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=Gonz%C3%A1lez-Alem%C3%A1n%2C+Roy&rft.au=Platero-Rochart%2C+Daniel&rft.au=Rodr%C3%ADguez-Serradet%2C+Alejandro&rft.au=Hern%C3%A1ndez-Rodr%C3%ADguez%2C+Erix+W&rft.date=2022-11-30&rft.issn=1367-4811&rft.eissn=1367-4811&rft.volume=38&rft.issue=23&rft.spage=5191&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbtac666&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4803&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4803&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4803&client=summon |