MDSCAN: RMSD-based HDBSCAN clustering of long molecular dynamics

The term clustering designates a comprehensive family of unsupervised learning methods allowing to group similar elements into sets called clusters. Geometrical clustering of molecular dynamics (MD) trajectories is a well-established analysis to gain insights into the conformational behavior of simu...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics (Oxford, England) Jg. 38; H. 23; S. 5191 - 5198
Hauptverfasser: González-Alemán, Roy, Platero-Rochart, Daniel, Rodríguez-Serradet, Alejandro, Hernández-Rodríguez, Erix W, Caballero, Julio, Leclerc, Fabrice, Montero-Cabrera, Luis
Format: Journal Article
Sprache:Englisch
Veröffentlicht: England Oxford University Press (OUP) 30.11.2022
Schlagworte:
ISSN:1367-4803, 1367-4811, 1367-4811
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract The term clustering designates a comprehensive family of unsupervised learning methods allowing to group similar elements into sets called clusters. Geometrical clustering of molecular dynamics (MD) trajectories is a well-established analysis to gain insights into the conformational behavior of simulated systems. However, popular variants collapse when processing relatively long trajectories because of their quadratic memory or time complexity. From the arsenal of clustering algorithms, HDBSCAN stands out as a hierarchical density-based alternative that provides robust differentiation of intimately related elements from noise data. Although a very efficient implementation of this algorithm is available for programming-skilled users (HDBSCAN*), it cannot treat long trajectories under the de facto molecular similarity metric RMSD. Here, we propose MDSCAN, an HDBSCAN-inspired software specifically conceived for non-programmers users to perform memory-efficient RMSD-based clustering of long MD trajectories. Methodological improvements over the original version include the encoding of trajectories as a particular class of vantage-point tree (decreasing time complexity), and a dual-heap approach to construct a quasi-minimum spanning tree (reducing memory complexity). MDSCAN was able to process a trajectory of 1 million frames using the RMSD metric in about 21 h with <8 GB of RAM, a task that would have taken a similar time but more than 32 TB of RAM with the accelerated HDBSCAN* implementation generally used. The source code and documentation of MDSCAN are free and publicly available on GitHub (https://github.com/LQCT/MDScan.git) and as a PyPI package (https://pypi.org/project/mdscan/). Supplementary data are available at Bioinformatics online.
AbstractList The term clustering designates a comprehensive family of unsupervised learning methods allowing to group similar elements into sets called clusters. Geometrical clustering of molecular dynamics (MD) trajectories is a well-established analysis to gain insights into the conformational behavior of simulated systems. However, popular variants collapse when processing relatively long trajectories because of their quadratic memory or time complexity. From the arsenal of clustering algorithms, HDBSCAN stands out as a hierarchical density-based alternative that provides robust differentiation of intimately related elements from noise data. Although a very efficient implementation of this algorithm is available for programming-skilled users (HDBSCAN*), it cannot treat long trajectories under the de facto molecular similarity metric RMSD.MOTIVATIONThe term clustering designates a comprehensive family of unsupervised learning methods allowing to group similar elements into sets called clusters. Geometrical clustering of molecular dynamics (MD) trajectories is a well-established analysis to gain insights into the conformational behavior of simulated systems. However, popular variants collapse when processing relatively long trajectories because of their quadratic memory or time complexity. From the arsenal of clustering algorithms, HDBSCAN stands out as a hierarchical density-based alternative that provides robust differentiation of intimately related elements from noise data. Although a very efficient implementation of this algorithm is available for programming-skilled users (HDBSCAN*), it cannot treat long trajectories under the de facto molecular similarity metric RMSD.Here, we propose MDSCAN, an HDBSCAN-inspired software specifically conceived for non-programmers users to perform memory-efficient RMSD-based clustering of long MD trajectories. Methodological improvements over the original version include the encoding of trajectories as a particular class of vantage-point tree (decreasing time complexity), and a dual-heap approach to construct a quasi-minimum spanning tree (reducing memory complexity). MDSCAN was able to process a trajectory of 1 million frames using the RMSD metric in about 21 h with <8 GB of RAM, a task that would have taken a similar time but more than 32 TB of RAM with the accelerated HDBSCAN* implementation generally used.RESULTSHere, we propose MDSCAN, an HDBSCAN-inspired software specifically conceived for non-programmers users to perform memory-efficient RMSD-based clustering of long MD trajectories. Methodological improvements over the original version include the encoding of trajectories as a particular class of vantage-point tree (decreasing time complexity), and a dual-heap approach to construct a quasi-minimum spanning tree (reducing memory complexity). MDSCAN was able to process a trajectory of 1 million frames using the RMSD metric in about 21 h with <8 GB of RAM, a task that would have taken a similar time but more than 32 TB of RAM with the accelerated HDBSCAN* implementation generally used.The source code and documentation of MDSCAN are free and publicly available on GitHub (https://github.com/LQCT/MDScan.git) and as a PyPI package (https://pypi.org/project/mdscan/).AVAILABILITY AND IMPLEMENTATIONThe source code and documentation of MDSCAN are free and publicly available on GitHub (https://github.com/LQCT/MDScan.git) and as a PyPI package (https://pypi.org/project/mdscan/).Supplementary data are available at Bioinformatics online.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.
The term clustering designates a comprehensive family of unsupervised learning methods allowing to group similar elements into sets called clusters. Geometrical clustering of molecular dynamics (MD) trajectories is a well-established analysis to gain insights into the conformational behavior of simulated systems. However, popular variants collapse when processing relatively long trajectories because of their quadratic memory or time complexity. From the arsenal of clustering algorithms, HDBSCAN stands out as a hierarchical density-based alternative that provides robust differentiation of intimately related elements from noise data. Although a very efficient implementation of this algorithm is available for programming-skilled users (HDBSCAN*), it cannot treat long trajectories under the de facto molecular similarity metric RMSD. Here, we propose MDSCAN, an HDBSCAN-inspired software specifically conceived for non-programmers users to perform memory-efficient RMSD-based clustering of long MD trajectories. Methodological improvements over the original version include the encoding of trajectories as a particular class of vantage-point tree (decreasing time complexity), and a dual-heap approach to construct a quasi-minimum spanning tree (reducing memory complexity). MDSCAN was able to process a trajectory of 1 million frames using the RMSD metric in about 21 h with <8 GB of RAM, a task that would have taken a similar time but more than 32 TB of RAM with the accelerated HDBSCAN* implementation generally used. The source code and documentation of MDSCAN are free and publicly available on GitHub (https://github.com/LQCT/MDScan.git) and as a PyPI package (https://pypi.org/project/mdscan/). Supplementary data are available at Bioinformatics online.
Abstract Motivation The term clustering designates a comprehensive family of unsupervised learning methods allowing to group similar elements into sets called clusters. Geometrical clustering of molecular dynamics (MD) trajectories is a well-established analysis to gain insights into the conformational behavior of simulated systems. However, popular variants collapse when processing relatively long trajectories because of their quadratic memory or time complexity. From the arsenal of clustering algorithms, HDBSCAN stands out as a hierarchical density-based alternative that provides robust differentiation of intimately related elements from noise data. Although a very efficient implementation of this algorithm is available for programming-skilled users (HDBSCAN*), it cannot treat long trajectories under the de facto molecular similarity metric RMSD. Results Here, we propose MDSCAN, an HDBSCAN-inspired software specifically conceived for non-programmers users to perform memory-efficient RMSD-based clustering of long MD trajectories. Methodological improvements over the original version include the encoding of trajectories as a particular class of vantage-point tree (decreasing time complexity), and a dual-heap approach to construct a quasi-minimum spanning tree (reducing memory complexity). MDSCAN was able to process a trajectory of 1 million frames using the RMSD metric in about 21 h with <8 GB of RAM, a task that would have taken a similar time but more than 32 TB of RAM with the accelerated HDBSCAN* implementation generally used. Availability and implementation The source code and documentation of MDSCAN are free and publicly available on GitHub (https://github.com/LQCT/MDScan.git) and as a PyPI package (https://pypi.org/project/mdscan/). Supplementary information Supplementary data are available at Bioinformatics online.
Author González-Alemán, Roy
Leclerc, Fabrice
Rodríguez-Serradet, Alejandro
Caballero, Julio
Montero-Cabrera, Luis
Platero-Rochart, Daniel
Hernández-Rodríguez, Erix W
Author_xml – sequence: 1
  givenname: Roy
  orcidid: 0000-0003-3852-4902
  surname: González-Alemán
  fullname: González-Alemán, Roy
– sequence: 2
  givenname: Daniel
  orcidid: 0000-0001-6454-4320
  surname: Platero-Rochart
  fullname: Platero-Rochart, Daniel
– sequence: 3
  givenname: Alejandro
  surname: Rodríguez-Serradet
  fullname: Rodríguez-Serradet, Alejandro
– sequence: 4
  givenname: Erix W
  surname: Hernández-Rodríguez
  fullname: Hernández-Rodríguez, Erix W
– sequence: 5
  givenname: Julio
  orcidid: 0000-0003-0182-1444
  surname: Caballero
  fullname: Caballero, Julio
– sequence: 6
  givenname: Fabrice
  surname: Leclerc
  fullname: Leclerc, Fabrice
– sequence: 7
  givenname: Luis
  surname: Montero-Cabrera
  fullname: Montero-Cabrera, Luis
BackLink https://www.ncbi.nlm.nih.gov/pubmed/36205607$$D View this record in MEDLINE/PubMed
https://hal.science/hal-03938219$$DView record in HAL
BookMark eNqFkctOwzAQRS0E4lH4hSpLWIT6UdsJYkFpgSIVkCisLcdxwMiJi50g8fc4akGCDasZjc6dO5p7ALYb12gAhgieIpiTUWGcaSrna9kaFUZFKxVjbAvsI8J4Os4Q2v7pIdkDByG8QQgppGwX7BGGYwP5Pri4my2nk_uz5PFuOUsLGXSZzGeX_SxRtgut9qZ5SVyVWBdr7axWnZU-KT8bWUfrQ7BTSRv00aYOwPP11dN0ni4ebm6nk0WqSE7aVFKO6BjzimrOx0STAhKkFOWKFYrRnNASIYaLXEKMS8l4zqoKy0yXJdUsQ2QATtZ7X6UVK29q6T-Fk0bMJwvRz2D0yTDKP3r2eM2uvHvvdGhFbYLS1spGuy4IzDGJ11BMIjrcoF1R6_Jn8_eHInC-BpR3IXhdCWXa-HTXtF4aKxAUfSDidyBiE0iUsz_yb4d_hF9HmpQ2
CitedBy_id crossref_primary_10_1007_s12633_024_03148_9
crossref_primary_10_1007_s00894_024_05996_z
crossref_primary_10_1016_j_bpc_2025_107389
crossref_primary_10_1021_acs_jcim_4c02217
crossref_primary_10_1016_j_jmb_2025_169233
crossref_primary_10_1080_07391102_2023_2280675
crossref_primary_10_3390_molecules29163902
Cites_doi 10.1021/acs.jcim.9b00828
10.1021/acs.jctc.6b00757
10.1007/978-1-4939-2978-8_15
10.1007/s10618-008-0120-3
10.1002/widm.1343
10.1007/s10115-003-0086-9
10.1016/j.bpj.2015.08.015
10.1109/T-C.1975.224110
10.1093/bioinformatics/btac021
10.1021/ct700119m
10.1021/acs.jcim.9b00558
10.1021/acs.jctc.7b00028
10.1002/pro.3268
10.1198/jcgs.2009.07049
10.1063/1674-0068/31/cjcp1806147
10.1021/ct400341p
10.1145/3068335
10.1093/bioinformatics/btab595
ContentType Journal Article
Copyright The Author(s) 2022. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml – notice: The Author(s) 2022. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
– notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7X8
1XC
VOOES
DOI 10.1093/bioinformatics/btac666
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
Hyper Article en Ligne (HAL)
Hyper Article en Ligne (HAL) (Open Access)
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
MEDLINE

Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 1367-4811
EndPage 5198
ExternalDocumentID oai:HAL:hal-03938219v1
36205607
10_1093_bioinformatics_btac666
Genre Research Support, Non-U.S. Gov't
Journal Article
GrantInformation_xml – fundername: Eiffel Scholarship Program of Excellence of Campus France
  grantid: P104786Z
– fundername: Project Hubert Curien-Carlos J. Finlay
  grantid: 41814TM
– fundername: Fondo Nacional de Desarrollo Científico y Tecnológico
  grantid: 1210138
– fundername: Cuban Oficina de Gestión de Fondos y Proyectos Internacionales
  grantid: PN223LH010-02
GroupedDBID ---
-E4
-~X
.2P
.DC
.I3
0R~
23N
2WC
4.4
48X
53G
5GY
5WA
70D
AAIJN
AAIMJ
AAJKP
AAKPC
AAMDB
AAMVS
AAOGV
AAPQZ
AAPXW
AAVAP
AAVLN
AAYXX
ABEJV
ABEUO
ABGNP
ABIXL
ABNKS
ABPTD
ABQLI
ABWST
ABXVV
ABZBJ
ACGFS
ACIWK
ACPRK
ACUFI
ACYTK
ADBBV
ADEYI
ADEZT
ADFTL
ADGZP
ADHKW
ADHZD
ADMLS
ADOCK
ADPDF
ADRTK
ADYVW
ADZTZ
ADZXQ
AECKG
AEGPL
AEJOX
AEKKA
AEKSI
AELWJ
AEMDU
AENEX
AENZO
AEPUE
AETBJ
AEWNT
AFFZL
AFIYH
AFOFC
AFRAH
AGINJ
AGKEF
AGQXC
AGSYK
AHMBA
AHXPO
AIJHB
AJEUX
AKHUL
AKWXX
ALMA_UNASSIGNED_HOLDINGS
ALTZX
ALUQC
AMNDL
APIBT
APWMN
ARIXL
ASPBG
AVWKF
AXUDD
AYOIW
AZVOD
BAWUL
BAYMD
BHONS
BQDIO
BQUQU
BSWAC
BTQHN
C45
CDBKE
CITATION
CS3
CZ4
DAKXR
DIK
DILTD
DU5
D~K
EBD
EBS
EE~
EMOBN
F5P
F9B
FEDTE
FHSFR
FLIZI
FLUFQ
FOEOM
FQBLK
GAUVT
GJXCC
GROUPED_DOAJ
GX1
H13
H5~
HAR
HW0
HZ~
IOX
J21
JXSIZ
KAQDR
KOP
KQ8
KSI
KSN
M-Z
MK~
ML0
N9A
NGC
NLBLG
NMDNZ
NOMLY
O9-
OAWHX
ODMLO
OJQWA
OK1
OVD
OVEED
P2P
PAFKI
PEELM
PQQKQ
Q1.
Q5Y
R44
RD5
RNS
ROL
ROX
RPM
RUSNO
RW1
RXO
SV3
TEORI
TJP
TLC
TOX
TR2
W8F
WOQ
X7H
YAYTL
YKOAZ
YXANX
ZKX
~91
~KM
CGR
CUY
CVF
ECM
EIF
NPM
7X8
1XC
VOOES
ID FETCH-LOGICAL-c393t-a5715427f5e7743e3b031cc57c6bc65935d1162b9a022da6796ff2a8edd5e6813
ISICitedReferencesCount 12
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000869800100001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1367-4803
1367-4811
IngestDate Tue Oct 14 20:38:01 EDT 2025
Thu Jul 10 17:34:46 EDT 2025
Mon Jul 21 05:36:47 EDT 2025
Sat Nov 29 03:49:25 EST 2025
Tue Nov 18 22:38:27 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 23
Language English
License https://academic.oup.com/pages/standard-publication-reuse-rights
The Author(s) 2022. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c393t-a5715427f5e7743e3b031cc57c6bc65935d1162b9a022da6796ff2a8edd5e6813
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0003-3852-4902
0000-0003-0182-1444
0000-0001-6454-4320
0000-0002-5641-1525
OpenAccessLink https://hal.science/hal-03938219
PMID 36205607
PQID 2723154523
PQPubID 23479
PageCount 8
ParticipantIDs hal_primary_oai_HAL_hal_03938219v1
proquest_miscellaneous_2723154523
pubmed_primary_36205607
crossref_citationtrail_10_1093_bioinformatics_btac666
crossref_primary_10_1093_bioinformatics_btac666
PublicationCentury 2000
PublicationDate 2022-11-30
PublicationDateYYYYMMDD 2022-11-30
PublicationDate_xml – month: 11
  year: 2022
  text: 2022-11-30
  day: 30
PublicationDecade 2020
PublicationPlace England
PublicationPlace_xml – name: England
PublicationTitle Bioinformatics (Oxford, England)
PublicationTitleAlternate Bioinformatics
PublicationYear 2022
Publisher Oxford University Press (OUP)
Publisher_xml – name: Oxford University Press (OUP)
References Campello (2022113016195100500_btac666-B3) 2020; 10
González-Alemán (2022113016195100500_btac666-B6) 2021; 38
Roe (2022113016195100500_btac666-B16) 2013; 9
Schubert (2022113016195100500_btac666-B18) 2017; 42
González-Alemán (2022113016195100500_btac666-B5) 2020; 60
González-Alemán (2022113016195100500_btac666-B4) 2020; 60
McInnes (2022113016195100500_btac666-B9) 2017
Platero-Rochart (2022113016195100500_btac666-B15) 2022; 38
Shea (2022113016195100500_btac666-B20) 2016; 1345
Campello (2022113016195100500_btac666-B2) 2013
Pei (2022113016195100500_btac666-B13) 2009; 18
Shao (2022113016195100500_btac666-B19) 2007; 3
McGibbon (2022113016195100500_btac666-B8) 2015; 109
Melvin (2022113016195100500_btac666-B11) 2018; 27
Pedregosa (2022113016195100500_btac666-B12) 2011; 12
Hinneburg (2022113016195100500_btac666-B7) 2003; 5
Sun (2022113016195100500_btac666-B22) 2010
Melvin (2022113016195100500_btac666-B10) 2016; 12
Yianilos (2022113016195100500_btac666-B23) 1993
Stuetzle (2022113016195100500_btac666-B21) 2010; 19
Sargsyan (2022113016195100500_btac666-B17) 2017; 13
Peng (2022113016195100500_btac666-B14) 2018; 31
Baskett (2022113016195100500_btac666-B1) 1975; C-24
References_xml – volume: 60
  start-page: 444
  year: 2020
  ident: 2022113016195100500_btac666-B4
  article-title: BitClust: fast geometrical clustering of long molecular dynamics simulations
  publication-title: J. Chem. Inf. Model
  doi: 10.1021/acs.jcim.9b00828
– volume: 12
  start-page: 6130
  year: 2016
  ident: 2022113016195100500_btac666-B10
  article-title: Uncovering large-scale conformational change in molecular dynamics without prior knowledge
  publication-title: J. Chem. Theory Comput
  doi: 10.1021/acs.jctc.6b00757
– start-page: 160
  year: 2013
  ident: 2022113016195100500_btac666-B2
– volume: 1345
  start-page: 225
  year: 2016
  ident: 2022113016195100500_btac666-B20
  article-title: Studying the early stages of protein aggregation using replica exchange molecular dynamics simulations
  publication-title: Methods Mol. Biol
  doi: 10.1007/978-1-4939-2978-8_15
– volume: 18
  start-page: 337
  year: 2009
  ident: 2022113016195100500_btac666-B13
  article-title: DECODE: a new method for discovering clusters of different densities in spatial data
  publication-title: Data Min. Knowl. Disc
  doi: 10.1007/s10618-008-0120-3
– volume: 10
  start-page: 1
  year: 2020
  ident: 2022113016195100500_btac666-B3
  article-title: Density-based clustering
  publication-title: WIREs Data Mining Knowl. Discov
  doi: 10.1002/widm.1343
– volume: 5
  start-page: 387
  year: 2003
  ident: 2022113016195100500_btac666-B7
  article-title: A general approach to clustering in large databases with noise
  publication-title: Knowl. Inform. Syst
  doi: 10.1007/s10115-003-0086-9
– volume: 109
  start-page: 1528
  year: 2015
  ident: 2022113016195100500_btac666-B8
  article-title: MDTraj: a modern open library for the analysis of molecular dynamics trajectories
  publication-title: Biophys. J
  doi: 10.1016/j.bpj.2015.08.015
– volume: C-24
  start-page: 1000
  year: 1975
  ident: 2022113016195100500_btac666-B1
  article-title: An algorithm for finding nearest neighbors
  publication-title: IEEE Trans. Comput
  doi: 10.1109/T-C.1975.224110
– volume: 38
  start-page: 1863
  year: 2022
  ident: 2022113016195100500_btac666-B15
  article-title: RCDPeaks: memory-efficient density peaks clustering of long molecular dynamics
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btac021
– volume: 3
  start-page: 2312
  year: 2007
  ident: 2022113016195100500_btac666-B19
  article-title: Clustering molecular dynamics trajectories: 1. Characterizing the performance of different clustering algorithms
  publication-title: J. Chem. Theory Comput
  doi: 10.1021/ct700119m
– start-page: 311
  year: 1993
  ident: 2022113016195100500_btac666-B23
– volume: 60
  start-page: 467
  year: 2020
  ident: 2022113016195100500_btac666-B5
  article-title: Quality threshold clustering of molecular dynamics: a word of caution
  publication-title: J. Chem. Inf. Model
  doi: 10.1021/acs.jcim.9b00558
– volume: 13
  start-page: 1518
  year: 2017
  ident: 2022113016195100500_btac666-B17
  article-title: How molecular size impacts RMSD applications in molecular dynamics simulations
  publication-title: J. Chem. Theory Comput
  doi: 10.1021/acs.jctc.7b00028
– volume: 27
  start-page: 62
  year: 2018
  ident: 2022113016195100500_btac666-B11
  article-title: Visualizing correlated motion with HDBSCAN clustering
  publication-title: Protein Sci
  doi: 10.1002/pro.3268
– volume: 19
  start-page: 397
  year: 2010
  ident: 2022113016195100500_btac666-B21
  article-title: A generalized single linkage method for estimating the cluster tree of a density
  publication-title: J. Comput. Graph. Stat
  doi: 10.1198/jcgs.2009.07049
– start-page: 481
  year: 2010
  ident: 2022113016195100500_btac666-B22
– volume: 31
  start-page: 404
  year: 2018
  ident: 2022113016195100500_btac666-B14
  article-title: Clustering algorithms to analyze molecular dynamics simulation trajectories for complex chemical and biological systems
  publication-title: Chin. J. Chem. Phys
  doi: 10.1063/1674-0068/31/cjcp1806147
– volume: 9
  start-page: 3084
  year: 2013
  ident: 2022113016195100500_btac666-B16
  article-title: PTRAJ and CPPTRAJ: software for processing and analysis of molecular dynamics trajectory data
  publication-title: J. Chem. Theory Comput
  doi: 10.1021/ct400341p
– start-page: 33
  year: 2017
  ident: 2022113016195100500_btac666-B9
– volume: 12
  start-page: 2825
  year: 2011
  ident: 2022113016195100500_btac666-B12
  article-title: Scikit-learn: machine learning in python
  publication-title: J. Mach. Learn. Res
– volume: 42
  start-page: 1
  year: 2017
  ident: 2022113016195100500_btac666-B18
  article-title: DBSCAN revisited, revisited: why and how you should (still) use DBSCAN
  publication-title: ACM Trans. Database Syst
  doi: 10.1145/3068335
– volume: 38
  start-page: 73
  year: 2021
  ident: 2022113016195100500_btac666-B6
  article-title: BitQT: a graph-based approach to the quality threshold clustering of molecular dynamics
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btab595
SSID ssj0005056
Score 2.4716215
Snippet The term clustering designates a comprehensive family of unsupervised learning methods allowing to group similar elements into sets called clusters....
Abstract Motivation The term clustering designates a comprehensive family of unsupervised learning methods allowing to group similar elements into sets called...
SourceID hal
proquest
pubmed
crossref
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
StartPage 5191
SubjectTerms Algorithms
Biochemistry, Molecular Biology
Biophysics
Chemical Sciences
Cluster Analysis
Life Sciences
Molecular Dynamics Simulation
or physical chemistry
Software
Theoretical and
Title MDSCAN: RMSD-based HDBSCAN clustering of long molecular dynamics
URI https://www.ncbi.nlm.nih.gov/pubmed/36205607
https://www.proquest.com/docview/2723154523
https://hal.science/hal-03938219
Volume 38
WOSCitedRecordID wos000869800100001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVASL
  databaseName: Oxford Journals Open Access Collection
  customDbUrl:
  eissn: 1367-4811
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0005056
  issn: 1367-4803
  databaseCode: TOX
  dateStart: 19850101
  isFulltext: true
  titleUrlDefault: https://academic.oup.com/journals/
  providerName: Oxford University Press
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3db9MwELfaARIvE5-jA6aAEC9V1Cau44QnyrrRh66bWCf1LbIdB4aypPRLZf8f_xfn2EkaqYjxwEvUnmK78l3Pd7673yH0zhUE7HAW2w7hxO75hNmcBT0bzv6Yc8qwoDxvNkHHY386DS4ajV9FLcw6oWnqbzbB7L-yGmjAbFU6-w_sLicFAnwGpsMT2A7POzH-bHB53B_niW5nlwNbHVNRezj4pKhtkawUMoJJdU5Uo6GbokFuO9Ld6Re1QO91ZrBVczxnBU66KfLhTQOQrcuEz1l6m4fenUTe2v1E3uhv2tHPyuv7iwRM3Hlmf8lU1deyqnWv4j-RjuEPvq5gJtBocxbpuAnM-p0pmIXqGneemmUieLc-1Oj6jU4kLO43wDU2uIqlSsYKmd03KlnuoBk9jv0teXXxllYGK9XZeVxoKC1e20lFWDLheTsQusfn4enVaBROTqaT97MftmpepoL8ppNLE91zKQlUZuHkfFplGnXzJsLljy4K1QPcqS_dMQvXbKTmN5Wh-yf3JzeDJo_QvvFfrL6Wu8eoIdMn6IHuaPrzKfqope-DVcmeZWTPqmTPymJLyZ5Vyp5VyN4zdHV6Mjke2qZJhy1wgJc2IxSscJfGRIIngSXmcEwIQajwuPBIgEnkOJ7LAwa8jZi6toxjl_kyioj0fAc_R3tplsoXyALPNhAk4qJHec9l3UCAtU5E7EqBiR_LFiLFtoTCINirRipJqDMpcFjfztBsZwt1ynEzjeHy1xFvYdfLlxUE-7A_ChVN1bL7cMyvnRZ6UzAlBJ2sAm0sldlqEboUvCZwTVzcQgeaW-VcYDCCOHTp4R1Gv0QPq3_EK7S3nK_ka3RfrJfXi_kRatKpf5SL2m_ak7Z0
linkProvider Oxford University Press
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=MDSCAN%3A+RMSD-based+HDBSCAN+clustering+of+long+molecular+dynamics&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=Gonz%C3%A1lez-Alem%C3%A1n%2C+Roy&rft.au=Platero-Rochart%2C+Daniel&rft.au=Rodr%C3%ADguez-Serradet%2C+Alejandro&rft.au=Hern%C3%A1ndez-Rodr%C3%ADguez%2C+Erix+W&rft.date=2022-11-30&rft.issn=1367-4811&rft.eissn=1367-4811&rft.volume=38&rft.issue=23&rft.spage=5191&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbtac666&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4803&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4803&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4803&client=summon