DROPP: Structure-Aware PCA for Ordered Data: A General Method and its Applications in Climate Research and Molecular Dynamics

Uloženo v:
Podrobná bibliografie
Název: DROPP: Structure-Aware PCA for Ordered Data: A General Method and its Applications in Climate Research and Molecular Dynamics
Autoři: Beer, Anna, Palotás, Olivér, Maldonado, Andrea, Draganov, Andrew, Assent, Ira
Zdroj: Beer, A, Palotás, O, Maldonado, A, Draganov, A & Assent, I 2024, DROPP: Structure-Aware PCA for Ordered Data : A General Method and its Applications in Climate Research and Molecular Dynamics. in 2024 IEEE 40th International Conference on Data Engineering (ICDE). IEEE, pp. 1143-1156, 40th IEEE International Conference on Data Engineering Workshops, ICDEW 2024, Utrecht, Netherlands, 13/05/2024. https://doi.org/10.1109/ICDE60146.2024.00093
Informace o vydavateli: IEEE, 2024.
Rok vydání: 2024
Témata: PCA, 102033 Data Mining, Data visualization, 102033 Data mining, Trajectory, Proteins, random walks, Reliability, Dimensionality reduction, molecular dynamics, ordered data, correlation, climate data, Neural networks, Visualization, dimensionality reduction
Popis: Ordered data arises in many areas, e.g., in molec-ular dynamics and other spatial-temporal trajectories. While data points that are close in this order are related, common dimensionality reduction techniques cannot capture this relation or order. Thus, the information is lost in the low-dimensional representations. We introduce DROPP, which incorporates order into dimensionality reduction by adapting a Gaussian kernel function across the ordered covariances between data points. We find underlying principal components that are characteristic of the process that generated the data. In extensive experiments, we show DROPP's advantages over other dimensionality re-duction techniques on synthetic as well as real-world data sets from molecular dynamics and climate research: The principal components of different data sets that were generated by the same underlying mechanism are very similar to each other. They can, thus, be used for dimensionality reduction with low reconstruction errors along a set of data sets, allowing an explainable visual comparison of different data sets as well as good compression even for unseen data.
Druh dokumentu: Article
Conference object
Contribution for newspaper or weekly magazine
DOI: 10.1109/icde60146.2024.00093
Přístupová URL adresa: https://ucrisportal.univie.ac.at/de/publications/77010aef-5c4e-4e58-b6f4-5ebd6a0f4864
https://pure.au.dk/portal/en/publications/1ce67ff5-76ca-4338-8032-6ed98690c747
http://www.scopus.com/inward/record.url?scp=85200446989&partnerID=8YFLogxK
https://doi.org/10.1109/ICDE60146.2024.00093
Rights: STM Policy #29
Přístupové číslo: edsair.doi.dedup.....b24ee1e65131a0d3c7bed13270a1e0f3
Databáze: OpenAIRE
Popis
Abstrakt:Ordered data arises in many areas, e.g., in molec-ular dynamics and other spatial-temporal trajectories. While data points that are close in this order are related, common dimensionality reduction techniques cannot capture this relation or order. Thus, the information is lost in the low-dimensional representations. We introduce DROPP, which incorporates order into dimensionality reduction by adapting a Gaussian kernel function across the ordered covariances between data points. We find underlying principal components that are characteristic of the process that generated the data. In extensive experiments, we show DROPP's advantages over other dimensionality re-duction techniques on synthetic as well as real-world data sets from molecular dynamics and climate research: The principal components of different data sets that were generated by the same underlying mechanism are very similar to each other. They can, thus, be used for dimensionality reduction with low reconstruction errors along a set of data sets, allowing an explainable visual comparison of different data sets as well as good compression even for unseen data.
DOI:10.1109/icde60146.2024.00093