DROPP: Structure-Aware PCA for Ordered Data: A General Method and its Applications in Climate Research and Molecular Dynamics
Uloženo v:
| Název: | DROPP: Structure-Aware PCA for Ordered Data: A General Method and its Applications in Climate Research and Molecular Dynamics |
|---|---|
| Autoři: | Beer, Anna, Palotás, Olivér, Maldonado, Andrea, Draganov, Andrew, Assent, Ira |
| Zdroj: | Beer, A, Palotás, O, Maldonado, A, Draganov, A & Assent, I 2024, DROPP: Structure-Aware PCA for Ordered Data : A General Method and its Applications in Climate Research and Molecular Dynamics. in 2024 IEEE 40th International Conference on Data Engineering (ICDE). IEEE, pp. 1143-1156, 40th IEEE International Conference on Data Engineering Workshops, ICDEW 2024, Utrecht, Netherlands, 13/05/2024. https://doi.org/10.1109/ICDE60146.2024.00093 |
| Informace o vydavateli: | IEEE, 2024. |
| Rok vydání: | 2024 |
| Témata: | PCA, 102033 Data Mining, Data visualization, 102033 Data mining, Trajectory, Proteins, random walks, Reliability, Dimensionality reduction, molecular dynamics, ordered data, correlation, climate data, Neural networks, Visualization, dimensionality reduction |
| Popis: | Ordered data arises in many areas, e.g., in molec-ular dynamics and other spatial-temporal trajectories. While data points that are close in this order are related, common dimensionality reduction techniques cannot capture this relation or order. Thus, the information is lost in the low-dimensional representations. We introduce DROPP, which incorporates order into dimensionality reduction by adapting a Gaussian kernel function across the ordered covariances between data points. We find underlying principal components that are characteristic of the process that generated the data. In extensive experiments, we show DROPP's advantages over other dimensionality re-duction techniques on synthetic as well as real-world data sets from molecular dynamics and climate research: The principal components of different data sets that were generated by the same underlying mechanism are very similar to each other. They can, thus, be used for dimensionality reduction with low reconstruction errors along a set of data sets, allowing an explainable visual comparison of different data sets as well as good compression even for unseen data. |
| Druh dokumentu: | Article Conference object Contribution for newspaper or weekly magazine |
| DOI: | 10.1109/icde60146.2024.00093 |
| Přístupová URL adresa: | https://ucrisportal.univie.ac.at/de/publications/77010aef-5c4e-4e58-b6f4-5ebd6a0f4864 https://pure.au.dk/portal/en/publications/1ce67ff5-76ca-4338-8032-6ed98690c747 http://www.scopus.com/inward/record.url?scp=85200446989&partnerID=8YFLogxK https://doi.org/10.1109/ICDE60146.2024.00093 |
| Rights: | STM Policy #29 |
| Přístupové číslo: | edsair.doi.dedup.....b24ee1e65131a0d3c7bed13270a1e0f3 |
| Databáze: | OpenAIRE |
| Abstrakt: | Ordered data arises in many areas, e.g., in molec-ular dynamics and other spatial-temporal trajectories. While data points that are close in this order are related, common dimensionality reduction techniques cannot capture this relation or order. Thus, the information is lost in the low-dimensional representations. We introduce DROPP, which incorporates order into dimensionality reduction by adapting a Gaussian kernel function across the ordered covariances between data points. We find underlying principal components that are characteristic of the process that generated the data. In extensive experiments, we show DROPP's advantages over other dimensionality re-duction techniques on synthetic as well as real-world data sets from molecular dynamics and climate research: The principal components of different data sets that were generated by the same underlying mechanism are very similar to each other. They can, thus, be used for dimensionality reduction with low reconstruction errors along a set of data sets, allowing an explainable visual comparison of different data sets as well as good compression even for unseen data. |
|---|---|
| DOI: | 10.1109/icde60146.2024.00093 |
Nájsť tento článok vo Web of Science