Datasets and Distillation Labels for the Paper 'Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians'

Uloženo v:
Podrobná bibliografie
Název: Datasets and Distillation Labels for the Paper 'Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians'
Autoři: Amin, Ishan, Raja, Sanjeev, Krishnapriyan, Aditi
Přispěvatelé: Amin, Ishan, Raja, Sanjeev, Krishnapriyan, Aditi
Informace o vydavateli: Zenodo
Rok vydání: 2025
Sbírka: Zenodo
Témata: Machine Learning, Molecular and chemical physics, Materials Science, Quantum chemistry
Popis: We provide 6 data folders, which were used in our paper Amin, I., Raja, S., Krishnapriyan, A.S. (2024). Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians. Accepted to ICLR 2025. arXiv:2501.09009. md22_JMP_labels.tar.gz - md22 JMP (large and small, finetuned) Hessian Labels for Buckyball Catcher and Double Walled Nanotube splits SPICE_MaceOFF_labels.tar.gz - SPICE Mace-OFF Hessian Labels MPtrj_labels.tar.gz - MPTrj Mace-MP Hessian Labels spice_separated.tar.gz - SPICE subdatasets (lmdb) (Solvated Amino Acids, Molecules with Iodine, DES370K Monomers) md22.tar.gz - MD22 datasets (lmdb) for buckyball catcher and double wall nanotube. Taken from the JMP repository (see paper). MPtrj_separated_all_splits.zip - MPtrj subdatasets (lmdb) filtered by property (Pm3m Spacegroup, Systems with Yttrium, Bandgap >= 5 meV). The original data was taken from the SPICE dataset , MPtrj dataset, and md22 dataset The repository for the paper, where these datasets can be used, is available at https://github.com/ASK-Berkeley/MLFF-distill. If you found any of this useful, please consider citing the paper: @article{amin2025distilling, title={Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians}, author={Ishan Amin, Sanjeev Raja, and Krishnapriyan, A.S.}, journal={International Conference on Learning Representations 2025}, year={2025}, archivePrefix={arXiv}, eprint={2501.09009},}
Druh dokumentu: dataset
Jazyk: English
Relation: https://zenodo.org/records/14759305; oai:zenodo.org:14759305; https://doi.org/10.5281/zenodo.14759305
DOI: 10.5281/zenodo.14759305
Dostupnost: https://doi.org/10.5281/zenodo.14759305
https://zenodo.org/records/14759305
Rights: Creative Commons Attribution 4.0 International ; cc-by-4.0 ; https://creativecommons.org/licenses/by/4.0/legalcode
Přístupové číslo: edsbas.7B5211D3
Databáze: BASE
Popis
Abstrakt:We provide 6 data folders, which were used in our paper Amin, I., Raja, S., Krishnapriyan, A.S. (2024). Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians. Accepted to ICLR 2025. arXiv:2501.09009. md22_JMP_labels.tar.gz - md22 JMP (large and small, finetuned) Hessian Labels for Buckyball Catcher and Double Walled Nanotube splits SPICE_MaceOFF_labels.tar.gz - SPICE Mace-OFF Hessian Labels MPtrj_labels.tar.gz - MPTrj Mace-MP Hessian Labels spice_separated.tar.gz - SPICE subdatasets (lmdb) (Solvated Amino Acids, Molecules with Iodine, DES370K Monomers) md22.tar.gz - MD22 datasets (lmdb) for buckyball catcher and double wall nanotube. Taken from the JMP repository (see paper). MPtrj_separated_all_splits.zip - MPtrj subdatasets (lmdb) filtered by property (Pm3m Spacegroup, Systems with Yttrium, Bandgap >= 5 meV). The original data was taken from the SPICE dataset , MPtrj dataset, and md22 dataset The repository for the paper, where these datasets can be used, is available at https://github.com/ASK-Berkeley/MLFF-distill. If you found any of this useful, please consider citing the paper: @article{amin2025distilling, title={Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians}, author={Ishan Amin, Sanjeev Raja, and Krishnapriyan, A.S.}, journal={International Conference on Learning Representations 2025}, year={2025}, archivePrefix={arXiv}, eprint={2501.09009},}
DOI:10.5281/zenodo.14759305