Accelerating active learning materials discovery with FAIR data and workflows: A case study for alloy melting temperatures

Active learning (AL) is a powerful sequential optimization approach that has shown great promise in the discovery of new materials. However, a major challenge remains the acquisition of the initial data and the development of workflows to generate new data at each iteration. In this study, we demons...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computational materials science Jg. 249; S. 113640
Hauptverfasser: Harwani, Mohnish, Verduzco, Juan C., Lee, Brian H., Strachan, Alejandro
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier B.V 05.02.2025
Schlagworte:
ISSN:0927-0256
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Active learning (AL) is a powerful sequential optimization approach that has shown great promise in the discovery of new materials. However, a major challenge remains the acquisition of the initial data and the development of workflows to generate new data at each iteration. In this study, we demonstrate a significant speedup in an optimization task by reusing a published simulation workflow available for online simulations and its associated data repository, where the results of each workflow run are automatically stored. Both the workflow and its data follow FAIR (findable, accessible, interoperable, and reusable) principles using nanoHUB’s infrastructure. The workflow employs molecular dynamics to calculate the melting temperature of multi-principal component alloys. We leveraged all prior data not only to develop an accurate machine learning model to start the sequential optimization but also to optimize the simulation parameters and accelerate convergence. Prior work showed that finding the alloy composition with the highest melting temperature required testing several alloy compositions, and establishing the melting temperature for each composition took, on average, multiple simulations. By developing a workflow that utilizes the FAIR data in the nanoHUB database, we reduced the number of simulations per composition to one and found the alloy with the lowest melting temperature testing only three compositions. This second optimization, therefore, shows a speedup of 10x as compared to models that do not access the FAIR databases. [Display omitted] •FAIR data/workflows streamline active learning processes, boosting alloy discovery.•ML-driven parameter refinement cuts simulations per alloy from 4.4 to 1.3.•Active learning paired with FAIR data achieves a 10-fold optimization speed increase.
ISSN:0927-0256
DOI:10.1016/j.commatsci.2024.113640