Evaluating large language model agents for automation of atomic force microscopy.

Uloženo v:
Podrobná bibliografie
Název: Evaluating large language model agents for automation of atomic force microscopy.
Autoři: Mandal, Indrajeet, Soni, Jitendra, Zaki, Mohd, Smedskjaer, Morten M., Wondraczek, Katrin, Wondraczek, Lothar, Gosvami, Nitya Nand, Krishnan, N. M. Anoop
Zdroj: Nature Communications; Dec2025, Vol. 16 Issue 1, p1-15, 15p
Témata: ATOMIC force microscopy, LANGUAGE models, SAFETY regulations, RESEARCH assistants, MULTIAGENT systems, LABORATORIES
Abstrakt: Large language models (LLMs) are transforming laboratory automation by enabling self-driving laboratories (SDLs) that could accelerate materials research. However, current SDL implementations rely on rigid protocols that fail to capture the adaptability and intuition of expert scientists in dynamic experimental settings. Here, we show that LLM agents can automate atomic force microscopy (AFM) through our Artificially Intelligent Lab Assistant (AILA) framework. Further, we develop AFMBench—a comprehensive evaluation suite challenging LLM agents across the complete scientific workflow from experimental design to results analysis. We find that state-of-the-art LLMs struggle with basic tasks and coordination scenarios. Notably, models excelling at materials science question-answering perform poorly in laboratory settings, showing that domain knowledge does not translate to experimental capabilities. Additionally, we observe that LLM agents can deviate from instructions, a phenomenon referred to as sleepwalking, raising safety alignment concerns for SDL applications. Our ablations reveal that multi-agent frameworks significantly outperform single-agent approaches, though both remain sensitive to minor changes in instruction formatting or prompting. Finally, we evaluate AILA's effectiveness in increasingly advanced experiments—AFM calibration, feature detection, mechanical property measurement, graphene layer counting, and indenter detection. These findings establish the necessity for benchmarking and robust safety protocols before deploying LLM agents as autonomous laboratory assistants across scientific disciplines. LLM agents could revolutionize laboratory automation, but their capabilities remain poorly tested. Here, the authors create a framework automating atomic force microscopy with LLMs and benchmark them through an end-to-end evaluation suite, revealing major limitations and safety concerns [ABSTRACT FROM AUTHOR]
Copyright of Nature Communications is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Databáze: Complementary Index
Buďte první, kdo okomentuje tento záznam!
Nejprve se musíte přihlásit.