CADialogue: A multimodal LLM-powered conversational assistant for intuitive parametric CAD modeling

Recent advances in generative Artificial Intelligence (AI)—particularly Large Language Models (LLMs)—offer a new paradigm for CAD interaction by enabling natural and intuitive input through texts, images, and context-aware selections. In this study, we present CADialogue, a multimodal LLM-powered co...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computer aided design Jg. 191; S. 104006
Hauptverfasser: Zhou, Jiwei, Camba, Jorge D., Company, Pedro
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier Ltd 01.02.2026
Schlagworte:
ISSN:0010-4485
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Recent advances in generative Artificial Intelligence (AI)—particularly Large Language Models (LLMs)—offer a new paradigm for CAD interaction by enabling natural and intuitive input through texts, images, and context-aware selections. In this study, we present CADialogue, a multimodal LLM-powered conversational assistant to enable intuitive parametric CAD modeling through natural language, speech, image, and selection-based geometry interactions. Built on a general-purpose large language model, CADialogue translates user prompts into executable code to support geometry creation and context-aware editing. The system features a modular architecture that decouples prompt handling, refinement logic, and execution—allowing seamless model replacement as LLMs develop—and includes caching for rapid reuse of validated designs. We evaluate the system on 70 modeling and 10 editing tasks across varying difficulty levels, assessing performance in terms of accuracy, refinement behavior, and execution time. Results show an overall success rate of 95.71%, combining a 91.43% baseline under Text-Only input with additional recoveries enabled by Text + Image input, with robust recovery from failure via self-correction and human-in-the-loop refinement. Comparative analysis reveals that image input improves success in semantically complex prompts but introduces additional processing time. Furthermore, caching confirmed macros yields over 85.71% speedup in repeated executions. These findings highlight the potential of general-purpose LLMs for enabling accessible, iterative, and accurate CAD modeling workflows without domain-specific fine-tuning. The source code and dataset for CADialogue are available at https://github.com/Hiram31/CADialogue.
ISSN:0010-4485
DOI:10.1016/j.cad.2025.104006