Hybrid-NL2SVA: Integrating RAG and Finetuning for LLM-based NL2SVA

SystemVerilog Assertions (SVAs) are critical for verifying the correctness of hardware designs, but manually writing them from natural language property descriptions, i.e., NL2SVA, remains a labor-intensive and error-prone task. Recent advances in large language models (LLMs) offer opportunities to...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2025 ACM/IEEE 7th Symposium on Machine Learning for CAD (MLCAD) s. 1 - 10
Hlavní autoři: Xiao, Weihua, Ekberg, Derek, Garg, Siddharth, Karri, Ramesh
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 08.09.2025
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:SystemVerilog Assertions (SVAs) are critical for verifying the correctness of hardware designs, but manually writing them from natural language property descriptions, i.e., NL2SVA, remains a labor-intensive and error-prone task. Recent advances in large language models (LLMs) offer opportunities to automate this translation. However, existing models still struggle with understanding domain-specific syntax and semantics. To enhance LLM performance in NL2SVA, we propose a customized retrieval-augmented generation (RAG) framework and a synthetic fine-tuning dataset that together improve LLM's performance. Our RAG framework (i) constructs a context-preserving database via dynamic splitting technique, (ii) combines global semantic retrieval with keyword-guided retrieval to extract SVA operator-related contexts via HybridRetrieval, and (iii) validate and correct the use of SVA operators in LLM-generated SVAs via SVA operator-based rechecking. To further improve lightweight models over NL2SVA, our fine-tuning dataset provides prompt-guided explanations that teach LLMs the layer-by-layer construction process of concurrent SVAs, enabling supervised fine-tuning that greatly improves syntax and functionality accuracy. To evaluate the performance of LLMs over NL2SVA, we construct the largest evaluation dataset for NL2SVA, comprising 40 Verilog designs and 229 formally verified SVAs with detailed annotations. Experimental results show that our customized RAG framework increases the number of functionality matched SVAs by 58.42% over GPT-4o-mini, while Qwen2.5-Coder-7B-Instruct fine-tuned on our fine-tuning dataset and integrated with HybridRetrieval achieves a 59.05% over the base Qwen model.
DOI:10.1109/MLCAD65511.2025.11189208