DuQTTA: Dual Quantized Tensor-Train Adaptation with Decoupling Magnitude-Direction for Efficient Fine-Tuning of LLMs

Recent parameter-efficient fine-tuning (PEFT) techniques have enabled large language models (LLMs) to be efficiently fine-tuned for specific tasks, while maintaining model performance with minimal additional trainable parameters. However, existing PEFT techniques continue to face challenges in balan...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2025 62nd ACM/IEEE Design Automation Conference (DAC) s. 1 - 7
Hlavní autoři:	Dong, Haoyan, Chen, Hai-Bao, Chang, Jingjing, Yang, Yixin, Gao, Ziyang, Ji, Zhigang, Wang, Runsheng, Huang, Ru
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 22.06.2025
Témata:	Accuracy Adaptation models adaptive optimization strategy dual quantization Faces Large language models LLMs magnitude-direction decoupled Memory management Optimization Power demand Quantization (signal) Scalability Tensor-Train Tensors
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Recent parameter-efficient fine-tuning (PEFT) techniques have enabled large language models (LLMs) to be efficiently fine-tuned for specific tasks, while maintaining model performance with minimal additional trainable parameters. However, existing PEFT techniques continue to face challenges in balancing both accuracy and efficiency, especially when addressing scalability and the demands of lightweight deployment for LLMs. In this paper, we propose an efficient fine-tuning method of LLMs based on dual quantized Tensor-Train adaptation with decoupling magnitude-direction (DuQTTA). The proposed DuQTTA method employs Tensor-Train decomposition and dual-stage quantization to minimize model size and resource consumption. Additionally, it employs an adaptive optimization strategy and a decoupled update mechanism to improve model performance, thereby minimizing suboptimal outcomes and ensuring alignment with the full-parameter fine-tuning goals. Experimental results indicate that the proposed DuQTTA method outperforms existing PEFT methods, achieving up to a 65 \times compression rate compared to the LLaMA2-7B models, meanwhile delivering improvements of 4.44 \%, 3.14 \%, and 0.97% over LoRA on LLaMA2-7B, LLaMA3-8B, and LLaMA2-13B, respectively. The proposed DuQTTA method is effective in compressing LLMs for deployment on resource-constrained edge devices.
DOI:	10.1109/DAC63849.2025.11133002