DuQTTA: Dual Quantized Tensor-Train Adaptation with Decoupling Magnitude-Direction for Efficient Fine-Tuning of LLMs
Recent parameter-efficient fine-tuning (PEFT) techniques have enabled large language models (LLMs) to be efficiently fine-tuned for specific tasks, while maintaining model performance with minimal additional trainable parameters. However, existing PEFT techniques continue to face challenges in balan...
Uloženo v:
| Vydáno v: | 2025 62nd ACM/IEEE Design Automation Conference (DAC) s. 1 - 7 |
|---|---|
| Hlavní autoři: | , , , , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
22.06.2025
|
| Témata: | |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Recent parameter-efficient fine-tuning (PEFT) techniques have enabled large language models (LLMs) to be efficiently fine-tuned for specific tasks, while maintaining model performance with minimal additional trainable parameters. However, existing PEFT techniques continue to face challenges in balancing both accuracy and efficiency, especially when addressing scalability and the demands of lightweight deployment for LLMs. In this paper, we propose an efficient fine-tuning method of LLMs based on dual quantized Tensor-Train adaptation with decoupling magnitude-direction (DuQTTA). The proposed DuQTTA method employs Tensor-Train decomposition and dual-stage quantization to minimize model size and resource consumption. Additionally, it employs an adaptive optimization strategy and a decoupled update mechanism to improve model performance, thereby minimizing suboptimal outcomes and ensuring alignment with the full-parameter fine-tuning goals. Experimental results indicate that the proposed DuQTTA method outperforms existing PEFT methods, achieving up to a 65 \times compression rate compared to the LLaMA2-7B models, meanwhile delivering improvements of 4.44 \%, 3.14 \%, and 0.97% over LoRA on LLaMA2-7B, LLaMA3-8B, and LLaMA2-13B, respectively. The proposed DuQTTA method is effective in compressing LLMs for deployment on resource-constrained edge devices. |
|---|---|
| DOI: | 10.1109/DAC63849.2025.11133002 |