DuQTTA: Dual Quantized Tensor-Train Adaptation with Decoupling Magnitude-Direction for Efficient Fine-Tuning of LLMs

Recent parameter-efficient fine-tuning (PEFT) techniques have enabled large language models (LLMs) to be efficiently fine-tuned for specific tasks, while maintaining model performance with minimal additional trainable parameters. However, existing PEFT techniques continue to face challenges in balan...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:2025 62nd ACM/IEEE Design Automation Conference (DAC) S. 1 - 7
Hauptverfasser: Dong, Haoyan, Chen, Hai-Bao, Chang, Jingjing, Yang, Yixin, Gao, Ziyang, Ji, Zhigang, Wang, Runsheng, Huang, Ru
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 22.06.2025
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Recent parameter-efficient fine-tuning (PEFT) techniques have enabled large language models (LLMs) to be efficiently fine-tuned for specific tasks, while maintaining model performance with minimal additional trainable parameters. However, existing PEFT techniques continue to face challenges in balancing both accuracy and efficiency, especially when addressing scalability and the demands of lightweight deployment for LLMs. In this paper, we propose an efficient fine-tuning method of LLMs based on dual quantized Tensor-Train adaptation with decoupling magnitude-direction (DuQTTA). The proposed DuQTTA method employs Tensor-Train decomposition and dual-stage quantization to minimize model size and resource consumption. Additionally, it employs an adaptive optimization strategy and a decoupled update mechanism to improve model performance, thereby minimizing suboptimal outcomes and ensuring alignment with the full-parameter fine-tuning goals. Experimental results indicate that the proposed DuQTTA method outperforms existing PEFT methods, achieving up to a 65 \times compression rate compared to the LLaMA2-7B models, meanwhile delivering improvements of 4.44 \%, 3.14 \%, and 0.97% over LoRA on LLaMA2-7B, LLaMA3-8B, and LLaMA2-13B, respectively. The proposed DuQTTA method is effective in compressing LLMs for deployment on resource-constrained edge devices.
AbstractList Recent parameter-efficient fine-tuning (PEFT) techniques have enabled large language models (LLMs) to be efficiently fine-tuned for specific tasks, while maintaining model performance with minimal additional trainable parameters. However, existing PEFT techniques continue to face challenges in balancing both accuracy and efficiency, especially when addressing scalability and the demands of lightweight deployment for LLMs. In this paper, we propose an efficient fine-tuning method of LLMs based on dual quantized Tensor-Train adaptation with decoupling magnitude-direction (DuQTTA). The proposed DuQTTA method employs Tensor-Train decomposition and dual-stage quantization to minimize model size and resource consumption. Additionally, it employs an adaptive optimization strategy and a decoupled update mechanism to improve model performance, thereby minimizing suboptimal outcomes and ensuring alignment with the full-parameter fine-tuning goals. Experimental results indicate that the proposed DuQTTA method outperforms existing PEFT methods, achieving up to a 65 \times compression rate compared to the LLaMA2-7B models, meanwhile delivering improvements of 4.44 \%, 3.14 \%, and 0.97% over LoRA on LLaMA2-7B, LLaMA3-8B, and LLaMA2-13B, respectively. The proposed DuQTTA method is effective in compressing LLMs for deployment on resource-constrained edge devices.
Author Chang, Jingjing
Yang, Yixin
Gao, Ziyang
Chen, Hai-Bao
Wang, Runsheng
Dong, Haoyan
Huang, Ru
Ji, Zhigang
Author_xml – sequence: 1
  givenname: Haoyan
  surname: Dong
  fullname: Dong, Haoyan
  email: haibaochen@sjtu.edu.cn
  organization: Shanghai Jiao Tong University,The Department of Micro-Nano Electronics,Shanghai,China
– sequence: 2
  givenname: Hai-Bao
  surname: Chen
  fullname: Chen, Hai-Bao
  organization: Shanghai Jiao Tong University,The Department of Micro-Nano Electronics,Shanghai,China
– sequence: 3
  givenname: Jingjing
  surname: Chang
  fullname: Chang, Jingjing
  organization: Shanghai Jiao Tong University,The Department of Micro-Nano Electronics,Shanghai,China
– sequence: 4
  givenname: Yixin
  surname: Yang
  fullname: Yang, Yixin
  organization: Shanghai Jiao Tong University,The Department of Micro-Nano Electronics,Shanghai,China
– sequence: 5
  givenname: Ziyang
  surname: Gao
  fullname: Gao, Ziyang
  organization: Shanghai Jiao Tong University,The Department of Micro-Nano Electronics,Shanghai,China
– sequence: 6
  givenname: Zhigang
  surname: Ji
  fullname: Ji, Zhigang
  organization: Shanghai Jiao Tong University,The Department of Micro-Nano Electronics,Shanghai,China
– sequence: 7
  givenname: Runsheng
  surname: Wang
  fullname: Wang, Runsheng
  organization: Peking University, Peking University,School of Integrated Circuits,Beijing,China
– sequence: 8
  givenname: Ru
  surname: Huang
  fullname: Huang, Ru
  organization: Peking University, Peking University,School of Integrated Circuits,Beijing,China
BookMark eNo10F1LwzAYBeAIeqFz_0Akf6AzH22TeFfWTYUOGdTr8TZ9MwM1LW2K6K93fl0dODyci3NFzkMfkJBbzlacM3NXFutc6tSsBBPZqeJSMibOyNIoo6XkGZMs1ZcklvO-rot7Ws7Q0f0MIfpPbGmNYerHpB7BB1q0MESIvg_03cdXWqLt56Hz4Uh3cAw-zi0mpR_R_hjXj3TjnLceQ6RbHzCp5_Cte0erajddkwsH3YTLv1yQl-2mXj8m1fPD07qoEuDKxERZy3InAdNMOhQWQatMqNSyVjUARqhGC-Qqz4xmnDdSN7kVljVZDidh5YLc_O56RDwMo3-D8ePwf4b8AnwXWew
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/DAC63849.2025.11133002
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798331503048
EndPage 7
ExternalDocumentID 11133002
Genre orig-research
GrantInformation_xml – fundername: Nature
  funderid: 10.13039/501100020487
GroupedDBID 6IE
6IH
CBEJK
RIE
RIO
ID FETCH-LOGICAL-a179t-7cc06f3ae453fe2cea875274c0d7baa927b82e176598011b38b6c2c0b56ad7bc3
IEDL.DBID RIE
IngestDate Wed Oct 01 07:05:15 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a179t-7cc06f3ae453fe2cea875274c0d7baa927b82e176598011b38b6c2c0b56ad7bc3
PageCount 7
ParticipantIDs ieee_primary_11133002
PublicationCentury 2000
PublicationDate 2025-June-22
PublicationDateYYYYMMDD 2025-06-22
PublicationDate_xml – month: 06
  year: 2025
  text: 2025-June-22
  day: 22
PublicationDecade 2020
PublicationTitle 2025 62nd ACM/IEEE Design Automation Conference (DAC)
PublicationTitleAbbrev DAC
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
Score 2.2949028
Snippet Recent parameter-efficient fine-tuning (PEFT) techniques have enabled large language models (LLMs) to be efficiently fine-tuned for specific tasks, while...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Accuracy
Adaptation models
adaptive optimization strategy
dual quantization
Faces
Large language models
LLMs
magnitude-direction decoupled
Memory management
Optimization
Power demand
Quantization (signal)
Scalability
Tensor-Train
Tensors
Title DuQTTA: Dual Quantized Tensor-Train Adaptation with Decoupling Magnitude-Direction for Efficient Fine-Tuning of LLMs
URI https://ieeexplore.ieee.org/document/11133002
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELagYmACRBFveWB1mzh1nLBVTSuGtmqlIHWrzo6DKqGmamMGfj1nNwUxMLBF0VmJzo97-L77CHlyDUM0AHpuhRGOwixgiS6BGa5B4YriPfAt88dyOk0Wi3TWgNU9FsYY44vPTMc9-rv8otLWpcq6jhY98q0jj6WUe7BWg_oNg7Sb9Qe4mnoOfsJF5yD8izbFW43R2T-_d07aP_g7Ovu2LBfkyKwvSZ3ZeZ73n2lm4Z3OLapk9WkKmmMgWm1Z7rgeaL-Azf5ynboMK80wurQOdPtGJ-AKhWxhWHPMoQx6rHTom0jgj9ARepwsty5TQquSjseTXZu8job54IU1nAkMcGvVTGodxGUEpieiEhVuAAMSjDx1UEgFkHKpEm5CGYsUbVOookTFmutAiRhQQkdXpLWu1uaaUBBacED3Lk5wfBwkSgJKpMqEuLUAbkjbqWy52bfFWB60dfvH-zty6ibG1Vlxfk9a9daaB3KiP-rVbvvoJ_MLYh2iWg
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEF6kCnpSseLbPXhNm2yyeXgrjaViWlqI0FuZ3WykIE1psx789c5uU8WDB28hzJIw-5jHzjcfIQ-mYYgEQM-tUNxQmLlOLEtwFJMgcEWxAGzL_Cwaj-PZLJk0YHWLhVFK2eIz1TGP9i6_qKQ2qbKuoUX3bevIfR4EzNvCtRrcr-cm3bTXx_UUGAAK452d-C_iFGs3Bsf__OIJaf8g8Ojk27ackj21PCN1qqd53nukqYZ3OtWolMWnKmiOoWi1dnLD9kB7Bay21-vU5FhpivGlNrDbNzoCUyqkC-U0Bx3KoM9Kn2wbCfwROkCf08m1yZXQqqRZNtq0yevgKe8PnYY1wQHcXLUTSemGpQ8q4H6JKleAIQnGntItIgGQsEjETHlRyBO0Tp7wYxFKJl3BQ0AJ6Z-T1rJaqgtCgUvOAB28MMbxoRuLCFAiEcrDzQVwSdpGZfPVtjHGfKetqz_e35PDYT7K5tnz-OWaHJlJMlVXjN2QVr3W6pYcyI96sVnf2Yn9AgDNpaE
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2025+62nd+ACM%2FIEEE+Design+Automation+Conference+%28DAC%29&rft.atitle=DuQTTA%3A+Dual+Quantized+Tensor-Train+Adaptation+with+Decoupling+Magnitude-Direction+for+Efficient+Fine-Tuning+of+LLMs&rft.au=Dong%2C+Haoyan&rft.au=Chen%2C+Hai-Bao&rft.au=Chang%2C+Jingjing&rft.au=Yang%2C+Yixin&rft.date=2025-06-22&rft.pub=IEEE&rft.spage=1&rft.epage=7&rft_id=info:doi/10.1109%2FDAC63849.2025.11133002&rft.externalDocID=11133002