DuQTTA: Dual Quantized Tensor-Train Adaptation with Decoupling Magnitude-Direction for Efficient Fine-Tuning of LLMs
Recent parameter-efficient fine-tuning (PEFT) techniques have enabled large language models (LLMs) to be efficiently fine-tuned for specific tasks, while maintaining model performance with minimal additional trainable parameters. However, existing PEFT techniques continue to face challenges in balan...
Gespeichert in:
| Veröffentlicht in: | 2025 62nd ACM/IEEE Design Automation Conference (DAC) S. 1 - 7 |
|---|---|
| Hauptverfasser: | , , , , , , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
22.06.2025
|
| Schlagworte: | |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Recent parameter-efficient fine-tuning (PEFT) techniques have enabled large language models (LLMs) to be efficiently fine-tuned for specific tasks, while maintaining model performance with minimal additional trainable parameters. However, existing PEFT techniques continue to face challenges in balancing both accuracy and efficiency, especially when addressing scalability and the demands of lightweight deployment for LLMs. In this paper, we propose an efficient fine-tuning method of LLMs based on dual quantized Tensor-Train adaptation with decoupling magnitude-direction (DuQTTA). The proposed DuQTTA method employs Tensor-Train decomposition and dual-stage quantization to minimize model size and resource consumption. Additionally, it employs an adaptive optimization strategy and a decoupled update mechanism to improve model performance, thereby minimizing suboptimal outcomes and ensuring alignment with the full-parameter fine-tuning goals. Experimental results indicate that the proposed DuQTTA method outperforms existing PEFT methods, achieving up to a 65 \times compression rate compared to the LLaMA2-7B models, meanwhile delivering improvements of 4.44 \%, 3.14 \%, and 0.97% over LoRA on LLaMA2-7B, LLaMA3-8B, and LLaMA2-13B, respectively. The proposed DuQTTA method is effective in compressing LLMs for deployment on resource-constrained edge devices. |
|---|---|
| AbstractList | Recent parameter-efficient fine-tuning (PEFT) techniques have enabled large language models (LLMs) to be efficiently fine-tuned for specific tasks, while maintaining model performance with minimal additional trainable parameters. However, existing PEFT techniques continue to face challenges in balancing both accuracy and efficiency, especially when addressing scalability and the demands of lightweight deployment for LLMs. In this paper, we propose an efficient fine-tuning method of LLMs based on dual quantized Tensor-Train adaptation with decoupling magnitude-direction (DuQTTA). The proposed DuQTTA method employs Tensor-Train decomposition and dual-stage quantization to minimize model size and resource consumption. Additionally, it employs an adaptive optimization strategy and a decoupled update mechanism to improve model performance, thereby minimizing suboptimal outcomes and ensuring alignment with the full-parameter fine-tuning goals. Experimental results indicate that the proposed DuQTTA method outperforms existing PEFT methods, achieving up to a 65 \times compression rate compared to the LLaMA2-7B models, meanwhile delivering improvements of 4.44 \%, 3.14 \%, and 0.97% over LoRA on LLaMA2-7B, LLaMA3-8B, and LLaMA2-13B, respectively. The proposed DuQTTA method is effective in compressing LLMs for deployment on resource-constrained edge devices. |
| Author | Chang, Jingjing Yang, Yixin Gao, Ziyang Chen, Hai-Bao Wang, Runsheng Dong, Haoyan Huang, Ru Ji, Zhigang |
| Author_xml | – sequence: 1 givenname: Haoyan surname: Dong fullname: Dong, Haoyan email: haibaochen@sjtu.edu.cn organization: Shanghai Jiao Tong University,The Department of Micro-Nano Electronics,Shanghai,China – sequence: 2 givenname: Hai-Bao surname: Chen fullname: Chen, Hai-Bao organization: Shanghai Jiao Tong University,The Department of Micro-Nano Electronics,Shanghai,China – sequence: 3 givenname: Jingjing surname: Chang fullname: Chang, Jingjing organization: Shanghai Jiao Tong University,The Department of Micro-Nano Electronics,Shanghai,China – sequence: 4 givenname: Yixin surname: Yang fullname: Yang, Yixin organization: Shanghai Jiao Tong University,The Department of Micro-Nano Electronics,Shanghai,China – sequence: 5 givenname: Ziyang surname: Gao fullname: Gao, Ziyang organization: Shanghai Jiao Tong University,The Department of Micro-Nano Electronics,Shanghai,China – sequence: 6 givenname: Zhigang surname: Ji fullname: Ji, Zhigang organization: Shanghai Jiao Tong University,The Department of Micro-Nano Electronics,Shanghai,China – sequence: 7 givenname: Runsheng surname: Wang fullname: Wang, Runsheng organization: Peking University, Peking University,School of Integrated Circuits,Beijing,China – sequence: 8 givenname: Ru surname: Huang fullname: Huang, Ru organization: Peking University, Peking University,School of Integrated Circuits,Beijing,China |
| BookMark | eNo10F1LwzAYBeAIeqFz_0Akf6AzH22TeFfWTYUOGdTr8TZ9MwM1LW2K6K93fl0dODyci3NFzkMfkJBbzlacM3NXFutc6tSsBBPZqeJSMibOyNIoo6XkGZMs1ZcklvO-rot7Ws7Q0f0MIfpPbGmNYerHpB7BB1q0MESIvg_03cdXWqLt56Hz4Uh3cAw-zi0mpR_R_hjXj3TjnLceQ6RbHzCp5_Cte0erajddkwsH3YTLv1yQl-2mXj8m1fPD07qoEuDKxERZy3InAdNMOhQWQatMqNSyVjUARqhGC-Qqz4xmnDdSN7kVljVZDidh5YLc_O56RDwMo3-D8ePwf4b8AnwXWew |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/DAC63849.2025.11133002 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9798331503048 |
| EndPage | 7 |
| ExternalDocumentID | 11133002 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: Nature funderid: 10.13039/501100020487 |
| GroupedDBID | 6IE 6IH CBEJK RIE RIO |
| ID | FETCH-LOGICAL-a179t-7cc06f3ae453fe2cea875274c0d7baa927b82e176598011b38b6c2c0b56ad7bc3 |
| IEDL.DBID | RIE |
| IngestDate | Wed Oct 01 07:05:15 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a179t-7cc06f3ae453fe2cea875274c0d7baa927b82e176598011b38b6c2c0b56ad7bc3 |
| PageCount | 7 |
| ParticipantIDs | ieee_primary_11133002 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-June-22 |
| PublicationDateYYYYMMDD | 2025-06-22 |
| PublicationDate_xml | – month: 06 year: 2025 text: 2025-June-22 day: 22 |
| PublicationDecade | 2020 |
| PublicationTitle | 2025 62nd ACM/IEEE Design Automation Conference (DAC) |
| PublicationTitleAbbrev | DAC |
| PublicationYear | 2025 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| Score | 2.2949028 |
| Snippet | Recent parameter-efficient fine-tuning (PEFT) techniques have enabled large language models (LLMs) to be efficiently fine-tuned for specific tasks, while... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Accuracy Adaptation models adaptive optimization strategy dual quantization Faces Large language models LLMs magnitude-direction decoupled Memory management Optimization Power demand Quantization (signal) Scalability Tensor-Train Tensors |
| Title | DuQTTA: Dual Quantized Tensor-Train Adaptation with Decoupling Magnitude-Direction for Efficient Fine-Tuning of LLMs |
| URI | https://ieeexplore.ieee.org/document/11133002 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELagYmACRBFveWB1mzh1nLBVTSuGtmqlIHWrzo6DKqGmamMGfj1nNwUxMLBF0VmJzo97-L77CHlyDUM0AHpuhRGOwixgiS6BGa5B4YriPfAt88dyOk0Wi3TWgNU9FsYY44vPTMc9-rv8otLWpcq6jhY98q0jj6WUe7BWg_oNg7Sb9Qe4mnoOfsJF5yD8izbFW43R2T-_d07aP_g7Ovu2LBfkyKwvSZ3ZeZ73n2lm4Z3OLapk9WkKmmMgWm1Z7rgeaL-Azf5ynboMK80wurQOdPtGJ-AKhWxhWHPMoQx6rHTom0jgj9ARepwsty5TQquSjseTXZu8job54IU1nAkMcGvVTGodxGUEpieiEhVuAAMSjDx1UEgFkHKpEm5CGYsUbVOookTFmutAiRhQQkdXpLWu1uaaUBBacED3Lk5wfBwkSgJKpMqEuLUAbkjbqWy52bfFWB60dfvH-zty6ibG1Vlxfk9a9daaB3KiP-rVbvvoJ_MLYh2iWg |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEF6kCnpSseLbPXhNm2yyeXgrjaViWlqI0FuZ3WykIE1psx789c5uU8WDB28hzJIw-5jHzjcfIQ-mYYgEQM-tUNxQmLlOLEtwFJMgcEWxAGzL_Cwaj-PZLJk0YHWLhVFK2eIz1TGP9i6_qKQ2qbKuoUX3bevIfR4EzNvCtRrcr-cm3bTXx_UUGAAK452d-C_iFGs3Bsf__OIJaf8g8Ojk27ackj21PCN1qqd53nukqYZ3OtWolMWnKmiOoWi1dnLD9kB7Bay21-vU5FhpivGlNrDbNzoCUyqkC-U0Bx3KoM9Kn2wbCfwROkCf08m1yZXQqqRZNtq0yevgKe8PnYY1wQHcXLUTSemGpQ8q4H6JKleAIQnGntItIgGQsEjETHlRyBO0Tp7wYxFKJl3BQ0AJ6Z-T1rJaqgtCgUvOAB28MMbxoRuLCFAiEcrDzQVwSdpGZfPVtjHGfKetqz_e35PDYT7K5tnz-OWaHJlJMlVXjN2QVr3W6pYcyI96sVnf2Yn9AgDNpaE |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2025+62nd+ACM%2FIEEE+Design+Automation+Conference+%28DAC%29&rft.atitle=DuQTTA%3A+Dual+Quantized+Tensor-Train+Adaptation+with+Decoupling+Magnitude-Direction+for+Efficient+Fine-Tuning+of+LLMs&rft.au=Dong%2C+Haoyan&rft.au=Chen%2C+Hai-Bao&rft.au=Chang%2C+Jingjing&rft.au=Yang%2C+Yixin&rft.date=2025-06-22&rft.pub=IEEE&rft.spage=1&rft.epage=7&rft_id=info:doi/10.1109%2FDAC63849.2025.11133002&rft.externalDocID=11133002 |