OOPS: Outlier-Aware and Quadratic Programming Based Structured Pruning for Large Language Models

The large model size and resource consumption of Large Language Models (LLMs) limit their deployment and application in many scenarios. Structured pruning offers a solution to this challenge. Based on the need for retraining after pruning, structured pruning methods for LLMs fall into two categories...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Neural networks Ročník 196; s. 108332
Hlavní autori: Wei, Jiateng, Li, Siqi, Xiang, Jingyang, Yang, Jiandang, Chen, Jun, Wei, Xiaobin, Jiang, Yunliang, Liu, Yong
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: United States Elsevier Ltd 25.11.2025
Predmet:
ISSN:0893-6080, 1879-2782
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract The large model size and resource consumption of Large Language Models (LLMs) limit their deployment and application in many scenarios. Structured pruning offers a solution to this challenge. Based on the need for retraining after pruning, structured pruning methods for LLMs fall into two categories: retraining-free and retraining-based. Retraining-free methods often result in significant performance degradation, while retraining-based methods may require substantial computational resources. To address these limitations, we propose a structured pruning framework named OOPS (Outlier-Aware and Quadratic PrOgramming-Based Structured Pruning). It comprises three key components: outlier-aware pruning unit selection, quadratic programming-based reconstruction, and layer-wise distillation. By employing the first two components, OOPS prunes models without the requirement of retraining, outperforming existing retraining-free methods. When further incorporating layer-wise distillation to train the pruned layers individually, OOPS surpasses other retraining-based methods with lower computational costs. We evaluate the effectiveness of OOPS on 11 models from 4 LLM families across multiple tasks, demonstrating its superior performance compared to state-of-the-art methods in both retraining-free and retraining-based settings.
AbstractList The large model size and resource consumption of Large Language Models (LLMs) limit their deployment and application in many scenarios. Structured pruning offers a solution to this challenge. Based on the need for retraining after pruning, structured pruning methods for LLMs fall into two categories: retraining-free and retraining-based. Retraining-free methods often result in significant performance degradation, while retraining-based methods may require substantial computational resources. To address these limitations, we propose a structured pruning framework named OOPS (Outlier-Aware and Quadratic PrOgramming-Based Structured Pruning). It comprises three key components: outlier-aware pruning unit selection, quadratic programming-based reconstruction, and layer-wise distillation. By employing the first two components, OOPS prunes models without the requirement of retraining, outperforming existing retraining-free methods. When further incorporating layer-wise distillation to train the pruned layers individually, OOPS surpasses other retraining-based methods with lower computational costs. We evaluate the effectiveness of OOPS on 11 models from 4 LLM families across multiple tasks, demonstrating its superior performance compared to state-of-the-art methods in both retraining-free and retraining-based settings.
ArticleNumber 108332
Author Jiang, Yunliang
Wei, Xiaobin
Wei, Jiateng
Li, Siqi
Xiang, Jingyang
Yang, Jiandang
Chen, Jun
Liu, Yong
Author_xml – sequence: 1
  givenname: Jiateng
  surname: Wei
  fullname: Wei, Jiateng
  organization: Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, 310027, China
– sequence: 2
  givenname: Siqi
  orcidid: 0009-0000-4632-9010
  surname: Li
  fullname: Li, Siqi
  organization: Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, 310027, China
– sequence: 3
  givenname: Jingyang
  orcidid: 0000-0001-5350-1528
  surname: Xiang
  fullname: Xiang, Jingyang
  organization: Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, 310027, China
– sequence: 4
  givenname: Jiandang
  surname: Yang
  fullname: Yang, Jiandang
  organization: Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, 310027, China
– sequence: 5
  givenname: Jun
  orcidid: 0000-0001-6568-8801
  surname: Chen
  fullname: Chen, Jun
  email: junc.change@zjnu.edu.cn
  organization: National Special Education Resource Center for Children with Autism, Zhejiang Normal University, Hangzhou, 311231, China
– sequence: 6
  givenname: Xiaobin
  surname: Wei
  fullname: Wei, Xiaobin
  organization: Wasu Media & Network CO.,Ltd, China
– sequence: 7
  givenname: Yunliang
  surname: Jiang
  fullname: Jiang, Yunliang
  organization: School of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004, China
– sequence: 8
  givenname: Yong
  surname: Liu
  fullname: Liu, Yong
  email: yongliu@iipc.zju.edu.cn
  organization: Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, 310027, China
BackLink https://www.ncbi.nlm.nih.gov/pubmed/41337790$$D View this record in MEDLINE/PubMed
BookMark eNp9kNtKw0AQhhep2IO-gUheIHVP2Wy8ELR4gkoq1et1szsJKe2mbLKKb29K1EtvZn5m_n9gvikaucYBQucEzwkm4nIzdxAcdHOKadKPJGP0CE2ITLOYppKO0ATLjMUCSzxG07bdYIyF5OwEjTlhLE0zPEHveb5aX0V56LY1-PjmU3uItLPRS9DW66420co3lde7Xe2q6Fa3YKN154Ppgu_lygd3WJSNj5baV9BXVwXdi-fGwrY9Rcel3rZw9tNn6O3-7nXxGC_zh6fFzTI2hDIckwJAmzQpJS1pIjgjBTO25AKIACGolrywmHHDC1kKnCVFJopEmILKxHKdshm6GO7uQ7EDq_a-3mn_pX5f7Q18MBjftK2H8s9CsDoQVRs1EFUHomog2seuh1j_C3z0jFRranAGbO3BdMo29f8HvgFSr4AP
Cites_doi 10.18653/v1/2024.findings-acl.178
10.1609/aaai.v34i05.6399
10.18653/v1/2023.emnlp-main.298
10.18653/v1/2020.emnlp-demos.6
ContentType Journal Article
Copyright 2025
Copyright © 2025 Elsevier Ltd. All rights reserved.
Copyright_xml – notice: 2025
– notice: Copyright © 2025 Elsevier Ltd. All rights reserved.
DBID AAYXX
CITATION
NPM
DOI 10.1016/j.neunet.2025.108332
DatabaseName CrossRef
PubMed
DatabaseTitle CrossRef
PubMed
DatabaseTitleList
PubMed
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1879-2782
ExternalDocumentID 41337790
10_1016_j_neunet_2025_108332
S0893608025012134
Genre Journal Article
GroupedDBID ---
--K
--M
-~X
.DC
.~1
0R~
123
186
1B1
1RT
1~.
1~5
29N
4.4
457
4G.
53G
5RE
5VS
6TJ
7-5
71M
8P~
9DU
9JM
9JN
AABNK
AAEDT
AAEDW
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AATTM
AAXKI
AAXLA
AAXUO
AAYFN
AAYWO
ABAOU
ABBOA
ABCQJ
ABDPE
ABEFU
ABFNM
ABFRF
ABHFT
ABIVO
ABJNI
ABLJU
ABMAC
ABWVN
ABXDB
ACDAQ
ACGFO
ACGFS
ACIUM
ACLOT
ACNNM
ACRLP
ACRPL
ACVFH
ACZNC
ADBBV
ADCNI
ADEZE
ADGUI
ADJOM
ADMUD
ADNMO
ADRHT
AEBSH
AECPX
AEFWE
AEIPS
AEKER
AENEX
AEUPX
AFJKZ
AFPUW
AFTJW
AFXIZ
AGHFR
AGQPQ
AGUBO
AGWIK
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIGII
AIIUN
AIKHN
AITUG
AKBMS
AKRWK
AKYEP
ALMA_UNASSIGNED_HOLDINGS
AMRAJ
ANKPU
AOUOD
APXCP
ARUGR
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFKBS
EFLBG
EJD
EO8
EO9
EP2
EP3
F0J
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-2
G-Q
GBLVA
GBOLZ
HLZ
HMQ
HVGLF
HZ~
IHE
J1W
JJJVA
K-O
KOM
KZ1
LG9
LMP
M2V
M41
MHUIS
MO0
MOBAO
MVM
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
R2-
ROL
RPZ
SBC
SCC
SDF
SDG
SDP
SES
SEW
SNS
SPC
SPCBC
SSN
SST
SSV
SSW
SSZ
T5K
TAE
UAP
UNMZH
VOH
WUQ
XPP
ZMT
~G-
~HD
AAYXX
CITATION
NPM
ID FETCH-LOGICAL-c1230-1beeac75f82f256431b3cdf46e16e662a84bd034c4b8f6095b96b56cb285d4a73
ISSN 0893-6080
IngestDate Wed Dec 10 13:18:16 EST 2025
Thu Nov 27 01:07:36 EST 2025
Wed Dec 10 14:36:01 EST 2025
IsPeerReviewed true
IsScholarly true
Keywords Large Language Model
Model Compression
Network Pruning
Large language model
Model compression
Network pruning
Language English
License Copyright © 2025 Elsevier Ltd. All rights reserved.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c1230-1beeac75f82f256431b3cdf46e16e662a84bd034c4b8f6095b96b56cb285d4a73
ORCID 0009-0000-4632-9010
0000-0001-6568-8801
0000-0001-5350-1528
PMID 41337790
ParticipantIDs pubmed_primary_41337790
crossref_primary_10_1016_j_neunet_2025_108332
elsevier_sciencedirect_doi_10_1016_j_neunet_2025_108332
PublicationCentury 2000
PublicationDate 2025-Nov-25
PublicationDateYYYYMMDD 2025-11-25
PublicationDate_xml – month: 11
  year: 2025
  text: 2025-Nov-25
  day: 25
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Neural networks
PublicationTitleAlternate Neural Netw
PublicationYear 2025
Publisher Elsevier Ltd
Publisher_xml – name: Elsevier Ltd
References Bisk, Zellers, Bras, Gao, Choi (bib0005) 2019
Li, G., Tang, Y., & Zhang, W. (2024). LoRAP: Transformer sub-layers deserve differentiated structured compression for large language models.
Guo, S., Xu, J., Zhang, L. L., & Yang, M. (2023). Compresso: Structured pruning with collaborative prompting learns compact large language models.
Chen, Chen, Wang, Dai, Tsang, Liu (bib0007) 2024; 25
Kaushal, A., Vaidhya, T., & Rish, I. (2023). Lord: Low rank decomposition of monolingual code llms for one-shot compression.
Ainslie, J., Lee-Thorp, J., De Jong, M., Zemlyanskiy, Y., Lebrón, F., & Sanghai, S. (2023). Gqa: Training generalized multi-query transformer models from multi-head checkpoints.
Dettmers, Lewis, Belkada, Zettlemoyer (bib0012) 2022; 35
Yin, L., Wu, Y., Zhang, Z., Hsieh, C.-Y., Wang, Y., Jia, Y., Pechenizkiy, M., Liang, Y., Wang, Z., & Liu, S. (2023). Outlier weighed layerwise sparsity (owl): A missing secret sauce for pruning llms to high sparsity.
Frantar, Alistarh (bib0014) 2023
Paszke, Gross, Massa, Lerer, Bradbury, Chanan, Killeen, Lin, Gimelshein, Antiga (bib0031) 2019; 32
Mihaylov, Clark, Khot, Sabharwal (bib0030) 2018
Wu, M., Waheed, A., Zhang, C., Abdul-Mageed, M., & Aji, A. F. (2023). Lamini-lm: A diverse herd of distilled models from large-scale instructions.
Frantar, Alistarh (bib0013) 2022; 35
Lin, Y., Tang, H., Yang, S., Zhang, Z., Xiao, G., Gan, C., & Han, S. (2024b). Qserve: W4a8kv4 quantization and system co-design for efficient llm serving.
Wang (bib0044) 2020
.
Li, H., Kadav, A., Durdanovic, I., Samet, H., & Graf, H. P. (2016). Pruning filters for efficient convnets. arXiv preprint
Frantar, Ashkboos, Hoefler, Alistarh (bib0015) 2022
Das, R. J., Ma, L., & Shen, Z. (2023). Beyond size: How gradients shape pruning decisions in large language models.
Shao, W., Chen, M., Zhang, Z., Xu, P., Zhao, L., Li, Z., Zhang, K., Gao, P., Qiao, Y., & Luo, P. (2023). Omniquant: Omnidirectionally calibrated quantization for large language models.
An, Zhao, Yu, Tang, Wang (bib0002) 2024; vol. 38
Brown, T. B. (2020). Language models are few-shot learners.
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F. et al. (2023). Llama: Open and efficient foundation language models.
Zhang, Y., Zhao, L., Lin, M., Sun, Y., Yao, Y., Han, X., Tanner, J., Liu, S., & Ji, R. (2023b). Dynamic sparse no training: Training-free fine-tuning for sparse llms.
Zellers, Holtzman, Bisk, Farhadi, Choi (bib0051) 2019
Wang, P., Fan, Z., Hu, S., Chen, Z., Wang, Y., & Wang, Y. (2024a). Reconstruct the pruned model without any retraining.
Clark, Lee, Chang, Kwiatkowski, Collins, Toutanova (bib0009) 2019
Han, Pool, Tran, Dally (bib0018) 2015
Lee, C., Jin, J., Kim, T., Kim, H., & Park, E. (2023). Owq: Lessons learned from activation outliers for weight quantization in large language models.
Sakaguchi, Le Bras, Bhagavatula, Choi (bib0033) 2020
Sun, M., Liu, Z., Bair, A., & Kolter, J. Z. (2023). A simple and effective pruning approach for large language models.
Raffel, Shazeer, Roberts, Lee, Narang, Matena, Zhou, Li, Liu (bib0032) 2020; 21
Siddiqui, S. A., Dong, X., Heinrich, G., Breuel, T., Kautz, J., Krueger, D., & Molchanov, P. (2024). A deeper look at depth pruning of LLMs.
Hassibi, Stork, Wolff (bib0019) 1993
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021). Lora: Low-rank adaptation of large language models.
Workshop, B., Scao, T. L., Fan, A., Akiki, C., Pavlick, E., Ilić, S., Hesslow, D., Castagné, R., Luccioni, A. S., Yvon, F. et al. (2022). Bloom: A 176b-parameter open-access multilingual language model.
Chen, M., Shao, W., Xu, P., Wang, J., Gao, P., Zhang, K., Qiao, Y., & Luo, P. (2024b). EfficientQAT: Efficient quantization-aware training for large language models.
Taori, R., Gulrajani, I., Zhang, T., Dubois, Y., Li, X., Guestrin, C., Liang, P., & Hashimoto, T. B. (2023). Stanford alpaca: An instruction-following LLaMA model.
Ashkboos, Croci, Nascimento, Hoefler, Hensman (bib0003) 2024
Song, J., Oh, K., Kim, T., Kim, H., Kim, Y., & Kim, J.-J. (2024). Sleb: Streamlining llms through redundancy verification and elimination of transformer blocks.
Merity, Xiong, Bradbury, Socher (bib0029) 2017
Yuan, Z., Shang, Y., Song, Y., Wu, Q., Yan, Y., & Sun, G. (2023). Asvd: Activation-aware singular value decomposition for compressing large language models.
Shi, Wang, Chu (bib0035) 2020
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M. et al. (2019). Huggingface’s transformers: State-of-the-art natural language processing.
Zhang, M., Chen, H., Shen, C., Yang, Z., Ou, L., Yu, X., & Zhuang, B. (2023a). Loraprune: Pruning meets low-rank parameter-efficient fine-tuning.
Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., & Tafjord, O. (2018). Think you have solved question answering? try ARC, the AI2 reasoning challenge. arXiv: Artificial Intelligence.
Ma, Fang, Wang (bib0028) 2023; 36
Wang, X., Zheng, Y., Wan, Z., & Zhang, M. (2024b). Svd-llm: Truncation-aware singular value decomposition for large language model compression.
Ashkboos, S., Mohtashami, A., Croci, M. L., Li, B., Jaggi, M., Alistarh, D., Hoefler, T., & Hensman, J. (2024b). Quarot: Outlier-free 4-bit inference in rotated llms.
Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, Polosukhin (bib0041) 2017; 30
Xiao, Lin, Seznec, Wu, Demouth, Han (bib0048) 2023
Lin, Tang, Tang, Yang, Chen, Wang, Xiao, Dang, Gan, Han (bib0026) 2024
Gao, L., Tow, J., Abbasi, B., Biderman, S., Black, S., DiPofi, A., Foster, C., Golding, L., Hsu, J., Le Noac’h, A., Li, H., McDonell, K., Muennighoff, N., Ociepa, C., Phang, J., Reynolds, L., Schoelkopf, H., Skowron, A., Sutawika, L., Tang, E., Thite, A., Wang, B., Wang, K., & Zou, A. (2021). A framework for few-shot language model evaluation.
He, S., Sun, G., Shen, Z., & Li, A. (2024). What matters in transformers? not all attention is needed.
Sakaguchi (10.1016/j.neunet.2025.108332_bib0033) 2020
Hassibi (10.1016/j.neunet.2025.108332_bib0019) 1993
10.1016/j.neunet.2025.108332_bib0040
Ma (10.1016/j.neunet.2025.108332_bib0028) 2023; 36
Merity (10.1016/j.neunet.2025.108332_bib0029) 2017
Clark (10.1016/j.neunet.2025.108332_bib0009) 2019
10.1016/j.neunet.2025.108332_bib0043
10.1016/j.neunet.2025.108332_bib0042
10.1016/j.neunet.2025.108332_bib0004
10.1016/j.neunet.2025.108332_bib0047
10.1016/j.neunet.2025.108332_bib0046
10.1016/j.neunet.2025.108332_bib0001
10.1016/j.neunet.2025.108332_bib0045
An (10.1016/j.neunet.2025.108332_bib0002) 2024; vol. 38
10.1016/j.neunet.2025.108332_bib0008
Xiao (10.1016/j.neunet.2025.108332_bib0048) 2023
10.1016/j.neunet.2025.108332_bib0006
10.1016/j.neunet.2025.108332_bib0049
Han (10.1016/j.neunet.2025.108332_bib0018) 2015
10.1016/j.neunet.2025.108332_bib0037
10.1016/j.neunet.2025.108332_bib0036
10.1016/j.neunet.2025.108332_bib0034
Mihaylov (10.1016/j.neunet.2025.108332_bib0030) 2018
10.1016/j.neunet.2025.108332_bib0039
10.1016/j.neunet.2025.108332_bib0038
Paszke (10.1016/j.neunet.2025.108332_bib0031) 2019; 32
10.1016/j.neunet.2025.108332_bib0022
10.1016/j.neunet.2025.108332_bib0021
Chen (10.1016/j.neunet.2025.108332_bib0007) 2024; 25
10.1016/j.neunet.2025.108332_bib0020
Wang (10.1016/j.neunet.2025.108332_bib0044) 2020
Ashkboos (10.1016/j.neunet.2025.108332_bib0003) 2024
10.1016/j.neunet.2025.108332_bib0025
Shi (10.1016/j.neunet.2025.108332_bib0035) 2020
10.1016/j.neunet.2025.108332_bib0024
Frantar (10.1016/j.neunet.2025.108332_bib0013) 2022; 35
10.1016/j.neunet.2025.108332_bib0023
Frantar (10.1016/j.neunet.2025.108332_bib0015) 2022
10.1016/j.neunet.2025.108332_bib0027
Lin (10.1016/j.neunet.2025.108332_bib0026) 2024
Dettmers (10.1016/j.neunet.2025.108332_bib0012) 2022; 35
Raffel (10.1016/j.neunet.2025.108332_bib0032) 2020; 21
Zellers (10.1016/j.neunet.2025.108332_bib0051) 2019
10.1016/j.neunet.2025.108332_bib0050
10.1016/j.neunet.2025.108332_bib0011
10.1016/j.neunet.2025.108332_bib0010
Vaswani (10.1016/j.neunet.2025.108332_bib0041) 2017; 30
10.1016/j.neunet.2025.108332_bib0053
Bisk (10.1016/j.neunet.2025.108332_bib0005) 2019
10.1016/j.neunet.2025.108332_bib0052
Frantar (10.1016/j.neunet.2025.108332_bib0014) 2023
10.1016/j.neunet.2025.108332_bib0017
10.1016/j.neunet.2025.108332_bib0016
References_xml – volume: 32
  year: 2019
  ident: bib0031
  article-title: Pytorch: An imperative style, high-performance deep learning library
  publication-title: Advances in neural information processing systems
– reference: Ainslie, J., Lee-Thorp, J., De Jong, M., Zemlyanskiy, Y., Lebrón, F., & Sanghai, S. (2023). Gqa: Training generalized multi-query transformer models from multi-head checkpoints.
– volume: 25
  start-page: 1
  year: 2024
  end-page: 44
  ident: bib0007
  article-title: Learning discretized neural networks under ricci flow
  publication-title: Journal of Machine Learning Research
– reference: Gao, L., Tow, J., Abbasi, B., Biderman, S., Black, S., DiPofi, A., Foster, C., Golding, L., Hsu, J., Le Noac’h, A., Li, H., McDonell, K., Muennighoff, N., Ociepa, C., Phang, J., Reynolds, L., Schoelkopf, H., Skowron, A., Sutawika, L., Tang, E., Thite, A., Wang, B., Wang, K., & Zou, A. (2021). A framework for few-shot language model evaluation.
– year: 2017
  ident: bib0029
  article-title: Pointer sentinel mixture models
  publication-title: International conference on learning representations
– year: 2019
  ident: bib0005
  article-title: Piqa: Reasoning about physical commonsense in natural language
  publication-title: Aaai conference on artificial intelligence
– start-page: 293
  year: 1993
  end-page: 299
  ident: bib0019
  article-title: Optimal brain surgeon and general network pruning
  publication-title: Ieee international conference on neural networks
– reference: Das, R. J., Ma, L., & Shen, Z. (2023). Beyond size: How gradients shape pruning decisions in large language models.
– reference: Brown, T. B. (2020). Language models are few-shot learners.
– reference: Zhang, Y., Zhao, L., Lin, M., Sun, Y., Yao, Y., Han, X., Tanner, J., Liu, S., & Ji, R. (2023b). Dynamic sparse no training: Training-free fine-tuning for sparse llms.
– reference: Yuan, Z., Shang, Y., Song, Y., Wu, Q., Yan, Y., & Sun, G. (2023). Asvd: Activation-aware singular value decomposition for compressing large language models.
– start-page: 1135
  year: 2015
  end-page: 1143
  ident: bib0018
  article-title: Learning both weights and connections for efficient neural networks
  publication-title: Proceedings of the 28th international conference on neural information processing systems volume 1
– reference: Song, J., Oh, K., Kim, T., Kim, H., Kim, Y., & Kim, J.-J. (2024). Sleb: Streamlining llms through redundancy verification and elimination of transformer blocks.
– reference: Zhang, M., Chen, H., Shen, C., Yang, Z., Ou, L., Yu, X., & Zhuang, B. (2023a). Loraprune: Pruning meets low-rank parameter-efficient fine-tuning.
– reference: Wu, M., Waheed, A., Zhang, C., Abdul-Mageed, M., & Aji, A. F. (2023). Lamini-lm: A diverse herd of distilled models from large-scale instructions.
– reference: Sun, M., Liu, Z., Bair, A., & Kolter, J. Z. (2023). A simple and effective pruning approach for large language models.
– volume: 30
  year: 2017
  ident: bib0041
  article-title: Attention is all you need
  publication-title: Advances in neural information processing systems
– year: 2024
  ident: bib0003
  article-title: Slicegpt: Compress large language models by deleting rows and columns
  publication-title: International conference on learning representations
– year: 2022
  ident: bib0015
  article-title: Optq: Accurate quantization for generative pre-trained transformers
  publication-title: The eleventh international conference on learning representations
– start-page: 2381
  year: 2018
  end-page: 2391
  ident: bib0030
  article-title: Can a suit of armor conduct electricity? a new dataset for open book question answering
  publication-title: Proceedings of the 2018 conference on empirical methods in natural language processing
– reference: Ashkboos, S., Mohtashami, A., Croci, M. L., Li, B., Jaggi, M., Alistarh, D., Hoefler, T., & Hensman, J. (2024b). Quarot: Outlier-free 4-bit inference in rotated llms.
– reference: Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M. et al. (2019). Huggingface’s transformers: State-of-the-art natural language processing.
– reference: Workshop, B., Scao, T. L., Fan, A., Akiki, C., Pavlick, E., Ilić, S., Hesslow, D., Castagné, R., Luccioni, A. S., Yvon, F. et al. (2022). Bloom: A 176b-parameter open-access multilingual language model.
– reference: Chen, M., Shao, W., Xu, P., Wang, J., Gao, P., Zhang, K., Qiao, Y., & Luo, P. (2024b). EfficientQAT: Efficient quantization-aware training for large language models.
– start-page: 19
  year: 2020
  end-page: 26
  ident: bib0035
  article-title: Efficient sparse-dense matrix-matrix multiplication on GPUs using the customized sparse storage format
  publication-title: 2020 IEEE 26th international conference on parallel and distributed systems (ICPADS)
– reference: Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F. et al. (2023). Llama: Open and efficient foundation language models.
– volume: 21
  start-page: 1
  year: 2020
  end-page: 67
  ident: bib0032
  article-title: Exploring the limits of transfer learning with a unified text-to-text transformer
  publication-title: Journal of machine learning research
– volume: 36
  start-page: 21702
  year: 2023
  end-page: 21720
  ident: bib0028
  article-title: Llm-pruner: On the structural pruning of large language models
  publication-title: Advances in neural information processing systems
– reference: Shao, W., Chen, M., Zhang, Z., Xu, P., Zhao, L., Li, Z., Zhang, K., Gao, P., Qiao, Y., & Luo, P. (2023). Omniquant: Omnidirectionally calibrated quantization for large language models.
– reference: Li, H., Kadav, A., Durdanovic, I., Samet, H., & Graf, H. P. (2016). Pruning filters for efficient convnets. arXiv preprint
– reference: .
– reference: Wang, X., Zheng, Y., Wan, Z., & Zhang, M. (2024b). Svd-llm: Truncation-aware singular value decomposition for large language model compression.
– year: 2023
  ident: bib0048
  article-title: Smoothquant: accurate and efficient post-training quantization for large language models
  publication-title: Proceedings of the 40th international conference on machine learning
– reference: Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., & Tafjord, O. (2018). Think you have solved question answering? try ARC, the AI2 reasoning challenge. arXiv: Artificial Intelligence.
– reference: Li, G., Tang, Y., & Zhang, W. (2024). LoRAP: Transformer sub-layers deserve differentiated structured compression for large language models.
– reference: Siddiqui, S. A., Dong, X., Heinrich, G., Breuel, T., Kautz, J., Krueger, D., & Molchanov, P. (2024). A deeper look at depth pruning of LLMs.
– year: 2019
  ident: bib0051
  article-title: Hellaswag: Can a machine really finish your sentence?
  publication-title: Proceedings of the 57th annual meeting of the association for computational linguistics
– reference: Yin, L., Wu, Y., Zhang, Z., Hsieh, C.-Y., Wang, Y., Jia, Y., Pechenizkiy, M., Liang, Y., Wang, Z., & Liu, S. (2023). Outlier weighed layerwise sparsity (owl): A missing secret sauce for pruning llms to high sparsity.
– reference: He, S., Sun, G., Shen, Z., & Li, A. (2024). What matters in transformers? not all attention is needed.
– volume: 35
  start-page: 4475
  year: 2022
  end-page: 4488
  ident: bib0013
  article-title: Optimal brain compression: A framework for accurate post-training quantization and pruning
  publication-title: Advances in Neural Information Processing Systems
– year: 2024
  ident: bib0026
  article-title: Awq: Activation-aware weight quantization for llm compression and acceleration
  publication-title: Mlsys
– reference: Lin, Y., Tang, H., Yang, S., Zhang, Z., Xiao, G., Gan, C., & Han, S. (2024b). Qserve: W4a8kv4 quantization and system co-design for efficient llm serving.
– reference: Guo, S., Xu, J., Zhang, L. L., & Yang, M. (2023). Compresso: Structured pruning with collaborative prompting learns compact large language models.
– reference: Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021). Lora: Low-rank adaptation of large language models.
– start-page: 31
  year: 2020
  end-page: 42
  ident: bib0044
  article-title: Sparsert: Accelerating unstructured sparsity on gpus for deep learning inference
  publication-title: Proceedings of the ACM international conference on parallel architectures and compilation techniques
– start-page: 8732
  year: 2020
  end-page: 8740
  ident: bib0033
  article-title: Winogrande: An adversarial winograd schema challenge at scale
  publication-title: Proceedings of the AAAI Conference on Artificial Intelligence
– reference: Lee, C., Jin, J., Kim, T., Kim, H., & Park, E. (2023). Owq: Lessons learned from activation outliers for weight quantization in large language models.
– reference: .
– start-page: 10323
  year: 2023
  end-page: 10337
  ident: bib0014
  article-title: Sparsegpt: Massive language models can be accurately pruned in one-shot
  publication-title: International conference on machine learning
– volume: vol. 38
  start-page: 10865
  year: 2024
  end-page: 10873
  ident: bib0002
  article-title: Fluctuation-based adaptive structured pruning for large language models
  publication-title: Proceedings of the AAAI conference on artificial intelligence
– start-page: 2924
  year: 2019
  end-page: 2936
  ident: bib0009
  article-title: Boolq: Exploring the surprising difficulty of natural yes/no questions
  publication-title: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics
– volume: 35
  start-page: 30318
  year: 2022
  end-page: 30332
  ident: bib0012
  article-title: Llm.int8: 8-bit matrix multiplication for transformers at scale
  publication-title: Advances in Neural Information Processing Systems
– reference: Kaushal, A., Vaidhya, T., & Rish, I. (2023). Lord: Low rank decomposition of monolingual code llms for one-shot compression.
– reference: Wang, P., Fan, Z., Hu, S., Chen, Z., Wang, Y., & Wang, Y. (2024a). Reconstruct the pruned model without any retraining.
– reference: Taori, R., Gulrajani, I., Zhang, T., Dubois, Y., Li, X., Guestrin, C., Liang, P., & Hashimoto, T. B. (2023). Stanford alpaca: An instruction-following LLaMA model.
– ident: 10.1016/j.neunet.2025.108332_bib0037
– volume: 35
  start-page: 30318
  year: 2022
  ident: 10.1016/j.neunet.2025.108332_bib0012
  article-title: Llm.int8: 8-bit matrix multiplication for transformers at scale
  publication-title: Advances in Neural Information Processing Systems
– year: 2024
  ident: 10.1016/j.neunet.2025.108332_bib0003
  article-title: Slicegpt: Compress large language models by deleting rows and columns
– year: 2022
  ident: 10.1016/j.neunet.2025.108332_bib0015
  article-title: Optq: Accurate quantization for generative pre-trained transformers
– year: 2019
  ident: 10.1016/j.neunet.2025.108332_bib0005
  article-title: Piqa: Reasoning about physical commonsense in natural language
– ident: 10.1016/j.neunet.2025.108332_bib0052
  doi: 10.18653/v1/2024.findings-acl.178
– ident: 10.1016/j.neunet.2025.108332_bib0010
– start-page: 293
  year: 1993
  ident: 10.1016/j.neunet.2025.108332_bib0019
  article-title: Optimal brain surgeon and general network pruning
– ident: 10.1016/j.neunet.2025.108332_bib0004
– ident: 10.1016/j.neunet.2025.108332_bib0046
– ident: 10.1016/j.neunet.2025.108332_bib0027
– ident: 10.1016/j.neunet.2025.108332_bib0008
– ident: 10.1016/j.neunet.2025.108332_bib0042
– ident: 10.1016/j.neunet.2025.108332_bib0023
– ident: 10.1016/j.neunet.2025.108332_bib0038
– volume: 25
  start-page: 1
  issue: 386
  year: 2024
  ident: 10.1016/j.neunet.2025.108332_bib0007
  article-title: Learning discretized neural networks under ricci flow
  publication-title: Journal of Machine Learning Research
– start-page: 8732
  year: 2020
  ident: 10.1016/j.neunet.2025.108332_bib0033
  article-title: Winogrande: An adversarial winograd schema challenge at scale
  publication-title: Proceedings of the AAAI Conference on Artificial Intelligence
  doi: 10.1609/aaai.v34i05.6399
– ident: 10.1016/j.neunet.2025.108332_bib0034
– ident: 10.1016/j.neunet.2025.108332_bib0017
– volume: 32
  year: 2019
  ident: 10.1016/j.neunet.2025.108332_bib0031
  article-title: Pytorch: An imperative style, high-performance deep learning library
  publication-title: Advances in neural information processing systems
– ident: 10.1016/j.neunet.2025.108332_bib0049
– start-page: 31
  year: 2020
  ident: 10.1016/j.neunet.2025.108332_bib0044
  article-title: Sparsert: Accelerating unstructured sparsity on gpus for deep learning inference
– start-page: 2381
  year: 2018
  ident: 10.1016/j.neunet.2025.108332_bib0030
  article-title: Can a suit of armor conduct electricity? a new dataset for open book question answering
– ident: 10.1016/j.neunet.2025.108332_bib0020
– ident: 10.1016/j.neunet.2025.108332_bib0024
– ident: 10.1016/j.neunet.2025.108332_bib0039
– volume: 21
  start-page: 1
  issue: 140
  year: 2020
  ident: 10.1016/j.neunet.2025.108332_bib0032
  article-title: Exploring the limits of transfer learning with a unified text-to-text transformer
  publication-title: Journal of machine learning research
– ident: 10.1016/j.neunet.2025.108332_bib0016
– volume: 30
  year: 2017
  ident: 10.1016/j.neunet.2025.108332_bib0041
  article-title: Attention is all you need
  publication-title: Advances in neural information processing systems
– ident: 10.1016/j.neunet.2025.108332_bib0025
– ident: 10.1016/j.neunet.2025.108332_bib0050
– volume: 36
  start-page: 21702
  year: 2023
  ident: 10.1016/j.neunet.2025.108332_bib0028
  article-title: Llm-pruner: On the structural pruning of large language models
  publication-title: Advances in neural information processing systems
– start-page: 10323
  year: 2023
  ident: 10.1016/j.neunet.2025.108332_bib0014
  article-title: Sparsegpt: Massive language models can be accurately pruned in one-shot
– start-page: 2924
  year: 2019
  ident: 10.1016/j.neunet.2025.108332_bib0009
  article-title: Boolq: Exploring the surprising difficulty of natural yes/no questions
– ident: 10.1016/j.neunet.2025.108332_bib0001
  doi: 10.18653/v1/2023.emnlp-main.298
– volume: vol. 38
  start-page: 10865
  year: 2024
  ident: 10.1016/j.neunet.2025.108332_bib0002
  article-title: Fluctuation-based adaptive structured pruning for large language models
– year: 2023
  ident: 10.1016/j.neunet.2025.108332_bib0048
  article-title: Smoothquant: accurate and efficient post-training quantization for large language models
– ident: 10.1016/j.neunet.2025.108332_bib0021
– ident: 10.1016/j.neunet.2025.108332_bib0006
– start-page: 1135
  year: 2015
  ident: 10.1016/j.neunet.2025.108332_bib0018
  article-title: Learning both weights and connections for efficient neural networks
– ident: 10.1016/j.neunet.2025.108332_bib0040
– ident: 10.1016/j.neunet.2025.108332_bib0036
– volume: 35
  start-page: 4475
  year: 2022
  ident: 10.1016/j.neunet.2025.108332_bib0013
  article-title: Optimal brain compression: A framework for accurate post-training quantization and pruning
  publication-title: Advances in Neural Information Processing Systems
– year: 2024
  ident: 10.1016/j.neunet.2025.108332_bib0026
  article-title: Awq: Activation-aware weight quantization for llm compression and acceleration
– ident: 10.1016/j.neunet.2025.108332_bib0053
– ident: 10.1016/j.neunet.2025.108332_bib0011
– year: 2017
  ident: 10.1016/j.neunet.2025.108332_bib0029
  article-title: Pointer sentinel mixture models
– ident: 10.1016/j.neunet.2025.108332_bib0047
– ident: 10.1016/j.neunet.2025.108332_bib0045
  doi: 10.18653/v1/2020.emnlp-demos.6
– year: 2019
  ident: 10.1016/j.neunet.2025.108332_bib0051
  article-title: Hellaswag: Can a machine really finish your sentence?
– ident: 10.1016/j.neunet.2025.108332_bib0043
– ident: 10.1016/j.neunet.2025.108332_bib0022
– start-page: 19
  year: 2020
  ident: 10.1016/j.neunet.2025.108332_bib0035
  article-title: Efficient sparse-dense matrix-matrix multiplication on GPUs using the customized sparse storage format
SSID ssj0006843
Score 2.4722047
Snippet The large model size and resource consumption of Large Language Models (LLMs) limit their deployment and application in many scenarios. Structured pruning...
SourceID pubmed
crossref
elsevier
SourceType Index Database
Publisher
StartPage 108332
SubjectTerms Large Language Model
Model Compression
Network Pruning
Title OOPS: Outlier-Aware and Quadratic Programming Based Structured Pruning for Large Language Models
URI https://dx.doi.org/10.1016/j.neunet.2025.108332
https://www.ncbi.nlm.nih.gov/pubmed/41337790
Volume 196
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1879-2782
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0006843
  issn: 0893-6080
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3JTsMwELXYDlzY90U-cHXVZnEcbgWBoEJtESDKKSSOK7WCAC1h-XvGW9qCROmBS1SNUsfyPHleJs8zCB1ADI4TLW6tcOI5ZY_ETAgCsT2FgEGBI6eq2URQr7NWK2yaVHZftRMIsox9fITP_-pqsIGz5dHZCdxdDAoG-A1Ohyu4Ha5_cnyj0byS7_mN_BX4ZY9U32PzjeAyj9OeqtDa1KqsR5knOII4JkmnrCObSzV6s5dnVl95IYXicNVJTdU57aE_TGhlcQ_wcqbV5AVBvxVKJVADxwsTG6XqRxmvOi8da2l1TL66Bk_8jAe33hV2meowdpOdcHx5TE-fZLabWOgSWtbdmswmWQHap5OaP_ZvnUroljKRw8RLcsTS4PbRctnfwlghLrS6tW6kR4nkKJEeZRrNOoEfwg4-Wz0_adWKoE2ZFlja6dpTlkoK-HM2Y1jMEEW5XkIL5t0CVzUmltGUyFbQou3bgc02voruJUQO8QhAMCwzLgCChwCCFUDwACDYAAQDQLACCLYAwRoga-jm9OT6-IyYThuEA3Mpk0oiIAAHfps5beDAQCoTl6dtj4oKFZQ6MfOStOx63EtYW5YoTEKa-JQnDvNTLw7cdTSTPWViE2FXpK7PUoeH3PG4_E7rusCiWciEH0Pw2ELErlz0rAuqRL95bAsFdnkjQwo12YsAM2P-uaG9UTwHCJuqr7k94Rx20PwA27toBhZc7KE5_vba6ff2DZS-ABt1io0
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=OOPS%3A+Outlier-Aware+and+Quadratic+Programming+Based+Structured+Pruning+for+Large+Language+Models&rft.jtitle=Neural+networks&rft.au=Wei%2C+Jiateng&rft.au=Li%2C+Siqi&rft.au=Xiang%2C+Jingyang&rft.au=Yang%2C+Jiandang&rft.date=2025-11-25&rft.issn=0893-6080&rft.spage=108332&rft_id=info:doi/10.1016%2Fj.neunet.2025.108332&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_neunet_2025_108332
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0893-6080&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0893-6080&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0893-6080&client=summon