OOPS: Outlier-Aware and Quadratic Programming Based Structured Pruning for Large Language Models

The large model size and resource consumption of Large Language Models (LLMs) limit their deployment and application in many scenarios. Structured pruning offers a solution to this challenge. Based on the need for retraining after pruning, structured pruning methods for LLMs fall into two categories...

Full description

Saved in:

Bibliographic Details
Published in:	Neural networks Vol. 196; p. 108332
Main Authors:	Wei, Jiateng, Li, Siqi, Xiang, Jingyang, Yang, Jiandang, Chen, Jun, Wei, Xiaobin, Jiang, Yunliang, Liu, Yong
Format:	Journal Article
Language:	English
Published:	United States Elsevier Ltd 25.11.2025
Subjects:	Large Language Model Model Compression Network Pruning Large Language Model Model Compression Network Pruning Large language model Model compression Network pruning
ISSN:	0893-6080, 1879-2782
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The large model size and resource consumption of Large Language Models (LLMs) limit their deployment and application in many scenarios. Structured pruning offers a solution to this challenge. Based on the need for retraining after pruning, structured pruning methods for LLMs fall into two categories: retraining-free and retraining-based. Retraining-free methods often result in significant performance degradation, while retraining-based methods may require substantial computational resources. To address these limitations, we propose a structured pruning framework named OOPS (Outlier-Aware and Quadratic PrOgramming-Based Structured Pruning). It comprises three key components: outlier-aware pruning unit selection, quadratic programming-based reconstruction, and layer-wise distillation. By employing the first two components, OOPS prunes models without the requirement of retraining, outperforming existing retraining-free methods. When further incorporating layer-wise distillation to train the pruned layers individually, OOPS surpasses other retraining-based methods with lower computational costs. We evaluate the effectiveness of OOPS on 11 models from 4 LLM families across multiple tasks, demonstrating its superior performance compared to state-of-the-art methods in both retraining-free and retraining-based settings.
ISSN:	0893-6080 1879-2782
DOI:	10.1016/j.neunet.2025.108332