OOPS: Outlier-Aware and Quadratic Programming Based Structured Pruning for Large Language Models
The large model size and resource consumption of Large Language Models (LLMs) limit their deployment and application in many scenarios. Structured pruning offers a solution to this challenge. Based on the need for retraining after pruning, structured pruning methods for LLMs fall into two categories...
Saved in:
| Published in: | Neural networks Vol. 196; p. 108332 |
|---|---|
| Main Authors: | , , , , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
United States
Elsevier Ltd
25.11.2025
|
| Subjects: | |
| ISSN: | 0893-6080, 1879-2782 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The large model size and resource consumption of Large Language Models (LLMs) limit their deployment and application in many scenarios. Structured pruning offers a solution to this challenge. Based on the need for retraining after pruning, structured pruning methods for LLMs fall into two categories: retraining-free and retraining-based. Retraining-free methods often result in significant performance degradation, while retraining-based methods may require substantial computational resources. To address these limitations, we propose a structured pruning framework named OOPS (Outlier-Aware and Quadratic PrOgramming-Based Structured Pruning). It comprises three key components: outlier-aware pruning unit selection, quadratic programming-based reconstruction, and layer-wise distillation. By employing the first two components, OOPS prunes models without the requirement of retraining, outperforming existing retraining-free methods. When further incorporating layer-wise distillation to train the pruned layers individually, OOPS surpasses other retraining-based methods with lower computational costs. We evaluate the effectiveness of OOPS on 11 models from 4 LLM families across multiple tasks, demonstrating its superior performance compared to state-of-the-art methods in both retraining-free and retraining-based settings. |
|---|---|
| ISSN: | 0893-6080 1879-2782 |
| DOI: | 10.1016/j.neunet.2025.108332 |