Text Compression Algorithm Based on Feedback Self-Optimization Using Lightweight LLMs

Uložené v:
Podrobná bibliografia
Názov: Text Compression Algorithm Based on Feedback Self-Optimization Using Lightweight LLMs
Autori: Ziyue Wang, Chia Yean Lim, Xingshuai Jia
Zdroj: Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology.
Informácie o vydavateľovi: SAGE Publications, 2025.
Rok vydania: 2025
Popis: Large Language Models (LLMs) have demonstrated impressive capabilities in text summarization and question answering. However, their outputs often rely on shallow rephrasing of the input, leading to summaries that may overlook key information or misrepresent the source content. Moreover, LLMs face challenges when processing excessively long inputs due to context limitations and information dilution. To address these issues, we propose a two-stage text compression framework based on lightweight LLMs. In the first stage, semantically salient text units are identified using a combination of lightweight LLM scoring and embedding-based ranking to achieve concise input compression. In the second stage, the same lightweight LLM is employed to regenerate coherent and informative summaries from the compressed inputs. Experimental results on multiple QA datasets demonstrate that our system achieves competitive performance against standard baselines while significantly reducing model size and input length. The framework is highly adaptable, interpretable, and suitable for integration into resource-constrained environments and document processing scenarios characterized by noisy or semantically irrelevant content.
Druh dokumentu: Article
Jazyk: English
ISSN: 1875-8967
1064-1246
DOI: 10.1177/18758967251375117
Rights: URL: https://journals.sagepub.com/page/policies/text-and-data-mining-license
Prístupové číslo: edsair.doi...........db0b6a8dc73b1079e28ea721d6a20a80
Databáza: OpenAIRE
Popis
Abstrakt:Large Language Models (LLMs) have demonstrated impressive capabilities in text summarization and question answering. However, their outputs often rely on shallow rephrasing of the input, leading to summaries that may overlook key information or misrepresent the source content. Moreover, LLMs face challenges when processing excessively long inputs due to context limitations and information dilution. To address these issues, we propose a two-stage text compression framework based on lightweight LLMs. In the first stage, semantically salient text units are identified using a combination of lightweight LLM scoring and embedding-based ranking to achieve concise input compression. In the second stage, the same lightweight LLM is employed to regenerate coherent and informative summaries from the compressed inputs. Experimental results on multiple QA datasets demonstrate that our system achieves competitive performance against standard baselines while significantly reducing model size and input length. The framework is highly adaptable, interpretable, and suitable for integration into resource-constrained environments and document processing scenarios characterized by noisy or semantically irrelevant content.
ISSN:18758967
10641246
DOI:10.1177/18758967251375117