Text Compression Algorithm Based on Feedback Self-Optimization Using Lightweight LLMs
Uložené v:
| Názov: | Text Compression Algorithm Based on Feedback Self-Optimization Using Lightweight LLMs |
|---|---|
| Autori: | Ziyue Wang, Chia Yean Lim, Xingshuai Jia |
| Zdroj: | Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology. |
| Informácie o vydavateľovi: | SAGE Publications, 2025. |
| Rok vydania: | 2025 |
| Popis: | Large Language Models (LLMs) have demonstrated impressive capabilities in text summarization and question answering. However, their outputs often rely on shallow rephrasing of the input, leading to summaries that may overlook key information or misrepresent the source content. Moreover, LLMs face challenges when processing excessively long inputs due to context limitations and information dilution. To address these issues, we propose a two-stage text compression framework based on lightweight LLMs. In the first stage, semantically salient text units are identified using a combination of lightweight LLM scoring and embedding-based ranking to achieve concise input compression. In the second stage, the same lightweight LLM is employed to regenerate coherent and informative summaries from the compressed inputs. Experimental results on multiple QA datasets demonstrate that our system achieves competitive performance against standard baselines while significantly reducing model size and input length. The framework is highly adaptable, interpretable, and suitable for integration into resource-constrained environments and document processing scenarios characterized by noisy or semantically irrelevant content. |
| Druh dokumentu: | Article |
| Jazyk: | English |
| ISSN: | 1875-8967 1064-1246 |
| DOI: | 10.1177/18758967251375117 |
| Rights: | URL: https://journals.sagepub.com/page/policies/text-and-data-mining-license |
| Prístupové číslo: | edsair.doi...........db0b6a8dc73b1079e28ea721d6a20a80 |
| Databáza: | OpenAIRE |
| Abstrakt: | Large Language Models (LLMs) have demonstrated impressive capabilities in text summarization and question answering. However, their outputs often rely on shallow rephrasing of the input, leading to summaries that may overlook key information or misrepresent the source content. Moreover, LLMs face challenges when processing excessively long inputs due to context limitations and information dilution. To address these issues, we propose a two-stage text compression framework based on lightweight LLMs. In the first stage, semantically salient text units are identified using a combination of lightweight LLM scoring and embedding-based ranking to achieve concise input compression. In the second stage, the same lightweight LLM is employed to regenerate coherent and informative summaries from the compressed inputs. Experimental results on multiple QA datasets demonstrate that our system achieves competitive performance against standard baselines while significantly reducing model size and input length. The framework is highly adaptable, interpretable, and suitable for integration into resource-constrained environments and document processing scenarios characterized by noisy or semantically irrelevant content. |
|---|---|
| ISSN: | 18758967 10641246 |
| DOI: | 10.1177/18758967251375117 |
Nájsť tento článok vo Web of Science