Text Mining Based on the Lexicon-Constrained Network in the Context of Big Data
Unstructured textual news data is produced every day; analyzing them using an abstractive summarization algorithm provides advanced analytics to decision-makers. Deep learning network with copy mechanism is finding increasing use in abstractive summarization, because copy mechanism allows sequence-t...
Uložené v:
| Vydané v: | Wireless communications and mobile computing Ročník 2022; číslo 1 |
|---|---|
| Hlavní autori: | , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Oxford
Hindawi
2022
John Wiley & Sons, Inc |
| Predmet: | |
| ISSN: | 1530-8669, 1530-8677 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | Unstructured textual news data is produced every day; analyzing them using an abstractive summarization algorithm provides advanced analytics to decision-makers. Deep learning network with copy mechanism is finding increasing use in abstractive summarization, because copy mechanism allows sequence-to-sequence models to choose words from the input and put them directly into the output. However, since there is no explicit delimiter in Chinese sentences, most existing models for Chinese abstractive summarization can only perform character copy, resulting in inefficiency. To solve this problem, we propose a lexicon-constrained copying network that models multigranularity in both encoder and decoder. On the source side, words and characters are aggregated into the same input memory using a Transformer-based encoder. On the target side, the decoder can copy either a character or a multicharacter word at each time step, and the decoding process is guided by a word-enhanced search algorithm which facilitates the parallel computation and encourages the model to copy more words. Moreover, we adopt a word selector to integrate keyword information. Experiment results on a Chinese social media dataset show that our model can work standalone or with the word selector. Both forms can outperform previous character-based models and achieve competitive performances. |
|---|---|
| Bibliografia: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1530-8669 1530-8677 |
| DOI: | 10.1155/2022/8703100 |