Stacked Denoising Variational Auto Encoder Model for Extractive Web Text Summarization

Extracting and concatenating distilled content from a corpus into a summary is a technique known as extractive summarization. In recent days, extractive summarization of web text has become popular due to the wide usage of social media. Hence various researches have been conducted on extractive summ...

Full description

Saved in:
Bibliographic Details
Published in:Iranian journal of science and technology. Transactions of electrical engineering Vol. 48; no. 4; pp. 1501 - 1518
Main Authors: Yadav, Madhuri, Katarya, Rahul
Format: Journal Article
Language:English
Published: Cham Springer International Publishing 01.12.2024
Springer Nature B.V
Subjects:
ISSN:2228-6179, 2364-1827
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Extracting and concatenating distilled content from a corpus into a summary is a technique known as extractive summarization. In recent days, extractive summarization of web text has become popular due to the wide usage of social media. Hence various researches have been conducted on extractive summarization of web text, but the processing of huge amounts of web text and understanding the context is difficult due to the requirement of a lot of storage and time. To solve this issue, the continuous bag of words text vectorization model has been used that reduce the processing time by producing a distributed combination of words in vector arrangement. Moreover, the polysemous words are unable to be captured, which makes extraction difficult. Hence a novel Hierarchical Attention pointer Stacked Denoising Variational Autoencoder Model has been proposed in which the SDVAE model forms latent distribution for contextualized words and bidirectional attention mechanism extracts keywords and features from sentences thereby capturing polysemic words. Furthermore, the summary is obtained with dangling anaphora whereas antecedent morphological expression and verb referents are not considered in the summary. Hence a novel Multilayered Competitive Probable Modular Perception Model has been proposed in which the competitive layer scores the sentence and the scored sentences are ranked using string kernel and class conditional probability thereby considering the antecedent morphological expression and then, Graph based Quadruplicate Lexicon Summarization is used that forms quadruplicate lexicon chain in graphical format to eliminate dangling anaphoric expressions. The experimental results obtained show that the proposed model has achieved a comparatively high accuracy of 98.3% and recall, precision, and F-measure of 98%.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2228-6179
2364-1827
DOI:10.1007/s40998-024-00751-9