A survey on different dimensions for graphical keyword extraction techniques Issues and Challenges

The transmission from offline activities to online activities due to the social disorder evolved from COVID-19 pandemic lockdown has led to increase in the online economic and social activities. In this regard, the Automatic Keyword Extraction (AKE) from textual data has become even more interesting...

Full description

Saved in:
Bibliographic Details
Published in:The Artificial intelligence review Vol. 54; no. 6; pp. 4731 - 4770
Main Author: Garg, Muskan
Format: Journal Article
Language:English
Published: Dordrecht Springer Netherlands 01.08.2021
Subjects:
ISSN:0269-2821, 1573-7462
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The transmission from offline activities to online activities due to the social disorder evolved from COVID-19 pandemic lockdown has led to increase in the online economic and social activities. In this regard, the Automatic Keyword Extraction (AKE) from textual data has become even more interesting due to its application over different domains of Natural Language Processing (NLP). It is observed that the Graphical Keyword Extraction Techniques (GKET) use Graph of Words (GoW) in literature for analysis in different dimensions. In this article, efforts have been made to study these different dimensions for GKET, namely, the GoW representation, the statistical properties of GoW, the stability of the structure of GoW, the diversity in approaches over GoW for GKET, and the ranking of nodes in GoW. To elucidate these different dimensions, a comprehensive survey of GKET is carried in different domains to make some inferences out of the existing literature. These inferences are used to lay down possible research directions for interdisciplinary studies of network science and NLP. In addition, the experimental results are analysed to compare and contrast the existing GKET over 21 different dataset, to analyse the Word Co-occurrence Networks (WCN) for 15 different languages, and to study the structure of WCN for different genres. In this article, some strong correspondences in different disciplinary approaches are identified for different dimensions, namely, GoW representation: ’Line Graphs’ and ’Bigram Words Graphs’; Feature extraction and selection using eigenvalues: ’Random Walk’ and ’Spectral Clustering’. Different observations over the need to integrate multiple dimensions has open new research directions in the inter-disciplinary field of network science and NLP, applicable to handle streaming data and language-independent NLP.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0269-2821
1573-7462
DOI:10.1007/s10462-021-10010-6