A keyword extraction model study in the movie domain with synopsis and reviews

The use of keywords is increasingly being applied across diverse domains, including the movie industry, whose main platforms are adopting advanced natural language processing techniques. Algorithms for automatic extraction of keywords can provide relevant information in this domain. The most novel a...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Knowledge and information systems Ročník 67; číslo 5; s. 4301 - 4323
Hlavní autoři: González-Santos, Carlos, Vega-Rodríguez, Miguel A., Pérez, Carlos J., Martínez-Sarriegui, Iñaki, López-Muñoz, Joaquín M.
Médium: Journal Article
Jazyk:angličtina
Vydáno: London Springer Nature B.V 01.05.2025
Témata:
ISSN:0219-1377, 0219-3116
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:The use of keywords is increasingly being applied across diverse domains, including the movie industry, whose main platforms are adopting advanced natural language processing techniques. Algorithms for automatic extraction of keywords can provide relevant information in this domain. The most novel approaches covering several categories (statistics, graphs, word embedding, and hybrid) have been considered in a model study framework. They have been implemented, applied, and evaluated with standard datasets. In addition, a movie dataset with gold standard keywords, based on textual metadata from synopses and reviews, has been specifically developed for this scope. Keyword extraction models have been evaluated in terms of F-score and computation time. Furthermore, content analysis, both quantitative and qualitative, of the extracted keywords in the movie context has been performed. Results show a great variability in model performance and computation time among the different models. Qualitative results, in addition to F-score and computation time, demonstrate that keyword extraction works better with synopses than with reviews. The quantitative content analysis revealed that EmbedRank effectively reduces redundancy and limits the use of proper nouns, leading to high-quality keywords.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0219-1377
0219-3116
DOI:10.1007/s10115-025-02350-4