Interactive Optimization of Embedding-based Text Similarity Calculations

Saved in:
Bibliographic Details
Title: Interactive Optimization of Embedding-based Text Similarity Calculations
Authors: Witschard, Daniel, Jusufi, Ilir, 1983, Martins, Rafael Messias, Dr., 1984, Kucher, Kostiantyn, 1989, Kerren, Andreas, Dr.-Ing., 1971
Source: eLLIIT – The Linköping – Lund Initiative on IT and Mobile Communication Information Visualization. 21(4):335-353
Subject Terms: Text embedding, ensemble methods, text similarity, similarity calculations, visual analytics, Information and software visualization, Informations- och programvisualisering
Description: Comparing text documents is an essential task for a variety of applications within diverse research fields, and several different methods have been developed for this. However, calculating text similarity is an ambiguous and context-dependent task, so many open challenges still exist. In this paper, we present a novel method for text similarity calculations based on the combination of embedding technology and ensemble methods. By using several embeddings, instead of only one, we show that it is possible to achieve higher quality, which in turn is a key factor for developing high-performing applications for text similarity exploitation. We also provide a prototype visual analytics tool which helps the analyst to find optimal performing ensembles and gain insights to the inner workings of the similarity calculations. Furthermore, we discuss the generalizability of our key ideas to fields beyond the scope of text analysis.
File Description: electronic
Access URL: https://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-115658
https://doi.org/10.1177/14738716221114372
Database: SwePub
Description
Abstract:Comparing text documents is an essential task for a variety of applications within diverse research fields, and several different methods have been developed for this. However, calculating text similarity is an ambiguous and context-dependent task, so many open challenges still exist. In this paper, we present a novel method for text similarity calculations based on the combination of embedding technology and ensemble methods. By using several embeddings, instead of only one, we show that it is possible to achieve higher quality, which in turn is a key factor for developing high-performing applications for text similarity exploitation. We also provide a prototype visual analytics tool which helps the analyst to find optimal performing ensembles and gain insights to the inner workings of the similarity calculations. Furthermore, we discuss the generalizability of our key ideas to fields beyond the scope of text analysis.
ISSN:14738716
14738724
DOI:10.1177/14738716221114372