X-SCITLDR: Cross-Lingual Extreme Summarization of Scholarly Documents

The number of scientific publications nowadays is rapidly increasing, causing information overload for researchers and making it hard for scholars to keep up to date with current trends and lines of work. Consequently, recent work on applying text mining technologies for scholarly publications has i...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries S. 1 - 12
Hauptverfasser:	Takeshita, Sotaro, Green, Tommaso, Friedrich, Niklas, Eckert, Kai, Ponzetto, Simone Paolo
Format:	Tagungsbericht
Sprache:	Englisch
Veröffentlicht:	ACM 20.06.2022
Schlagworte:	Adaptation models Computational modeling Multilinguality Scholarly document processing Summarization Terminology Text mining Training Visualization
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The number of scientific publications nowadays is rapidly increasing, causing information overload for researchers and making it hard for scholars to keep up to date with current trends and lines of work. Consequently, recent work on applying text mining technologies for scholarly publications has investigated the application of automatic text summarization technologies, including extreme summarization, for this domain. However, previous work has concentrated only on monolingual settings, primarily in English. In this paper, we fill this research gap and present an abstractive cross-lingual summarization dataset for four different languages in the scholarly domain, which enables us to train and evaluate models that process English papers and generate summaries in German, Italian, Chinese and Japanese. We present our new XSCITLDR dataset for multilingual summarization and thoroughly benchmark different models based on a state-of-the-art multilingual pre-trained model, including a two-stage 'summarize and translate' approach and a direct cross-lingual model. We additionally explore the benefits of intermediate-stage training using English monolingual summarization and machine translation as intermediate tasks and analyze performance in zero- and few-shot scenarios. CCS Concepts * Computing methodologies \rightarrow Natural language processing; Natural language generation; Language resources.
DOI:	10.1145/3529372.3530938