Parameter-efficient feature-based transfer for paraphrase identification

There are many types of approaches for Paraphrase Identification (PI), an NLP task of determining whether a sentence pair has equivalent semantics. Traditional approaches mainly consist of unsupervised learning and feature engineering, which are computationally inexpensive. However, their task perfo...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Natural Language Engineering Ročník 29; číslo 4; s. 1066 - 1096
Hlavní autoři:	Liu, Xiaodong, Rzepka, Rafal, Araki, Kenji
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Cambridge, UK Cambridge University Press 01.07.2023 Cambridge University Press (CUP)
Témata:	Classifiers Computing costs Continual learning Datasets Identification Inference Language Learning Machine learning Natural language Natural language inference Natural language processing Neural networks Parameter identification Parameter-efficient feature-based transfer Paraphrase Paraphrase identification Semantic textual similarity Semantics Task performance Unsupervised learning Parameter-efficient feature-based transfer Natural language inference Paraphrase identification Semantic textual similarity Continual learning
ISSN:	1351-3249, 1469-8110
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	There are many types of approaches for Paraphrase Identification (PI), an NLP task of determining whether a sentence pair has equivalent semantics. Traditional approaches mainly consist of unsupervised learning and feature engineering, which are computationally inexpensive. However, their task performance is moderate nowadays. To seek a method that can preserve the low computational costs of traditional approaches but yield better task performance, we take an investigation into neural network-based transfer learning approaches. We discover that by improving the usage of parameters efficiently for feature-based transfer, our research goal can be accomplished. Regarding the improvement, we propose a pre-trained task-specific architecture. The fixed parameters of the pre-trained architecture can be shared by multiple classifiers with small additional parameters. As a result, the computational cost left involving parameter update is only generated from classifier-tuning: the features output from the architecture combined with lexical overlap features are fed into a single classifier for tuning. Furthermore, the pre-trained task-specific architecture can be applied to natural language inference and semantic textual similarity tasks as well. Such technical novelty leads to slight consumption of computational and memory resources for each task and is also conducive to power-efficient continual learning. The experimental results show that our proposed method is competitive with adapter-BERT (a parameter-efficient fine-tuning approach) over some tasks while consuming only 16% trainable parameters and saving 69-96% time for parameter update.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1351-3249 1469-8110
DOI:	10.1017/S135132492200050X