Deep context transformer: bridging efficiency and contextual understanding of transformer models

This paper introduces the deep context transformer (DCT), which is a novel transformer model designed to enhance the efficiency and accuracy of processing contextually interlinked sequences in natural language processing (NLP) tasks, particularly in dialogue systems and code completion. Although the...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Applied intelligence (Dordrecht, Netherlands) Ročník 54; číslo 19; s. 8902 - 8923
Hlavní autor: Ghaith, Shadi
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York Springer US 01.10.2024
Springer Nature B.V
Témata:
ISSN:0924-669X, 1573-7497
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:This paper introduces the deep context transformer (DCT), which is a novel transformer model designed to enhance the efficiency and accuracy of processing contextually interlinked sequences in natural language processing (NLP) tasks, particularly in dialogue systems and code completion. Although they are powerful, traditional transformer models, struggle to manage extended sequences and complex data structures because of their fixed-length context window and uniform attention mechanism. DCT addresses these limitations by implementing a chunked transformer methodology, where sequences in a dialogue or code chunk are treated as standalone sequences and part of a broader context. This approach is complemented by decayed attention weighting, which scales down cross-attention weights based on the sequence age within the chunk, and an innovative positional encoding scheme that reflects both the token’s position within a sequence and the sequence’s position within a chunk. DCT was evaluated using the Schema-Guided Dialogue dataset from the Eighth Dialog System Technology Challenge and a subset of the IBM Project CodeNet for code completion, focusing on metrics such as character error rate, word error rate, and Bilingual Evaluation Understudy (BLEU) scores. The results revealed improvements in character error rate, word error rate, and BLEU scores compared to baseline models, with a notable increase in dialogue fluency and code completion accuracy. These achievements underscore the model’s advanced contextual understanding, which demonstrates its effectiveness in NLP and programming language tasks. Additionally, a variant of the model was explored using Bidirectional Encoder Representations from Transformers as the encoder, which demonstrates similar improvements in performance metrics; however, it tended to repeat responses due to missing positional encoding across encoder chunk sequences. The ability of DCT to maintain context over extended conversations and code chunks demonstrated its potential as a transformative tool in dialogue systems and code completion, with applications extending to document summarization, and language translation.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0924-669X
1573-7497
DOI:10.1007/s10489-024-05453-7