Deep context transformer: bridging efficiency and contextual understanding of transformer models
This paper introduces the deep context transformer (DCT), which is a novel transformer model designed to enhance the efficiency and accuracy of processing contextually interlinked sequences in natural language processing (NLP) tasks, particularly in dialogue systems and code completion. Although the...
Uloženo v:
| Vydáno v: | Applied intelligence (Dordrecht, Netherlands) Ročník 54; číslo 19; s. 8902 - 8923 |
|---|---|
| Hlavní autor: | |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
New York
Springer US
01.10.2024
Springer Nature B.V |
| Témata: | |
| ISSN: | 0924-669X, 1573-7497 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | This paper introduces the deep context transformer (DCT), which is a novel transformer model designed to enhance the efficiency and accuracy of processing contextually interlinked sequences in natural language processing (NLP) tasks, particularly in dialogue systems and code completion. Although they are powerful, traditional transformer models, struggle to manage extended sequences and complex data structures because of their fixed-length context window and uniform attention mechanism. DCT addresses these limitations by implementing a chunked transformer methodology, where sequences in a dialogue or code chunk are treated as standalone sequences and part of a broader context. This approach is complemented by decayed attention weighting, which scales down cross-attention weights based on the sequence age within the chunk, and an innovative positional encoding scheme that reflects both the token’s position within a sequence and the sequence’s position within a chunk. DCT was evaluated using the Schema-Guided Dialogue dataset from the Eighth Dialog System Technology Challenge and a subset of the IBM Project CodeNet for code completion, focusing on metrics such as character error rate, word error rate, and Bilingual Evaluation Understudy (BLEU) scores. The results revealed improvements in character error rate, word error rate, and BLEU scores compared to baseline models, with a notable increase in dialogue fluency and code completion accuracy. These achievements underscore the model’s advanced contextual understanding, which demonstrates its effectiveness in NLP and programming language tasks. Additionally, a variant of the model was explored using Bidirectional Encoder Representations from Transformers as the encoder, which demonstrates similar improvements in performance metrics; however, it tended to repeat responses due to missing positional encoding across encoder chunk sequences. The ability of DCT to maintain context over extended conversations and code chunks demonstrated its potential as a transformative tool in dialogue systems and code completion, with applications extending to document summarization, and language translation. |
|---|---|
| Bibliografie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0924-669X 1573-7497 |
| DOI: | 10.1007/s10489-024-05453-7 |