The Automatic Text Classification Method Based on BERT and Feature Union

For the traditional model based on the deep learning method most used CNN(convolutional neural networks) or RNN(Recurrent neural Network) model and is based on the dynamic character-level embedding or word-level embedding as input, so there is a problem that the text feature extraction is not compre...

Full description

Saved in:
Bibliographic Details
Published in:2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS) pp. 774 - 777
Main Authors: Li, Wenting, Gao, Shangbing, Zhou, Hong, Huang, Zihe, Zhang, Kewen, Li, Wei
Format: Conference Proceeding
Language:English
Published: IEEE 01.12.2019
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:For the traditional model based on the deep learning method most used CNN(convolutional neural networks) or RNN(Recurrent neural Network) model and is based on the dynamic character-level embedding or word-level embedding as input, so there is a problem that the text feature extraction is not comprehensive. In the development environment of the Internet of Things, A method of Automatic text classification based on BERT(Bidirectional Encoder Representations from Transformers) and Feature Fusion was proposed in this paper. Firstly, the text-to-dynamic character-level embedding is transformed by the BERT model, and the BiLSTM(Bi-directional Long-Short Term Memory) and CNN output features are combined and merged to make full use of CNN to extract the advantages of local features and to use BiLSTM to have the advantage of memory to link the extracted context features to better represent the text, so as to improve the accuracy of text classification task. A comparative study with state-of-the-art approaches manifests the proposed method outperforms the state-of-the-art methods in accuracy. It can effectively improve the accuracy of tag prediction for text data with sequence features and obvious local features.
DOI:10.1109/ICPADS47876.2019.00114