Learning Regular Expressions for Interpretable Medical Text Classification Using a Pool-based Simulated Annealing Approach

In this paper, we propose a rule-based engine composed of high-quality and interpretable regular expressions for medical text classification. The regular expressions are autogenerated by a constructive heuristic method and optimized using a Pool-based Simulated Annealing (PSA) approach. Although exi...

Full description

Saved in:

Bibliographic Details
Published in:	2020 IEEE Congress on Evolutionary Computation (CEC) pp. 1 - 7
Main Authors:	Tu, Chaofan, Cui, Menglin
Format:	Conference Proceeding
Language:	English
Published:	IEEE 01.07.2020
Subjects:	Knowledge engineering Machine learning Medical diagnostic imaging Medical services medical text classification Neural networks regular expression Simulated annealing Task analysis
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In this paper, we propose a rule-based engine composed of high-quality and interpretable regular expressions for medical text classification. The regular expressions are autogenerated by a constructive heuristic method and optimized using a Pool-based Simulated Annealing (PSA) approach. Although existing Deep Neural Network (DNN) methods present highquality performance in most Natural Language Processing (NLP) applications, the solutions are regarded as uninterpretable "black boxes" to humans. Therefore, rule-based methods are often introduced when interpretable solutions are needed, especially in the medical field. However, the construction of regular expressions can be extremely labor-intensive for large data sets. This research aims to reduce the manual efforts while maintaining high-quality solutions. The Pool-based Simulated Annealing method is proposed to automatically optimize the performance of machine-generated regular expressions without human interference. The proposed method is tested on real-life data provided by one of China's largest online medical platforms. Experimental results show that the proposed PSA method further improves the performance of initial machine-generated regular expressions compared with other meta-heuristics such as Genetic Programming. We also believe that the proposed method can serve as a vital complementary tool for the existing machine learning approaches in text classification applications when high levels of interpretability of the solutions are required.
DOI:	10.1109/CEC48606.2020.9185650