Learning Regular Expressions for Interpretable Medical Text Classification Using a Pool-based Simulated Annealing Approach
In this paper, we propose a rule-based engine composed of high-quality and interpretable regular expressions for medical text classification. The regular expressions are autogenerated by a constructive heuristic method and optimized using a Pool-based Simulated Annealing (PSA) approach. Although exi...
Saved in:
| Published in: | 2020 IEEE Congress on Evolutionary Computation (CEC) pp. 1 - 7 |
|---|---|
| Main Authors: | , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
01.07.2020
|
| Subjects: | |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | In this paper, we propose a rule-based engine composed of high-quality and interpretable regular expressions for medical text classification. The regular expressions are autogenerated by a constructive heuristic method and optimized using a Pool-based Simulated Annealing (PSA) approach. Although existing Deep Neural Network (DNN) methods present highquality performance in most Natural Language Processing (NLP) applications, the solutions are regarded as uninterpretable "black boxes" to humans. Therefore, rule-based methods are often introduced when interpretable solutions are needed, especially in the medical field. However, the construction of regular expressions can be extremely labor-intensive for large data sets. This research aims to reduce the manual efforts while maintaining high-quality solutions. The Pool-based Simulated Annealing method is proposed to automatically optimize the performance of machine-generated regular expressions without human interference. The proposed method is tested on real-life data provided by one of China's largest online medical platforms. Experimental results show that the proposed PSA method further improves the performance of initial machine-generated regular expressions compared with other meta-heuristics such as Genetic Programming. We also believe that the proposed method can serve as a vital complementary tool for the existing machine learning approaches in text classification applications when high levels of interpretability of the solutions are required. |
|---|---|
| DOI: | 10.1109/CEC48606.2020.9185650 |