Context-aware automated ICD coding: A semantic-driven approach

Identifying the exact International Classification of Diseases (ICD) codes describing a patient’ s health condition is essential in classifying patients with similar disease conditions. Numerous studies have devised automated approaches to retrieve the ICD codes from patients’ health records. Howeve...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Information systems (Oxford) Ročník 132; s. 102539
Hlavní autoři:	Reshma, O.K., Saleena, N., Nazeer, K.A. Abdul
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Elsevier Ltd 01.07.2025
Témata:	Bidirectional Long Short Term Memory Network Electronic Health Records ICD code Semantic Textual Similarity Siamese network Siamese network ICD code Bidirectional Long Short Term Memory Network Semantic Textual Similarity Electronic Health Records
ISSN:	0306-4379
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Identifying the exact International Classification of Diseases (ICD) codes describing a patient’ s health condition is essential in classifying patients with similar disease conditions. Numerous studies have devised automated approaches to retrieve the ICD codes from patients’ health records. However, majority of these methodologies have considered ICD codes solely as alphanumeric codes, overlooking their descriptions and thus neglecting the inherent semantics. Also, these methodologies overlook the one-to-many semantic relationships between diagnosis and assigned ICD code descriptions. Subsequently, this constrains these approaches from effectively assigning ICD codes with meaningful context. This work addresses these limitations by capturing the semantic similarity between the diagnosis and ICD code descriptions, while utilising the inherent one-to-many relationships between them, to accurately assign ICD codes. For this, we formulate the ICD coding problem as a Semantic Text Similarity task. The proposed approach uses a siamese stacked Bi-LSTM network to learn context-aware representations of diagnoses and ICD code descriptions. We transform each patient-visit data into sentence pairs by considering the one-to-many relationships between diagnosis and assigned ICD code descriptions. Further, we compute their semantic similarity and classify them as similar or dissimilar. The proposed approach was evaluated using 5-fold cross-validation on MIMIC-III dataset and achieved the highest evaluation metric scores (F1-score 0.66, precision 0.67, recall 0.84) compared with other sequential models. The per-label evaluation demonstrates the performance of the proposed approach for each ICD code. Furthermore, the proposed approach outperformed several existing attention-based models, demonstrating the potential use of semantics in automated ICD coding. [Display omitted] •Semantics underlying the ICD codes is significant in automated ICD coding.•Formulates automated ICD coding problem as a Semantic Textual Similarity task.•Captures semantic similarity between diagnosis and ICD code descriptions using a siamese Bi-LSTM model.•One-to-many relationships between the descriptions contribute to the effectiveness of automated ICD coding.
ISSN:	0306-4379
DOI:	10.1016/j.is.2025.102539