Radiology report generation via visual-semantic ambivalence-aware network and focal self-critical sequence training
Radiology report generation, which aims to provide accurate descriptions of both normal and abnormal regions, has been attracting growing research attention. Recently, despite considerable progress, data-driven deep-learning based models still face challenges in capturing and describing the abnormal...
Saved in:
| Published in: | Neural networks Vol. 194; p. 108102 |
|---|---|
| Main Authors: | , , , , , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
United States
Elsevier Ltd
01.02.2026
|
| Subjects: | |
| ISSN: | 0893-6080, 1879-2782, 1879-2782 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Radiology report generation, which aims to provide accurate descriptions of both normal and abnormal regions, has been attracting growing research attention. Recently, despite considerable progress, data-driven deep-learning based models still face challenges in capturing and describing the abnormalities, due to the data bias problem. To address this problem, we propose to generate radiology reports via the Visual-Semantic Ambivalence-Aware Network (VSANet) and the Focal Self-Critical Sequence Training (FSCST). In detail, our VSANet follows the encoder-decoder framework. In the encoder part, we first deploy a multi-grained abnormality extractor and a visual extractor to capture both semantic and visual features from given images, and then introduce a Parameter Shared Dual-way Encoder (PSDwE) to delve into the inter- and intra-relationships among these features. In the decoder part, we propose the Visual-Semantic Ambivalence-Aware (VSA) module to generate the abnormality-aware visual features to mitigate the data bias problem. In implementation, our VSA introduces three sub-modules: Dual-way Attention (DwA), introduced to generate both the word-related visual and semantic features; Dual-way Attention on Attention (DwAoA), designed to mitigate redundant information; Score-based Feature Fusion (SFF), constructed to fuse the visual and semantic features in an ambivalence way. We further introduce the FSCST to enhance the overall performance of our VSANet by allocating more attention toward difficult samples. Experimental results demonstrate that our proposal achieves superior performance on various evaluation metrics. Source code have released at https://github.com/SKD-HPC/VSANet. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 0893-6080 1879-2782 1879-2782 |
| DOI: | 10.1016/j.neunet.2025.108102 |