Radiology report generation via visual-semantic ambivalence-aware network and focal self-critical sequence training
Radiology report generation, which aims to provide accurate descriptions of both normal and abnormal regions, has been attracting growing research attention. Recently, despite considerable progress, data-driven deep-learning based models still face challenges in capturing and describing the abnormal...
Uloženo v:
| Vydáno v: | Neural networks Ročník 194; s. 108102 |
|---|---|
| Hlavní autoři: | , , , , , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
United States
Elsevier Ltd
01.02.2026
|
| Témata: | |
| ISSN: | 0893-6080, 1879-2782, 1879-2782 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Radiology report generation, which aims to provide accurate descriptions of both normal and abnormal regions, has been attracting growing research attention. Recently, despite considerable progress, data-driven deep-learning based models still face challenges in capturing and describing the abnormalities, due to the data bias problem. To address this problem, we propose to generate radiology reports via the Visual-Semantic Ambivalence-Aware Network (VSANet) and the Focal Self-Critical Sequence Training (FSCST). In detail, our VSANet follows the encoder-decoder framework. In the encoder part, we first deploy a multi-grained abnormality extractor and a visual extractor to capture both semantic and visual features from given images, and then introduce a Parameter Shared Dual-way Encoder (PSDwE) to delve into the inter- and intra-relationships among these features. In the decoder part, we propose the Visual-Semantic Ambivalence-Aware (VSA) module to generate the abnormality-aware visual features to mitigate the data bias problem. In implementation, our VSA introduces three sub-modules: Dual-way Attention (DwA), introduced to generate both the word-related visual and semantic features; Dual-way Attention on Attention (DwAoA), designed to mitigate redundant information; Score-based Feature Fusion (SFF), constructed to fuse the visual and semantic features in an ambivalence way. We further introduce the FSCST to enhance the overall performance of our VSANet by allocating more attention toward difficult samples. Experimental results demonstrate that our proposal achieves superior performance on various evaluation metrics. Source code have released at https://github.com/SKD-HPC/VSANet. |
|---|---|
| Bibliografie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 0893-6080 1879-2782 1879-2782 |
| DOI: | 10.1016/j.neunet.2025.108102 |