Radiology report generation via visual-semantic ambivalence-aware network and focal self-critical sequence training

Radiology report generation, which aims to provide accurate descriptions of both normal and abnormal regions, has been attracting growing research attention. Recently, despite considerable progress, data-driven deep-learning based models still face challenges in capturing and describing the abnormal...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Neural networks Ročník 194; s. 108102
Hlavní autoři:	Yi, Xiulong, Fu, You, Bi, Enxu, Liang, Jianguo, Zhang, Hao, Yu, Jianzhi, Li, Qianqian, Hua, Rong, Wang, Rui
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	United States Elsevier Ltd 01.02.2026
Témata:	Image caption Radiology report generation Reinforcement learning Transformer Image caption Transformer Radiology report generation Reinforcement learning
ISSN:	0893-6080, 1879-2782, 1879-2782
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Radiology report generation, which aims to provide accurate descriptions of both normal and abnormal regions, has been attracting growing research attention. Recently, despite considerable progress, data-driven deep-learning based models still face challenges in capturing and describing the abnormalities, due to the data bias problem. To address this problem, we propose to generate radiology reports via the Visual-Semantic Ambivalence-Aware Network (VSANet) and the Focal Self-Critical Sequence Training (FSCST). In detail, our VSANet follows the encoder-decoder framework. In the encoder part, we first deploy a multi-grained abnormality extractor and a visual extractor to capture both semantic and visual features from given images, and then introduce a Parameter Shared Dual-way Encoder (PSDwE) to delve into the inter- and intra-relationships among these features. In the decoder part, we propose the Visual-Semantic Ambivalence-Aware (VSA) module to generate the abnormality-aware visual features to mitigate the data bias problem. In implementation, our VSA introduces three sub-modules: Dual-way Attention (DwA), introduced to generate both the word-related visual and semantic features; Dual-way Attention on Attention (DwAoA), designed to mitigate redundant information; Score-based Feature Fusion (SFF), constructed to fuse the visual and semantic features in an ambivalence way. We further introduce the FSCST to enhance the overall performance of our VSANet by allocating more attention toward difficult samples. Experimental results demonstrate that our proposal achieves superior performance on various evaluation metrics. Source code have released at https://github.com/SKD-HPC/VSANet.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0893-6080 1879-2782 1879-2782
DOI:	10.1016/j.neunet.2025.108102