Multimodal Distillation Pre-Training Model for Ultrasound Dynamic Images Annotation

With the development of medical technology, ultrasonography has become an important diagnostic method in doctors' clinical work. However, compared with the static medical image processing work such as CT, MRI, etc., which has more research bases, ultrasonography is a dynamic medical image simil...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE journal of biomedical and health informatics Jg. 29; H. 5; S. 3124 - 3136
Hauptverfasser: Chen, Xiaojun, Ke, Jia, Zhang, Yaning, Gou, Jianping, Shen, Anna, Wan, Shaohua
Format: Journal Article
Sprache:Englisch
Veröffentlicht: United States IEEE 01.05.2025
Schlagworte:
ISSN:2168-2194, 2168-2208, 2168-2208
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:With the development of medical technology, ultrasonography has become an important diagnostic method in doctors' clinical work. However, compared with the static medical image processing work such as CT, MRI, etc., which has more research bases, ultrasonography is a dynamic medical image similar to video, which is captured and generated by a real-time moving probe, so how to deal with the video data in the medical field and cross modal extraction of the textual semantics in the medical video is a difficult problem that needs to be researched. For this reason, this paper proposes a pre-training model of multimodal distillation and fusion coding for processing the semantic relationship between ultrasound dynamic Images and text. Firstly, by designing the fusion encoder, the visual geometric features of tissues and organs in ultrasound dynamic images, the overall visual appearance descriptive features and the named entity linguistic features are fused to form a unified visual-linguistic feature, so that the model obtains richer visual, linguistic cues aggregation and alignment ability. Then, the pre-training model is augmented by multimodal knowledge distillation to improve the learning ability of the model. The final experimental results on multiple datasets show that the multimodal distillation pre-training model generally improves the fusion ability of various types of features in ultrasound dynamic images, and realizes the automated and accurate annotation of ultrasound dynamic images.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2168-2194
2168-2208
2168-2208
DOI:10.1109/JBHI.2024.3438254