Multimodal Distillation Pre-Training Model for Ultrasound Dynamic Images Annotation

With the development of medical technology, ultrasonography has become an important diagnostic method in doctors' clinical work. However, compared with the static medical image processing work such as CT, MRI, etc., which has more research bases, ultrasonography is a dynamic medical image simil...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE journal of biomedical and health informatics Ročník 29; číslo 5; s. 3124 - 3136
Hlavní autoři:	Chen, Xiaojun, Ke, Jia, Zhang, Yaning, Gou, Jianping, Shen, Anna, Wan, Shaohua
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	United States IEEE 01.05.2025
Témata:	Algorithms Annotation Annotations Biomedical imaging Computational modeling Humans Image Interpretation, Computer-Assisted - methods Image Processing, Computer-Assisted - methods knowledge distillation Medical diagnostic imaging multimodal distillation pre-training model Semantics Training transformer Ultrasonography - methods ultrasound dynamic image Visualization
ISSN:	2168-2194, 2168-2208, 2168-2208
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	With the development of medical technology, ultrasonography has become an important diagnostic method in doctors' clinical work. However, compared with the static medical image processing work such as CT, MRI, etc., which has more research bases, ultrasonography is a dynamic medical image similar to video, which is captured and generated by a real-time moving probe, so how to deal with the video data in the medical field and cross modal extraction of the textual semantics in the medical video is a difficult problem that needs to be researched. For this reason, this paper proposes a pre-training model of multimodal distillation and fusion coding for processing the semantic relationship between ultrasound dynamic Images and text. Firstly, by designing the fusion encoder, the visual geometric features of tissues and organs in ultrasound dynamic images, the overall visual appearance descriptive features and the named entity linguistic features are fused to form a unified visual-linguistic feature, so that the model obtains richer visual, linguistic cues aggregation and alignment ability. Then, the pre-training model is augmented by multimodal knowledge distillation to improve the learning ability of the model. The final experimental results on multiple datasets show that the multimodal distillation pre-training model generally improves the fusion ability of various types of features in ultrasound dynamic images, and realizes the automated and accurate annotation of ultrasound dynamic images.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2168-2194 2168-2208 2168-2208
DOI:	10.1109/JBHI.2024.3438254