DOMAS: DATA ORIENTED MEDICAL VISUAL QUESTION ANSWERING USING SWIN TRANSFORMER

The Medical Visual Question Answering problem is a joined Computer Vision and Natural Language Processing task that aims to obtain answers in natural language to a question, posed in natural language as well, regarding an image. Both the image and question are of a medical nature. In this paper, we...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Studia Universitatis Babes-Bolyai: Series Informatica Ročník 68; číslo 1
Hlavný autor:	Teodora-Alexandra TOADER
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	Babes-Bolyai University, Cluj-Napoca 20.07.2023
Predmet:	Medical Visual Question Answering, Swin Transformer
ISSN:	2065-9601
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	The Medical Visual Question Answering problem is a joined Computer Vision and Natural Language Processing task that aims to obtain answers in natural language to a question, posed in natural language as well, regarding an image. Both the image and question are of a medical nature. In this paper, we introduce DOMAS, a deep learning model that solves this task on the Med-VQA 2019 dataset. The method is based on dividing the task into smaller classification problems by using a BERT-based question classification and a unique approach that makes use of dataset information for selecting the suited model. For the image classification problems, transfer learning was performed by using a pre-trained Swin Transform based architecture. DOMAS uses a question classifier and seven image classifiers along with the image classifier selection strategy and achieves 0.616 strict accuracy and 0.654 BLUE score. The results are competitive with other state-of-the-art models, proving that our approach is effective in solving the presented task.
ISSN:	2065-9601
DOI:	10.24193/subbi.2023.1.04