DOMAS: DATA ORIENTED MEDICAL VISUAL QUESTION ANSWERING USING SWIN TRANSFORMER

The Medical Visual Question Answering problem is a joined Computer Vision and Natural Language Processing task that aims to obtain answers in natural language to a question, posed in natural language as well, regarding an image. Both the image and question are of a medical nature. In this paper, we...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Studia Universitatis Babes-Bolyai: Series Informatica Ročník 68; číslo 1
Hlavný autor: Teodora-Alexandra TOADER
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Babes-Bolyai University, Cluj-Napoca 20.07.2023
Predmet:
ISSN:2065-9601
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:The Medical Visual Question Answering problem is a joined Computer Vision and Natural Language Processing task that aims to obtain answers in natural language to a question, posed in natural language as well, regarding an image. Both the image and question are of a medical nature. In this paper, we introduce DOMAS, a deep learning model that solves this task on the Med-VQA 2019 dataset. The method is based on dividing the task into smaller classification problems by using a BERT-based question classification and a unique approach that makes use of dataset information for selecting the suited model. For the image classification problems, transfer learning was performed by using a pre-trained Swin Transform based architecture. DOMAS uses a question classifier and seven image classifiers along with the image classifier selection strategy and achieves 0.616 strict accuracy and 0.654 BLUE score. The results are competitive with other state-of-the-art models, proving that our approach is effective in solving the presented task.
ISSN:2065-9601
DOI:10.24193/subbi.2023.1.04