DOMAS: DATA ORIENTED MEDICAL VISUAL QUESTION ANSWERING USING SWIN TRANSFORMER
The Medical Visual Question Answering problem is a joined Computer Vision and Natural Language Processing task that aims to obtain answers in natural language to a question, posed in natural language as well, regarding an image. Both the image and question are of a medical nature. In this paper, we...
Gespeichert in:
| Veröffentlicht in: | Studia Universitatis Babes-Bolyai: Series Informatica Jg. 68; H. 1 |
|---|---|
| 1. Verfasser: | |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Babes-Bolyai University, Cluj-Napoca
20.07.2023
|
| Schlagworte: | |
| ISSN: | 2065-9601 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | The Medical Visual Question Answering problem is a joined Computer Vision and Natural Language Processing task that aims to obtain answers in natural language to a question, posed in natural language as well, regarding an image. Both the image and question are of a medical nature. In this paper, we introduce DOMAS, a deep learning model that solves this task on the Med-VQA 2019 dataset. The method is based on dividing the task into smaller classification problems by using a BERT-based question classification and a unique approach that makes use of dataset information for selecting the suited model. For the image classification problems, transfer learning was performed by using a pre-trained Swin Transform based architecture. DOMAS uses a question classifier and seven image classifiers along with the image classifier selection strategy and achieves 0.616 strict accuracy and 0.654 BLUE score. The results are competitive with other state-of-the-art models, proving that our approach is effective in solving the presented task. |
|---|---|
| ISSN: | 2065-9601 |
| DOI: | 10.24193/subbi.2023.1.04 |