Evaluating and comparing student responses in examinations from the perspectives of human and artificial intelligence (GPT-4 and Gemini)
Uložené v:
| Názov: | Evaluating and comparing student responses in examinations from the perspectives of human and artificial intelligence (GPT-4 and Gemini) |
|---|---|
| Autori: | Kubra Yildiz Domanic, Sukran Baycan |
| Zdroj: | BMC Medical Education, Vol 25, Iss 1, Pp 1-6 (2025) |
| Informácie o vydavateľovi: | BMC, 2025. |
| Rok vydania: | 2025 |
| Zbierka: | LCC:Special aspects of education LCC:Medicine |
| Predmety: | ChatGPT, GPT-4, Gemini, Artificial intelligence, Dental education, Assessment methods, Special aspects of education, LC8-6691, Medicine |
| Popis: | Abstract Background Generative Artificial Intelligence (AI) models, such as ChatGPT (GPT-4) and Gemini, offer potential benefits in educational settings, including dental education. These tools have shown promise in enhancing learning and assessment processes, particularly in dental prosthetic technology (DPT) and oral health (OH) programs. Objective This study aimed to evaluate the accuracy, reliability, and consistency of GPT-4 and Gemini AI models in answering examination questions in dental education. The study focused on multiple-choice questions (MCQs), true/false (T/F) questions, and short-answer questions (SAQs). Methods An exploratory study design was used with 30 questions (10 MCQs, 10 T/F, and 10 SAQs) covering key topics in DPT and OH education. ChatGPT and Gemini were tested with the same set of questions on two separate occasions to assess consistency. Responses were evaluated by two independent researchers using a predefined answer key. Data were analyzed using descriptive statistics, the Kappa coefficient for agreement, and the Chi-square test for categorical variables. Results ChatGPT demonstrated high accuracy in MCQs (90%) and T/F questions (85%) but showed reduced performance in SAQs (60%). Gemini’s accuracy ranged between 60% and 70%, with the highest accuracy in SAQs (70%). ChatGPT showed significant consistency across testing dates (Kappa = 0.754; p = 0.001), whereas Gemini’s responses were less consistent (Kappa = 0.634; p = 0.001). Conclusion While both AI models offer valuable support in dental education, ChatGPT exhibited greater accuracy and consistency in structured assessments. The findings suggest that AI tools can enhance teaching and assessment methods if integrated thoughtfully, supporting personalized learning while maintaining academic integrity. |
| Druh dokumentu: | article |
| Popis súboru: | electronic resource |
| Jazyk: | English |
| ISSN: | 1472-6920 |
| Relation: | https://doaj.org/toc/1472-6920 |
| DOI: | 10.1186/s12909-025-07835-y |
| Prístupová URL adresa: | https://doaj.org/article/4fb9baad619745eb9d84c0ddf49a960a |
| Prístupové číslo: | edsdoj.4fb9baad619745eb9d84c0ddf49a960a |
| Databáza: | Directory of Open Access Journals |
| Abstrakt: | Abstract Background Generative Artificial Intelligence (AI) models, such as ChatGPT (GPT-4) and Gemini, offer potential benefits in educational settings, including dental education. These tools have shown promise in enhancing learning and assessment processes, particularly in dental prosthetic technology (DPT) and oral health (OH) programs. Objective This study aimed to evaluate the accuracy, reliability, and consistency of GPT-4 and Gemini AI models in answering examination questions in dental education. The study focused on multiple-choice questions (MCQs), true/false (T/F) questions, and short-answer questions (SAQs). Methods An exploratory study design was used with 30 questions (10 MCQs, 10 T/F, and 10 SAQs) covering key topics in DPT and OH education. ChatGPT and Gemini were tested with the same set of questions on two separate occasions to assess consistency. Responses were evaluated by two independent researchers using a predefined answer key. Data were analyzed using descriptive statistics, the Kappa coefficient for agreement, and the Chi-square test for categorical variables. Results ChatGPT demonstrated high accuracy in MCQs (90%) and T/F questions (85%) but showed reduced performance in SAQs (60%). Gemini’s accuracy ranged between 60% and 70%, with the highest accuracy in SAQs (70%). ChatGPT showed significant consistency across testing dates (Kappa = 0.754; p = 0.001), whereas Gemini’s responses were less consistent (Kappa = 0.634; p = 0.001). Conclusion While both AI models offer valuable support in dental education, ChatGPT exhibited greater accuracy and consistency in structured assessments. The findings suggest that AI tools can enhance teaching and assessment methods if integrated thoughtfully, supporting personalized learning while maintaining academic integrity. |
|---|---|
| ISSN: | 14726920 |
| DOI: | 10.1186/s12909-025-07835-y |
Full Text Finder
Nájsť tento článok vo Web of Science