ChatGPT-4 versus human generated multiple choice questions - A study from a medical college in Pakistan
DOI:
https://doi.org/10.53685/jshmdc.v5i2.253Keywords:
Artificial intelligence, Multiple choice questions, Undergraduate medical examination, ChatGPT-4Abstract
Background: There has been a growing interest in using artificial intelligence (AI) generated multiple choice questions (MCQs) to supplement traditional assessments. While AI claims to generate higher-order questions, few studies focus on undergraduate medical education assessment in Pakistan.
Objective: To compare the quality of human-developed versus ChatGPT-4-generated MCQs for the final-year MBBS written MCQs examination
Methods: This observational study compared ChatGPT-4-generated and human-developed MCQs in four specialties: Pediatrics, Obstetrics and Gynecology (Ob/Gyn), Surgery, and Medicine. Based on the table of specifications, 204 MCQs were ChatGPT-4-generated and 196 MCQs were retrieved from the question bank of the medical college. ChatGPT-4-generated and human-generated MCQs were anonymized and MCQs quality was scored using a checklist based on the National Board of Medical Examiner criteria. Data was analyzed using SPSS version 23 and Mann-Whitney U and Chi square tests were applied.
Results: Out of 400 MCQs, 396 MCQs were included in the final review as four MCQs were not according to the table of specification. Total scores were not significantly different between human-generated and ChatGPT-4 generated MCQs (p=0.12). However, human-developed MCQs performed significantly better than ChatGPT-4-generated MCQ in Ob/Gyn (p=0.03). Human-developed MCQs scored better than ChatGPT-generated MCQs in the item checklist “stem includes necessary details for answering the question’’ in Ob/Gyn and Pediatrics (p < 0.05) as well as in "Is the item appropriate for cover the options rule"? in Surgery.
Conclusion: With a well-structured and specific prompting, ChatGPT-4 has the potential to assist in medical examination MCQ development. However, ChatGPT-4 has limitations where in depth contextual item generation is required.
References
Tangianu F, Mazzone A, Berti F, Pinna G, Bortolotti I, Colombo F, et. al. Are multiple-choice questions a good tool for the assessment of clinical competence in Internal Medicine?. Ital J Med 2018; 12(2): 88-96. doi: 10.4081/itjm.2018.980 DOI: https://doi.org/10.4081/itjm.2018.980
Towns MH. Guide to developing high-quality, reliable, and valid multiple-choice assessments. J Chem Educ 2014; 91(9) : 1426-1431. doi: 10.1021/ed500076x DOI: https://doi.org/10.1021/ed500076x
Diwan C, Srinivasa S, Suri G, Agarwal S, Ram P. AI-based learning content generation and learning pathway augmentation to increase learner engagement. Comput. Educ.: Artif Intell 2023; 4:100110. doi: 10.1016/j.caeai.2022.100110 DOI: https://doi.org/10.1016/j.caeai.2022.100110
Owan VJ, Abang KB, Idika DO, Etta EO, Bassey BA. Exploring the potential of artificial intelligence tools in educational measurement and assessment. EURASIA J Math Sci Tech Ed 2023; 19(8): em2307. doi: 10.29333/ejmste/13428 DOI: https://doi.org/10.29333/ejmste/13428
Mihalache A, Huang RS, Popovic MM, Muni RH. ChatGPT-4: an assessment of an upgraded artificial intelligence chatbot in the United States Medical Licensing Examination. Med Teach. 2024; 46(3): 366-372. doi: 10.1080/0142159X. 2023.2249588 DOI: https://doi.org/10.1080/0142159X.2023.2249588
Ali FA, Sharif S, Ata M, Patel N, Muhammad Rafay M, Syed HR, et. al. The Chat GPT develops multiple choice questions (MCQs) for postgraduate specialty assessment–A reality or a myth? Pak J Neruol Surg. 2024; 28(1):142-149. doi: 10.36552/ pjns.v28i1.963 DOI: https://doi.org/10.36552/pjns.v28i1.963
Giray L. Prompt engineering with ChatGPT: A guide for academic writers. Ann Biomed Eng. 2023; 51(12): 2629–2633. doi: 10.1007/s10439-023-03272-4 DOI: https://doi.org/10.1007/s10439-023-03272-4
Cheung BHH, Lau GKK, Wong GTC, Lee EYP, Kulkarni D, Seow CS, et.al. ChatGPT versus human in generating medical graduate examination multiple choice questions-A multinational prospective study (Hong Kong S.A.R., Singapore, Ireland, and the United Kingdom). PLoS One. 2023;18(8):e0290691. doi: 10.1371/journal.pone.0290691 DOI: https://doi.org/10.1371/journal.pone.0290691
Ahmed A, Jamil E, Abubakar M, Batool A, Masoom Akhtar M, Iqbal Nasiri M, Ullah M. Harnessing the power of ChatGPT to develop effective MCQ-based clinical pharmacy examinations. J Res Technol Edu. 2024; 2:1-1. doi: 10.1080/15391523.2024.2425435 DOI: https://doi.org/10.1080/15391523.2024.2425435
Laverghetta AJ, Licato J. Generating better items for cognitive assessments using large language models BEA. Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications. 2023; 414-428. doi: 10.18653/v1/2023.bea-1.34 DOI: https://doi.org/10.18653/v1/2023.bea-1.34
Rezigalla AA. AI in medical education: uses of AI in construction type A MCQs. BMC Med. Educ. 2024; 24(1): 247. doi: 10.1186/s12909-024-05250-3 DOI: https://doi.org/10.1186/s12909-024-05250-3
Haataja ES, Tolvanen A, Vilppu H, Kallio M, Peltonen J, Metsäpelto RL. Measuring higher-order cognitive skills with multiple choice questions–potentials and pitfalls of Finnish teacher education entrance. Teach Educ. 2023; 122: 103943. doi: 10.1016/j.tate.2022.103943 DOI: https://doi.org/10.1016/j.tate.2022.103943
Agarwal M, Sharma P, Goswami A. Analysing the applicability of ChatGPT, Bard, and Bing to generate reasoning-based multiple-choice questions in medical physiology. Cureus. 2023; 15(6):e40977. doi: 10.7759/cureus.40977 DOI: https://doi.org/10.7759/cureus.40977
Morrison S, Free KW. Writing multiple-choice test items that promote and measure critical thinking. J Nurs Educ. 2001; 40(1):17-24. doi: 10.3928/0148-4834-20010101-06 DOI: https://doi.org/10.3928/0148-4834-20010101-06
Liu J, Zheng J, Cai X, Wu D, Yin C. A descriptive study based on the comparison of ChatGPT and evidence-based neurosurgeons. iScience. 2023; 26(9). doi: 10.1016/j.isci.2023.107590 DOI: https://doi.org/10.1016/j.isci.2023.107590
Paranjape K, Schinkel M, Nannan Panday R, Car J, Nanayakkara P. Introducing artificial intelligence training in medical education. JMIR Med Educ. 2019; 5(2): e16048. doi: 10.2196/16048 DOI: https://doi.org/10.2196/16048
Adiguzel T, Kaya MH & Cansu FK. Revolutionizing education with AI: Exploring the transformative potential of ChatGPT. Contemp. Educ. Technol. 2023; 15(3): ep429. doi: 10.30935/cedtech/13152 DOI: https://doi.org/10.30935/cedtech/13152
Doughty J, Wan Z, Bompelli A, Qayum J, Wang T, Zhang J, et.al. A comparative study of AI-generated (GPT-4) and human-crafted MCQs in programming education. In Proceedings of the 26th ACEC. 2024; 114-123. doi: 10.1145/3636243.363625 DOI: https://doi.org/10.1145/3636243.3636256
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Muhammad Ahsan Naseer, Yusra Nasir, Afifa Tabassum, Sobia Ali
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
-
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
-
Non Commercial — You may not use the material for commercial purposes.
-
No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.