ChatGPT Sits Its Exams

A study published in Vision has tested the capabilities of ChatGPT models against human candidates sitting the European Board of Ophthalmology Diploma (EBOD) examination, and the results reveal both promise and limitations for artificial intelligence (AI) in medical education.

Researchers evaluated ChatGPT-3.5 Turbo and ChatGPT-4o using over 2,200 true/false statements and 48 single best answer (SBA) questions sourced from actual EBOD exams held between 2012 and 2023. ChatGPT-4o achieved an impressive 80.4 percent accuracy on multiple-choice questions (MCQs), surpassing the pass mark and performing comparably to human candidates. In contrast, ChatGPT-3.5 scored 63.2 percent, slightly below the typical passing threshold. Both models showed strongest performance in text-based pathology and retina-related questions, with weaker results in optics and refraction.

However, AI's performance dropped dramatically on SBA questions. ChatGPT-3.5 scored just 28.4 percent accuracy, with ChatGPT-4o coming in slightly lower at 24.1 percent, both significantly underperforming compared to the average candidate. These SBA questions often require higher-order clinical reasoning and the ability to discriminate between closely related options, skills that current AI models struggle to replicate.

Interestingly, ChatGPT-4o answered all the easiest MCQs correctly but fared worse than ChatGPT-3.5 on the most challenging ones. This highlights a trade-off: newer models may excel in general knowledge, but they are not necessarily better at complex, ambiguous reasoning.

The study suggests that while ChatGPT is capable of retrieving and interpreting structured knowledge, its integration of nuanced clinical judgment still remains limited. The authors conclude that while ChatGPT is not yet ready to replace human judgment in high-stakes medical assessments, rapid advancements in large language models suggest that its role in ophthalmic education will continue to expand in the future.

ChatGPT Sits Its Exams

ChatGPT shows promise in ophthalmology-based multi-choice questions, but falls short on clinical reasoning

About the Author(s)

The Ophthalmologist

Related Content

Explore

Featured Topics

Issues

Business & Profession

Career Development

Events

People & Profiles

ChatGPT Sits Its Exams

ChatGPT shows promise in ophthalmology-based multi-choice questions, but falls short on clinical reasoning

About the Author(s)

The Ophthalmologist

Related Content

Newsletters

Explore More in Ophthalmology

Disclaimer