A new systematic review suggests that vision and vision-language foundation models could dramatically expand the role of artificial intelligence in ophthalmology – but the researchers warn that major challenges around bias, interpretability, and clinical integration still stand in the way of routine adoption.
Published in Advances in Ophthalmology Practice and Research, the review analyzed 10 studies published between 2023 and 2025 examining large-scale AI foundation models across retinal disease, glaucoma, and ocular oncology applications. Unlike conventional task-specific AI systems, foundation models are trained on massive multimodal datasets and can be adapted across multiple diagnostic tasks with comparatively limited labeled data.
The review authors describe ophthalmology as an “ideal landscape” for these technologies because of the specialty’s dependence on image-based diagnostics and complex decision-making, as well as the growing demand for automated clinical workflows in the space. Models reviewed in the paper were trained on datasets ranging from hundreds of thousands to millions of ophthalmic images, often incorporating optical coherence tomography (OCT), fundus photography, clinical reports, and electronic health record data.
Several systems demonstrated performance approaching – and in some cases exceeding – experienced clinicians.
RETFound achieved an area under the curve (AUC) of 0.94 for diabetic retinopathy (DR) detection, while VisionFM reached AUC values of 0.974 for age-related macular degeneration (AMD) and 0.945 for DR. Models targeting glaucoma detection reported AUC values ranging from 0.721 to 0.913, while the ocular surface tumor model OSPM achieved AUC scores as high as 0.993.
The review also highlighted the expanding role of multimodal AI. Models such as EyeCLIP and MetaGP combined imaging with clinical text and electronic health records to improve diagnostic reasoning, particularly in rare diseases. MetaGP reportedly outperformed GPT-4 in rare disease classification tasks by integrating multimodal imaging with EHR data.
Several models also demonstrated strong results even in “few-shot” and “zero-shot” scenarios, enabling accurate performance even with limited disease-specific training data. The authors noted this could be especially valuable for rare ophthalmic conditions where annotated datasets remain scarce, and early, accurate diagnosis is essential when attempting to prevent vision loss.
Despite the enthusiasm, the review emphasizes caution with these models. Most included studies relied on retrospective datasets and convenience sampling, raising concerns about generalizability across populations and imaging devices. The review authors also flagged algorithmic bias, computational demands, fragmented EHR interoperability, and the persistent “black-box” nature of deep learning systems (which means it can be difficult for clinicians to understand a model’s rationale behind predictions) as significant barriers to clinical deployment.
Although none of the reviewed foundation models have yet achieved widespread clinical deployment, the paper concludes that they may ultimately transform ophthalmic care by enabling scalable, multimodal, and generalizable AI systems capable of supporting diagnosis, prognosis, and clinical decision-making across diverse eye diseases.