A new international expert consensus has laid out the first comprehensive framework for the classification, annotation, and quality control of dry eye imaging datasets designed for artificial intelligence (AI) applications, addressing one of the field’s most significant barriers to clinical translation: data standardization.
Published in Intelligent Medicine, the 2025 consensus brings together ophthalmologists, imaging specialists, and AI researchers from China, Hong Kong, Singapore, the UK, and Europe to establish guidance for creating high-quality datasets that can support the development of reliable AI tools for dry eye diagnosis and management.
Dry eye disease remains one of the most common ocular surface disorders worldwide, with prevalence continuing to rise due to factors such as aging populations, increased screen use, sleep disorders, and environmental factors. While AI has shown considerable promise in ophthalmic imaging, the authors argue that progress in dry eye has been hindered by the lack of consistent standards for image annotation and classification, which limits both the development of these models as well as delaying their wider application in clinical practice.
“High-quality data annotation” is identified as a prerequisite for robust AI model development. Without standardized datasets, variability between institutions, devices, and annotators can limit both algorithm performance and clinical adoption.
The consensus focuses on five major imaging modalities commonly used in dry eye assessment: tear film lipid layer imaging, tear meniscus height (TMH), tear film breakup time (TBUT), corneal fluorescein staining (CFS), and meibomian gland imaging.
For each modality, the document provides detailed recommendations for both qualitative classification and quantitative annotation. Examples include standardized grading systems for lipid layer thickness, TMH measurement protocols using ocular surface analyzers and OCT, annotation strategies for non-invasive and fluorescein-based TBUT assessments, and structured approaches to evaluating meibomian gland morphology and dropout.
Particular emphasis is placed on the growing role of AI-powered image analysis. In meibography, for example, the authors highlight the ability of automated systems to quantify features such as gland tortuosity, gland loss, vagueness, and uneven atrophy – parameters that can be difficult to assess consistently through manual evaluation alone.
Beyond annotation methods, the consensus devotes substantial attention to quality assurance. Recommended measures include rigorous image screening, standardized acquisition protocols, annotator training, consistency testing, multi-stage review processes, and the use of established metrics such as kappa statistics to assess agreement between graders.
The study authors also identify several persistent challenges facing AI development in dry eye disease. These include variable image quality across institutions, the absence of universally accepted annotation standards, limited algorithm generalizability due to single-center training datasets, and ongoing barriers to multi-center data sharing.
To address these issues, the group advocates for the creation of large, diverse, multi-center image repositories, the adoption of standard operating procedures for image acquisition and annotation, and greater use of privacy-preserving approaches such as federated learning to facilitate collaborative research.
Importantly, the consensus acknowledges that strong algorithmic performance in research settings does not always translate into real-world clinical success. The authors therefore call for greater integration of clinical expertise into AI development and validation, alongside large-scale prospective testing in routine practice.