A new deep learning model developed by Beijing Jiaotong University may significantly enhance the accuracy and efficiency of automated eye disease diagnosis, according to findings published in Scientific Reports.
The study — authored by Ankang Lin at the Beijing Jiaotong University Weihai Campus, Weihai, China — introduces the Local-Global Scale Fusion Network (LGSF-Net) — a hybrid artificial intelligence model that integrates the strengths of convolutional neural networks (CNNs) and transformer architectures to analyze fundus images. Designed specifically for ophthalmic imaging, LGSF-Net combines the transformer’s capacity for global contextual understanding with the CNN’s sensitivity to local, fine-grained retinal features.
Traditional AI models used for retinal imaging often specialize in either local lesion recognition (e.g. vascular changes) or global feature assessment (e.g. optic nerve head or fundus color changes). Similarly, recent research either focuses on detection based on local details or on global features, with few models attempting to integrate both types of information to enhance their performance.
To overcome this limitation, LGSF-Net processes fundus images through two parallel learning streams: one prioritizing global feature extraction (via transformer blocks) and another focusing on local feature mapping (via CNN convolutional layers), with the outputs then being fused to form a unified disease classification prediction.
Using a publicly available fundus image dataset covering four disease categories — cataract, diabetic retinopathy, glaucoma, and normal — LGSF-Net achieved a 96% classification accuracy while using just 18.7K parameters and 0.93 GFLOPs. This represents a striking balance between computational efficiency and diagnostic precision, outperforming leading architectures such as ResNet50, Vision Transformer (ViT), and InceptionV3. Compared to ResNet50 (94% accuracy) and ViT (90% accuracy), LGSF-Net demonstrated superior generalization, particularly in identifying diabetic retinopathy, with an F1-score and recall of 0.99.
Beyond its accuracy, the model’s lightweight architecture makes it practical for real-world clinical deployment — especially in resource-limited or tele-ophthalmology settings where computational power and trained personnel may be constrained.
Visualization heat maps in the study further confirmed that LGSF-Net effectively captured both localized lesions such as microaneurysms and global features like optic disc morphology. In contrast, models omitting either the CNN or transformer module showed poorer interpretability and performance.
While Lin acknowledges that current datasets are relatively balanced and may not reflect the variability of real-world cases, he suggests that future research could test the model on larger, imbalanced datasets, and integrate additional imaging modalities like OCT (optical coherence tomography). If validated clinically, LGSF-Net could represent a new benchmark for AI-assisted ophthalmic screening, offering a faster, lower-cost, and highly accurate tool to aid early detection and management of vision-threatening diseases.
The study — authored by Ankang Lin at the Beijing Jiaotong University Weihai Campus, Weihai, China — introduces the Local-Global Scale Fusion Network (LGSF-Net) — a hybrid artificial intelligence model that integrates the strengths of convolutional neural networks (CNNs) and transformer architectures to analyze fundus images. Designed specifically for ophthalmic imaging, LGSF-Net combines the transformer’s capacity for global contextual understanding with the CNN’s sensitivity to local, fine-grained retinal features.
Traditional AI models used for retinal imaging often specialize in either local lesion recognition (e.g. vascular changes) or global feature assessment (e.g. optic nerve head or fundus color changes). Similarly, recent research either focuses on detection based on local details or on global features, with few models attempting to integrate both types of information to enhance their performance.
To overcome this limitation, LGSF-Net processes fundus images through two parallel learning streams: one prioritizing global feature extraction (via transformer blocks) and another focusing on local feature mapping (via CNN convolutional layers), with the outputs then being fused to form a unified disease classification prediction.
Using a publicly available fundus image dataset covering four disease categories — cataract, diabetic retinopathy, glaucoma, and normal — LGSF-Net achieved a 96% classification accuracy while using just 18.7K parameters and 0.93 GFLOPs. This represents a striking balance between computational efficiency and diagnostic precision, outperforming leading architectures such as ResNet50, Vision Transformer (ViT), and InceptionV3. Compared to ResNet50 (94% accuracy) and ViT (90% accuracy), LGSF-Net demonstrated superior generalization, particularly in identifying diabetic retinopathy, with an F1-score and recall of 0.99.
Beyond its accuracy, the model’s lightweight architecture makes it practical for real-world clinical deployment — especially in resource-limited or tele-ophthalmology settings where computational power and trained personnel may be constrained.
Visualization heat maps in the study further confirmed that LGSF-Net effectively captured both localized lesions such as microaneurysms and global features like optic disc morphology. In contrast, models omitting either the CNN or transformer module showed poorer interpretability and performance.
While Lin acknowledges that current datasets are relatively balanced and may not reflect the variability of real-world cases, he suggests that future research could test the model on larger, imbalanced datasets, and integrate additional imaging modalities like OCT (optical coherence tomography). If validated clinically, LGSF-Net could represent a new benchmark for AI-assisted ophthalmic screening, offering a faster, lower-cost, and highly accurate tool to aid early detection and management of vision-threatening diseases.