Performance of deep neural network-based artificial intelligence method in diabetic retinopathy screening: a systematic review and meta-analysis of diagnostic test accuracy

Authors: Wang S, Zhang Y, Lei S, Zhu H, Li J, Wang Q, et al.

Geographical coverage: Not reported

Sector: Service delivery

Sub-sector: Diagnostic test accuracy, convolutional neural network, artificial intelligence, case detection

Equity focus: Not specified

Study population: Patients with referable diabetic retinopathy

Review type: Other review

Quantitative synthesis method: Meta-analysis

Qualitative synthesis method: Not applicable

Background:

Diabetic retinopathy (DR) screening requires significant investment in specialised equipment, strict quality control, and highly technical ophthalmologists limiting screening in resource-poor settings and low income countries (LICs). Automatic screening systems based on neural networks have been used to distinguish between non-referable DR and referable DR, which could save resources. In addition, only those detected through automated systems would need to be referred to ophthalmologists for further examination and treatment. Currently, there has been no quantitative synthesis of the performance of the different methods. This study, therefore, estimates the sensitivity and specificity of different neural networks in grading DR.

Objectives:

To quantitatively analyse the diagnostic accuracy of the neural network in detecting referable DR in patients with diabetes mellitus (DM) and investigate the factors that affect diagnostic accuracy.

Main findings:

The search identified 2,135 records by database searches and 19 by hand searching, of which 24 studies were included in the meta-analysis including 235,235 subjects. Some studies used data from public databases that may not be geographically specific as this data (fundal images) was derived from a range of country settings. Other studies included used images from local screening programmes/hospitals in the following countries: the Netherlands, India, New Zealand, China, Finland, Australia and the USA. The authors reported risk-of-bias assessment by domains (not overall) and found that the risk of bias in patient selection domain was low in 11 studies, unclear in 11 and high in two studies. In index test domain, risk of bias was low in 17 studies and high in seven studies. In reference standard domain, the risk of bias was low in 14 studies and high in 10 studies, while all studies had a low risk of bias in the domain of flow and timing.

The study found that neural network methods could correctly detect 91.9% (95% CI: 89.6 to 94.3) of the patients with referable DR and exclude 91.3% (95% CI: 89.0 to 93.5) of patients without referable DR, a high level of pooled sensitivity and specificity for computer-aided diagnosis. Comparing the five different convolutional neural network (CNN) models used found no significant difference between them. No relationship between diagnostic accuracy and image resolution was found. No subgroup effect was found in regard to sample size of training sets for the CNN models. As it is expensive and time consuming to train algorithms with a large amount of high-resolution labelled data, it might be possible to develop a more limited minimum requirement for image resolution and sample size of training set and still get good diagnostic results. Sensitivity analysis, after excluding nine studies which did not consider diabetic macular edema as a target condition, showed a pooled sensitivity of 90.7% (95% CI: 87.2% to 94.3%) and specificity of 90.0% (95% CI: 87.2% to 92.7%). However, both sensitivity (91.7%, 95% CI: 89.1% to 94.2%) and specificity (91.4%, 95% CI: 89.1% to 93.8%) slightly increased after excluding three studies whose neural networks were unclear or not in-depth.

In summary, neural networks can effectively detect clinically significant DR. Diagnostic accuracy could be improved by the development of new algorithms rather than improving image quality inputted into existing algorithms or increasing the sample size of images used.

Methodology:

The search for studies was done on Medline, EMBASE, Cochrane Library and IEEE Xplore. from inception to July 2019 to identify cross-sectional diagnostic accuracy studies. Manual checks of reference lists were also done to identify additional relevant studies. Studies were included that used neural networks to detect referable DR including macular oedema, based on fundus photography. Studies were required to include an evaluation of accuracy of the neural network results compared to the index test of diagnosis by an ophthalmologist.

Two reviewers independently screened the articles and extracted the data The risk of bias in each study was assessed using the QUADAS-2. Authors stated that due to the lack of a universally accepted method to detect publication bias in reviews of diagnostic studies (according to Cochrane Handbook for Diagnostic Tests Review) they did not assess publication of bias in the review.

Meta-analysis results were presented in forest plots with sensitivity and specificity and 95% CIs. A summary receiver operating characteristics (ROC) curve was also produced based on data in 2-by-2 tables allowing fixed-effect and random-effects to estimate a population-level ROC. Subgroup analyses and meta-regression were used to explore the between-study heterogeneity and several sources were explored.

Applicability/external validity:

The authors did not specifically discuss the external validity of the results.

Geographic focus:

The authors did not report geographic focus of included studies.

Summary of quality assessment:

Overall, there is medium confidence in the conclusions about the effects of this study. Important gaps on the comprehensiveness of the literature search were identified, as only peer-reviewed articles written in English were included in the review. The risk of bias in included studies was appropriately assessed using the QUADAS-2, however, results were not analysed and presented separately based on the risk of bias identified.

Publication Source:

Wang, S, Zhang, Y, Lei, S, Zhu, H, Li, J, Wang, Q, Yang, J, Chen, S, & Pan, H. (2020). Performance of deep neural network-based artificial intelligence method in diabetic retinopathy screening: a systematic review and meta-analysis of diagnostic test accuracy. European Journal of Endocrinology, 183(1), 41–49. https://doi.org/10.1530/EJE-19-0968

Downloadable link https://pubmed.ncbi.nlm.nih.gov/32504495/