Artificial intelligence in cataract surgery: A systematic review

Author: Müller S, Jain M, Sachdeva B, Shah PN, Holz FG, Finger RP, Murali K, Wintergerst MWM, Schultz T. 

 

Geographical coverage: Not reported

Sector: Cataract surgery

Sub-sector: Use and reliability of artificial intelligence

Equity focus: Not reported

Study population: Patients with cataract

Review type: Effectiveness review

Quantitative synthesis method: Meta-analysis

Qualitative synthesis method: Not applicable

Background: Globally, cataract surgeries are among the most frequently performed procedures and are often digitally recorded. The availability of video material and the standardised nature of these surgeries present a significant opportunity for automatic analysis in areas such as quality management, teaching, and training. However, recent advancements in artificial intelligence (AI), especially deep learning (DL), have further enabled this automation. The deep learning algorithms have shown great potential in revolutionising cataract surgeries from diagnostics and planning to surgical guidance and postoperative care. Therefore, the integration of AI in cataract surgery leads to improved surgical precision, better patient outcomes, and increased efficiency of healthcare professionals.

 

Objectives: To assess the current use and reliability of artificial intelligence (AI)-based algorithms for analysing cataract surgery videos.

 

Main findings: The review identified 38 studies that utilised AI for analysing cataract surgery videos, focusing on three main tasks: instrument analysis, surgical phase recognition, and skill/complication assessment.

In instrument detection, deep learning models, particularly combinations of convolutional neural networks (CNNs) and recurrent neural networks (RNNs), achieved high performance, with ROC AUC scores ranging from 0.976 to 0.998. Sensitivity and specificity values varied between 0.797 to 0.959 and 0.820 to 0.997, respectively. However, performance degraded when models trained on one dataset were tested on another, highlighting generalisation challenges.

For surgical phase recognition, algorithms achieved ROC AUC scores of 0.773 to 0.990 and accuracy rates of 64.5 to 97.8%. Older studies using handcrafted features performed worse than newer approaches employing CNNs with LSTMs for temporal analysis. Notably, one study achieved high accuracy (95.9%) using only instrument labels as input, suggesting that instrument presence alone can effectively predict surgical phases.

In surgical skill and complication prediction, performance varied widely, with ROC AUC scores between 0.570 and 0.970. Binary classifiers distinguishing novice from expert surgeons achieved accuracies of 57.8 to 84.8%, while specialised models predicted complications like lens instability with high precision. One study even outperformed human surgeons in early complication detection (ROC AUC: 0.970).

Methodology:  The searches were conducted in multiple databases (PubMed, Web of Science, Medline, and more) to identify studies published up to July 2023 that evaluated algorithms using quantitative performance metrics on real cataract surgery videos, focusing on surgical video analysis. The reference lists of included studies were also scanned to identify additional relevant publications. Only studies published in the past 10 years were considered due to the relative recency of successful computer vision solutions in the field of ophthalmology.

One reviewer screened the articles for eligibility while consulting a senior computer scientist or ophthalmologist in case of any uncertainty. Relevant data were extracted, and the reproducibility of the studies was assessed by one reviewer using a modified version of the International Conference on Medical Image Computing and Computer-Assisted Interventions (MICCAI) Reproducibility Checklist. Performance metrics across studies were compared. Statistical analysis, including Spearman’s correlation, was performed to explore relationships between dataset size and algorithm performance. Significance probabilities were then determined with a permutation test on this statistic.

Applicability/external validity: The review highlighted that most studies lacked validation on external datasets, with only 23.7% testing their models on independent data. Additionally, the reliance on limited public datasets further restricted broader applicability, particularly for manual small-incision cataract surgery (MSICS). Furthermore, variability in surgical techniques, instruments, and recording setups across institutions also posed challenges for real-world deployment. The review emphasised the need for standardised datasets, external validation, and transparent reporting to improve the clinical relevance and reliability of AI models in diverse settings.

Geographic focus: The review did not apply any geographical limits. However, it did not report the geographical distribution of the included studies.

Summary of quality assessment: Overall, there is low confidence in the review’s conclusions. The searches were comprehensive. Inclusion and exclusion criteria were clearly defined. Characteristics of included studies were well-documented, statistical analyses were appropriately performed, and heterogeneity was addressed. However, only one reviewer screened the articles for eligibility. The review did not provide a list of excluded studies, did not specify the number of reviewers involved in the data extraction. The review evaluated reproducibility of the studies instead of risk of bias, though their decision to focus primarily on reproducibility was justified given the context of AI-driven surgical video analysis.

Publication Source:

Müller S, Jain M, Sachdeva B, Shah PN, Holz FG, Finger RP, Murali K, Wintergerst MWM, Schultz T. Artificial Intelligence in Cataract Surgery: A Systematic Review. Transl Vis Sci Technol. 2024 Apr 2;13(4):20. doi: 10.1167/tvst.13.4.20. PMID: 38618893; PMCID: PMC11033603.

Downloadable link