Methodology and reporting of diagnostic accuracy studies of automated perimetry in glaucoma: evaluation using a standardised approach

Methodological quality of the review: Low confidence


Authors: Fidalgo BM, Crabb DP, Lawrenson JG


Region: Not reported


Sector: Glaucoma


Sub-sector: Diagnostic accuracy

Equity focus: None specified

Review type: Other review

Quantitative synthesis method: Narrative synthesis

Qualitative synthesis method: Not applicable


Diagnostic accuracy studies are used to certify new tests before they are brought into clinical practice. Most diagnostic accuracy studies report sensitivity and specificity, positive and negative predictive values, or receiver-operator characteristics (ROC) curves as measures of diagnostic performance. This information allows a clinician to make an informed decision regarding the potential value of the new test. However, if those studies are not conducted properly or their reporting is inconsistent there is potential for bias (internal validity) or difficulty in estimating the generalizability of the study findings (external validity).


To evaluate methodological and reporting quality of diagnostic accuracy studies of perimetry in glaucoma and to determine whether there had been any improvement since the publication of the Standards for Reporting of Diagnostic Accuracy (STARD) guidelines.

Main findings:

A total of 58 studies were included in the review. The authors note that inter-rater reliability, estimating consistency amongst the reviewers as determined by weighted kappa for QUADAS and STARD, was 0.70 and 0.81 respectively. The majority of studies used ophthalmologist diagnosis based on combined optic disc assessment, IOP and SAP as the reference standard for glaucoma diagnosis.

Of the 58 articles, the number of QUADAS items with a ‘yes’ response ranged from 2 to 14 with a median score of 9. For the papers published before 2003 (n = 22) and after 2003 (n = 36) this median score was eight (interquartile range [IQR] 5–9) and 10 (IQR 8–10) respectively.

In STARD, item 2, 4 and 25 were fully reported by all 58 articles. Item 20 was not scored due to the non-invasive nature of perimetry. No article fulfilled all items and only 41% reported in full more than half of the items. The highest and lowest number of items reported was 18 and six respectively. Statistical methods were often partially described and only 13 (22%) of 58 publications reported measures of statistical uncertainty. The median score of items fully reported was 11. For the papers published before 2003 (n = 22) and after 2003 (n = 36) this median score was 11 (interquartile range [IQR] 8–13) and 12.5 (IQR 10– 16). Only one study explicitly mentioned the use of STARD for preparing the manuscript.


The authors used the OVID platform to search relevant electronic databases (MEDLINE, EMBASE, and Global Health) to identify diagnostic accuracy studies of perimetry published over a 20-year period between January 1993 and August 2013. The search was limited to publications in the English language and studies on human subjects.

The titles and abstracts of all reports identified by the search strategy were screened by a single reviewer. A quality assessment was performed on all included studies by a single reviewer using both quality assessment tools (QUADAS and STARD). Two reviewers (JL and DC) independently assessed the quality of a random 20% sample of included studies.

The authors used descriptive statistics were used to summarise the number and proportion of items that met the QUADAS and STARD criteria. Studies were stratified according to two time periods, 1993–2003 and 2004–2013, corresponding to periods before and after publication of the quality assessment tools to determine the impact of their publication on quality. An inter-rater reliability analysis using the weighted Kappa statistic was performed to determine consistency among reviewers.

Applicability/external validity:

The overall conclusion is that the quality of reporting of the diagnostic accuracy of perimetric tests in glaucoma is sub-optimal and appears not to have improved substantially following the development of the STARD initiative.

Geographic focus

Not reported

Summary of quality assessment:

Low confidence was attributed in the conclusions about the effects of this study as important limitations were identified. The search strategy was not comprehensive enough to ensure that all relevant studies were identified and included in the review. methods used to screen studies and appraise included studies were not rigorous to ensure bias was avoided within the review.

Publication source:

Fidalgo BM, Crabb DP, Lawrenson JG (2015) Methodology and reporting of diagnostic accuracy studies of automated perimetry in glaucoma: evaluation using a standardised approach. Ophthalmic Physiol Opt. 2015 May;35(3):315-23.