A common question encountered in biology is to which extent the measurement of a variable by two different methods, or by two different assessors using the same method, or by a single assessor repeating the measurement at two different time points, produces essentially the same result.
Statistical methods used to assess agreement are concerned both with accuracy (validity), which is the way in which the observed value agrees with the true value in the population and precision (reliability), which measures the extent to which repeated observations conform.
Validity:
-
External and internal validity
-
Types of internal validity:
-
Face/content validity
-
Concurrent validity
-
Criterion-related validity (convergence)
-
Predictive validity
-
Reliability:
-
Inter-rater reliability
-
Intra-rater reliability
-
Test-retest reliability
-
Internal consistency reliability
Ruling out a systematic error between comparable methods:
-
-
McNemar’s test for binomial variables
-
Paired t-test for normally distributed continuous variables
-
-
Wilcoxon’s sign rank test for variables of other distribution.
Expressing agreement:
-
For categorical variables:
-
Cohen’s kappa for binomial variables
-
Weighted Kappa for ordinal variables
-
-
For numerical variables:
-
Correlation coefficient
-
Bland and Altman’s plot
-
Intraclass correlation coefficient
-
Standard error of measurement
-
Minimal detectable change
-
Application:
-
Running test on SPSS
-
Drawing Bland & Altman plots
-
Critical appraisal:
-
Critical appraisal tool items specific for validity & reliability studies
-
Critiquing a published paper
-