If we make a series of measurements of the same clinical variable we expect those measurements to show some variability. This may be because the measurement process is imperfect, or depends on exactly how the test is performed or because the underlying property being measured may vary from test to test. Understanding the amount of variability associated with a measurement can be very important when making clinical decisions. If we know that measurement variability is low then we can have confidence in any individual measure and act upon it. If the measurement variability is high then we will have less confidence. We might want to confirm any finding either by reference to different measurements or by repetition of the same measurement.
The simplest and often most appropriate measure of variability is the standard error of measurement (SEM). It is the standard deviation of a number of measurements made on the same person (indeed Bland and Altman prefer the term within-subject standard deviation).
Why I’ve written this material
These pages stem from a conviction that the within-subject standard deviation (also know as the Standard Error of Measurement, SEM) is the most appropriate method for reporting repeatability for clinical measurements. I’ve outlined this in a number of blog posts.
I consider this preferable to other reliability indices (ICC, CMC, CMD) for two main reasons. The most important is that the SEM, giving a direct indication of measurement uncertainty in the units in which the original measurement was made, is the most clinically relevant. If we know the uncertainty in knee extension is, say, 3°, then it is very clear how to incorporate that information into any interpretation of what the gait traces mean. Being told that the ICC is, say, 0.87, is much less useful.
The other issues is that indices all the ratio of one form of variability with respect to another. To understand the result we need to understand both measures of repeatability which makes the task at least twice as difficult! Several of the measures divide the variability we are interested in by a measure of variability that is not particularly meaningful which is clearly nonsensical. The ICC for example divides the measurement variability by the total variability within the sample of which the repeatability study has been conducted. This is clearly irrelevant if you want to apply the result to a population different to that from which the sample was drawn. Which we almost always do given that most repeatability studies are on healthy controls and most clinical interpretation is of patients who neurological or musculoskeletal impairments.
The fundamental simplicity of the SEM is, however, almost always hidden by the text books which present it as being derived from a reliability index (see eqn 26.11 in Portney and Watkins for example). This gives the impression that in order to fully understand the SEM you need first to fully understand the reliability index. In reality the situation is the other way around – the reliability index is derived from the SEM (see eqn 26.3 19 pages earlier in the same book). The aim of this material in these pages is thus to allow the SEM to speak for itself as the most clinically relevant measure of repeatability without ever mentioning a reliability index [again!]