July’s over so time to move on from the Determinants of Gait. We’re starting detailed development of teaching material for our new masters degree programme in clinical gait analysis. I’m working on the measurement theory section at the moment and been reflecting how to approach this. I’ve got an engineering background and automatically assume that the language we should use to describe measurement is that of classical measurement theory which I’m going to refer to as metrology.
Modern metrology really started with the French Revolution when a political motivation emerged to standardise measurement systems across the country. Out of this emerged an international process for the standardisation of measurement which is now overseen by the Conference Generale des Poids et Mesures, still based in Paris. They publish the International Vocabulary for Metrology (IVM) which is really the international “Bible” for measurement theory. The Vocabulary is designed to be universal including the statement that “metrology includes all theoretical and practical aspects of measurement, whatever the measurement uncertainty and field of application.”
One of the things that interests me about measurement in medicine in general and in rehabilitation in particular is that, in some respects, it is developing separately to this paradigm which is accepted almost universally in the physical and biological sciences and in engineering and chemistry. Measurement in medicine and rehabilitation is becoming increasing conceived within the framework of psychometrics. Why, if all the rest of the world is handling measurements one way does, psychology and now rehabilitation need to adopt a different approach?
Whilst its foundations can be traced back to Darwin (see Wikipedia) , psychometrics really came of age in the middle of the twentieth century and is thus a much more recent development than metrology. As the name implies it was developed by psychologists for their work studying concepts such as self-esteem or happiness or even pain which are less specifically defined than quantities in other branches of science. Over-simplifying a bit – metrology was developed to measure things that are specifically defined whereas psychometrics was developed to measure things that are not.
If the quantity you are measuring is specifically defined (e.g. someone’s height) then it is sensible to ask how accurate the measurement is (are you measuring what you claim to be measuring) and this is the fundamental challenge of metrology. If the quantity you are measuring is not specifically defined (e.g. how happy someone is) then the question of how accurate you are is rather meaningless. Psychometry thus focusses on the twin alternative questions of how reliable (repeatable) and valid measurements are.
Others may argue but I am convinced that there is a hierarchy here. If a quantity is well enough defined to determine how accurately it can be measured, then assessing repeatability and validity is second best. If you want to do the job properly you should use metrology to assess accuracy. Psychometric assessment of reliability and validity should be confined to quantities for which the superior option is not possible.
I think that the insidious onset of psychometry has made people lazy. I suspect that there would have been considerably more effort expended on improving measurements in biomechanics if the community had focussed on ensuring accuracy of measurements rather than accepting second best (and rather flattering) measures of repeatability derived from an essentially psychometric approach.
Any volunteers to man (or woman!) the barricades against the insurgence of psychometrics where it isn’t needed or wanted?
Can you accurately measure something that you can’t define? I would think not and so psychometrics are indeed not appropriate measures of accuracy. On the other hand, isn’t something only truly defined if you can repeatedly measure it? So what should we do? On the other hand, a reviewer has asked me to refer to the ‘clinimetric properties’, rather than the psychometrics, of the measurement method. This term seems to be more commonly used than psychometrics, but is just as vague, poorly described, and not accepted by any spell-check (i.e. more medical jargon)!
I’ve been vaguely aware of the term “clinimetrics” but haven’t paid too much attention to it. It seems to have a fairly small web footprint and a couple of key articles listed on Pub Med are in journals that my university doesn’t subscribe to. From the abstracts and one commentary which is openly available at http://dspace.ubvu.vu.nl/bitstream/handle/1871/22214/263548.pdf?sequence=1 it appears that clinimetrics is an even more recent phenomenon (introduced by Feinstien in a text book in 1987?). It seems to have evolved in reaction to psychometry partly because of some specific aspects of psychometry (particularly its tendency to focus on relative measures of reliability) and partly because of its name!
Of course if clinical medicine just adopted the approach of the rest of science with regard to measurement then there’d be no need for a specific discipline of clinimetrics.
I should add that just because a quantity cannot be specifically defined (and hence accurately measured) does not mean it is not important. I’m beginning to think that there is a catch-22 at the heart of clinical research which is that the most important outcomes, e.g. quality of life, cannot actually be quantified.
Great blog and a very relevant discussion.
Physical measurement will always be the preferred clinical outcome relative to latent “non-physical” variables, and in particular, biomechanical measures are the best representation of objective function. Psychometrics is only applicable when we want to acquire some measure of a psychological attribute that has relevance for the persons being measured. If the person is a patient, then the attribute might be pain or self-assessed functional ability/impairment, which we obviously cannot measure with physical instrumentation. But of course pain, sense of impairment, or as you mention above QoL are relevant entities. The problem with classical psychometric theory is that the underlying measurement models are for a number of reasons insufficient to establish real measurement (invariant comparison of such entities). Pearson applied linear regression and factor analytic techniques to latent variables, but these techniques are exploratory and never test whether the data actually fit the specified model. This is one major reason why most psychometry is useless for measurement purposes. There are however methods that approach real measurement for latent variables: Item Response Theory (IRT) and Confirmatory Factor Analysis, in that they both are confirmatory in nature, and thus compare observed data with expected model values (residuals are computed). Rasch IRT is the technique that mathematically best satisfies the model requirements of measurement. So, why then is this even relevant for us as biomechanists and clinicians? For one reason, there is a hugely increasing emphasis on patient-reported outcomes in the healthcare field. For my PhD, I developed a questionnaire for patients with ACL-deficiency and ACL-reconstruction, and validated the instrument using Rasch IRT. My plan was to correlate the scores with 3-D biomechanical input. Unfortunately, there wasn’t enough time to do this in the time frame of the project, and also unfortunately the cams in my lab began to fail and were not reparable. But I am trying to get funding for a postdoc stipendiary or try to do it my “spare time”. But to address your comment about QoL, I think that “psycho”measures must be shown to have a causal relationship with physical measures. If the knee is busted, it must impact QoL…
Denny Borsbooms brilliant book “Measuring the Mind” is a wonderful read in terms of measurement theory and its paradoxes in psychometry.
Jonathan Comins MSc, RPT, PhD
I am working in the field of Rehabilitation. I always face up with what to believe between the results of clinimetric measurement and psychometric measurement especially in pre and post operative surgery. I had one OA knee research project. In this project , I had to measure pre and post operative of knee after replacing knee prosthesis. The data was shown that there were no significantly improved between pre and post operative but the patients said that they had no knee pain and could walk better than before. so… what should we believe?
Think this is a classic case of measuring different things. Measures of the gait pattern aren’t necessarily related to pain. Remembering that there is quite a lot of measurement variability in measuring the gait pattern I can quite believe that a group with Knee OA would show little change in gait pattern but still see quite marked changes in pain (after all you’ve exchanged all that inflamed synovium and worn out cartilage for a nice metal implant).
More subtle is the difference between how someone walks (a well defined measure suited to metrology?) and how well they feel they walk (a less specifically designed measure suite to psychometrics). If someone is walking pain free then they may feel they are walking better even if there is no change in gait pattern.
I think these are examples of being very careful in defining what we are measuring can help in interpreting results.
My point exactly! A causal relationship must be established between the sense of function and the functional act itself, before we can expect to find substantial correlations between the two measures in response to some intervention.
FYI, a few relevant peer-reviewed publications in the vein of Jonathan Comins comments above:
Fisher, W. P., Jr. (1997). Physical disability construct convergence across instruments: Towards a universal metric. Journal of Outcome Measurement, 1(2), 87-113.
Fisher, W. P., Jr. (1999). Foundations for health status metrology: The stability of MOS SF-36 PF-10 calibrations across samples. Journal of the Louisiana State Medical Society, 151(11), 566-578.
Fisher, W. P., Jr. (2009). Invariance and traceability for measures of human, social, and natural capital: Theory and application. Measurement, 42(9), 1278-1287.
Fisher, W. P., Jr., Harvey, R. F., & Kilgore, K. M. (1995). New developments in functional assessment: Probabilistic models for gold standards. NeuroRehabilitation, 5(1), 3-25.
Fisher, W. P., Jr., Harvey, R. F., Taylor, P., Kilgore, K. M., & Kelly, C. K. (1995, February). Rehabits: A common language of functional assessment. Archives of Physical Medicine and Rehabilitation, 76(2), 113-122.
Fisher, W. P., Jr., & Stenner, A. J. (2016). Theory-based metrological traceability in education: A reading measurement network. Measurement, 92, 489-496.
Mari, L., & Wilson, M. (2014, May). An introduction to the Rasch measurement approach for metrologists. Measurement, 51, 315-327.
Mari, L., & Wilson, M. (2015, 11-14 May). A structural framework across strongly and weakly defined measurements. Instrumentation and Measurement Technology Conference (I2MTC), 2015 IEEE International, pp. 1522-1526.
Pendrill, L. (2014, December). Man as a measurement instrument [Special Feature]. NCSLi Measure: The Journal of Measurement Science, 9(4), 22-33.
Pendrill, L., & Fisher, W. P., Jr. (2013). Quantifying human response: Linking metrological and psychometric characterisations of man as a measurement instrument. Journal of Physics Conference Series, 459, http://iopscience.iop.org/1742-6596/459/1/012057.
Pendrill, L., & Fisher, W. P., Jr. (2015). Counting and quantification: Comparing psychometric and metrological perspectives on visual perceptions of number. Measurement, 71, 46-55. doi: http://dx.doi.org/10.1016/j.measurement.2015.04.010
Wilson, M. R. (2013). Using the concept of a measurement system to characterize measurement models used in psychometrics. Measurement, 46, 3766-3774.
Wilson, M., Mari, L., Maul, A., & Torres Irribara, D. (2015). A comparison of measurement concepts across physical science and social science domains: Instrument design, calibration, and measurement. Journal of Physics Conference Series, 588(012034), http://iopscience.iop.org/1742-6596/588/1/012034.