Validate, validate, validate …

I had a query recently from a researcher who devised a variant of the GPS to incorporate trunk data. He’d submitted it for publication but the reviewer asked for evidence that the scale had been validated and he wanted to know how to respond. It made me stop and think about the whole process of validation. It’s one of those areas in which the concepts evolved within psychometrics, where they are relevant, have been allowed to spill over into other areas, where they are not.


For the uninitiated the field appears complex. I remember a PhD student once who we asked to validate a scale coming back a week later completely confused – she did master it eventually but there was a steep learning curve. Read the relevant chapter in Portney and Watkins for example and you are conducted on a whistle-stop tour of face, content, criterion-related and construct validity in 20 pages. Altman and Bland (direct link to article) whip through these even more quickly and add in internal consistency for good measure.

I don’t have enough space in a blog article to go into why this is all necessary (Altman and Bland provide a succinct summary) but I do want to explore when it is necessary which I feel is very poorly understood. Stating it rather boldly, validation of a scale is required when we don’t know what we are measuring. Psychometrics evolved to support psychologists and behavioural scientists who wanted to quantify concepts such as happiness or anxiety. Neither happiness nor anxiety is defined in terms of numbers so the researcher has to go through a process of convincing her or his peers that the scale she or he has devised is a valid measure of what the rest of us understand by the terms. In our own, field health related quality of life or patient satisfaction or even general terms like gross motor function or mobility are similar qualitative terms. If we want to assign a numerical value to these then we need to go through the same process. As our understanding of the underlying issues becomes more sophisticated then so does the battery of different types of validity that we need to establish in order to convince others that our scale is represents what we say it represents.

By contrast, however, such a process of validation is not required if we do know what we are measuring. If we are measuring length, time, speed or joint angle, moment and power then there are very precise definitions of the terms we are seeking to measure and there is absolutely no need to go through this full validation process. The question we need to ask is whether the tests are accurate rather than whether they are valid. This requires a completely different set of techniques. The GPS is a derivative of joint angle measurements and I would argue that a consideration of accuracy is required rather than one of validity.

Of course there is a subsequent question which is whether any measurement is useful. Just because a variant of the GPS including trunk data is well defined and accurate doesn’t necessarily mean it is useful in any particular context. That, however, is yet another and different question.


Metrology or psychometrics


July’s over so time to move on from the Determinants of Gait. We’re starting detailed development of teaching material for our new masters degree programme in clinical gait analysis. I’m working on the measurement theory section at the moment and been reflecting how to approach this. I’ve got an engineering background and automatically assume that the language we should use to describe measurement is that of classical measurement theory which I’m going to refer to as metrology.

Modern metrology really started with the French Revolution when a political motivation emerged to standardise measurement systems across the country. Out of this emerged an international process for the standardisation of measurement which is now overseen by the Conference Generale des Poids et Mesures, still based in Paris. They publish the International Vocabulary for Metrology (IVM) which is really the international “Bible” for measurement theory.  The Vocabulary is designed to be universal including the statement that “metrology includes all theoretical and practical aspects of measurement, whatever the measurement uncertainty and field of application.”

One of the things that interests me about measurement in medicine in general and in rehabilitation in particular is that, in some respects, it is developing separately to this paradigm which is accepted almost universally in the physical and biological sciences and in engineering and chemistry. Measurement in medicine and rehabilitation is becoming increasing conceived within the framework of psychometrics. Why, if all the rest of the world is handling measurements one way does, psychology and now rehabilitation need to adopt a different approach?

Whilst its foundations can be traced back to Darwin (see Wikipedia) , psychometrics really came of age in the middle of the twentieth century and is thus a much more recent development than metrology. As the name implies it was developed by psychologists for their work studying concepts such as self-esteem or happiness or even pain which are less specifically defined than quantities in other branches of science. Over-simplifying a bit – metrology was developed to measure things that are specifically defined whereas psychometrics was developed to measure things that are not.

If the quantity you are measuring is specifically defined (e.g. someone’s height) then it is sensible to ask how accurate the measurement is (are you measuring what you claim to be measuring) and this is the fundamental challenge of metrology. If the quantity you are measuring is not specifically defined (e.g. how happy someone is) then the question of how accurate you are is rather meaningless. Psychometry thus focusses on the twin alternative questions of how reliable (repeatable) and valid measurements are.

Others may argue but I am convinced that there is a hierarchy here. If a quantity is well enough defined to determine how accurately it can be measured, then assessing repeatability and validity is second best. If you want to do the job properly you should use metrology to assess accuracy. Psychometric assessment of reliability and validity should be confined to quantities for which the superior option is not possible.

I think that the insidious onset of psychometry has made people lazy. I suspect that there would have been considerably more effort expended on improving measurements in biomechanics if the community had focussed on ensuring accuracy of measurements rather than accepting second best (and rather flattering) measures of repeatability derived from an essentially psychometric approach.

Any volunteers to man (or woman!) the barricades against the insurgence of psychometrics where it isn’t needed or wanted?