normative data

What is normal?

A couple of months ago I wrote a post entitled normative data capture Part 1. No-one has yet demanded Part II but I’ll give it anyway. The earlier post concentrated on what sort of numbers were required to determine normative ranges for any data assuming we want a reasonable estimate of both the average and the standard deviation.

Once we’ve got the numbers sorted out then the question arises “What is normal?” It might be worth starting with a paragraph addressing the political dimension here. The word “normal” is considered inappropriate in some circles because of the connotation that anyone else, our patient for example, is “abnormal” which is considered a negative term. The response from some researchers (particularly Americans?) working with children has thus been to prefer the term typically developing. This presumable implies that our patients our atypical and I’m not sure that that is any better or worse than abnormal.  What I do appreciate is that we are all abnormal in some regard. The question should not be whether the person is normal or abnormal but whether their gait pattern is. I think normative stresses this emphasis that it is the data or the pattern that is abnormal rather than the individual (but others may think differently).

But then what is a normal gait pattern?  In my dictionary there are various definitions of normal and the closest to the sense in which we are using it is not deviating from the standard. Even this though is not particularly close. I suspect that what we really mean is representative of the population. This raises the questions of how we consider people, with conditions such as  cerebral palsy  in relation to this population?  I think that conceptually they should be considered as part of the population. Thus the normal population includes people with cerebral palsy (and other gait disorders). In childhood and early adulthood at least, these conditions are quite rare (approximately 1 in 500 people is born with CP, one of the more common conditions affecting walking) so true normative ranges (calculated over a wide enough sample) would be very little affected by including or excluding them.

Of course we most often collect normative data from much smaller samples (my previous post suggested that 30 might be regarded as a reasonable number). In this case it makes sense to specifically exclude people with obvious neuromusculoskeletal pathology not because we regard them as abnormal in principle but because the statistics of the situation dictate that the normative data that we obtain by excluding them will be closer to normative data for the entire population than the data we would obtain if they were included. (Including one person with CP in a sample of 30 runs the risk of obtaining normative data that is quite different from that which would result from the one person in 500 in the general population)

A more common problem is a number of anatomical and physiological characteristics which have a wide range within the general population such as in and out-toeing, tibial torsion, femoral anteversion, flat-feet and high arches. Some health professionals will want to define an arbitrary and often subjective cut-off beyond which the individual is labelled as having an impairment and exclude them from the normative dataset. I remember hearing one story of a gait analysis service that was interested in providing normative data for a foot model and simply collected a group of individuals who, to them, had no obvious neuromusculoskeletal impairment and were entirely asymptomatic. The team was later joined by another health professional who looked at the dataset and concluded that quite a large proportion of the cohort had either flat feet or raised arches and wanted these abnormal people deleted from the dataset.

This of course raises the prospect of self-fulfilling prophecy. People with flat feet are considered as abnormal because they have data that falls outside the range of those who have been assessed as not having flat feet. This is clearly daft. Normative data ranges should ideally be generated with randomly sampled datasets from the whole population. In practical situations random sampling is extremely rare but it is inappropriate to select participants on the basis of some pre-supposition of what normal is (unless the abnormality is so rare and severe that the inclusion in a small sample risks skewing the data as described above).

There is another problem when we start to look at older populations. Iezzoni et al. (2001) suggest that over 10% of the entire population have some difficulty walking as little as 400m, rising to nearly 50% if we look at the population aged 80 and over. There is considerably potential for inclusion or exclusion of these individuals to affect normative data. If we are concerned with what data to use for comparative purposes in older populations, however, then I think the goal posts have shifted.  What we really require is not normative reference data but reference data from healthy people within a particular age range or more specifically those without any specific neuromusculoskeletal pathology. If we want a convenient shorthand then perhaps we should refer to a healthy gait pattern in these circumstances rather than a normal one.

There are the same risks here of subjective decisions as to where the healthy range ends and pathology starts which are largely unavoidable. These can be addressed to a certain extent by defining explicit and objective inclusion criteria. We might not agree with definitions but at least we will know what they are. Even these are problematic however because it is very easy to introduce sampling bias when recruiting. When selecting healthy controls for a study there will be a tendency to select the healthiest available.  All will fulfil the inclusion criteria but they may not be representative of the population of all people who fulfil those criteria.  The solution here may be to specify the characteristics of the sample that were actually recruited rather than the inclusion criteria for the study.


Iezzoni, L. I., McCarthy, E. P., Davis, R. B., & Siebens, H. (2001). Mobility difficulties are not only a problem of old age. Journal of General Internal Medicine, 16(4), 235-243.

Do it yourself normative data comparison (free download!)

Hi, I’ve had a bit of a break over the summer but I’m hoping to start posting again regularly form now on. Just a reminder that we are running another gait course in November (this one focussing on measurement issues rather than clinical interpretation). Click on the image to the right for more details. There is also just about time to register to start our Masters in Clinical Gait Analysis by distance learning (you don’t need to come to Salford at all). Click on the other image to the right for more details of it.

Enough of the ads, this blog following on from one I wrote just after the GCMAS meeting last year. It had a video link to the presentation I’d just delivered arguing that the reason we collect normative data should be so that we can compare it with other people’s normative data. I presented data showing that if we did that for the normative data from the Royal Children’s Hospital in Melbourne and Gillette Children’s Speciality Healthcare then we get quite remarkable agreement. You can see the comparative kinematic data in the figure below.

compare norms

The paper based on that presentation has finally got published in Gait and Posture (if you don’t have access to the journal you can find a pre-publication version here). We’ve also prepared an Excel file that contains the data in a format that allows you to add your own normative data (mean and standard deviation) to allow comparison with the data from Melbourne and Gillette. Just cut and paste your data into the spreadsheet and look at the graphs to see how you compare. Remember that differences in the mean traces suggest that there are systematic differences in how you apply markers, differences in standard deviations are likely to reflect how consistently you apply them within your own lab.

Do let me know how well your data compares. It might be interesting to post some examples of how various clinical centres compare on this blog-site somewhere.

Normative databases: Part 1 – the numbers game

I get quite a few queries from people asking about how they should construct normative databases with which to compare their measurements. The first question to address is what you want the normative database for. As you’ll read in my book or in a paper that has just been accepted for Gait and Posture (based on the paper I presented at GMCAS last year)  I’m not convinced by the traditional arguments that we all have different ways of doing things and that we need to compensate for this by comparing clinical data to our own normative data. The whole history of measurement science, which really started at the time of the French revolution, has been about standardisation and the need to make measurements the same way. I don’t see any reason why gait analysts should be allowed to opt out of this.

I’d suggest that the main reason for collecting normative data should be to demonstrate that our measurement procedures are similar to those used in other labs rather than to make up for the idiosyncrasies that have developed for whatever reasons. Our paper shows that there are very small differences in normative data from two of the best respected children’s gait analysis services on different sides of the planet (Gillette Children’s Speciality Healthcare in Minneapolis and the Royal Children’s Hospital in Melbourne). The paper should be available electronically very soon (a couple of weeks) and will include the two normative datasets (mean and standard deviations) for others to download and compare with.

There are two important elements for comparison. Differences between the mean traces of two normative datasets will represent a combination of systematic differences between the participants and between the measuring techniques in different centres. If you find large differences here you should compare detailed description of your technique with that from the comparison centre and try and work towards more consistent techniques. Differences in the standard deviations represent differences in variability in the participants and in the measurement techniques. High standard deviations are likely to represent inconsistent measurement techniques within a given centre and require work within the centre to try and reduce this.

Having defined why we want to collect the data you can then think about how to design the dataset. The most obvious question is how many participants to include? The 95% confidence limits of the mean trace are very close to twice the standard error of the mean which is the standard deviation divided by the the square root of the sample size. I’ve plotted this on the figure below (the blue line). Thus if you want 95% confidence that your mean is within 2° of the value you have measured you’ll need just under 40 in the sample. If you want to decrease this to 1° you’ll need to increase the number to about 130. I’d suggest this isn’t a very good return for the extra hassle in including all those extra people.

sample size for normative data collection

Calculating confidence limits on the standard deviations is a little different (but not a great deal more complicated) because they are drawn from a chi-distribution rather than a normal distribution (see Stratford and Goldsmith, 1997). We’re not really interested in the lower confidence limit (how consistent our measurements might be in a best case scenario) but on the upper confidence limit (how inconsistent they might be in the worst case). We can plot a similar graph (based on the true value of the standard deviation being 6°). It is actually quite similar to the mean with just over 30 participants required to have 95% confidence that the actual SD is within 2 degrees of the measured SD and just under a hundred to reduce this to 1°.

In summary aiming to have between 30 and 40 people in the normative dataset appears to give reasonably tight confidence intervals on your data without requiring completely impractical numbers for data collection. You should note from both these curves that if you drop below about 20 participants then there is quite high potential that your results will not be representative of the population you have sampled from.

That’s probably enough for one post – I’ll maybe address some of the issues about the population you should sample from in the next post.

Just a note on the three day course we are running in June. Places are filling up and if you want to book one you should do so soon.


Stratford, P. W., & Goldsmith, C. H. (1997). Use of the standard error as a reliability index of interest: An applied example using elbow flexor strength data. Physical Therapy, 77, 745-750.





What is normal walking?

In the last post I commented on the recent paper by Dall et al. (2013) and the context of its publication. As commented on by the author in response to that post, some of the results are interesting in their own right. (I was going to paste a couple of figures from the paper into this article but the publishers require a payment of over $300 to do this legally so you’ll have to download a copy of the paper yourself if you want to see the evidence.)

Figure 1 shows the frequency distribution of minute epochs during which walking was recorded at various cadences. The mean cadence was 76 steps per minute with a cadence of less than 100 steps per minute in about 80% of the minutes during which any walking was recorded. When healthy adults walk at “self-selected” speed in the gait lab they tend to walk at cadences of well over 100 steps per minute (A brief review of the previous literature in Winter (1991) suggests values between 100 and 120). We can thus see that cadence in everyday activity is very different to that during walking in the laboratory.

The paper also includes a second graph (Figure 4) showing the same data but for the sub-set of minutes when the participants walked for the full minute. This shows a mean value of 109 (±9) steps per minute which is in much better agreement with the self-selected walking speeds recorded in the laboratory. The most obvious explanation of these two graphs together is that when we walk for short bouts we do so at much slower cadences than we tend to look at in the laboratory but when we walk continuously for a minute or more that we appear to walk at similar speeds (although the graphs tends to suggest that there is more variability in this in real life than I’d expect in the laboratory).

This can be put together with the data from Orendurff et al. (2008) that shows that 90% of bouts of walking are for less than 100 steps and 75% are less than 40 steps to suggest that the walking we investigate in the gait laboratory is quite different to the walking the we use most frequently in our everyday lives. This worries some people but this misses the reason for performing clinical gait analysis as we do. We use level walking at self-selected speed because it is a well-defined stereotypical movement that we understand reasonably well. We hope that analysing it will give clinical insights into impairments of neurological, muscular or skeletal function. The ultimate hope is that if we base treatment on the results of this analysis then we will improve function in “laboratory walking” and in every day walking as well. I hope you can see that this line of reasoning does not necessarily require laboratory walking to be representative of everyday walking.


Dall, P. M., McCrorie, P. R., Granat, M. H., & Stansfield, B. W. (2013). Step Accumulation per Minute Epoch Is Not the Same as Cadence for Free-Living Adults. Med Sci Sports Exerc.

Orendurff, M. S., Schoen, J. A., Bernatz, G. C., Segal, A. D., & Klute, G. K. (2008). How humans walk: bout duration, steps per bout, and rest duration. J Rehabil Res Dev, 45(7), 1077-1089.

Winter, D. (1991). The biomechanics and motor control of human gait: Normal, Elderly and Pathological (2nd ed.). Waterloo:: Waterloo Biomechanics.

Why do we collect normative data?

The sun is still shining in Cincinnati although many of us in the conference hotel are seeing very little of it. Thought I’d share the podium presentation I’ve just made which reflects on why it is that we collect service specific normative reference data. It’s my feeling  that this should be to allow us to compare data between services in order to develop consistent practices rather than as a way to allow us to continue to tolerate differences in the way different services make measurements. Anyway if you want to you can listen to the screen cast below.

There was an interesting technical extension to the work which I was unable to include in the presentation because of tht time limit. This is covered in the screen cast below.