Statistician AE Maxwell used to say, as I put my head cautiously past his open door and then sat in front of his desk “Have you plotted the data?” His doctoral thesis consisted of one factor analysis, done by hand, which reportedly took him almost three years. By that time, he had got to know his data.
Brian Everitt, in the room next to Maxwell in the Biometrics Department at the Institute of Psychiatry, used to add: “It is a big limitation of statistics that when you ask a question, you are given a number in reply. You should be given an answer to your question.”
With these paragons in mind it is a delight to be guided to Emil Kirkegaard’s site, where he plots the data and answers questions. Yes, there are some numbers, but they are closely linked to the plotted data, which aids understanding.
I know that my esteemed readers might regard all this as old hat, but I think it has great utility.
Restriction of Range
Psychology samples tend to be drawn from college students, and although it may be hard to believe sometimes, they are of above average intelligence. Even if one excludes only those of below average intelligence (try it with the slider set at a Z value of zero) that restriction reduces the variance by 63%. In standard present day university samples where IQ 115 is the minimum required, variance will be reduced by 80%. In proper, old style universities where IQ 130 is the entry requirement, the reduction in variance is 88%. I think this is very important, particularly when some researchers make claims about multiple intelligences based on Ivy League and Oxbridge students showing that some particular skill, say gastro-intestinal intelligence, is unrelated to g because the correlation is only 0.18, which in fact means that the general population correlation is very probably a much larger 0.50
“Small differences in means are great at the extremes”
Having repeated the quip, I should have added to it: “and small differences in standard deviations cause large perturbations”. Here it is again, ready for a tweet:
“Small differences in means are great at the extremes and small differences in standard deviations cause large perturbations.”
In this example Emil introduces us to the Blues and the Reds. These two tribes differ by one standard deviation on a score which is very similar to intelligence. That means that at a threshold of IQ 130 (old style good university) the proportions of Blue to Red students will be about 17 to 1. That is to say, if entry to such a university is based only on ability, that will be the ratio. If in addition the standard deviation of Red intelligence is a bit narrower (say only 14, and not the usual Blue sd of 15) then the ratio of Blue to Red will be 35 to 1 on intelligence alone. Please stick to Blue and Red, because that makes the concept easier for many people to understand.
Regression towards the mean
This has been explained many times, but plotting the data helps. “Regression” implies a process which takes time: some magical shrinking or reversion to a primitive ancestral state. Partly this is due to psychoanalytic notions about childhood, partly due to an analogy with the loss of function which is part of ageing. Engaging ideas, but not what is being discussed here. I think I am in favour of the more general title of “errors in repeated measurements”. The simplest verbal explanation is to say that the more often you test someone the less their overall results will be affected by flukes, and if you select people on the basis of extreme scores at first testing, those individuals are unlikely to be so extreme at second testing, just because of testing un-reliabilities. Flukes get lost, because they are flukes.
I put in a test-retest reliability figure of 0.8 which corresponds to that observed for Wechsler intelligence subtests. Even in those subtests there will be an apparent regression caused by measurement error. As Emil notes, this may falsely create the impression that a group with low scores has been raised to a higher standard by some educational intervention carried out before they are re-tested. Ideally, one would re-test half the group who had obtained low scores first time round without giving them any educational intervention, in order to find out how much of the “improvement” was mere measurement error.
Even when you set test-retest reliability at 0.93 (true of Wechsler Full Scale IQ with 6 months between test sessions) then there is still a small regression slope of –0.07 and there will be quite a few outliers with large apparent changes in ability levels.
In conclusion, having these interactive visualising tools handy could help you make critical comments when reading 98% of psychology papers.