There is little point in reading newspapers when they pretend to discuss serious subjects, but even so, one hopes for the best. Two days ago an article in a broadsheet British newspaper offered “Seven ways to appear more intelligent than other people”. Not reading newspapers was not one of them.
However, these stories show what sticks in the mind: the ideas which get repeated because they serve the function of disparaging intelligence test results. “Scientists once claimed that intelligence quotient (IQ) levels were hereditary. This meant that human beings had no control over their brain power; it was decided by their genes. However, recent studies have shown that IQ scores are barely linked to genes at all. They can also be extremely volatile, changing significantly - by up to 20 points - over time.”
Ignoring the pure ignorance, it was the claim about “volatility by up to 20 points” that caught my eye. No reference was given, but in the 1970s that was already a popular argument. In other fora I found a 2012 paper being quoted in support of that claim, so that seemed worth a look.
Sue Ramsden, Fiona M. Richardson, Goulven Josse, Michael S. C. Thomas, Caroline Ellis, Clare Shakeshaft, Mohamed L. Seghier & Cathy J. Price. (2012) Verbal and non-verbal intelligence changes in the teenage brain. doi:10.1038/nature10514
They tested 33 adolescents at 14 years and again at 18 years, each time giving them a Wechsler IQ test and an MRI. 33 adolescents are hardly a good representation of adolescents generally. Presumably the cost of MRI scans has limited the number, so the authors have tried to ensure that the people they selected represented the normal range. The same technique has often been used, for example in the early days of inspection time studies, when the testing procedure was very time consuming. Small, artificially representative samples were criticised then, and can be criticised now, but they are a start, though better for establishing whether a correlation exists than as a measure of normal variability.
The sample means were 112 (13.9) at first testing and 113 (14.0) at second testing, an overall gain of 1 IQ point. There was what the authors call “a tight correlation across testing points (r = 0.79).
The wide range of abilities in our sample was confirmed as follows: FSIQ ranged from 77 to 135 at time 1 and from 87 to 143 at time 2, with averages of 112 and 113 at times 1 and 2, respectively, and a tight correlation across testing points (r = 0.79; P < 0.001).
At this point, you might want to stop reading. In social science research, correlations of 0.8 are very large and rarely found. This is a strong correlation, so intelligence tests are OK and worth using. Move on.
However, test-retest correlations over 6 months for the Wechsler are usually about 0.9 so this reported correlation is tight, but not tight enough. Adolescence is a time of change (though probably not as much as early childhood) so something is going on.
The authors say: strong correlations over time disguise considerable individual variation; for example, a correlation coefficient of 0.7 (which is not unusual with verbal IQ) still leaves over 50% of the variation unexplained.
Call me picky, but they should have said “strong correlations include individual variation, because only with perfect correlation does individual variation disappear”. Nothing is being hidden. All scores contribute to the correlation statistic, even outliers. Earl Hunt accuses some researchers of being lawyerly rather than scholarly, and this is the way tricks are played: if you want to stress that IQ is OK, use correlation coefficients; if you want to stress that IQs are rubbish, use the one case with the biggest difference you can find. The same trick is played in the case of adoption and IQ: genetically inclined commentators reveal, truthfully, that years after adoption Black kid’s IQs correlated more strongly with their Black blood parents than with their White adoptive parents; environmentally inclined commentators reveal, truthfully, that the IQs of Black kids go up when White parents adopt them. Environmentalists champion the increase in IQ that Black children showed at age 7, because it was a big gain. They are less keen to reveal that by age 17, when the gain should have become even bigger (even more years for wealthy, middle-class White parents to pass on intellectual stimulation and good table manners to their Black adoptees) the gains had diminished, though not been entirely lost.
These selective presentations are very much like the “metric shift illusion” in which you can make a rare disorder seem common by saying how many sufferers there are in a large national population. Better to give the rate per 100,000 for all disorders, so that there is true comparability.
Incidentally, the results from this study are interesting: The results showed that changes in Verbal IQ were positively correlated with changes in grey matter density (and volume) in a region of the left motor cortex that is activated by the articulation of speech. Conversely, changes in Performance IQ were positively correlated with grey matter density in the anterior cerebellum (lobule IV), which is associated with motor movements of the hand. Post hoc tests that correlated structural change with change in each of the nine VIQ and PIQ subtest scores that were common in the WISC and WAIS assessments found that the neural marker for VIQ indexed constructs that were shared by all VIQ measures and that the neural marker for PIQ indexed constructs that were common to three of the four PIQ measures. This indicates that our VIQ and PIQ markers indexed skills that were not specific to individual subtests. There were no other grey or white matter effects that reached significance in a whole-brain structural analysis of VIQ, PIQ or FSIQ.
Later, they say: Specifically, 66% of the variance in VIQ at time 2 was accounted for by VIQ at time 1, a further 20% was accounted for by the change in grey matter density in the left motor speech region, with the remaining 14% unaccounted for. Similarly, 35% of the variance in PIQ at time 2 was accounted for by PIQ at time 1, with 13% accounted for by the change in grey matter density in the anterior cerebellum, leaving 52% unaccounted for. Future studies may be able to account for more of the between-subject variability by using a similar methodology with larger samples or other methodologies that measure structural or functional connectivity.
The attraction of this result for journalists is that it uses the minimum and maximum score difference statistic, which gives an inflated impression of variability. Even taking this study at face value, the mean difference between IQ scores is precisely 1 point. The standard deviation of the change scores is 9 points which is 0.6 sd. To give the authors their due, they are on the hunt for discrepancies, and want to link them with brain changes. For once these different scores are being explained by some real data, rather than mere surmise about error terms.
However, the journalist has taken a phenomenon caused by the developing brains of 33 adolescents and then used the minimum and maximum changes of outliers to disparage intelligence testing.
What do test-retest score look like across the whole lifespan?
Ian J. Deary and Caroline E. Brett Predicting and retrodicting intelligence between childhood and old age in the 6-Day Sample of the Scottish Mental Survey 1947 Intelligence Volume 50, May–June 2015, Pages 1–9
In studies of cognitive ageing it is useful and important to know how stable are the individual differences in cognitive ability from childhood to older age, and also to be able to estimate (retrodict) prior cognitive ability differences from those in older age. Here we contribute to these aims with new data from a follow-up study of the 6-Day Sample of the Scottish Mental Survey of 1947 (original N = 1208). The sample had cognitive, educational, social, and occupational data collected almost annually from age 11 to 27 years. Whereas previous long-term follow-up studies of the Scottish mental surveys are based upon group-administered cognitive tests at a mean age of 11 years, the present sample each had an individually-administered revised Binet test. We traced them for vital status in older age, and some agreed to take several mental tests at age 77 years (N = 131). The National Adult Reading Test at age 77 correlated .72 with the Terman–Merrill revision of the Binet Test at age 11. Adding the Moray House Test No. 12 score from age 11 and educational information took the multiple R to .81 between youth and older age. The equivalent multiple R for fluid general intelligence was .57. When the NART from age 77 was the independent variable (retrodictor) along with educational attainment, the multiple R with the Terman–Merrill IQ at age 11 was .75. No previous studies of the stability of intelligence from childhood to old age, or of the power of the NART to retrodict prior intelligence, have had individually-administered IQ data from youth. About two-thirds, at least, of the variation in verbal ability in old age can be captured by cognitive and educational information from youth. Non-verbal ability is less well predicted. A short test of pronunciation—the NART—and brief educational information can capture well over half of the variation in IQ scores obtained 66 years earlier.
In sum, IQ scores hold up well. Steady as she goes. The score you get at 11 will be very similar to the ones you get at 77. Similar, but probably not identical. You could go out and hunt for a couple of people who have lost and gained the most IQ points just for the perverse pleasure of it, but why concentrate on the biggest discrepancy you can find for one individual when you can give the results for all individuals in one summary statistic: the correlation coefficient? If the latter is too difficult, why not get a ruler and a sharp pencil and try drawing the best line through a scatter-plot?