Sunday, 22 February 2015

Steady as she goes

There is little point in reading newspapers when they pretend to discuss serious subjects, but even so, one hopes for the best. Two days ago an article in a broadsheet British newspaper offered “Seven ways to appear more intelligent than other people”. Not reading newspapers was not one of them.

However, these stories show what sticks in the mind: the ideas which get repeated because they serve the function of disparaging intelligence test results. “Scientists once claimed that intelligence quotient (IQ) levels were hereditary. This meant that human beings had no control over their brain power; it was decided by their genes. However, recent studies have shown that IQ scores are barely linked to genes at all. They can also be extremely volatile, changing significantly - by up to 20 points - over time.”

Ignoring the pure ignorance, it was the claim about “volatility by up to 20 points” that caught my eye. No reference was given, but in the 1970s that was already a popular argument. In other fora I found a 2012 paper being quoted in support of that claim, so that seemed worth a look.

Sue Ramsden, Fiona M. Richardson, Goulven Josse, Michael S. C. Thomas, Caroline Ellis, Clare Shakeshaft, Mohamed L. Seghier & Cathy J. Price. (2012) Verbal and non-verbal intelligence changes in the teenage brain. doi:10.1038/nature10514

They tested 33 adolescents at 14 years and again at 18 years, each time giving them a Wechsler IQ test and an MRI. 33 adolescents are hardly a good representation of adolescents generally. Presumably the cost of MRI scans has limited the number, so the authors have tried to ensure that the people they selected represented the normal range. The same technique has often been used, for example in the early days of inspection time studies, when the testing procedure was very time consuming. Small, artificially representative samples were criticised then, and can be criticised now, but they are a start, though better for establishing whether a correlation exists than as a measure of normal variability.

The sample means were 112 (13.9) at first testing and 113 (14.0) at second testing, an overall gain of 1 IQ point.    There was what the authors call “a tight correlation across testing points (r = 0.79).

The wide range of abilities in our sample was confirmed as follows: FSIQ ranged from 77 to 135 at time 1 and from 87 to 143 at time 2, with averages of 112 and 113 at times 1 and 2, respectively, and a tight correlation across testing points (r = 0.79; P < 0.001).

At this point, you might want to stop reading. In social science research, correlations of 0.8 are very large and rarely found. This is a strong correlation, so intelligence tests are OK and worth using. Move on.

However, test-retest correlations over 6 months for the Wechsler are usually about 0.9 so this reported correlation is tight, but not tight enough. Adolescence is a time of change (though probably not as much as early childhood) so something is going on.

The authors say: strong correlations over time disguise considerable individual variation; for example, a correlation coefficient of 0.7 (which is not unusual with verbal IQ) still leaves over 50% of the variation unexplained. 

Call me picky, but they should have said “strong correlations include individual variation, because only with perfect correlation does individual variation disappear”. Nothing is being hidden. All scores contribute to the correlation statistic, even outliers. Earl Hunt accuses some researchers of being lawyerly rather than scholarly, and this is the way tricks are played: if you want to stress that IQ is OK, use correlation coefficients; if you want to stress that IQs are rubbish, use the one case with the biggest difference you can find. The same trick is played in the case of adoption and IQ: genetically inclined commentators reveal, truthfully, that years after adoption Black kid’s IQs correlated more strongly with their Black blood parents than with their White adoptive parents; environmentally inclined commentators reveal, truthfully, that the IQs of Black kids go up when White parents adopt them. Environmentalists champion the increase in IQ that Black children showed at age 7, because it was a big gain. They are less keen to reveal that by age 17, when the gain should have become even bigger (even more years for wealthy, middle-class White parents to pass on intellectual stimulation and good table manners to their Black adoptees) the gains had diminished, though not been entirely lost.

These selective presentations are very much like the “metric shift illusion” in which you can make a rare disorder seem common by saying how many sufferers there are in a large national population. Better to give the rate per 100,000 for all disorders, so that there is true comparability.

Incidentally, the results from this study are interesting: The results showed that changes in Verbal IQ were positively correlated with changes in grey matter density (and volume) in a region of the left motor cortex that is activated by the articulation of speech. Conversely, changes in Performance IQ were positively correlated with grey matter density in the anterior cerebellum (lobule IV), which is associated with motor movements of the hand. Post hoc tests that correlated structural change with change in each of the nine VIQ and PIQ subtest scores that were common in the WISC and WAIS assessments found that the neural marker for VIQ indexed constructs that were shared by all VIQ measures and that the neural marker for PIQ indexed constructs that were common to three of the four PIQ measures. This indicates that our VIQ and PIQ markers indexed skills that were not specific to individual subtests. There were no other grey or white matter effects that reached significance in a whole-brain structural analysis of VIQ, PIQ or FSIQ.

Later, they say: Specifically, 66% of the variance in VIQ at time 2 was accounted for by VIQ at time 1, a further 20% was accounted for by the change in grey matter density in the left motor speech region, with the remaining 14% unaccounted for. Similarly, 35% of the variance in PIQ at time 2 was accounted for by PIQ at time 1, with 13% accounted for by the change in grey matter density in the anterior cerebellum, leaving 52% unaccounted for. Future studies may be able to account for more of the between-subject variability by using a similar methodology with larger samples or other methodologies that measure structural or functional connectivity.

The attraction of this result for journalists is that it uses the minimum and maximum score difference statistic, which gives an inflated impression of variability. Even taking this study at face value,  the mean difference between IQ scores is precisely 1 point. The standard deviation of the change scores is 9 points which is 0.6 sd. To give the authors their due, they are on the hunt for discrepancies, and want to link them with brain changes. For once these different scores are being explained by some real data, rather than mere surmise about error terms.

However, the journalist has taken a phenomenon caused by the developing brains of 33 adolescents and then used the minimum and maximum changes of outliers to disparage intelligence testing.

What do test-retest score look like across the whole lifespan?

Ian J. Deary and Caroline E. Brett Predicting and retrodicting intelligence between childhood and old age in the 6-Day Sample of the Scottish Mental Survey 1947 Intelligence Volume 50, May–June 2015, Pages 1–9

In studies of cognitive ageing it is useful and important to know how stable are the individual differences in cognitive ability from childhood to older age, and also to be able to estimate (retrodict) prior cognitive ability differences from those in older age. Here we contribute to these aims with new data from a follow-up study of the 6-Day Sample of the Scottish Mental Survey of 1947 (original N = 1208). The sample had cognitive, educational, social, and occupational data collected almost annually from age 11 to 27 years. Whereas previous long-term follow-up studies of the Scottish mental surveys are based upon group-administered cognitive tests at a mean age of 11 years, the present sample each had an individually-administered revised Binet test. We traced them for vital status in older age, and some agreed to take several mental tests at age 77 years (N = 131). The National Adult Reading Test at age 77 correlated .72 with the Terman–Merrill revision of the Binet Test at age 11. Adding the Moray House Test No. 12 score from age 11 and educational information took the multiple R to .81 between youth and older age. The equivalent multiple R for fluid general intelligence was .57. When the NART from age 77 was the independent variable (retrodictor) along with educational attainment, the multiple R with the Terman–Merrill IQ at age 11 was .75. No previous studies of the stability of intelligence from childhood to old age, or of the power of the NART to retrodict prior intelligence, have had individually-administered IQ data from youth. About two-thirds, at least, of the variation in verbal ability in old age can be captured by cognitive and educational information from youth. Non-verbal ability is less well predicted. A short test of pronunciation—the NART—and brief educational information can capture well over half of the variation in IQ scores obtained 66 years earlier.

In sum, IQ scores hold up well. Steady as she goes. The score you get at 11 will be very similar to the ones you get at 77. Similar, but probably not identical. You could go out and hunt for a couple of people who have lost and gained the most IQ points just for the perverse pleasure of it, but why concentrate on the biggest discrepancy you can find for one individual when you can give the results for all individuals in one summary statistic: the correlation coefficient? If the latter is too difficult, why not get a ruler and a sharp pencil and try drawing the best line through a scatter-plot?


  1. When I write a comment, and then try to edit it, your software seems to misbehave. I am using the Safari browser on a Mac.

  2. Sorry about that. I find that if you Preview, then there is a problem, but if you change it yourself and then press Publish it usually works

    1. It's when I'm trying to edit a raw comment, not in preview, that I get the difficulty.

  3. The study is interesting, but far from complete. A larger study needs to be conducted in which the longitudinal gathering of IQ scores is accompanied by an investigation of what occurred in the lives of participants between their first and second tests, to discover what possible factors can be identified. Lots of things happen to children between the age of 14 and 17. Family break-up is known to have a big effect on adolescent's academic performance, motivation and mood. Some children start drinking and taking drugs at around that time. Illness is also a potential factor. Physiological tests should be conducted and a health history should be collected. Fourteen is also the age at which children are streamed into "foundation" and "gcse" streams, and choose subjects which may involve a lot of either mathematical reasoning, or verbal reasoning, or neither. Testosterone levels should be measured, as these are changing during adolescence, and have been observed to affect IQ. Another variable that should be measured is height. Perhaps those who experienced a rise or fall in IQ simultaneously experienced a rise in "height quotient" (i.e., they grew faster or slower than their age peers). This could point to relative lateness or earliness of adolescence as a factor. Intense study, or the lack of it, for a couple of years, in subjects where grades correlate with IQ scores, ought to have some effect on test performance, though findings in the literature indicate that these effects are likely to after a while. Therefore, to complete the study, there should be another follow-up a few years later, in which the participants are tested again to discover to what extent, if at all, the changes are stable. Again, potentially interesting factors and correlates should be looked at, such as whether and what sort of higher or further education, and what sort of career, has been pursued in the intervening time.

    1. I mean, "foundation" and "higher" GCSE streams.

    2. I agree that these would be very important additional variables. I suppose the authors could say that the brain changes they have revealed are the most important things to have measured. On you final point, I agree that measuring them a few years later would be salutary. Perhaps that is intended and is underway.

  4. this "% of variation explained" business came up in your blog. Men Hu wrote an excellent blogpost a while back on r-squared versus r on the statistic of interest, arguing that it is actually the latter and not the former that is of interest to the research worker, because the straightforward correlation coefficient maps in a linear and easily understandable manner to real-world effects, whereas the r-squared does not. It's well worth reading and includes a particularly good quote from Hunter and Schmidt

  5. Dear Andrew, Ouch. I admit I have generally fallen for the R2 argument, whilst at the same time knowing that if you show the data as frequencies in, say quintiles, then they are obviously very impressive, as Charles Murray did in the Bell Curve. I will look at all this again. I found one quote I liked, and copy it out here to check whether it was the one you had in mind: In fact, a validity coefficient of .40 has 40% of the practical value to an employer of a validity coefficient of 1.00 — perfect validity (Schmidt & Hunter, 1998; Schmidt, Hunter, McKenzie, & Muldrow, 1979).

  6. I like brain studies. The problem being that there are too many of them, each having very low sample sizes. So, at some point, I stopped reading them. I prefer to read others peoples' reviews of those studies. It saves me the time to read them all. I appreciate you look at some of them.

    On IQ changes, there is a study that deserves to be cited more often, but that I rarely see (except in articles written by Nathan Brody and Arthur Jensen).

    Moffitt, T. E., Caspi, A., Harkness, A. R., & Silva, P. A. (1993). The Natural History of Change to Intellectual Performance: Who Changes? How Much? Is it Meaningful?. Journal of Child Psychology and Psychiatry, 34(4), 455-506.

    Here's the summary of Moffitt (1993) by Brody (2007) :

    In principle, environmental variations associated with educational deprivations or educational interventions could result in cumulative changes in IQ. The data do not support this outcome. Rather, the results reviewed above suggest that relatively dramatic changes in the environment have vanishingly small influences on general intelligence in the long run, although they may have large short-term effects. These results, combined with evidence for stability of IQ, suggest that environmental variations commonly encountered do not have enduring influences on cognitive ability - g is a relatively resilient trait whose short-term perturbations are accompanied by a tendency for its phenotypic manifestations to revert to an enduring stable value, manifested initially in early childhood or infancy. This conclusion is buttressed by the results of a longitudinal analysis of changes in IQ reported by Moffitt, Caspi, Harkness, and Silva (1993), who administered the Wechsler Intelligence Scale for Children (WISC) test to a representative sample of children when they were age 7, 9, 11, and 13. They obtained test-retest correlations varying between .74 and .84, and concluded that for close to 90% of the children in their sample, variations in IQ over this period were small and attributable to random errors of measurement. They also found a subset of children who exhibited larger changes in IQ over this period. They identified 37 different environmental measures that might be related to changes in IQ, including socioeconomic status, changes in family composition, and such biological influences as impaired vision or perinatal problems. They found that this set of environmental variables was not associated with changes in IQ.

    In other words, it's difficult to understand what's causing IQ changes. The same thing applies to the Flynn effect.

    Brody, N. (2007). Heritability and the nomological network of g. In M. J. Roberts (Ed.), Integrating the mind: Domain general versus domain specific processes in higher cognition (pp. 427-448). Hove, UK: Psychology Press.