Tuesday 30 July 2013

Editorial: Intelligence: Special Issue: Flynn Effect




Cohorts differ in their abilities, and we do not know why. Wines also differ, with some years being better than others, some famously so. Choice generations sparkle like champagne, with bright bubbles of innovation and discovery followed by longer periods of duller harvests and flatter discourse. These differences in comparative brilliance are rightly a matter of interest and speculation, particularly when it seems that, on the basis of IQ test results alone, throughout much of the 20th century everyone was getting brighter. This is not impossible. Better techniques of viniculture have improved many a new entrant to the global wine trade, but if humans are improving so much, it would be nice to know how this is being achieved.

The term “Flynn effect” was coined by Herrnstein & Murray (1994, The Bell Curve, p. 307) to designate the increases in IQs during the twentieth century that were documented for the United States and for a number of other countries by Flynn (1984, 1987). Herrnstein and Murray were explicit that the phenomenon was observable in the 1930s but that Flynn had drawn attention to it. Jim Flynn was not able to prevent this ennoblement, which was itself a clear instance of the Herrnstein and Murray effect, in which a finding is attributed to someone who was not the first to find it. I doubt they were the first to do this, since contested parentage is generally true of all effects named after a single person.

Of course, whatever the name of the effect, we must distinguish between IQ gains and IQ inflation. Large numbers impress, and rising apparent wealth often comforts even those faced with the reality of rising inflation. Many national examinations report higher pass rates every year. Students may be getting brighter or examiners may be getting kinder. Exam results are silent on this issue, unless we can find objective measures. Intelligence tests are great for ranking individuals within a contemporary population, and have excellent predictive value, but they are not particularly good at comparing generations. They weren't designed to do that, though it might be achieved by lengthening those parts of the test which can be defined in terms of some intrinsic measure of difficulty, perhaps in mathematics.

What made the Flynn effect notable was its apparently relentless and steady progress. Like the application of nitrogen fertiliser to crops, something in the decades after the Second World War seemed to be boosting IQ scores at the predictable rate of about 3 points per decade. All that was lacking was to find the miracle ingredient. Many causes were proposed, yet few came close to fitting the data. In some ways the early results made more sense. It seemed as if duller citizens were being boosted into the average range: yet another example of the benefits of humane and kindly welfare states. Then more results came in suggesting that average and above average citizens were also getting the benefits of the mysterious ingredients. All were getting prizes. By the turn of the century the effect seemed to be coming to a halt in some wealthy countries, whilst starting off in poorer countries with emerging economies.

The results have always prompted a cynical response: if everyone is getting brighter, why are so many people still behaving stupidly? Where are all the geniuses? Where are all the new inventions and breakthroughs in understanding which would match the IQ results? Recursively, if we are as bright as the Flynn effect suggests, why can't we get to the bottom of the Flynn effect?

This special issue attempts to push on the debate, whilst being aware that it cannot resolve it.

Robert Williams gives an overview of the putative main drivers of the effect. He notes that far from being uniform, “the gains have been large, small, variable and even negative”. Some researchers have found that the gains were on g loaded items, whilst more have found no g loadings, suggesting that the gains may be empty. The rate of gain varies by the dates chosen for study. Data from behind the former Iron Curtain countries are particularly informative, since there have been real changes in education and in society. A feature of the literature is the frequent contradictions: for every finding there seems to be an opposite finding. As to causes: education may be a driver of change in emerging nations, but probably no longer in wealthy nations; test sophistication is no longer much of an issue; guessing answers may boost gains somewhat; nutrition may boost gains, particularly at the lower levels; nutrition may be boosting both height and intelligence, but not in the same way at the same times; measurement invariance strongly suggests that the meaning of IQ is not constant over time. Williams says: “It is likely that most of the Flynn Effect gains that have been reported are hollow”. However, he also says that the effect remains enigmatic because there are varying combinations of multiple drivers, and methodological problems are confounded with real world issues.

Richard Lynn looks back at the pre-history of the Flynn effect, finding early studies showing that it ante-dates the Second World War and that these early reports showed that the Flynn effect was fully present in pre-school children, did not increase during the school age years, and was greater for non-verbal abilities than for verbal abilities. He suggests that only increases in nutrition are the likely long term causes of the effect.

William Shiu; Alexander Beaujean; Olev Must; Jan te Nijenhuis; and Aasa Must use item response theory to delve into the Estonian version of the Yerkes 1919 National Intelligence Test given in Estonia in 1934 and again in 2006 and find that, using only the invariant (stable) items there was a Flynn effect on all but one subtest. There was much variability in the strength of the effect, ranging from an effect size of 0.24 (3.60 IQ points) to 1.05 (15.75 IQ points). There was a decrease in variability across time for all subtests, although only two showed a large decrease. Overall, this suggests a real Flynn effect in this country.

Olev Must and Aasa Must continue the Estonian story by looking at guessing behaviour, and find that in some subtests of the Estonian National Intelligence test over the same period 1934 to 2006, adjustments for false-positive answers reduced the rise in test scores. Rapid guessing has risen over time and influenced test scores more strongly over the years. The FE is partly explained by changes in test-taking behaviour over time.

Jakob Pietschnig, Ulrich Tran and Martin Voracek use a different data set, the vocabulary test taken by German speaking psychiatric patients in Vienna in the 17 years between 1978 and 1994 and find that both classical test theory and item response theory indicate a Flynn effect. They also find that the Flynn effect is due to decreasing IQ variability (seen in quite a few data sets) and that increased guessing behaviour may conceivably play an additional role for IQ gains.

Jan te Nijenhuis and Henk van der Flier conduct a psychometric meta-analysis based on a large totalN = 16,663 and show that after corrections for several statistical artefacts there is an estimated true correlation of − .38 between g loadings of tests and secular score gains. This suggests that the Flynn effect is not on g. Notably, all the variance between the studies is explained by four statistical artefacts, namely sampling error, reliability of the g vector, reliability of the d vector, and restriction of range. Moderator variables were not found in these studies, but might conceivably be found in further studies.

Gerhard Meisenberg and Michael Woodley look at TIMSS and PISA results to assess international scholastic achievements at age 15 and find that lower scoring countries are gaining on higher scoring countries, suggesting on-going Flynn effects in lower-scoring countries. They point out that the closing of this gap, whilst welcome, suggests that there are biological limits to human intelligence. These limits are being approached in (most of) the higher scoring countries, where achievements are stagnating, but not yet in (most of) the lower scoring countries where achievements are rising. Therefore the kinds of environmental improvement that have fuelled Flynn effects in the recent past are predicted to show diminishing returns in the high-scoring but not the low-scoring countries. On PISA, on current trends the differences between high-scoring and low-scoring countries will converge in only 40 years, whilst on the maths and science TIMSS test, complete convergence would result after 341 years. Convergence is not guaranteed.

Edward Dutton and Richard Lynn have looked at Finnish recruits between 1988 and 2008 and found that intelligence test results for Shapes rose and then dropped very slightly, whilst both Words and Numbers showed early gains but subsequently have fallen more significantly. The end result is that from about 1997 there is a “negative” Flynn effect. It is hard to explain this drop, but dysgenic fertility is possibly part of the picture.

Heiner Rindermann and James Thompson (final editing by Doug Detterman) have looked at the NAEP data in the U.S. from 1970 to 2008 and find that the Flynn effect continues and that racial gaps have closed to some degree. However, the effects are smallest at precisely the ages which matter most, namely among 17 year olds about to enter the workplace or further education. Again, results show less variance. Students are becoming homogenous. The benefits of gap closing policies are mitigated by demographic changes.

Michael Woodley, Aurelio Figueredo, Sacha Brown, and Kari Ross do not lack ambition. They have proposed a larger theory within which the Flynn effect sits, namely that slower life histories lead to more specialised cognitive skills, and these in turn are more weakly integrated. They have established a measure of life history speed, and find correlations with some measures of intellect which are small but intriguing, and may lead to further investigation and replication.

Michael Woodley, Jan te Nijenhuis and Reagan Murphy have stepped outside the usual IQ findings and have looked at reaction time, one of the basal correlates of intelligence. If intelligence really has been rising for over a century, then one would expect reactions to stimuli to have speeded up. However, their meta-analysis of simple reaction times since 1884 finds that contemporary reaction times are slower. This is a puzzling result, and although there are always issues about early instrumentation, it is hard to see how those instruments would have been speedier to record responses. They suggest that a portion of the slowing down might be due to dysgenic effects.

In deference to ostensive definition and with gratitude for his own contributions to the field, I have invited Jim Flynn to have the final word at the close of the special issue, and give his reflections on the effect which now bears his name.

Copyright © 2013 Elsevier Inc. All rights reserved.


  1. endre bakken stovner30 July 2013 at 19:49

    thanks for the entertaining write-up. can't wait to read the woodley, figueredo et al piece - sounds like a completely new theory, something which seems rare in psychometrics (which probably just means mean much of the science is mature)

  2. I will try to keep posting on the special issue, at least giving abstracts and highlights

  3. It's famous - in spite of which it may well be true - that the Dutch have leapt in height in the last generation (or even two?). So it might be particularly interesting to know how their IQ has done. Why settle for one unexplained phenomenon when you can have two?

  4. I have no particular critical comment to make, and I generally agree. But about the Gerhard Meisenberg & Michael Woodley paper, the authors said that there were some serious limitations, and that's why I tend to be very careful. But most important, is the evidence of lack in measurement invariance with regard to the PISA (link). Thus, I expect the same thing with the TIMSS. But if so, this means that comparison across nations becomes somewhat difficult.