Tuesday 18 November 2014

Have backward digits sunk Flynn?


Repeating digits forwards is easy, and weakly predictive (.46) of general intelligence. Repeating digits backwards is harder, and more strongly predictive (.58) of general intelligence. Reliabilities are good if you give at least two trials for each digit string length. The task produces scores on a real, ratio scale, with a true zero, and thus is unusual in psychometrics in providing absolute results.


So, if there really is a Flynn effect, have digit spans have increased over the last century, particularly digits backwards, the better test of intelligence? “No” says Gilles Gignac from the bright blue skies of Perth, Australia. Not a glimmer of intellectual improvement since 1923. All this is as I had grimly expected. We shall all come to no good, just you mark my words.

Of course, perhaps digits backwards, demanding as they are, do not catch the full subtlety of Similarities or Vocabulary, or even Ravens Matrices. Let us dig around a little in this pre-publication paper, accepted by Intelligence.


In a careful approach, Gignac has gone back to the raw scores for longest digit spans forwards and backwards in the Wechsler intelligence test Digit Span subtest.




Gignac observes that if the Flynn effect is not acting on g and is not acting on short-term memory capacity, then it is hard to see that it is really acting on a broad range of fluid intelligence skills over time.

Turning the screw, Gignac points out that at the beginning of the century few people had to remember telephone numbers, but now we are inundated with long mobile phone numbers and login codes and the like, so there is a strong cultural reason for digit spans to have increased, but they have not.

He considers carefully the various explanations and details for the findings which might temper his conclusions, but in the end he clearly feels that it is very hard to explain how the Flynn effect, derived from standardised scores, can be real when it does not show up on actual raw scores of short-term memory.


  1. Very interesting, especially as (at least my impression is that) the Flynn effect has proven resistant to challenge. I get the impression you like simplicity in testing, but is this test really sufficiently demanding to correlate highly with g, eg are the correlations you report confirmed by repeated testing? And can one have more confidence in this test, than others where a Flynn efffect is evident?

  2. The Flynn/Lynn effect has remained a puzzle, because the apparent rise in intelligence is not matched by other everyday observations. The interest of this paper is that it deals with an absolute measure, the number of digits recalled, rather than the relative and standardized measures in other subtests. Reaction time is another absolute measure. The author does go through the arguments about whether this test is powerful enough to detect improvements in mental ability, and I agree with him that it ought to show something. This negative result is instructive, though not conclusive. The correlations I report arise from the Wechsler data, and look pretty solid.

  3. Digit span has simply always been pretty meaningless as a correlate to g. Always curious to see a psychometric instrument called "reliable" when after training essentially random people can perform at dozens of standard deviations above untrained people. I don't know why it's so hard to get through to people on understanding that, when it should be known better. Furthermore this has been known since papers published in the 1950s, if not earlier.

    The effect is way too large, such that any researcher has to be concerned about the cultural and environmental factors that might passively train or prime people for the task if outliers or unusual results are observed, as well as worry about incentives and effort for subjects in studies. And with meta-analyses of historical studies we don't have many proper controls on the experiments themselves anyway.

    So, learn the difference between bits of memory and a multiplexer. Or rather, in some sense remember it because people in the field of psychology even knew this in the past. It's friggin Miller's law. Most PhDs in neuroscience haven't figured out this issue either, they're helping drag the boundaries of scientific understanding backwards, but still.

    1. Same anonymous: having loaded Gignac's paper I should say the presentation therein is not as sensationalist or one-sided and provides plenty of reference to prior work, I realize I may have harshly been criticizing the blog post but that doesn't carry over to said preprint.

      Still his paper is at best spelling things out in baby steps of least publishable units or less charitably is unaware of follow-up conclusions. No relevant mentions of information, Shannon entropy, physical underpinnings that lead into cross disciplinary work, anything like that. Roughly speaking this evidence only reinforces short-term memory having some underlying physiological limitations that are practically the same for all humans since the African savanna, and only unaccounted for cultural training and cognitive shortcuts giving individuals with different g an appearance of variation in testing. That's not even getting to the interesting hypotheses which should be explored on topics and given mentions, not limited to digit span, but with plausible associations and consequences, such as auditory or visual encoding of information as experienced consciously by subjects.

  4. This study is based on tests' scores and we know that these scores might not capture the relevant variance. Jensen explains why in 'The g factor'. Talking properly about g requires going beyond tests' scores. The g factor is based on the inter-correlations among tests' scores. Gilles did an interesting job, but he neglected a crucial point: the changes in the g loading of FDS and BDS across generations. Without this info, it is really difficult to evaluate the implication of the main conclusion of his report. I compared the g loading for Wechsler's digit span in Spain (between 1970 and 1992) and this is the result: 1970 (.56), 1992 (.31). Therefore, there is a remarkable reduction, meaning that solving the test is less cognitively complex in 1992 than in 1970. This is quite consistent with an improvement in intelligence. Best, Roberto

    1. Michael A. Woodley of Menie26 November 2014 at 17:19

      Decreasing factor loadings are often associated with Flynn effects on abilities, however these are not perfectly negatively correlated (not even close, see Woodley & Madison, 2013). If a subtest is loosing its g-loading to a less heritable and correspondingly more environmentally plastic source of ability variance, then we would expect the decrease in loading to accompany secular gains. If however an ability (like BDS) is loosing it g-loading to another, highly heritable, but nominally independent cognitive system, such as attention and executive functioning, then we would get an apparent decrease in the cognitive complexity of the ability (as evidenced by decreasing g-loading), but no apparent secular gain on that ability. Therefore decreasing factor loadings are likely necessary, but not sufficient for certain manifestations of the Flynn effect (i.e. those that result from actual environmental improvements of one sort or another rather than increasing test-wiseness, for example). There is an additional interaction with heritability, which determines whether the decrease in the factor loading actually leverages secular gains. This could be tested quite easily with the right data.


      Woodley, M.A., & Madison, G. (2013). Establishing an association between the Flynn effect and ability differentiation. Personality and Individual Differences, 55, 387-390.

    2. Thanks. I still have a more generalized concern about saying that the Wechsler is an intelligence test (which works as a strong predictor of outcomes) but that it has no more than a .27 g loading. I cannot reconcile these findings.

  5. @James - I think that when one cognitive test raw score stays the same in a context when almost-all raw scores are rising is likely to be significant; indeed, if it is assumed that the Flynn effect is essentially inflationary (and not caused by rising g) - http://iqpersonalitygenius.blogspot.co.uk/2014/06/could-flynn-effect-be-non-valid-yes-if.html - then it may be that a score failing to rise is evidence of decline in the real underlying cognitive ability.

  6. I attach importance to scores being on absolute scales wherever possible, and particularly when dealing with cross generational comparisons. I do not discount the matter of whether a change is "on g" or not, so I am sketching out a fuller reply to all the points raised so far.

  7. actually we have passed peak number/password memorization requirement. My phone remembers my numbers and my computer my passwords.