Tuesday 4 March 2014

Digit Span: the modest little bombshell

Digit Span must be one of the simplest tests ever devised. The examiner says a short string of digits at the rate of one digit a second in a monotone voice, and then the examinee repeats them. The examiner then tries a string which is one digit longer, and continues in this fashion with longer and longer strings of digits until the examinee fails both trials at that particular length. That determines the number of digits forwards.

Then the examiner explains that he will say a string of digits and the examinee has to repeat them backwards, that is, in reverse order. For example, 3 – 7 is to be said back to the examiner as 7 –3. This continues until the examinee fails two trials at a particular length which determines the number of digits backwards.

I hope you will agree that this is a simple test, easy to understand, and largely bereft of any intellectual content. All you need is: to know the names of single digits, and to understand the simple instructions and examples given so that you repeat the digits forwards, and in the later version of the test, backwards. In particular, if you can do digits forwards you reveal you know your digits and have some memory, and if you can do a short string backwards you reveal that you have some memory and you understand the idea of repeating digits backwards.

The test is not only bereft of intellectual content, but is also low on cultural content. Once you have learnt digit names you are ready to do the test. I assume that forwards and backwards are concepts understood by all cultures worthy of the name.

Initially, test constructors regarded the test as an optional extra, because test-retest reliabilities were low. Arthur Jensen pointed out that this was simply because not enough trials were used. Once extra trials are provided, Digit Span becomes a good measure of general intelligence, correlating with g at 0.71.  Of course, Wechsler being Wechsler, they have also included some new tasks in Digit Span, in which digits are read to the examinee and have to be remembered back in order of magnitude, but we can leave that out for the time being, since it does not affect the central comparison between digits forwards and backwards.

How does digit backwards have this profound effect? Short term memory is just an auditory store. Most of the intellectual demand comes from digits backwards. That simple little task of remembering the forward sequence, and then keeping it in mind while reading off the sequence in reverse order taxes the mind. Digit backwards spans are usually at least a digit shorter than digits forwards. If someone can remember 7 digits forward (the average adult score) but only 6 backwards (the average adult core), that is a 14% reduction in memory capacity. (At age 11 for white kids the reduction is 23% and for black kids 30%, as shown below).  Digits forwards are related to g, but digits backwards are even more loaded on g.

How does this finding relate to the vexed question of group differences? Well, it is hard to give a plausible cultural explanation for the effect, unless you stretch the concept of culture to absurd lengths. Could there really be a culture in which there are numbers but no reversible operations? Even if there were a culture or putative sub-culture in which using numbers was discouraged, it should affect all digit tasks, not just digits backwards. (What name would one give to a culture in which number use is discouraged?)

If any group defined in genetic or cultural terms has a particular difficulty with digits backwards this is a strong indicator that they have difficulty with tasks as they get more intellectually demanding. The higher the g loading the more they should differ from brighter groups.

Hence the great interest in the most recent scores, to see if they conform to the usual pattern described by Jensen in the G factor (p. 405, referring to work he did in 1975 with Figueroa, ref on p 614). Over at Human Varieties, Dalliard has tried to replicate those results using data from CNLSY (these are the children of the female participants in NLSY79). Incidentally, this is a great follow-up survey: “My Mummy did your tests before I was born”. Gradually we are getting to understand the transmission of intelligence through the generations.

h-b-w ds results

The chart shows the increase in digit span with increasing age, and the nature of the gap between digits forwards and backwards in the different groups. This is clearer in the second table, which shows the gaps as Cohen’s d

CNLSY digit span racial:ethnic gaps

Incidentally, the fact that Hispanics have a slightly lower digit forwards score than whites and blacks but reasonable digits backwards slightly reduces their gap between the two conditions.

Dalliard says: “That the black-white gap on forward digits is substantially smaller than on backwards digits is a robust finding confirmed in this new analysis. This poses a challenge to the argument that racial differences in exposure to the kinds of information that are needed in cognitive tests cause the black-white test score gap. The informational demands of the digit span tests are minimal, as only the knowledge of numbers from 1 to 9 is required. Forward digits is a simple memory test assessing the ability to store information and immediately recall it. The informational demands of backwards digits are the same as those of forward digits, but the requirement that the digits be repeated in the reverse order means that it is not simply a memory test but one that also requires mental transformation or manipulation of the information presented.”

It is good to have a replication of a well-established and informative finding. However, Dalliard has pushed the analysis further, with a factorial study which suggests that black kids have a slight short term memory advantage which is enough to overcome the g demands of digits forwards, but not enough to cope with the higher g demands of digits backwards. This is a new finding which could lead to further studies.

Read the whole thing here http://humanvarieties.org/2013/12/21/racial-differences-on-digit-span-tests/

Finally, the really engaging feature of digit span from a psychometric point of view is that it is a true scale with a true zero. If you cannot remember any digits, your score is zero and that corresponds to zero digits. If you can remember 4 or 5 or 6 or 7 digits those are real scores, and the intervals between them are identical. So, for purists, this is an interval scale with a true zero like the Kelvin scale, where 0 Kelvin is absolute zero. Nothing is colder than that. Age in years is also a true scale.

At this point, it would be normal to explain what psychologist S S Stevens called it in his 1946 proposed typology in Science. Why on earth should I do that? You already understand the notion of a true scale with a true zero, where the intervals are truly each as big as each other. What more do you need to know? If someone says that IQ isn’t a real measure because “a quotient is all relative” please tell them a thing or two about digit span.

Ratio. I didn’t want you to waste time looking it up.


  1. "What name would one give to a culture in which number use is discouraged?"

    The tea party?

    1. Ho ho. Very satirical.

    2. "Satirical" and nonsensical since it's tea partiers most likely to understand that overspending ( and that spending for that which shows little to no lasting return) are dangerous habits.

    3. Hey Pesta,
      Talking about bombshell, your study was also a mini-bombshell, right ?
      Honestly, if you were planning to redo such analyses, that would be good news. Reaction time is a purer measure of intelligence that most psychometric tests. And I believe we need more studies like the one you did with your colleague.

      (p.s.: also, if my memory is correct, measurement errors and strategies constitute a big threat to RT-IQ correlation, and apparently, most studies can't remove these artifacts fully. What I mean, by this, is that your number of 50% mediation of BW IQ gap by RT is probably lower-bound estimate.)

    4. Anon. I was just kidding / it seemed like a softball lobbed over the middle of the plate...

      Meng. Thanks for your comments. I really appreciate that someone read my stuff. I've since been focused on State IQs, because the data are easy to collect, yet the results seem compelling.

      My newest shows rather strong correlations re race and other things:


  2. Dalliard did a good job here. But he has forgotten to mention the most salient hereditarian argument. Re-read Jensen (1998 p370) here.

    Several studies showed, in every age group, that the W-B difference on the FDS test is smaller (usually by about 0.5σ) than on the BDS test. Also, when black and white groups were matched on mental age (thus the blacks were chronologically older than the whites), the black and white means did not differ, either on FDS or on BDS. These results are not easily explained in terms of a qualitative cultural difference or some motivational factor. Rather, the results are most parsimoniously explained in terms of a difference in the black and white rates of development of whatever abilities enter into FDS and BDS. BDS obviously makes a greater demand on mental manipulation of the input in order to produce the correct output than does FDS. Hence BDS can be characterized as a more complex cognitive task than FDS. Further, a factor analysis of FDS and BDS scores obtained at five grade levels clearly showed (in separate analyses for blacks and whites) that two distinct factors are reflected in these tests, with the most salient loadings of FDS and of BDS found on different factors.

    The interpretation of these digit span factors is elucidated by observing the correlations of FDS and of BDS with the WISC-R Full Scale IQ and the Raven IQ. For both tests, the correlation between BDS and IQ is almost twice as large as the correlation between FDS and IQ. A factor analysis of FDS and BDS among a large battery of other tests showed that while both FDS and BDS are loaded on a memory factor and on g, BDS has the much larger g loading.

    Although these findings on the interaction of the W-B difference with FDS and BDS are interesting in their own right, their broader theoretical significance did not strike me fully until a short time later, as I was rereading (after about twenty years) Spearman’s major work, The Abilities of Man. I came upon a brief passage that had not previously caught my attention enough to have been remembered - undoubtedly because I had not before had in mind the question to which the passage was finally to prove so germane. Following a reference to an early study by psychologists at the University of Indiana that compared 2,000 white and 120 black schoolchildren on a battery of ten diverse tests, Spearman (p. 379) noted that the blacks scored, on average, about two years behind the whites in mental age, but the amount of the W-B difference varied across the ten tests, and was most marked in those tests that were known to be most saturated with g.

    When you look at the tables above, you see indeed that the BDS score of blacks aged 9 or 11 is nearly equal to that of whites aged 7 or 9. But the FDS still show some difference on the advantage, interestingly, of blacks. That, in itself, is interesting if you remember Jensen Level I-Level II theory, in which level 1 refers to rote memory and 2 to mental transformation (or g).

    The only question that remains is whether or not the items on FDS and BDS are psychometrically biased (i.e., if you find systematic one-sided DIFs, which is the only condition where DIF can be equated with bias, and which is something most people don't necessarily know about). You can test that using logistic regression, and IRT, probably two of the best methods available actually. I cannot perform IRT by now, but LR is well within my ability. I will do that in the near future.

  3. Well, it is hard to give a plausible cultural explanation for the effect, unless you stretch the concept of culture to absurd lengths. Could there really be a culture in which there are numbers but no reversible operations?

    Do all psychometrists have Asperger's?

  4. Please, can someone tell me what is the average score for smoeone aged 14?

    1. The short answer is that an average 14 year old would be expected to remember 5 digits forwards and 4 backwards. The adult average is more like 7 forwards and 6 backwards.

    2. My best was 21 backwards at that age in an official testing environment.

  5. any clear/specific interpretation in scoring digit span test? like score 2-4 digit indicates poor cognitive or somehing? and where can I get that?

  6. Just because most people fall within 7+/-2 doesn't mean 7 is the average. See for example the table at the bottom of this page: http://www.v-weiss.de/publ9-e.html
    I think this is similar to the WAIS norm, but I don't have a link for that.

  7. 1) its working mem, not auditory mem, working mem involves a diverse network of brain regions including frontal/pre-frontal regions, and is highly practiced in educational contexts during the early part of life (arithmetic etc.)
    2) just because something seems simple to you, doesn't mean that it is (this is actually the basis for many neuropsych tests)
    3) the n's differed across each sample, so these kids were sampled from different populations, it is not clear whether early samples were retained or was it new kids each time (given the size of the white samples, if its the former then practice effects would have skewed the results over time)
    4) there is a massive difference between the n's across racial groups suggesting that for the minorities, recruitment was a problem. This could be for many reasons, some of which may related to poor performance, such as parental resources etc.
    5) The means of the minorities, across all ages fall within 1 sd of the total sample, so the sig findings are a function of sample size rather than reflecting a meaningful difference.

  8. 4) there is a massive difference between the n's across racial groups suggesting that for the minorities, recruitment was a problem. This could be for many reasons, some of which may related to poor performance, such as parental resources etc.
    5) The means of the minorities, across all ages fall within 1 sd of the total sample, so the sig findings are a function of sample size rather than reflecting a meaningful difference.

    I rarely use the term "gorgeous", Dr. Thompson, but some of the comments beg for it.