Tuesday 24 February 2015

The Oscars for Intelligence

Last night the film business gave out prizes for actors, using a fallible voting system. Perhaps gross takings would be more valid, though more vulgar. The awards are always open to question, not only because of undue influence and general silliness, but because actors can only perform, and we do not have objective criteria to favour one over another.

At least 25 years ago I went to talk on a BBC radio program, You and Yours, and walked in to find that another of the interviewees was a famous actress, Jill Bennett, who had been married for 9 turbulent years to the playwright John Osborne, of Look Back in Anger fame. He did great things for the theatre and less good thing for his many women. After he had left her, Jill send him a lovely shirt for his birthday, with the inscription: “I want the contents back”. He did not return and she committed suicide in 1990. Anyway, there I was, facing this paragon of London theatre, the new thespian royalty, and I blurted out: “And what is your interview about?” She replied: “Putting bums on seats”.

Whilst we should probably measure actors by the number of bums they have put on seats, what harsh measure should we apply to intellectuals? For historical comparisons, In Human Accomplishment: The Pursuit of Excellence in the Arts and Sciences 800 B.C. to 1950 Charles Murray uses renown: the extent to which they are mentioned in encyclopaedias and scholarly references, and this works well. Some thinkers stand out: Galileo in Astronomy; Darwin in Biology; Newton and Einstein in Physics, Pasteur in Medicine; Euler in Maths. Oscars for all of them.

To a certain extent we can get a contemporary estimate of intellectual power by using a variant of chess rankings. The  beauty of chess for our purposes is that it is hard to play anyway, and even harder against a good opponent, and most of the time one person wins and another loses, with a few draws. There you have it: competitions generate a list of wins, and grandmasters play against grandmaster opponents who have themselves won most of their games. No quibbles: the champion meets the challenger, and the best player wins.

Instead of classical intelligence tests, can we conduct intelligence championships? For example, can we find the best chess players in the rankings, and note what characteristics they have in terms of other intellectual abilities? Chess rankings can be used to good effect to get real estimates of intelligence. For example, imagine a country where everyone is very strongly motivated to play chess, because it brings social and material advantages. The Government encourages chess playing across the nation, and encourages local, regional and national championships. Those chess players who do well get extra pay, housing, and a degree of freedom not allowed to other citizens. In such a setting, as in Soviet Russia from 1919 to almost the end of the century, who won the competitions?

You will see that La Griffe used chess as his first competition, and the Putnam maths competition as the second, and the winners are……Ashkenazis.


If life really is an intelligence test, then we should be able to get ability estimates from a broad range of behaviours, not just chess and maths, even if they are brief and interrupted segments of behaviour. A universal intelligence test should be capable of being applied anytime and anywhere:

Hernández-Orallo and Dowe (2010) Artificial Intelligence 174 1508–1539 Measuring universal intelligence: Towards an anytime intelligence test.

•The test should be able to measure the intelligence of any biological or artificial system that exists now or in the future.

•It should be able to evaluate both inept and brilliant systems as well as very slow to very fast systems.

•The test may be interrupted at any time, producing an approximation to the intelligence score, in such a way that the more time is left for the test, the better the assessment will be.

• It utilises the measurement of machine intelligence based on Kolmogorov complexity and universal distributions (a measure of the computational resources needed to specify an object, which were developed in the late 1990s (C-tests and compression-enhanced Turing tests).

But what if you don’t play chess, avoid maths competitions, and just like playing around with games on your computer? Can we get any ability estimates out of such a person, even if they won’t come in to be tested? So, has anyone tried to do this?

Han van der Maas and colleagues have made an excellent first step, developing a new computer adaptive intelligence test. They start with an item bank of over 500 maths problems, and then use an elegant technique derived from the Elo chess ranking system. In tennis and chess tournaments players are matched with opponents of same rating/ability: in adaptive testing ratings are estimated ‘on the fly’ and following the Elo system: “if I win my rating increases, the rating of my opponent decreases. If I win against a very good player my rating increases more”

What these researchers have done is to make the items compete with the persons. If you pass the item, the item loses and you win. Your score goes up, the item score goes down. Eventually each person is sorted against each item, and that can be done again and again. Beautiful.

Testing time is cut in half, repeated testing and practising on the test are allowed and encouraged, and the test can be used for a wide range of abilities. In fact, the old Binet test took exactly this approach, with fewer items, and a bright tester instead of a computer. This approach, disguised as a game, not only tests maths but the intelligence related subtests of: Proportional reasoning, Deductive reasoning, Number reasoning, Working memory and Perceptual reasoning.

To give you an idea of the reach of this technique: there are 120,000 active users in 1,400 schools, responding to 45,000 items at the rate of 1,200,000 items per day (yes, 1.2 million) cumulating in 400 million item responses over 5 years. As you will no doubt appreciate, this raises intelligence testing to a new level.

Have a look at the Powerpoint lecture, as presented to us at the ISIR conference in Graz by Han van der Maas


and then at the much more detailed paper which lets you see much further into the system.


If you are considering doing some research on large samples you might like to contact the team about doing some collaborative work with them. We haven’t yet sorted out Kolmogorov complexity and universal distributions, but if you know someone who is interested in this, get in touch.


  1. Dear Dr. Thompson,

    I'm looking into intelligence tests lately, and I'm wondering how one might construct a test which is largely unaffected by test-retest gains - in other words, a test that one can't practice for. I know from previous research that a significant portion of the Flynn Effect comes from simple test-taking savvy, and I'm wondering if you know what kinds of subtests or items show little in the way of test-retest gain (unlike Raven matrices), but at the same time have high g-loadings (unlike simple reaction time).

    1. Dear Mark, Very good question. Doing better with practice, with familiar rather than unfamiliar materials, is the way we learn and build our skills. So far as I know, you cannot avoid test-retest effects, you can only measure them and allow for them. If you assume that very few things you experience leave no trace in memory (see next post when I finish it) then you are probably on the right track. Wechsler tests used to give a six month break to avoid re-test gains. Now they concede the break should be one to two years (the longer times for performance type tests, if I remember correctly). So, I don't think you can avoid some gains on most material. Have a look at my post on g loadings for the Wechsler, and then you will have to dig up some retest data to check my impressions. There might be a golden sweet spot of high g, low retest gains for you to utilise, but I cannot call one to mind at the moment. Time for others to chip in? Will tweet the question.

  2. I haven't seen a widespread mention of this paper regarding verbal abilities.

    Schoolbook Simplification and Its Relation to the Decline in SAT-Verbal Scores, Donald P. Hayes, 1996

    How much difference would it make on IQ tests?

  3. http://www.indiana.edu/~educy520/sec6342/week_07/hayes96.pdf Fascinating. Texts have been getting simpler. See Woodley on that topic.It could make a difference in intelligence measures aimed at showing historical changes, but would not make any visible difference to tests normed on representative people each decade, which is the conventional approach. In sum, if our language has become impoverished IQ tests will not be the best way of showing that: historical comparisons of texts and examination papers would be better. Complicating factor is that literacy and higher education used to be available to a few, probably brighter individuals, and is now open to almost all above average students, so a pure comparison is difficult.

  4. Excellent post about the topic. Having read this I thought it was very informative. I appreciate you finding the time and energy to put this informative article together.