Some stories never die. They serve a purpose: to distract, explain away, assuage a fear or, in this particular case, to make us feel better about ourselves. It is a variant of the seductive story that the examiners did not mark your papers correctly, and that other examiners would have rated you more highly. This is always true, to some extent, because we can all shop around for an assessment which gives us more flattering results, like choosing the best photo and discarding the unfavourable depictions.
Last March I posted an item about “tests of rationality” being championed in a science magazine, which tried to generate interest by talking about “popular stupidity”. http://drjamesthompson.blogspot.co.uk/2013/03/popular-stupidy.html
Now in “The Psychologist”, a magazine published by The British Psychological Society, Keith Stanovich and Richard West have written an article “What intelligence tests miss” suggesting that intelligence tests neglect to measure “rationality”. They are trying to create a test of rationality using Kahneman and Tversky’s problems, together with others collected by the late lamented Robyn Dawes and subsequently brilliantly dissected by Gerd Gigerenzer. This latest escapade strikes me as the recycling of Gardner’s Multiple Intelligences, in the form of: Alternative Intelligences (Seriously and Rationally).
The hidden implication is that if you are smarting at a disappointing result on an intelligence test you might be better off taking a rationality test, which could give you a more accurate, or at the very least broader, assessment of your wide ranging mental skills, not to say your fundamental wisdom.
IQ has gained a bad reputation. In marketing terms it is a toxic brand: it immediately turns off half the population, who are brutally told that they are below average. That is a bad policy if you trying to win friends and influence people. There are several attacks on intelligence testing, but the frontal attack is that the tests are no good and best ignored, while the flanking attack is that the tests are too narrow, and leave out too much of the full panoply of human abilities.
The latter attack is always true, to some extent, because a one hour test cannot be expected to generate the complete picture which could be obtained over a week of testing on the full range of mental tasks. However, the surprising finding is that, hour for hour, intelligence testing is extraordinarily effective at predicting human futures, more so than any other assessment available so far. This is not entirely surprising when one realises that psychologists tried out at least 23 different mental tasks in the 1920s (including many we would find quaint today) and came to the conclusion that each additional test produced rapidly diminishing returns, such that 10 sub-tests were a reasonable cut-off point for an accurate measure of ability, and a key 4 sub-tests suffice for a reasonable estimate.
So, when a purveyor of an alternative intelligence test makes claims for their new assessment, they have something of a mountain to climb. After a century of development, intelligence testers have an armoury of approaches, methods and material they can bring to bear on the evaluation of abilities. New tests have to show that they can offer something over and above TAU (Testing As Usual). Years ago, this looked like being easy. There is still so much unexplained variance in ability that there was great confidence in the 60s that personality testing would add considerable explanatory power. Not so. Then tests of creativity were touted as the obvious route to a better understanding of ability. Not so. Then multiple intelligences, which psychology text books enthusiastically continue touting despite the paucity of supportive evidence. Not so. Then learning styles. Not so. More recently, emotional intelligence, produced partial results, but far less than anticipated. Same story for Sternberg’s practical intelligence. The list will continue, like types of diets. The Hydra of alternative, more sympathetic, more attuned to your special abilities, sparkling new tests keeps raising its many heads.
What all these innovators have to face is that about 50% of all mental skills can be accounted for by a common latent factor. This shows up again and again. For once psychology has found something which replicates!
The other hurdle is that nowadays there are very demanding legal requirements placed upon any test of intelligence. You have to have a proper representative sample of the nation, or nations, in which you wish to give the test. Nationally drawn up samples of 2000 to 2500 are required. Not only that, but you generally have to double sample minorities. You also have to show that the items are not biased against any group. This is difficult, because any large difference between the sexes or races is considered prima facie evidence of bias. Indeed, if there are pronounced, very specific differences between the mental abilities of the sexes or of racial groups, such findings have been discarded for the last 50 years, at least as far as intelligence testing is concerned.
The conceit of the new proposal is that rationality is a different mental attribute to problem solving in the broad sense. The argument is that IQ and results are poorly correlated (.20 to .35) in university students. To my eye, given the restriction of range (even at American universities which take in a broad range of intellects in the first year) this is not a bad finding. I say this because the authors do not yet have a rationality test. They seem to be correlating scores on a many-item IQ test with the scores on a few pass-fail rationality problems. This lumpiness in the rationality measure needs to be sorted out before we can say that the two concepts are independent.
In fact, when you read their 2009 paper it turns out that they did not give their subjects intelligence tests. They simply recorded what the students told them were their Scholastic Ability Test totals. I don’t wish to be too hard, since of course scholastic ability tests are largely determined by intelligence, but since the authors go on to talk about “what intelligence tests miss” I think they ought to say “what self-reported scholastic achievement tests score miss”. In fact, even that is wrong, because the word “miss” implies a fault in the original aim. So, what they should have called their later book is “some tasks don’t correlate very strongly with what university students self-report about their scholastic achievement tests scores”. As you will note, I am in favour of catchy titles.
That aside, the authors note that if the “rationality” task allows you to guide your choices by doing a calculation (deciding which of two trays of marbles has the highest probability of producing a black marble which gets a reward) then the correct choice is made by brighter students (SAT scores of 1174 versus 1137). This test provides only a pass/fail result, like so many of these “rationality” puzzles, so does not easily fit into psychometric analysis.
By now, dear readers, you will have worked out the main difference between intelligence test items and rationality puzzles. The former are worked upon again and again so that they are as straightforward and unambiguous as possible. If a putative intelligence item is misleading in any way it gets dropped. Misleading items introduce error variance and obscure the underlying results. Also, if particular groups are more likely to be mislead, then their lawyers can argue that the item is unfair to them. All those contested items do not make it to the final published test.
Rationality puzzles, on the other hand, can be as tricky as possible. They are not “upon oath”. If a particular symbol or word misleads, so much the better. If the construction draws the reader down the wrong path, or sets up an incorrect focus of attention, that is all part of the fun. Gigerenzer did some of the best work on this. He looked at the base rate problem beloved of previous investigators, and at all the difficulties caused by percentages with decimal points and all the rest of it, and then proposed a solution (this is unusual for psychologists). He tested his proposed solution (which was to show the problem in terms of natural frequencies, usually on a base of 1000 persons) and found that it got rid of virtually all of the “irrationality” problem. Much of the “irrationality” effect is due to the problem form not being unpackaged properly. This is not a trivial matter, but it is not an insuperable one. For example, consider the question which Stanovich and West give as an example of irrationality.
A bat and a ball cost $1.10 in total. The bat costs $1 more than the ball. How much does the ball cost?
Most people say 1o cents. This makes sense, because this is the usual way you calculate, in that if you spend $1.10 on a bat and a ball, and the bat costs $1.00 then the ball costs .10. It is unusual and somewhat bizarre to put in the concept “a certain amount more than another amount. The usual answer of 0.10 would be right in most circumstances. This is a special circumstance, and very unusual, in that the concept of “$1 more than” is being used in what appears to be a simple calculation. Respondents use the usual format, without noticing the subtle format change. This change means that you have to work out a sum for the bat and ball, so that when you take the cost of the ball from the bat you are left with exactly $1. It cannot be 10 cents, because if you take 10c from $1 you are left with 90c. So, in this case the ball must cost 5c so that when you take 5c from $1.05 you are left with exactly $1. It may strike you as a bit odd, and somewhat tricky and pedantic, and you would not be wrong in making this judgment.
In this particular case the question might be recast as follows.
A bat and a ball cost $1.10 in total. The bat costs $1 more than the ball, meaning that when you take the cost of the ball away from the cost of the bat you are left with exactly $1. How much does the ball cost?
Even the extra explanation might not do the trick, because the usual subtraction sum is uppermost in people’s mind, but they are not being irrational when they make the mistake. They fall for a trick, but they can learn the trick if they have to, or if it seems likely to be useful in the future. In my view the real world implications of this finding are almost zero, other than to highly how some subtleties and ambiguities lead us astray (and are best avoided in standard examinations). As a sideline, if an aircraft cockpit contains similar ambiguities, they can be lethal, and must be removed for safety reasons.
Similarly, as already discussed Dawes base rate problem disappears when you use natural frequencies. Gigerenzer likened it to being confused about the colour of a car seen under sodium floodlights at night in a car park . In the day time the usual colours were visible again. Strange problem formats (mathematical notation, symbolic logic notation, percentages which include decimal points, decimal points with many zeros, relative versus absolute risks, complicated visual displays in aeroplane cockpits, poorly set out controls in cars) impose an additional load on understanding. Most respondents take a short cut. As a rule of thumb, if you need lots of special training to operate a system, it is badly designed for humans.
The Stanovich and West test of rationality has yet to be constructed, let alone tested on the general population. To show that the test was worth giving it would be necessary to measure what additional benefits it provides over and above Testing As Usual. If the resultant Rationality Quotient proves to be very powerful in predicting human futures, then it can take over the lead position from intelligence testing. What is interesting to me is how much mileage they are getting out of attacking intelligence testing for “what it misses”. All they have done is compared SAT scores with replications of some rationality tests. Described more modestly, I would be on their side, and interested in the results of their replications. They distinguish between the results on different tests, which provides a version of an item analysis. However, they do not show that some tests are better predictors of real life achievements than the SAT scores reported by their university students. And, once again, university students are not the only people in the world, nor are they representative of the mental abilities of the general population. Stanovich and West’s rationality test seems to be a case of premature self-congratulation.
What can one say about a test which has yet to be created, tested, published and compared with established measures of mental ability? Frankly, it would be premature to say anything except: Good luck.