Thursday 20 June 2013

Coming soon: The Flynn Effect evaluated


Few of you may remember the entertainment formerly known as the Cinema, but in that art form the audience sat eating popcorn and while waiting for the main feature to begin were entertained by seductive excerpts from other upcoming main features. Typically these “Coming Soon” mini-features were more entertaining than the films themselves, because the purveyors of the ads simply waded through the tedious films they were promoting and picked out the best bits for display. Cutting out the dross gave the resulting clips a sense of urgency and drama, often belied by the films themselves.

By virtue of the mutative process of intellectual intercourse the ads served another social purpose. The audience soon learned that the ads were better than the films, so if the ads were not very good they calculated the film itself would be awful, and mentally chose the best of the ads to guide them as to which films might, conceivably, be worth paying to see, whilst understanding that there were probably better things on television, and free to boot.

However, for those of you with an interest in human intelligence, any upcoming publication on the Flynn Effect is worth a look. The effect, the apparent relentless rise of intelligence, or at least the rise of IQ scores, deserves some attention. It implies, dear reader, that you have been getting brighter or, more likely, that your children will be brighter than you are. Any publications on the Flynn Effect are likely to bring a glow of pride to the average citizen. (Pride: a sensation, usually entirely unmerited, that you have achieved something of value).

The Flynn Effect is a feel-good movie, starring the dashing Jim Flynn as the antidote to all those worries that your vocabulary is not up to scratch. Although the journal Intelligence is not usually associated with dramatic entertainment, the special issue on the Flynn Effect is the exception. In this issue the whole Effect is Taken Apart and also Beaten To A Pulp and then, when all seems lost, in the very last tearful minutes it is nursed back to health by Jim Flynn himself.

In the best traditions of Hitchcock’s Psycho, I will not allow readers to come in during the last ten minutes of the movie, but the whole production should be showing in a cinema near you very soon.

Test your vocabulary Part 2


In the spirit of relentless self enquiry, the moment I finished the Vocabulary post on the cognitive challenge of learning words and using them accurately I came upon a demanding vocabulary test. It requires that you decide whether two words are the same or different.

The test does not lack challenge. It bills itself as a test for people of above average intelligence, so it is likely to maximize stereotype threat. As a consequence, there are two categories of failure. You may be discouraged because you are a sensitive soul who does not like the implication that the test is testing you, and realise to your embarrassment that have blundered into an adult conversation while still wearing short pants, and that you are too dull to understand the concept of above average. Conversely, you may be discouraged by the simple fact that, after having started the test you realise that your vocabulary is not quite as good as you imagined. Either through threat or a paucity of vocabulary, potential failure stares you in the face. Mercifully, there is no way to know your results unless you post them, yet another clear example of the distorting effects of publication bias.

On a more technical note, there are 200 items, which ought to provide a high alpha, but there are no reliability stats available, nor any population norms at the moment. Consequently, it would be wrong to call it a vocabulary test in the psychometric sense, but rather a vocabulary competition. The only benchmark I can find is that 165 out of 200 is considered a reasonable score, which allows you to be posted up on a list somewhere, in the quiet precincts protected by the teachings of the wise. I have yet to receive an embossed certificate or discrete invitation to a sumptuous gathering of wordy interlocutors, but I live in hope.


Tuesday 11 June 2013

Vocabulary: humanity’s greatest achievement?

Words are our greatest tools. They allow us to guide, warn, praise and admonish each other and to cooperate in complex tasks. Even restricting ourselves to oral traditions, we can pass down stories which preserve some of the wisdom of our ancestors. Once we master the art of reading and writing we can supersede the restrictions of memory and draw on the wisdom of successive generations. Knowledge is power, and the accumulated knowledge of many centuries is the most powerful of all.

Vocabulary acquisition is an enthralling process to study, as children learn the rules of grammar, and sometimes apply them with greater consistency than the idiosyncrasies of actual spoken adult language would deem correct, if you have gotted the point. Children may refer to little rodents as “mouses” because children have noticed a language rule, and applied it correctly. The English language is not always correct, for idiosyncratic and historical reasons. The English who goed West on the Mayflower were full of reformist zeal, and managed to improve the unwritten constitution by writing it down, and improved writing by removing some of the colourful miss-spellings of the English language and by making spelling more logical and less colorful, but they did not get round to a full purging of languistic errors and exceptions, or at the very least they have not gotten round to it yet (this is a recondite joke, because all English speakers at that time said “gotten” and then with time the English moved to “got” for the past tense, no doubt just to aggravate the colonials).

Some people have the simplistic notion that vocabulary must be determined by mere exposure to spoken language. That is necessary, but far from sufficient, as even children work out. They notice patterns, informal rules, and the contexts in which communication takes place.  “The acquisition of meaning is based on the eduction of meaning from the contexts in which the words are encountered”. (So, even if the word “eduction” in the quotation from page 146 of Jensen’s “Bias in mental Testing” is unfamiliar, you will not be surprised to deduce that it means “To assume or work out from given facts; deduce”). The meaning of a word is acquired in some contexts which permits at least some partial inference as to its meaning. By hearing or reading the word in different contexts, through a process of generalization, discrimination and eduction one can guess at the essence of the meaning of the word, so as to use it (experimentally) oneself the next time a similar context presents itself. Words move from being unfamiliar to familiar, from familiar but not really understood to being familiar and partly understood (at which stage the explanations given about the meaning of the word are threadbare and inaccurate), and from there to being explained by use of synonyms (though those can range from partial to full understanding as shown by power of the explanations and definitions).

Testing vocabulary precisely is quite complicated, because you have to test how well subjects really understand the words in question. It is a bit like trying to find out whether people can really handle heavy machinery, as opposed to boasting about it. Typically vocabulary tests work up from common to rare words, and specify what sorts of definitions and explanations will get full points. On multiple choice questions, the use of distractor items often reveals that many people have misunderstood the meanings of words that are new to them. For example, some people who think they know how to define FATUOUS are distracted by the option large. An argument may be witless, silly and pointless, but not obese.  

A very short vocabulary test, which correlates 0.71 with IQ, is the ten word test in the General Social Survey (US). Can something so crude yield interesting results? Yes. Razib Khan has a very informative post on this. In my view, no survey should be conducted without including a test like this, which provides a very good estimate of intelligence.

A stab at using this test to calculate the intellectual demands of particular jobs is provided by The Audacious Epigone. Note the low score for academics.

Nonetheless, long vocabulary tests, or dynamic computer—administered tests which adapt quickly to your difficulty level provide reasonable estimates of your total word store. In that sense, intelligence ranges from 0 to 45,000 words (the real upper limit if one avoids technical jargon), and one can put a single number on it, on a proper ratio scale. Rating people by the size of their word stores makes sense.  Although 3000 words will provide a great deal of cost-effective and very useful communication, additional words bring conceptual benefits. Knowledge of the 3000 most frequent words in the English language will probably result in your understanding 95% of what is said to you, and knowledge of 5000 “word families” (the main word and its variants, like quick: quickly, quicker, quickest) should mean that you would be able to understand 99.9%. Why have more? The answer is that much of good thinking depends on a powerful vocabulary. Carpentry can be done with a standard tool set, cabinet making requires finer, more specialised tools.

As a rough guide, teenagers have about 12,000 words and college students 17,000. Older adults have 17,000 to 21,000 words, and a minority have many more.  Some conceited person referred to 20,000 words as being “the incoherence boundary”. I eschew such contemptuous judgements.

Bright children acquire vocabulary faster than duller children, and thus brighter adults have larger vocabularies, because they require fewer contexts to work out word meanings, and make more subtle discriminations in meanings between similar words. In 1962 Alice Heim, to whom a statue should be raised somewhere, designed a new test of verbal reasoning called “The word in context”.  Charming and intriguing, it featured an unfamiliar foreign word in a descriptive paragraph. Subsequent paragraphs gave more descriptive context, until the elusive meaning was potentially resolved. A brilliant idea, but the test took too long for it to be used in psychometry.

Most crucially, bright people often observe things and have thoughts about things before they learn the words they need to express them. They have the need before the word, so that when the word comes into view they seize on it with pleasure, and relief. Such words are treasured, and stick in the mind because of their elegant utility. Words for which we have no need are shapeless, and never stick in the mind.

So, a word is not just something we have heard, like a bird-song. A word is a cog in a meaning machine. To learn a word is to mine the essence of a context, to condense a cloud of implications into the condensate of definition.

You may wish to discuss this post with someone you love. Be warned: even in autochthonous pairings, discrepancies in terpsichorean accomplishments can lead to uxoricide.

Do not be captious after reading this post.

Saturday 8 June 2013

By the age of three, a clear gap in ability

Jason  Malloy has been working on an interesting question, which is to determine at whether there is a clear gap in ability between black and white three year old children in America.

Some years ago there was much talk in Britain about tests which found that all racial groups began schooling at age 5 with identical “school readiness” results, but then progressively diverged as the years went by, strongly suggesting that schools were failing to teach British black children properly. I was never able to get to the bottom of what these tests were, and what range of skills they covered, and whether there were ceiling effects on the tests, but it was clear that pre-school intelligence testing results would cast light on whether children of different genetic backgrounds really started school with the same level of ability.

Malloy found 29 usable studies containing 35 different samples of children born between 1936 and 2000. There is IQ data for 2569 black children and 2762 white children, age 3

“A majority of these samples contain control groups of whites who were tested at similar times, and under similar conditions. When we compare the 20 samples with both blacks and whites, we get a difference of almost exactly 1 standard deviation: the black IQ is 85.4 and the white IQ is 100.8 (15.4 points/1.03σ).”
The figure shows the studies plotted out by year.

There is no particular trend (one early outlier study was excluded) and the pre-school intelligence gap looks pretty solid. The next post will discuss measurement issues. The predictive power of intelligence tests increase with subject age. Conventionally 7 years gives the first reliable indication of adult intelligence. 11 years is far better, but there is still more maturation to take place.

One more post to follow on this topic.  

Friday 7 June 2013

Patronage and the arts of courtly flattery


I have been reading my favourite blogs for years, without paying a penny for the privilege. The material was there, free, so I just read it. When some of my much-visited bloggers admitted they were facing destitution, I sent them some dollars every now and then. Life being what it is, I only did that when they reminded me that they could not continue writing without some income to support them. It was hit and miss. One day, reading Steve Sailer’s blog, I noticed a little ”Flattr” button at the end of each post. is a way of making social micro-donations. You set aside a fixed sum which will be distributed each month according to which blogs you think deserve support. The more you flatter different writers by clicking on their “flattr” buttons the more finely the donations are distributed, but they never exceed your maximum monthly sum. Think of it as a way of paying to read a newspaper, but ensuring that your payment goes to those columnists whose writings you most value, and to no-one else.

And now, back to the program.

ORIGINAL PAPER: How clever were the Victorians? A comment on Woodley et al. (2013) by Elijah L. Armstrong

Abstract: Woodley et al. (2013) cite declines in simple reaction time as evidence of dysgenesis. In this paper it is conceded that these declines are strong evidence for a dysgenic trend. However, declines in g cannot be inferred from reaction time declines alone.

Woodley et al. (2013) are quite correct that the existence of a secular decline in reaction time suggests dysgenesis. However, the secular decline in reaction time is probably a poor quantification, per se, of the exact dysgenesis rate. Woodley (2012) argues that the Flynn effect may be caused by increasing specialization in cognitive abilities. If specialization has truly increased, one would expect to see a secular decline in certain such abilities. These abilities are probably, for the most part, not measured on typical g-loaded tests (Lynn, 1998 gives the example of farming ability), but it is nevertheless to be expected that some g-loaded tests will show a secular decline. It may be responded that reaction time’s shared variance with g is wholly genetic (Woodley et al. cite Rijsdijk et al., 1998 on this matter) and therefore changes in specialization will have a minor impact. However, even if the environmental influences on reaction time are different from those on IQ, there may still be considerable environmental influences. Moreover, reaction time influences mortality rates (Deary & Der, 2005). Declining reaction time independent of g fits into Woodley’s (2012) life-history model because there would be less pressure to develop a mortality-mediating ability in a less environmentally harsh environment. It should be noted as well that even though simple reaction time shows little or no training effect (Kida et al., 2005), there may be other processes that decrease simple reaction time, such as imprinting (Armstrong & Woodley, under review).

While Woodley et al. extract declines in g from the declines in reaction time (given a .54 correlation), simply multiplying the decline in reaction time by the g-loading is not sufficient to establish a decline in g (cf. Dickens & Flynn, 2001 for discussion of a similar issue). Using a similar procedure on IQ tests for the Flynn effect would imply high g gains (say, if performance on a test with a g-loading of 0.8 has increased by a d of 1, this procedure would imply that g has increased by a d of 0.8). However, the Flynn effect is not on g (Woodley, 2011, 2012a, 2012b; te Nijenhuis & van der Flier, 2013).

A number of similar declines (approximately 1 SD since the Victorian era) on other highly g-loaded tests or abilities would corroborate Woodley et al.’s dysgenesis estimate. To the best of my knowledge, though, there are few tests that have shown a secular decline; the SATs have, but the population has grown increasingly representative (e.g., Williams and Ceci, 1997; Sailer, 2011a, 2012). Piagetian tasks show a decline (Shayer et al., 2007), and if the decline in g estimated from secular trends in Piagetian tasks is comparable to the decline in g estimated from secular trends, this would corroborate a 1 SD dysgenesis estimate. Likewise, if the decline in IQ among wealthy countries that are no longer experiencing the Flynn effect (e.g., Sundet et al., 2004) was similar to the decline measured using reaction time, Woodley et al.’s estimate would be validated.

Finally, it should be noted that a g decline of 1 SD is difficult to believe (cf. Charlton, 2013; Flynn, 1987; Guha, 2001 for discussion of a similar issue).[1] A community with average levels of g 1 SD higher than modern populations would be supermen. Ashkenazi Jews, who are a tremendously successful ethnic group, appear to have IQs around 110 (e.g., Cochran et al., 2005; Lynn, 2011; Sailer, 2011b). Hence even the most intellectually successful ethnic group would have IQs five points lower than the Victorians, if Woodley et al. are correct. This process of devolution is made quite incredible by the fact that it is hypothesized to have occurred in only 130 years (Cochran, 2012).


Armstrong, E., and Woodley, M. A. The rule-dependence model explains the commonalities between the Flynn effect and IQ gains via retesting. Under review.

Charlton, B. (2013). "Extraordinary claims require extraordinary evidence" - with respect to the claim of intelligence decline since Victorian times. Retrieved from

Cochran, G., et al. (2006). Natural history of Ashkenazi intelligence. Journal of Biosocial Science, 38, 659-693.

Cochran, G. (2012). The long and short of it. Retrieved from

Deary, I., and Der. G. (2005). Reaction time explains IQ’s association with death. Psychological Science, 16, 64-69.

Dickens, W., and Flynn, J. R. (2001). Heritability estimates versus large environmental effects: The IQ paradox resolved. Psychological Review, 108, 346-369.

Flynn, J. R. (1987). Massive IQ gains in 14 nations: What IQ tests really measure. Psychological Review, 101, 179-191.

Guha, S. (2001). A philosopher’s paradise––in inspired lunacy. Retrieved from

Rijsdijk, F. V., et al. (1998). The genetic basis of the relation between speed-of-information-processing and IQ. Behavioural Brain Research, 95, 77-84.

Lynn, R. (1998). In support of the nutrition theory. In U. Neisser (Ed.), The rising curve: Long-term gains in IQ and related measures (pp. 207-215). Washington, D. C.: American Psychological Association.

Lynn, R. (2011). The Chosen People. Augusta, GA: Washington Summit Publishers.

Sailer, S. (2011a). SAT score changes by race since 1996. Retrieved from

Sailer, S. (2011b). Lynn on the Jews: Yes, it’s intelligence –– but there’s something else too. Retrieved from

Sailer, S. (2012). SAT and ACT: How hard are they scraping the bottom of the barrel and are they finding any diamonds in the rough? Retrieved from

Shayer, M., et al. (2007). Thirty years on – a large anti-Flynn effect? The Piagetian test Volume & Heaviness norms 1975–2003. British Journal of Educational Psychology, 77, 25-41.

Sundet, J.M., et al. (2004). The end of the Flynn effect?
A study of secular trends in mean intelligence test scores of Norwegian conscripts during half a century. Intelligence, 32, 349-362.

te Nijenhuis, J., & van der Flier, H. (in press). Is the Flynn effect on g?: A meta-analysis. Intelligence.

Williams, W. M., and Ceci, S. J. (1997). Are Americans becoming more or less alike? Trends in race, class, and ability differences in intelligence. American Psychologist, 52, 1226-1235.

Woodley, M. A. (2011a). Heterosis doesn’t cause the Flynn effect: A critical examination of Mingroni (2007). Psychological Review, 118, 689-693.

Woodley, M. A. (2012a). The social and scientific temporal correlates of genotypic intelligence and the Flynn effect. Intelligence, 40, 189–204.

Woodley, M. A. (2012b). A life history model of the Lynn-Flynn effect. Personality
and Individual Differences, 53, 152–156.

Woodley, M.A., et al. (in press) Were the Victorians cleverer than us? The decline in general intelligence estimated from a meta-analysis of the slowing of simple reaction time. Intelligence.

[1] Charlton’s discussion of this line of reasoning is critical.

Thursday 6 June 2013

Motion Quotient, and why you need a friend

Duje Tadin, who did the motion quotient work with one of his research team, explains that for proper results you need to use controlled conditions, because even the level of illumination in the room will affect the findings

"Unfortunately, there is no good way to compare the results across computers. For example, "suppression" is weaker at low contrast, so somebody will have a weaker suppression looking at the video in a bright room than in a dark environment. Also the actual monitor frame rate will make a big difference. What can be compared is if two people look at the video under same conditions.
This provides the same info:

So, this is why you will need a friend. However, the results could potentially damage the friendship. I leave that dilemma to you.

Wednesday 5 June 2013

Shibboleth: Test your vocabulary (and your honesty)

Shibboleth is simply a word that you will mispronounce unless you know how a particular tribe, cohort or gang pronounces it. It serves as a password to an exclusive community. Naturally, like all human culture, it has a dark side. Pronounce the word wrong while trying to worm your way into such a secluded community, and you may be banished, or attacked for your impudence.

Here at Psychological Comments we try to be more cognitively demanding. Your pronunciation of words is of marginal interest. Your reaction times, on the other hand, have some predictive value, so we tend to pore over those to test the quality of our readers. HBDchick is in pole position on this measure, and you are encouraged to test your own.

However, why not test yourself on something very closely related to intelligence, and something profoundly human: your vocabulary? Sure, you will be doing an intelligence test, but why not? The link is below

A bold reader, Elijah Armstrong is in pole position on this one with 32,800 words.

Best of all, this test requires the testee to be honest. You can make the number up by pretending to know the  words offered up to you, but in that case you would be missing the point, and failing to understand yourself.

No such problem would ever afflict readers of this blog.

A very good morning to you all.

Tuesday 4 June 2013

The Motion Quotient and other distractions

Galton believed that cleverness would be associated with the capacity to make fine sensory discriminations. As usual, he was way ahead of his time, but he did not have the statistics available to analyse his results with sufficient power. Once Binet had taken the broad-brush educational approach by using a non-theoretical selection of mental tasks to develop intelligence quotients (skills expressed by comparison with peers of the same age) research on sensory intelligence lapsed. Only when under pressure to explain the underlying basis of intelligence in the 1960’s did researchers return with renewed interest to sensory measures like inspection time, choice reaction time, brain waves, and sensory nerve transmission times. They put together a reasonable case that whatever intelligence is, it can be measured weakly by physiological surrogates with little intellectual content, and minimal cultural influences.

Now a new test of sensation has entered the lists. Melnick et al. A Strong Interactive Link between Sensory Discriminations and Intelligence
Current Biology 23, 1013–1017, June 3, 2013 ª2013 Elsevier Ltd All rights reserved

Already dubbed the “Motion Quotient”, this procedure looks at the sensation of movement caused by displays of lines in either a small or large visual field. This is better seen in action than described in words, although the sequence of test examples below is not itself very well explained. The test items come too quickly for you to be able to record your answers, but they ask two questions: do the lines appear to move to the right or the left, and is it easier to make that judgement with a small focussed display or a larger screen version?

My somewhat confused answer was that I had little idea what was going on, but of course it was much easier to see the lines moving on a big display than a small central display. It was “bloody obvious”, to use a British colloquialism.

However, my impressions turn out to be of little consequence. As you would expect from any inspection task, the experimenters adaptively adjusted the stimulus duration to estimate the shortest exposure durations sufficient for threshold level performance. In other words, the experiment has to be calibrated to each individual person, and you cannot really work anything out about yourself by looking at the above demonstration.

There is a lot to like in this study. They drew their volunteers from the general population, and tested them face to face on the Wechsler individual intelligence test, which is the best validated. That must have been time consuming. They know exactly how their result fits in with inspection time research, namely that they are getting a higher correlation than is usual in this often disappointing line of research.  They have convincing explanation as to why their somewhat more complex “grating stimulus” is a better, and more ecologically valid test of sensory discrimination, namely that it tests the subject’s ability to detect the important signal from the unimportant but distracting background noise. They say: “Rapid processing is of limited utility unless it is restricted to the most relevant information” Finally, they can show a correlation between their test and intelligence which is probably around 0.7 which is as good as the correlation between a subtest and the overall IQ figure.

So, why am I not celebrating their impressive result? First, we have been here before. The early results on inspection time looked almost as good. Second, one should be slightly on guard when the result is in line with what one wants. Third, one should be particularly on guard when the explanation for the results are given by clever researchers who have thought through their experiment carefully.

Nope, I have only one real gripe.


A warm welcome to Bobbing Bobcat HBDchick

HBDchick is not only fast but clever. Her reaction times are a very creditable 0.24 seconds, and she also knows how to do a screen grab so that the results are displayed for all to see. Respect!

My results, collected on the same tests, and with a Logitech wireless mouse, were announced on “Can I have a reaction” as being 0.29 on the BBC Sheep test on the day of posting, with an all-time best of 0.21 secs. The other results fell within that range. My Red Stoplight results were 0.29

In the spirit of Twitter inanity, this morning my sheep test result was 0.286 secs (and yes, I had already drunk some coffee) and 0.28 for the stoplight test.

So, it would seem that HBDchick is in the vanguard, and I am following in her wake. All this makes sense: she was blogging long before me, blogs more frequently, and on a wider range of subjects. (Perhaps we should bump up the sample size before coming to any firm conclusions). Time for coffee. 

Monday 3 June 2013

Steve Sailer’s reaction times, his driving record and his intelligence

In yet another engaging and meditative post, Steve Sailer has given us his personal view on his driving errors, reaction times, and intelligence.

With commendable honesty, he shows that he is fully aware that we should not keep ourselves “above the audit”. Every observer is also observable, and must submit to enquiry. Not everyone knows that, or behaves as if it were true.

Like Steve, I depend on the kindness of strangers to make allowance for my driving errors, and I remember clearly my failures to scan the road both ways at apparently quiet junctions. I have (mostly) got rid of the delusion that I am an above average driver. Why does this view prove so popular? One very strong reason is that the distribution of driving errors does not conform to the standard normal curve. Gigerenzer covers this in his mini-chapter “Why most drivers are better than average” on page 214 of his book “Reckoning with risk”. (This book can be quoted to advantage on virtually any occasion).

Most people drive pretty safely. (They avoid errors, but also recall their prudent reactions with pride and attribute their errors to a temporary lapse, which they tend to forget). A minority of drivers keep getting into trouble. This includes many young men, a very few young women, those of any age who drink heavily, those who allow themselves to be distracted by phones and fellow passengers, and some who just cannot control their speeds. In the spirit of Lady Bracknell who admitted: “I myself am peculiarly susceptible to draughts” I should confess that I am peculiarly susceptible to open roads in bright sunshine, though that is not a frequent temptation in England. Mind you, on the sunny road back to London last evening under the dappled, yew-tree-tunnelled shade of Salisbury plain, with not a car in sight, none of these prudent observations were uppermost in my mind.

Anyway, back to the distribution of errors: as a result of this dangerous minority, rather than 50 percent of drivers being above average, the true figure is probably that 63 percent are above the modal accident rate, and are justified in saying that they are, in the common parlance, “above average drivers”. Skewed distributions are difficult to describe in ordinary language, but depict a familiar social problem: that of accounting for behavioural minorities. (It takes us away from the main argument, but this is also true of the minorities who have more than 50 sexual partners, rather than the more usual, contemporary 10).

When I discussed this finding with driving behaviour psychologists some years ago (the driving, not the sex) one gave me an evidence-based and crushing reply: he sent me the self-evaluations he had collected from learner drivers who had only just passed their driving test. This is the period in which there is a sharp spike in accidents and deaths, partly due to sheer inexperience, partly due to showing off to passenger friends after a drinking party at night. These novice drivers habitually rated themselves as being 7 or 8 out of 10, when in fact they were at that stage 3 or 4 out of 10.

This is yet another example of the Dunning-Kruger syndrome: over-confidence and under-competence, a thoroughly lethal combination. It is a cognitive bias in which the unskilled suffer from illusory superiority, and lack the competence and self-reflection to acknowledge their deficiencies. This is a very common disorder, and seems to be inversely related to self-esteem and intelligence. Brighter people note their errors, note their brighter competitors, and are grimly aware of all the stuff they ought to know, but haven’t got round to reading yet (they monitor the external world). Less bright people revel in their accomplishments. They have delusions of adequacy (they monitor their internal world). It is not the purpose of this blog to encourage public abuse, but after being subjected to any sustained burst of self-confident nonsense one is justified in muttering, very quietly to one’s self “Dunning-Kruger syndrome”. I append the reference as a public service to aggrieved citizens who might otherwise be tempted to violence.

Kruger, Justin; David Dunning (1999). "Unskilled and Unaware of It: How Difficulties in Recognizing One's Own Incompetence Lead to Inflated Self-Assessments". Journal of Personality and Social Psychology 77 (6): 1121–34.doi:10.1037/0022-3514.77.6.1121. PMID 10626367. CiteSeerX:

Now to Steve Sailer’s reaction times. As already discussed in this blog (Can I have a reaction you should google “BBC reaction time sheep test” and then all of us can get ourselves on a common baseline. Ignore any blogger who does not post their reaction time results. Equally, demand reaction times from would-be commentators on your blogs. (Note that there are artefacts: my standard laptop response key gives poorer results than a new wireless mouse, so if this really bugs you, buy the latest and most sensitive response key you can find).

As Jensen was at pains to point out, reaction times contain two elements: thinking time and movement time. In ordinary life the two are confounded. Faced with an obvious threat, if you keep both of these short you keep alive. In more tricky situations with various options to consider, thinking time becomes the great discriminator, and movement time less significant.

Perhaps Steve is right that Jensen complicated reaction times too much. He was attracted by the beauty of Hick’s Law (speed plotted against the log2 of decision options) with which his results fitted quite well. Ian Deary, on the other hand, finds that simple reaction times predict lifespan, or at least take out a good chunk of the IQ/lifespan variance, suggesting that a common pathway gives us health, reaction speed and intelligence, to varying degrees.

Sport, as I understand it, often involves throwing or hitting balls. Do not ask me why. As far as I am concerned, balls have never done me any harm, particularly when left alone. Propelled at velocity they can be dangerous. For some reason schools pick on serious readers and interrupt their studies by taking them outside and getting them to catch these objects. The trick, for those serious readers who can see the flying object in the first place, is to compute the balls’ parabolic trajectory and the place and time of landing, and thus accelerate themselves into the place where it is most likely to land just at the moment it does so.

Rather than attempting any of this, it would be simpler to note that reaction times, whilst showing a positive correlation, are not very strongly related to measures of intellect. Steve is not the first of my clever readers I have had to reassure on this point.

Steve makes a personal claim: “I'm a reasonably intelligent person”. Claims of this sort are not allowed in England, so I can only look at this American remark with bemusement and envy. However, according to the Dunning-Kruger effect, we cannot take such self-assessments at face value. It is pointless to ask Steve for his IQ measurements, since the intelligence quotient is a summary of a sample of intellectual tasks. It does not have a reified status. It is a predictor (one of the best we have, out of a rather weak bunch) but it is not “that which must be predicted”.   

A detailed look at the corpus of his postings, his analysis of data, responses to arguments and so on confirms his likely high intelligence in the usual meaning of that term: “a very general mental capability that, among other things, involves the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience. It is not merely book learning, a narrow academic skill, or test-taking smarts. Rather, it reflects a broader and deeper capability for comprehending our surroundings “catching on,” “making sense” of things, or “figuring out” what to do” (Gottfredson 1994). He also shows an interest in sports, but that is permissible in persons who are otherwise of good character.

Steve makes two additional intelligence-related claims: “There are two intellectual areas where I have very fast reflexes”.

The first area is being fast to get the joke in a movie. I agree that this can be a great, but it is a lonely skill. It leads me to suggest a new IQ test. Skip movies which often have to spend time setting up the context. Take a selection of comics and measure the time between exposure and laughter. Failure to laugh at any three in a row gives you a beautifully embossed certificate of Failure, and a quick exit from the test. The items could be ranked in terms of a priori intellectual complexity, which would be a pleasurable task in itself, and then we could have a good linear scale comprehension test, admittedly rather slanted towards the upper right hand side of the bell curve. (In the spirit of further personal disclosure, today I came across an old copy of the Alice Heim AH5 test for university students, which was used in the 1960s. I can claim to have got a B in this test aged 19, though I doubt I could do that again without resting beforehand for several days. Have a look at some of the items if you can, without breaking copyright).

The second area is Quiz shows involving buzzers. This is a real intelligence test. Anyone who did not test their buzzer before participating fails! Other than that, “first to the buzzer” is the key feature of University Challenge on BBC2, with the proviso that if your answer is wrong ten points go to the other side. Speed matters in a quiz, but speed of thinking matters even more in real life, because faster processors are required to solve harder problems.

Steve says that he “doesn’t get reaction times” but of course he does, it is simply that he knows they are a poorer test of intellect than even something simple like digit span or a ten word vocabulary test. Whilst Steve is probably right about contemporary life, those who belittle reaction time measures are probably wrong about our hunter-gatherer past. In that era one presumes that reaction times were often a matter of life or death. Hence, it might be simplest to keep contemporary reaction time studies simple, and only administer one trial, with minimal warning, so that the test approximates most closely to real life. To my surprise I survived a French driver on a winding hillside road at dusk in the South of France some weeks ago, and was very pleasantly surprised by the speed with which I swerved to avoid him, my passengers less so.

Disclaimer: Some of my above statements are immodest. Modesty about one’s capacities is not only polite, but very probably has high survival value.

Saturday 1 June 2013

Educational attainment, intelligence, and the relentless cracking of the genetic code

Readers of this blog will be familiar with my views on behavioural scientist’s use of very small samples, from which they draw very large conclusions, in sharply opposing directions, very frequently. This makes for good headlines and weak science. So, it gives me great pleasure to read an article in Science about educational attainment which has a sample size of 101,069 persons, and then promptly checks its findings on a further sample of 25,490 other people. With one bound they propel themselves into the stratosphere of behavioural science: a massive sample of “discovery” and a very large “replication” sample. Psychologists, with some notable exceptions, generally limit themselves to a small “discovery” sample which they treat as if it were the entire universe, and leave replication to others.

GWAS of 126,559 Individuals Identifies Genetic Variants Associated with Educational Attainment. Science Xpress

You may think it churlish of me not to give the authors’ names, but as befits a collaborative study, the author list is the length of a short letter to Nature, and the 175 references are a longer paper in themselves.

The cautious naming of a treasured sample as being one of “discovery” is very wise. We use samples like a net cast into the sea to try to discover what fish are like in all oceans. Our conclusions as we pick through our fish will contain many elements which are characteristic of that particular catch, and not of all other catches. We have to dip the net into another sea at another time in order to better understand what creatures live in the oceans.

As you might suspect, the scientists in this paper are gene hunters. They know that the only way to try to make sense of the genetic code is to hit the problem with massive samples, thus squeezing out random characteristics and homing in on the real causes of variance. They found that three SNPs had genome wide significance as regards educational attainment, and also found that a score derived from all these many hundreds of SNPs, each having a tiny but additive effect, accounted for 2% of educational attainment and 2.5% of cognitive function.

Can we now trumpet “The genes for IQ have been found”? The authors make no such error. With commendable caution they say that these areas of the genome are associated with health, cognitive and central nervous system expression, so they are worth following up, and that their study provides a benchmark for power analyses in social science genetics.

This is a very important study, and sets a high standard for others to follow. What does this mean for the genetics of intelligence?

Criterion heterogeneity is the technical term for the “rubber ruler” effect. Suppose we try to study intelligence by asking every school to name their most able student. Some schools in unfavoured catchment areas will nominate a student who would not be rated outstanding in a slightly better school with brighter students. Suppose we try to do better by measuring the number of years that the student spends getting educated. (We discount those rare students who are too bright to spend much time at school and leave early to found their own companies). As a rule of thumb, the brighter you are, the longer you spend in education, because you go to college, and may then continue to even higher degrees. The authors, no slouches, were wise to this problem, and used the International Standard Classification of Education Scale (1997) to calculate years of education, and whether or not the person went to college. Frankly, this is not all that much use when compared to a standard scale like a national exam with a grade point total, but they had to recruit over several countries and this was the best way to get things on a common metric, however crude.

The subjects were all Caucasians i.e. white and they were most of them about 30 years of age, by which time they should have completed college. 23.1% had a college degree. The authors do not mention this, but since white IQ is 100 just about anywhere in the world, it suggests that those with IQ 109 and above were getting into college (23.1% of the white population have IQs of 109 and above). Depending on your attitudes to further education you may see it as a great thing that persons at that level of intellect are in college, or a waste of money and a dreadful lowering of standards. To me it suggests that “college” covered a wide range of courses. The more demanding colleges recruit from those with IQs of 115 and above (top 16% of the population) and elite colleges require IQ 130 (top 2.2%). This trade-off between intellect and educational quality is depicted in a previous post “Social class and university entrance”.

However, not all is lost. The diligent authors found that the peace loving Swedes had given all their military service conscripts a proper IQ test, and the very same genetic markers did a better job of predicting IQ for this subset, accounting for a princely 2.5% of the variance. So, the continuous measure of intelligence was slightly easier to predict than the lumpy and not so informative educational measures. By means of comparison with other personal characteristics, the same genetic analysis predicted 10% of the variance for height. By means of historical comparison, until the last two years the amount of variance of intelligence which could be explained by genetic analysis was zero.

Even larger samples with IQ measures and further analysis of the genetic code may well increase the intelligence variance accounted for. In all probability there are very many genes which contribute to what we call intelligence, all with slight but useful effects.

The hunt continues.