Tuesday 4 November 2014

Fractionating smoke and mirrors


In December 2012 I came across a silly study published in Neuron (Hampshire, Highfield, Parkin, & Owen, 2012) which had received credulous expressions of support in the media:

IQ tests have been used for decades to assess intelligence but they are fundamentally flawed because they do not take into account the complex nature of the human intellect and its different components, the study found.

The results question the validity of controversial studies of intelligence based on IQ tests which have drawn links between intellectual ability race, gender and social class and led to highly contentious claims that some groups of people are inherently less intelligent that other groups.

Instead of a general measure of intelligence epitomised by the intelligence quotient (IQ), intellectual ability consists of short-term memory, reasoning and verbal agility. Although these interact with one another they are handled by three distinct nerve “circuits” in the brain, the scientists found.

In a post entitled “What makes a good IQ story” I made a few comments showing why this paper’s conclusions could not be relied upon. As you will detect, I did not think much of their work:

It would seem that there are persons walking about this earth, entirely unsupervised and with access to resources, but with any luck not to heavy machinery, who think that you can make statements about human cognition on the basis of 16 subjects. The level of insolent innumeracy makes one’s jaw drop. 16 persons do not humanity make.

How did this paper get such adulatory press coverage? It told a story that people wanted to believe. It ticked all the boxes required by wishful thinking. Cold fusion, anybody?

Please check that I am giving a fair summary of my opinions at that time by clicking on the link below:


Now, almost 2 years later, the back story can be revealed. Unknown to me, it turns out that, prior to publication, a group of distinguished researchers (Haier, Karama, Colom, Jung, and Johnson) had been asked by the editors of Neuron to comment on the paper. These researchers expressed substantial reservations about the methods and the conclusions of that paper. Nonetheless, it was published. They have corresponded with the lead author subsequently, but have now decided to publish their concerns in the hope that the authors will reply fully to their points, in the usual fashion of academic debate. In the sedate world of academia, this is a challenge, and Hampshire et al. will be expected to reply, point by point.

I can proudly say I thought the Hampshire paper was nonsense from the start, but it is instructive to hear knowledgeable researchers point out the problems in a detailed manner. The reason I was particularly critical is that a few weeks earlier I had listened to a talk by the lead referee Richard Haier at the ISIR conference in December 2012 and he had struck me as a careful and diligent researcher. He is the guy to go for if you want to distinguish between proper results and neuro-bollocks. He and his team had no difficulty answering my pointed questions about the need for reliability in conducting brain scans and the requirement to have large and properly representative subject samples. They did more than answer my questions: they showed that they were very aware of these difficulties which plague the field and lead to the “scan a brain, print a pretty picture, publish a paper” production line of nonsense, and were taking steps to ensure proper procedures and large representative samples. Personally, I can also vouch for Colom and Johnson as sharp reviewers who can spot and correct errors in papers).

The key to the paper was that Hampshire et al had conducted a particular sort of factor analysis which suggested that their large sample of online testees did not show a general intelligence factor, but that a 3 factor solution was the best fit with the data.

Haier et al. reply:

One of the most robust findings in all of psychology is the observation that virtually all tests of mental abilities, irrespective of content and task demands, are positively correlated with each other, leading to the concept of a general factor, designated as “g”.

They then give an excellent explanation of factor analysis and its link with brain imaging research, and their paper is a great teaching resource. Like me, they find 16 un-described persons too slim a basis for coming to any conclusions about brain function. Unlike me, they have gone into every problem with care and patience, explaining normal practice and the caution required in interpreting results. There is great advantage in reading the whole thing, but if you just want to see their review, skip to Appendix A, which contains their unpublished preview of the paper.


In summary, the Hampshire et al paper was never in the running for the Thompson Prize for Plain Statistics. If authors cannot even give me basic statistics on the demographics of their subjects, I will conclude they are not treating the reader with respect.

On a broader front, in reviewing papers for this blog I will stick to my usual rule of thumb: if I can’t understand the statistics, that means they are either extremely good or very bad. If extremely good, I seek further readings on statistics from the bright young sparks of the research world. If they are very bad, I simply say that I would not rely upon them. And make some gentle comments giving the reasons for my reservations.

Cold fusion: a technique for getting headlines out of a badly controlled test tube.

1 comment: