For some time I have been arguing, in the company of those few persons polite enough to listen, that much of what is being passed off as neuroscience (the centre for Psychopathy/Intelligence/Love/Hate/Racism has been found in the brain) is not good science.
That point was made a year or so ago by Prof Tim Shallice in a book review (Shallice has been looking at brain-behaviour links for at least 40 years, and I remember going to his talks on memory disorders at the National Hospital, Queen Square, in the late 60’s), in which he pointed out that there were three problems: small sample sizes, inconsistent methods and measures, and a lack of theory against which to test findings.
Now the British Psychological Society’s Research Digest http://bps-research-digest.blogspot.co.uk/ has a piece entitled Serious power failure threatens the entire field of neuroscience. In this article it is reported that Katherine Button and Brian Nosek argue that typical sample sizes in neuroscience are woefully small, and that as a consequence the median statistical power of a neuroscience study is 21 per cent. This means that the vast majority (around 79 per cent) of real effects in brain science are probably being missed. More worrying still, when underpowered studies do uncover a significant result, the lack of power means the chances are increased that the finding is spurious.
Button's team also turned their attention specifically to the brain imaging field. Based on findings from 461 studies published between 2006 and 2009, they estimate that the median statistical power in the sub-discipline of brain volume abnormality research is just 8 per cent. Hence my title, because I am probably correct in assuming that if you and I read any neuroscience results, we are most likely to read those with pretty pictures of the brain, because a coloured MRI image is delightfully precise, and cannot be wrong.
Another paper by Chris Chambers and Petroc Sumner documented how 241 fMRI studies involved 223 unique statistical analysis strategies, which to me suggests data cooking for publication purposes. Furthermore, in general most brain imaging papers do not provide enough methodological information to permit replication.
However, there are still parts of my personal rant not covered. Whilst it is necessary for sample sizes to be large enough to confer statistical power, they also need to be absolutely representative of the population. It is a legal requirement for an intelligence test to show that it is standardised at a national level, matched for age bands, with a balance between urban and rural dwellers, and with good racial representation. Given the sensitivity of racial differences in intelligence, it is now usual to “double sample” racial minorities. This means that if 200 subjects are required in the standardisation sample to represent the population of African Americans, then 400 representative persons will be recruited and tested, simply to increase the confidence levels regarding the results for that group.
For an MRI result on an individual to be valid, in the way that an IQ result has to be valid (or be open to legal challenge) we are going to need to scan a representative sample of something like 1400 people. Getting the full age range will be essential, given age related changes in intelligence. Also, there will have to be agreement on the protocol for obtaining and analysing the results. Also, a proportion will need to be tested twice, a few months apart, just to establish the reliabilities of the scanner, and natural changes in the brain.
Anyway, after reading with great approval the story in the Research Digest on small sample sizes, I looked at the sample sizes in the other stories reported in this month’s edition.
Exploiting children's social instincts to boost their learning : 55 children in 3 conditions (28 per condition), 39 children in 2 conditions (20 per condition). Far too small to support the aims of the study.
Female political role models have an empowering effect on women: only 82 women, and 4 conditions, so 20 women per condition, rather small.
Anxiously attached people are ace at poker and lie detection: no sample sizes given initially, then 35 real poker players. This is better than psychology volunteers. It is too small really, though perhaps not so bad considering that it attempted to get out into the real world.
Nine-month-olds prefer looking at unattractive (read: normal) male bodies: A bold title, but no sample sizes given in the Research Digest story, nor in the abstract of the actual paper, but on inspection of the paper itself, this bold conclusion is based on 18 nine-month olds. Somewhat of a small sample, I think.
Investigating the love lives of the men and women who have no sense of smell: 32 patients, but the condition only occurs in 1:7,500 so it is a pretty good sample size, given the rarity of the condition, and we should cherish any insights we can gain.
Of course, psychologists never pay any attention to psychology, apart from Daniel Kahneman, who noticed he knew that small samples were unreliable, but kept on using them, and made a career out of explaining why.
Next time you see a pretty MRI picture of the brain, look at the sample size, the sample representativeness, the protocol and the statistical assumptions before believing a single pixel of it.