For some time I have been arguing, in the company of
those few persons polite enough to listen, that much of what is being passed
off as neuroscience (the centre for Psychopathy/Intelligence/Love/Hate/Racism has
been found in the brain) is not good science.
That point was made a year or so
ago by Prof Tim Shallice in a book review (Shallice has been looking at
brain-behaviour links for at least 40 years, and I remember going to his talks on
memory disorders at the National Hospital, Queen Square, in the late 60’s), in
which he pointed out that there were three problems: small sample sizes,
inconsistent methods and measures, and a lack of theory against which to test
findings.
Now the British Psychological Society’s Research
Digest http://bps-research-digest.blogspot.co.uk/
has a piece entitled Serious power
failure threatens the entire field of neuroscience. In this
article it is reported that Katherine Button and Brian Nosek argue that typical
sample sizes in neuroscience are woefully small, and that as a consequence the median statistical power of a neuroscience
study is 21 per cent. This means
that the vast majority (around 79 per cent) of real effects in brain science
are probably being missed. More worrying still, when underpowered studies do
uncover a significant result, the lack of power means the chances are increased
that the finding is spurious.
Button's team also
turned their attention specifically to the brain imaging field. Based on
findings from 461 studies published between 2006 and 2009, they estimate that
the median statistical power in the sub-discipline of brain volume abnormality
research is just 8 per cent.
Hence my title, because I am probably correct in assuming that if you and I
read any neuroscience results, we are most likely to read those with pretty pictures
of the brain, because a coloured MRI image is delightfully precise, and cannot
be wrong.
Another paper by Chris
Chambers and Petroc Sumner documented
how 241 fMRI studies involved 223 unique statistical analysis strategies, which
to me suggests data cooking for publication purposes. Furthermore, in general
most brain imaging papers do not provide enough methodological information to
permit replication.
However, there are still parts of my personal rant
not covered. Whilst it is necessary for sample sizes to be large enough to
confer statistical power, they also need to be absolutely representative of the
population. It is a legal requirement for an intelligence test to show that it
is standardised at a national level, matched for age bands, with a balance
between urban and rural dwellers, and with good racial representation. Given the
sensitivity of racial differences in intelligence, it is now usual to “double
sample” racial minorities. This means that if 200 subjects are required in the
standardisation sample to represent the population of African Americans, then
400 representative persons will be recruited and tested, simply to increase the
confidence levels regarding the results for that group.
For an MRI result on an individual to be valid, in
the way that an IQ result has to be valid (or be open to legal challenge) we
are going to need to scan a representative sample of something like 1400 people. Getting the full age range will be essential,
given age related changes in intelligence.
Also, there will have to be agreement on the protocol for obtaining and
analysing the results. Also, a proportion will need to be tested twice, a few
months apart, just to establish the reliabilities of the scanner, and natural
changes in the brain.
Anyway, after reading with great approval the story in
the Research Digest on small sample sizes, I looked at the sample sizes in the
other stories reported in this month’s edition.
Exploiting
children's social instincts to boost their learning : 55
children in 3 conditions (28 per
condition), 39 children in 2 conditions (20
per condition). Far too small to support the aims of the study.
Female political role models have an empowering effect on women: only 82 women, and 4 conditions, so 20 women per condition, rather small.
Anxiously attached people are ace at poker and lie detection: no sample sizes given initially, then 35 real poker players. This is better than psychology volunteers. It is too small really, though perhaps not so bad considering that it attempted to get out into the real world.
Nine-month-olds
prefer looking at unattractive (read: normal) male bodies: A bold title, but no
sample sizes given in the Research Digest story, nor in the abstract of the
actual paper, but on inspection of the paper itself, this bold conclusion is
based on 18 nine-month olds.
Somewhat of a small sample, I think.
Investigating
the love lives of the men and women who have no sense of smell: 32 patients, but the condition only
occurs in 1:7,500 so it is a pretty good sample size, given the rarity of the
condition, and we should cherish any insights we can gain.
Of course, psychologists never pay any attention to psychology, apart
from Daniel Kahneman, who noticed he knew that small samples were unreliable,
but kept on using them, and made a career out of explaining why.
Next time you see a pretty MRI picture of the brain,
look at the sample size, the sample representativeness, the protocol and the
statistical assumptions before believing a single pixel of it.
Gee, it sure is a good thing that Hamshire et al., those paragons of scientific integrity, didn't do any of that...
ReplyDelete" psychologists never pay any attention to psychology"
ReplyDeleteShould "psychology" be "statistics"?
Yes, psychologists don't pay attention to statistics for psychological reasons, Kahneman argued. Thought their study was above all that.
Delete