Saturday 11 January 2014

Genes, false positives and sample sizes

It is a simple rule of thumb in psychology that your sample size should be five times larger than the number of variables studied. Indeed, it is a minimal requirement, though one that is often ignored. A ratio of 5 to 1 gives you a chance of finding a signal amongst the noise, but noise will still get the upper hand all too often.

Problem is, this was not always apparent in the early stages of DNA analysis, which provided as many points of comparison as a drunken surveyor stumbling round Stonehenge at the summer solstice. Nearly 700,000 single-nucleotide polymorphisms (SNPs) and 1 million imputed SNPs can be generated by a modern genome wide analysis. Those are big numbers, particularly when your sample size is 2,329 twelve-year-olds for whom DNA and genome-wide genotyping were available. The South London Plomin gang have had to admit defeat in their attempt to name and celebrate the genes for receptive language. That receptive language (vocabulary, semantics, syntax, and pragmatics) at age 12 is  highly heritable is not in doubt. After all, it is a significant component in intelligence, which is also highly heritable. In the current study, the authors attempted to identify some of the genes responsible for the heritability of receptive language ability using a genome-wide association approach. They found that no SNP associations met the demanding criterion of genome-wide significance when they corrected for multiple testing across the genome ( p < 5 × 10 −8). Even the strongest SNP association did not replicate in an additional sample of 2,639 twelve-year-olds.

So, various headlines present themselves: “Receptive language not genetic” seems a clear favourite, with “Geneticists at a loss to explain how we understand language” a close runner. Of course, this overlooks the difference between heritability estimates (which show the extent of the genetic effect without identifying the mechanism) and genomic analysis (which attempts to identify the underlying code).

Of even more interest to science researchers is the following, unremarked, cultural difference. When psychologists publish a finding, they are usually satisfied with describing what they have found in their particular sample. They leave replication to someone else. Geneticists, on the other hand, usually include an attempted replication in the same paper, generally shooting down the original findings “in the sample of discovery”. Perhaps most of psychology is based on false positives derived from over-enthusiastic application of multiple comparisons in “samples of discovery”.

Genome-Wide Association Study of Receptive Language Ability of 12-Year-Olds

Nicole Harlaar; Emma L. Meaburn; Marianna E. Hayiou-Thomas;Oliver S. P. Davis; Sophia Docherty; Ken B. Hanscombe; Claire M. A. Haworth; Thomas S. Price; Maciej Trzaskowski; Philip S. Dale;Robert Plomin

Journal of Speech, Language, and Hearing Research Newly Published on December 23, 2013. doi:10.1044/1092-4388(2013/12-0303)

History: Accepted 22 Apr 2013 , Received 17 Sep 2012 , Revised 18 Feb 2013

http://jslhr.pubs.asha.org/article.aspx?articleid=1809251

The authors conclude that individual differences in receptive language ability in the general population do not reflect common genetic variants that account for more than 3% of the phenotypic variance. (The multiple comparison criterion). They admit that the search for genetic variants associated with language skill will require larger samples and additional methods to identify and functionally characterize the full spectrum of risk variants.

By now you will know my own opinion, which is that psychological research would benefit from collaborative projects which boost representativeness, increase sample size considerably, and utilize a core set of agreed psychological measures. The chance of that happening when the promotion system favours the number of publications is very low: as low as the chance of reliably finding the genes for something in a small sample.

2 comments:

  1. "So, various headlines present themselves: 'Receptive language not genetic' seems a clear favourite, with 'Geneticists at a loss to explain how we understand language' a close runner. Of course, this overlooks the difference between heritability estimates (which show the extent of the genetic effect without identifying the mechanism) and genomic analysis (which attempts to identify the underlying code)."

    Well, you know, I file that under the George Carlin doctrine: people are stupid. It doesn't hurt that there is a motivated segment of the population trying to make the case that heredity doesn't matter and hence and latch to such findings like a baby to his mom's nipple. ;)

    ReplyDelete
  2. Entertaining comedian. Whatever the motivation, there is a confusion between heritability estimates and genomic variance accounted for, so I am plugging away at explaining the difference

    ReplyDelete