Psychological comments: Comparisons are onerous N=1,000,000

Wednesday, 20 August 2014

Comparisons are onerous N=1,000,000

Can you remember back in ancient history when school exam questions said: “Compare and contrast”? I found this philosophically interesting, in that I was tempted to compare and contrast the epistemological foundations of comparing and contrasting. More to point, can you remember back to your undergraduate days when you learnt that each contrast and comparison used up some of your luck? I have put this in a dramatic and personal form to capture the dismay I felt when I understood that at least one of the positive t test results I had so painfully calculated was probably a fluke. I decided it was always the twentieth one which had led me astray, the early ones having first mover advantage in capturing the explanatory narrative, and becoming cherished for ever after, the first-born causes.

The problems of multiple contrasts arise in any even mildly complicated data set. Consider a test with 100 items in which you choose to compare each item with each other item in a t test. Doing multiple comparisons will throw up many spurious results, and you won’t know which is false positive and which is true.

Now consider a test with 1000 items. Multiple comparison will create a large number of errors of identification. There are ways of correcting for these multiple comparisons and contrasts, but they are always something of a patch and fix. The better strategy is to increase sample size.

The genome has a very large number of “scores” of interest, some more obvious to identify and measure than others. Deciding what is score and what is junk is not a trivial matter. Finding false positives is easy, finding true positives which replicate much harder. James Lee from the University of Minnesota told me in 2009 that his preliminary estimate of the sample sizes suggested that 100,000 was a likely starting point for dependable results, but that it could be higher. A few years is a long time in genomic analysis but now Steve Hsu has been thinking about this, and has published his conclusions, naming James Lee as one of the researchers whose work has influenced him.

http://arxiv.org/pdf/1408.3421v1.pdf

I describe some unpublished results concerning the genetic architecture of height and cognitive ability, which suggest that roughly 10k moderately rare causal variants of mostly negative effect are responsible for normal population variation. Using results from Compressed Sensing (L1-penalized regression), I estimate the statistical power required to characterize both linear and nonlinear models for quantitative traits. The main unknown parameter s (sparsity) is the number of loci which account for the bulk of the genetic variation. The required sample size is of order 100s, or roughly a million in the case of cognitive ability.

The paper is attractive for covering the background to the genetics of intelligence in a clear and succinct format. Steve Hsu talks about the reduced cost of sequencing the genome, which is speeding up research; the heritability of intelligence; the Flynn effect; exceptional intelligence; and additive genetic models.

One might say that to first approximation, Biology = linear combinations of nonlinear gadgets, and most of the variation between individuals is in the (linear) way gadgets are combined, rather than in the realization of different gadgets in different individuals.

I like the word gadgets. That is the sort of genetics I understand. Alleles be damned.

Pairs of individuals who were both below average in stature or cognitive ability tended to have more SNP changes between them than pairs who were both above average. This result supports the assumption that the minor allele (–) tends to reduce
the phenotype value. In a toy model with, e.g., p = 0:1;N = 10k, an individual with average phenotype would have 9k (+) variants and 1k (–) variants. A below average (-3 SD) person might instead have 1100 (–) variants, and an above average individual (+3 SD) 900 (–) variants. The typical SNP distance between genotypes with 1100 (–) variants is larger than that for genotypes with 900 (–) variants, as there are many places to place the (–) alleles in a list of 10k total causal variants. Two randomly chosen individuals will generally not overlap much in the positions of their (–) variants, so each additional (–) variant tends to increase the distance between them.

The content of the basic calculation as to how much any species can be improved underlies the work of animal and plant breeders. As leading population geneticist James Crow of Wisconsin wrote [14]:

The most extensive selection experiment, at least the one that has continued for the longest time, is the selection for oil and protein content in maize (Dudley 2007). These experiments began near the end of the nineteenth century and still continue; there are now more than 100 generations of selection. Remarkably, selection for high oil content and similarly, but less strikingly, selection for high protein, continue to make progress. There seems to be no diminishing of selectable variance in the population. The effect of selection is enormous: the difference in oil content between the high and low selected strains is some 32 times the original standard deviation.

Hsu’s point is to show that as regards intelligence, humans have not reached their upper limit.

His section on compressed sensing is interesting, but I cannot judge it, so leave that to you, dear reader. However, Hsu is clear that a sample size of a million persons will be required. On the upside, that should lead to genetic predictions of IQ accurate to about 8 IQ points. It would also lead to parents being able to choose the brightest of their fertilized eggs. Interesting times.

From the purely scientific perspective, the elucidation of the genetic architecture of intelligence is a first step towards unlocking the secrets of the brain and, indeed, of what makes humans unique among all life on earth.

15 comments:

Anonymous21 August 2014 at 05:21
The content of the basic calculation as to how much any species can be improved underlies the work of animal and plant breeders.

This doesn't apply to humans or anything other than breeds. Although humans as a species are homogeneous compared to other species (chimps separated by a river are less alike than abos and swedes), domesticated plants and animals are even more homogeneous. And so are their environments.

h^2 is meaningless outside a narrow range of both genomes and environments. Within a narrow range it is just a linear approximation to the GxE to P surface.

Herr professor doktor Hsu is very confused.
ReplyDelete
Replies
Anonymous21 August 2014 at 05:29
It's been two years and I haven't gotten anything from these BGI folks.

Steve had a blog entry "History will remember their names." Meaning he had data. Soon afterward he posted on requiring 1 m people. Obviously, their data was useless.

It's the same old story. Not a single allele has passed the test of reproducibility when it comes to an effect on behavior.

If Hsu gets his million from one city or one developed country he may be able to explain 20-30% of the variance. If his sample is truly random he'll be able to explain none of it.

The following article explains why MZA studies are meaningless. Namely, "apart" is a bad variable. The right variable is "apart-ness" which can be quanitfied. Of course, MZT/DZT and GCTA studies are meaningless, because they make simplifying assumptions, despite what you may have heard.

http://www.nytimes.com/1981/03/01/books/nature-vs-nurture-a-natural-experiment.html
ReplyDelete
Replies
Unknown21 August 2014 at 09:45
Since 1981 there have been many other studies of twins, with much larger sample sizes, showing a general relationship between consanguinity and intellect. This takes in more general form of relatedness, such as being somewhat related to a group, as opposed to totally unrelated. Also, the estimates about genetic similarity of humans in general (their homogeneity) depends on including every base pair and giving it equal weight. There is not a one to one relationship between alleles and behaviour. Finally, nothing has come out of the the BGI study so far, but I would rather they took time to avoid false positives, and am willing to wait.
ReplyDelete
Replies
Anonymous21 August 2014 at 16:18
to anonymous:
biology DOES apply to humans.
do you think the brain is the only organ to which heredity does not apply?

also, humans have MORE genetic (& phenotypic) diversity than chimps.
in fact, the only mammal with more "diversity" than humans is/are dogs.
ReplyDelete
Replies
Anonymous21 August 2014 at 16:50
I read just the opposite. I read that humans are less diversified than other mammals and that any eugenic large-scale experiment would lead to a further reduction of our genetic diversity.

Santoculto
ReplyDelete
Replies
Anonymous22 August 2014 at 17:00
no, humans are more variable on most characteristics by many standard deviations. e.g., height variation - pygmies vs. bantus or masia = about 6 standard deviations. humans are more variable on physical (& mental) characteristics than most species!
ReplyDelete
Replies
Anonymous23 August 2014 at 06:29
Twin and family studies are meaningless not only because adoptive homes have small environmental variance, but also because the environments of adopted siblings (twins or not) are correlated.

The bottom line of the balance sheet:

Psychologists and behavior genetics researchers especially are incompetent boobs and bores. They're dumb.
ReplyDelete
Replies

Add comment