Wednesday 27 April 2016

Genetics of racial differences in intelligence


As far as I know, nobody is funding studies of the genetics of racial differences in intelligence. Although research is being carried out on the genetics of intelligence generally, and the genetics of different racial groups generally, for some reason nobody makes the link.

An exception is Davide Piffer, who as far back as 2014 suggested a possible approach: find any of the genetic variants associated with intelligence, however weak and inconsistent they may be, and then look up the published literature to see how frequent those variants are in any racial group. If there are many such positive variants in a group they will be bright, and if there are fewer such positive variants they will be less bright.

Here is the first account I gave of Piffer’s work in 2014:

So far, this post has drawn no comments. However, it might turn out to be a significant step forwards.

The next account was in 2015 showing the pattern based on 9 GWAS hits:

Now, in the wake of the most recent publication by Davies 2016 which I covered in my last post

Davide Piffer has taken the data from that very paper in order to extend his work on racial differences.

Recent polygenic selection for educational attainment

The genetic variants identified by two large GWAS of educational attainment were used to test a polygenic selection model.

Average frequencies of alleles with positive (Beta) effect on the phenotype (polygenic scores) were compared across populations and racial groups using data from 1000 Genomes and ALFRED. Strong correlations between polygenic scores and population IQ were found (r>0.8). Moreover, the polygenic score obtained from the two independent GWAS exhibited a strong correlation (r= 0.83), even after pruning for linkage disequilibrium.

Factor analysis revealed that most alleles loaded on a single factor, which in turn was strongly correlated to population IQ.

Polygenic and factor scores survived control for phylogenetic autocorrelation, although the latter’s net effect on population was stronger (Betas= 0.361 and 0.861, respectively).

Results obtained from ALFRED data were similar and revealed a peak in polygenic and factor scores among East Asians (60.8% and 1.06, respectively) and a nadir among Africans and Native Americans (44.1% and 0.493).

Geographic distance from Eastern Africa (assuming an origin of modern humans there) was only weakly predictive of factor and polygenic scores (r= 0.21-0.29).

The aim of this study is to replicate the studies by Piffer (2015, 2013) that educational attainment and cognition GWAS hits have different frequencies across populations and thus, were subject to different selection pressures. To this end, the hits from the latest GWAS on educational attainment (Davies et al., 2016) will be used in the analysis. This GWAS was carried out using the UK Biobank sample (N=100K+). Over a thousand SNPs reached genome-wide significance (P< 5 x 10-8), but after controlling for linkage disequilibrium (Genotypes were LD pruned using clumping to obtain SNPs in linkage equilibrium with an r2<0.25 within a 200 bp window), a few independent signals were identified (Davies et al., 2016).

The boxplot below shows the major continental groups as derived from the 1000 genomes data.




Education attainmentP.S, I.S.(Davies et al., 2016). N=14

All_Ed_Att_2016. N=942

PS All Ind. (N=16)

Factor All Ind.








US Blacks






Bengali Bangladesh






Chinese Dai






Utah Whites






Chinese, Bejing






Chinese, South












Esan, Nigeria












British, GB






Gujarati Indian, Tx












Iberian, Spain






Indian Telegu, UK


















Luhya, Kenya






Mende, Sierra Leone






Mexican in L.A.






Peruvian, Lima






Punjabi, Pakistan






Puerto Rican






Sri Lankan, UK






Toscani, Italy






Yoruba, Nigeria








The analysis of independent signals from two different GWAS revealed a significant overlap across two genomic datasets. Using ALFRED and 1000 Genomes, the Rietveld et al. (2013) and Davies et al. (2016) polygenic scores were strongly correlated (r= 0.62 and 0.83, respectively). Both sets of GWAS hits were strong predictors of population IQ. The polygenic score (N=14) computed from the new independent hits (Davies et al., 2016) had a strong correlation to population IQ (r= 0.82). Similar correlation was observed for the polygenic score created by combining all the independent hits (free of LD) from the two publications (N=16): r=0.843 with population IQ.

Factor analysis produced a factor that even more strongly correlated to population IQ (r= 0.89) and survived control for spatial autocorrelation. Indeed, the predictive value of this factor was not affected by partialling out Fst distances. The high Beta value (B=0.82) and the null effect of Fst distances (B= -0.16) are suggestive of polygenic selection on these SNPs, independent of noise due to migrations or drift.

Comparisons of mean frequencies across racial groups via one-way ANOVA produced either non significant or marginally significant results, but the addition of new GWAS hits is needed to provide a definitive picture.

A limitation of this study is the reliance on GWAS hits for a complex phenotype such as educational attainment, which shares the majority of additive genetic variation with general intelligence, but also other personality and health-related traits (Krapohl et al., 2014 and 2015).

Another more obvious limitation is the small number of (independent) SNPs used for this analysis. More GWAS of intelligence or educational attainment are needed to shed light on worldwide patterns of polygenic selection on cognitive abilities.

As the author says, this can only be considered a first step. However, the method has the merit of simplicity: if some variations in the genetic code are associated with intelligence, then groups that have more of those variations ought to be more intelligent. If they are not, then the link between these variants and intelligence can be called into question. Of course, it is possible that these are not the most important variants, and that they differ between racial groups for trivial reasons. If so, then the observed associations are an unusual coincidence. I think this is a method to watch. When even more genetic signals of intelligence are identified, however weak and tentative, this approach can be put to the test, and then improved or discarded.



Monday 25 April 2016

Genetics of mental ability: greater power

I distinctly remember hearing from a colleague in July 2011 that earlier that afternoon Ian Deary had presented a paper in London which claimed that 1% of intelligence could be explained by genomic analysis. Although both he and I were excited by this result, the general reaction was one of respectful scepticism. A proven link between genes and intelligence had never been achieved before, so the result was a 100% improvement on all that had gone before. Welcome as it was, it seemed too good to be true, and if confirmed, worthy of a prize.

Davies et al. (2011) Genome-wide association studies establish that human intelligence is highly heritable and polygenic.

General intelligence is an important human quantitative trait that accounts for much of the variation in diverse cognitive abilities. Individual differences in intelligence are strongly associated with many important life outcomes, including educational and occupational attainments, income, health and lifespan. Data from twin and family studies are consistent with a high heritability of intelligence, but this inference has been controversial. We conducted a genome-wide analysis of 3511 unrelated adults with data on 549 692 single nucleotide polymorphisms (SNPs) and detailed phenotypes on cognitive traits. We estimate that 40% of the variation in crystallized-type intelligence and 51% of the variation in fluid-type intelligence between individuals is accounted for by linkage disequilibrium between genotyped common SNP markers and unknown causal variants. These estimates provide lower bounds for the narrow-sense heritability of the traits. We partitioned genetic variation on individual chromosomes and found that, on average, longer chromosomes explain more variation. Finally, using just SNP data we predicted ∼1% of the variance of crystallized and fluid cognitive phenotypes in an independent sample (P=0.009 and 0.028, respectively). Our results unequivocally confirm that a substantial proportion of individual differences in human intelligence is due to genetic variation, and are consistent with many genes of small effects underlying the additive genetic influences on intelligence.

These gene hunters had gone many steps better than most psychologists. Psychology gets published with sample sizes of about 80, but gene hunters habitually report on 100,000 people. Even more important, instead of just reporting their results on the main sample, clocking up a publication, and then leaving replication to others while they bask in glory, they always take the sobering step of moving from the “sample of discovery” to the “sample of testing”. Two papers for the price of one: a harsh reality check, and best practice.

So, as regards the 2011 paper, the conventional way of reporting it in psychology would have been “genetics explains 40% to 51% of intelligence”. It is only when one moves from the sample of discovery to the sample of testing that it turns out that 1% can be explained in new samples, which is the acid test of having found a real relationship in nature.

What does the picture look like almost 5 years later?

Genome-wide association study of cognitive functions and educational attainment in UK Biobank (N=112151)

G Davies, RE Marioni, DC Liewald, WD Hill, SP Hagenaars, SE Harris, SJ Ritchie, M Luciano, C Fawns-Ritchie, D Lyal, B Cullen, SRCox, C Hayward, DJ Porteous, J Evans, AM McIntosh, J Gallacher, N Craddock, JP Pell, DJ Smith, CR Gale and IJ Deary. Molecular Psychiatry (2016) 1-10.

People’s differences in cognitive functions are partly heritable and are associated with important life outcomes. Previous genome-wide association (GWA) studies of cognitive functions have found evidence for polygenic effects yet, to date, there are few replicated genetic associations. Here we use data from the UK Biobank sample to investigate the genetic contributions to variation in tests of three cognitive functions and in educational attainment. GWA analyses were performed for verbal–numerical reasoning (N= 36 035), memory (N= 112 067), reaction time (N= 111 483) and for the attainment of a college or a university degree (N= 111 114). We report genome-wide significant single-nucleotide polymorphism (SNP)-based associations in 20 genomic regions, and significant gene-based findings in 46 regions. These include findings in the ATXN2, CYP2DG, APBA1 and CADM2 genes. We report replication of these hits in published GWA studies of cognitive function, educational attainment and childhood intelligence. There is also replication, in UK Biobank, of SNP hits reported previously in GWA studies of educational attainment and cognitive function. GCTA-GREML analyses, using common SNPs (minor allele frequency 0.01), indicated significant SNP-based heritabilities of 31% (s.e.m. = 1.8%) for verbal–numerical reasoning, 5% (s.e.m. = 0.6%) for memory, 11% (s.e.m. = 0.6%) for reaction time and 21% (s.e.m. = 0.6%) for educational attainment. Polygenic score analyses indicate that up to 5% of the variance in cognitive test scores can be predicted in an independent cohort. The genomic regions identified include several novel loci, some of which have been associated with intracranial volume, neurodegeneration, Alzheimer’s disease and schizophrenia.

Molecular Psychiatry

advance online publication, 5 April 2016; doi:10.1038/mp.2016.45


Cognitive functions have important roles in human mental and physical well-being. Better cognitive function in youth is associated with lower risk of some psychiatric disorders and physical illness later in the life course, and with reduced mortality risk. The reverse is also true; some mental and physical illnesses are associated with a lowering of cognitive capabilities in youth and over the life course. Higher cognitive ability in youth is associated also with higher educational attainment and adult social position. Domains of cognitive functioning differ in their associations with ageing; some have trajectories of decline (for example, processing speed and some types of memory), whereas others (for example, knowledge-based tests) hold their levels better over the adult life course. Therefore, it is important to understand the causes of people’s differences in cognitive functions.

One source of cognitive differences is genetic variation. Cognitive functions have a substantial heritability. This has been found by using twin and family studies, and by molecular genetic methods, such as Genome-wide Complex Trait Analysis (GCTA-GREML), which estimates heritability based on common single-nucleotide polymorphisms (SNPs).

Some explanation is required regarding cognitive phenotypes.All tests of cognitive ability correlate positively, though not perfectly; that is, people who do well on one type of cognitive test tend to do well on the others. It is this regularity that is the basis for the construct of general cognitive ability, which is usually abbreviated to g. There are also separable domains of cognitive functioning. Differences in individual cognitive test score performances may be due to: (1) differences in general cognitive function described by the variance shared by all cognitive domains; (2) differences in test performance specific to a cognitive domain; and (3) differences specific to a particular test.

Twin and SNP-based GCTA-GREML studies have found that there is substantial heritability for general cognitive function, and also some heritability for cognitive domains and specific cognitive skills. They also find that there are significant genetic correlations among tests of different cognitive domains, and between cognitive abilities and education, which also shows substantial heritability.

Genome-wide association studies (GWAS) of cognitive functions have been successful in estimating SNP-based heritability, and in using summary GWAS data to make predictions of cognitive phenotypes in independent samples.However, they have been less successful in identifying the specific genetic variants that cause cognitive differences. The largest studies to date have been the CHARGE-Cognitive Working Groups studies and those on educational phenotypes by the Social Science Genetics Association Consortium. In a study of 53 949 individuals with data on general cognitive function, there were three genome-wide significant hits in three genomic regions, with the closest genes being APOE/TOMM40, AKAP6 and MIR2113.

In a study of 32 070 individuals with data on processing speed (mostly digit-symbol substitution-type tests) there was one genome-wide significant hit, near CADM2. In a study of 29 076 individuals with data on verbal declarative memory there were three genome-wide significant hits, near APOE and genes associated with immune response.

The present study directly addresses the limitations of previous molecular genetic studies of cognitive functions. It presents genome-wide association analyses of reasoning, processing speed, declarative memory, and educational attainment in the UK Biobank sample. The number of subjects is over 100 000 for most analyses. All participants took the same cognitive tests with the same instructions. All participants included in the current analysis were of white British ancestry. Genotyping was also standardised across the same arrays and QC procedures. The study addresses three important cognitive domains and educational attainment in a single report. These advantages are likely contributors to the relative success in finding many new genetic variants associated with cognitive functions.

Cognitive assessment .Verbal–numerical reasoning. Verbal–numerical reasoning was measured using a 13-item test presented on a touchscreen computer. The test included six verbal and seven numerical questions, all with multiple-choice answers, and had a time limit of two minutes in total. An example verbal item is: ‘If Truda’s mother’s brother is Tim’s sister’s father, what relation is Truda to Tim?’ (possible answers:  aunt/sister/niece/cousin/no relation/do not know/prefer not to answer). An example numerical item is: ‘If
60 is more than half of 75, multiply 23 by 3. If not subtract 15 from 85’ (possible answers: ‘68/69/70/71/72/do not know/prefer not to answer’). The verbal–numerical reasoning score was the total score out of 13. The Cronbach α-coefficient for the 13 items was 0.62.

Here is the variance explained for each of the main cognitive variables:

GWAS and variance explained

To my eye the memory test isn’t reliable, and only the verbal-numerical intelligence test is up to standard.

The most important novel contribution of the present study is the discovery of many new genome-wide significant genetic variants associated with reasoning ability, cognitive processing speed and the attainment of a college or
university degree. The study provided robust estimates of the
SNP-based heritability of the four cognitive variables and their
genetic correlations. The study makes important steps toward
genetic consilience, because several of the genomic regions
identified by the present analyses have previously been associated in GWASs of general cognitive function, executive function, educational attainment, intracranial volume, neurodegenerative disorders and Alzheimer’s disease. The study was successful in using the GWAS results from UK Biobank to predict cognitive variation in new samples.

The SNP-based estimate of heritability for verbal–numerical
reasoning (31%) was highly consistent with previous estimates
based on a general cognitive ability phenotype that had been
composed using three or more diverse cognitive tests.

Using the summary GWAS data from the present study to predict cognitive variation in independent samples (Supplementary Table S4) produced the largest R2 values in this field to date, with sometimes over 5% of variance explained, especially in the more crystallized cognitive functions such as vocabulary. Previously, values of 1 to 2% have been typical.

First, general cognitive ability, or strong indicators of it, tend to be more heritable than specific cognitive functions such as processing speed and memory.8,10,12,68 Second, tests of verbal ability and reasoning are among those tests that
have higher loadings on the latent trait of general cognitive
ability, and tests of memory and processing speed have lower
loadings.15,69–71 Third, the RT and memory tests in UK Biobank were handicapped further by being very brief. The RT test included a far smaller number of trials than is typical for large surveys in the UK, which have used 40 trials in choice RT
procedures.72,73 The memory test was based on the recall of a
single 12-item matrix with six pairs of stimuli. This is both a brief and unusual type of test in the field of declarative memory; more is known about the psychometric characteristics and genetic foundations of declarative memory tests such as word list and paragraph recall. The test–retest correlation of the memory variable was particularly low (r = 0.15).

This accumulating evidence is consistent with the interpretation that, to some extent, educational attainments are a product of genetic contributions to cognitive ability, but with two emphatic qualifications. First, it is obvious that there are other—especially social—determinants of whether or not people achieve certain educational outcomes. Second, there is evidence that the variation in educational attainments that is caused by genetic differences is shared with traits other than intelligence, such as personality dimensions. Therefore, we predict that not all of the genome-wide significant hits associated with the attainment of a college or university degree in the present study will be associated with cognitive differences; some might be associated with personality and other heritable, educational relevant traits.



This is a strange Table to look at from the point of view of a clinical psychologist. Many of these tests are very familiar to me, and are part of the bread and butter of ability testing, but here are data I never expected to see, showing specific genome-ability links. It is worthy of its own T shirt.

This is a very important paper. It had doubled the amount of variance in mental ability which can be explained by analysis of the genome. Of course we hope for even greater power. UK Biobank data for 500,000 persons will soon become available, which will allow the detection of genetic signals with higher precision.

Although the fact that mental ability is heritable may seem news to some people today, heritability was generally understood by any farmworker in the 19th Century. They knew that, broadly speaking, characteristics in animals and people were inherited, even though they did not know at the genomic level precisely how that was achieved. As people drifted off the farm that natural observation was lost, and as schooling become more available many correctly concluded that education was important, and some incorrectly that all differences between one person and another could be annulled by administering even more education. My experience of psychology was that nurture was given more attention than nature. Mainstream psychology recognised twin studies, but there was an implication that the findings applied to twins, and there weren’t many of them, so the implications were minor. It was always slightly surprising when genetics was suggested as a cause of human differences. The late psychometrician Prof Paul Kline gave a lecture at the University of Exeter in which, with characteristic brio, he announced that almost all winners of the Grand National were descended from a single horse, which he named. I am not a gambling man, so I did not record the name for betting purposes. A lamentable error. Bloodlines have their uses, and I am betting on UK Biobank coming up a winner again and again.




Thursday 21 April 2016

Estonia, the abstract

For those of us with refined, and possibly even nerdish, tastes, Estonia is in the news. Not one but now two papers have been NIT-picking through the fine detail of the items used in the national intelligence test, and coming to conclusions about the contribution to the rise in intelligence test scores made by the capacity, or strategy, of considering higher levels of abstraction. One might even begin to posit a dimension: as children develop and cultures evolve, they move from specific instances to general principles. Perhaps the same is true of nations. They arise out of sheer necessity, a wall defensive against the outside world, where protection is offered at the cost of homage and tribute; then flourish as a sceptr’d  isle which breeds a happy band of citizens who develop refined abstract thoughts; and then collapse when challenged by other tribes driven by sheer necessity. The rise and fall of civilisations may not be due to climate, illness, weapons, harvests, religion or politics, but simply to the inevitable drift from concentrating on pressing instances to formulating general abstractions.

Olev Must ⁎, Aasa Must, Jaan Mikk

Predicting the Flynn Effect through word abstractness : Results from the
National Intelligence Tests support Flynn's explanation. Intelligence 57 (2016) 7–14

The current study investigates the Flynn Effect (FE) and its relation to abstract thinking ability.We compare two cohorts of Estonian students (1933/36, n=888; 2006, n=912) using the Concepts (Logical Selection) subtest of the Estonian adaptation of the National Intelligence Tests (NIT). The item presentation order of the subtest correlates with the abstractness of the words used in the items (r = .609) of the subtest. The different test results (right, wrong and missing answers) were analysed in order to make an estimate of the FE magnitude. The FE for abstract thinking ability of those samples was 1.06 Hedges' g (adjusted for guessing). The magnitude of the FE is dependent upon the degree of difficulty of the items (an item's difficulty is estimated by determining its abstractness and its familiarity to students). The more difficult part of the subtest (the second half) showed a FE=1.80 whereas the easier part (the first half) of the subtest showed a FE=.72.Word abstractness was a strong predictor of all the testing results in both cohorts (Beta=.700). The familiarity of words used in the test items has no correlation with the test results if word abstractness is controlled in both cohorts. Our findings support Flynn's
explanation that the FE is primarily an indicator of the rise in abstract thinking ability.

The essence of this paper is the close analysis of item responses, which is as close to the real data as is possible to get. (It would be good to have concurrent fMRI brain scans). Jim Flynn proposed that a large part of the Flynn effect was an increase in abstract thinking. In olden times, a candidate asked to say what was similar about a man and a dog would reply “A man hunts with a dog” and might possibly from that say that both were hunters. In recent times candidates would be more likely to understand that both were animals.

The older sample (1933/36; N = 888) consists of students from grades 4 to 6, whose mean age was 13.3 (SD = 1.24) years. The more recent sample (2006, N = 912) consists of students from grades 6 to 8 with a mean age of 13.5 (SD = .93) years.

Looking at the test words they measured word familiarity, word abstractness, and the responses, including the missing responses to items. I am rushing ahead, but since this paper deals with the same data as described in the previous post, I am going straight to the results.

Abtractness and familiarity


The data set has been very well studied. Incidentally, it supports Chris Brand’s hypothesis that part of the Flynn effect is that modern kids are willing to guess, whereas students in the past were more cautious, well behaved and even deferential. Modern kids are willing to chance their arm, and thus pick up some extra points (unless the marking system specifically penalises wrong answers). In a nutshell, modern students will make good psychologists but bad engineers. Never let children guess things that might fall down and crush them.

Over 7 decades the scores on each item of a test of abstractness have risen from 0.48 to 0.6 which is a sizeable improvement.

The other finding is that intelligence testing is a game of two halves.

The first half (12 items) makes use of words that have low levels of abstractness and are simple and familiar to students. The second half of the subtest (also 12 items) uses less familiar and more abstract words. In the NIT data the highest magnitudes of the FE are evident in the more difficult part of the A3, although the probability of guessing is also higher as well. If we were to use the previous algorithm to calculate the FE, then in the more difficult section would be FE = 2 * .85 + .61–.51 = 1.80. The data does not support the hypothesis
that the younger cohort guessed more often than the older one. In the easier part of the A3, the FE=2 * .16 + .09 + .31 =.72. It can be concluded that the FE may be more than 2 times greater (1.80) for items where the words are abstract and less familiar, than it is for items that are less abstract and easier to understand (.72).

Interesting that this comparison of the first half against the second half makes the point about the rise being due to increasing abstractness so succinctly.

The abstractness of the items strongly predicts the answer patterns of the A3 (the abstractness test) Regression analysis shown in Table 2 indicates a clear result pattern.
If the presentation order of the items in the subtest is not taken into account, then it is word abstractness that becomes the main predictor of test scores for the A3. Right,wrong and missing answers can be predicted through word abstractness. The Beta of word abstractness ranged from −.682 (predicting 2 right answers in the 2006 cohort) to .775 (predicting 1 wrong answer in the 2006 cohort). It is easier to receive more points from easier items, and subsequently the test-takers more frequently received most of their points from the less-abstract items. This is the reason for the minus sign. This negative relationship in predicting 2 right answers was clear in the 1933/36 data as well as in the 2006 data (Beta accordingly −.638 and −.682). The situation changes however, when attempting to predict the one point score.

Altogether: the 2006 cohort was clearly more able to solve more subtest items (according to the 2-point criteria), while at the same time they took more risks with the highly abstract items. This strategy gave them an advantage over the older cohort.

Our results are in accordance with Flynn's explanation that the Flynn Effect is a demonstration of the rise in abstract thinking ability over time. The results also accord with Amstrong et al. (2016) empirical description of the abstractness vector of the NIT. The rise of abstract thinking
also corresponds with Flynn's explanation that the teaching process at school develops this ability. It should be kept in mind that in the present study the younger cohort had been educated for 2 years more than the older. Thus it can be concluded, that the A3 was relatively easy for the younger cohort, which is why the estimation for guessing in test taking
behaviour is relatively low.

Our findings also support Vygotsky's (1934/1987) theory of cultural–historical development wherein Vygotsky made a distinction between everyday (spontaneous) concepts that emerge from everyday experience, and scientific concepts that are taught in school. The acquisition of scientific concepts corresponds to the rise in higher-level thinking and facilitates the development of abstract thinking abilities.
According to Vygotsky's theory, child development is the acquisition of scientific concepts and the concomitant adoption of an abstract decontextualized systematic way of thinking. In this sense, the two additional years of schooling could have offered a significant advantage in abstract thinking ability for the 2006 cohort in comparison with the
1933/36 cohort.

Please read this paper together with the Armstrong paper I commented on in my previous post.

Final point: never throw away old data. Make sure the results are stored in a dry basement and carefully dusted at least once a decade.

Monday 11 April 2016

Instantiation and abstraction

What makes problems difficult? Indeed, what ever made events turn into problems? Usually, I assume, specific instances had to be confronted and dealt with: an approaching predator, an escaping prey, an edible nut that had to be opened. In all these instances a solution is required to a pressing problem. In time, some general principles may be discerned: perhaps those were discussed in campfire stories, or formed the preparation rituals of hunter-gatherers. Perhaps, more likely, people made them up as they went along.

Intelligence test items are very specific instances of problems. They are chosen to be unfamiliar, so that the test is always a real test, and not the exercise of specifically trained skills. Tests have to be kept secret, and defended from cheats. Tests only test problem-solving when the correct solutions are not known to the test taker. Tests must not only be unfamiliar, but preferably easily explained by using familiar concepts, ones known to almost everybody in that culture, or in any culture. Things can get bigger and smaller, go in front of or behind other objects, increase or reduce in number: that sort of thing. The mental habits of our species, as depicted on pottery, funerary objects, sculpture, buildings, jewellery, ornaments and dress. The all of it, as they say in Oxfordshire.

So, when a specific problem arises one examines the instance, and then attempts a solution. Is it helpful to be able to abstract general principles? Does abstraction assist, or is it better to concentrate on the individual task?

Into this hall of mirrors step an international gang to bring us some jewels from Estonia, a Finnic country with high income and living standards, which has the additional benefit of having done proper intelligence testing in 1933, and has the results item by item. These can then be compared with the data for 2006, item by item. And what a gang: they come from the US, Korea, Brazil, Germany, Belgium and of course Estonia.

Elijah L. Armstrong, Jan te Nijenhuis, Michael A.Woodley of Menie, Heitor B.F. Fernandes, Olev Must, Aasa Must. A NIT-picking analysis: Abstractness dependence of subtests correlated to their Flynn effect magnitudes. Intelligence 57 (2016)

We examine the association between the strength of the Flynn effect in Estonia and highly convergent panel ratings of the ‘abstractness’ of nine subtests on the National Intelligence Test, in order to test the theory that the Flynn effect results in part from an increase in the use of abstract reference frames in solving cognitive problems. The vectors of abstractness ratings and Flynn effect gains, controlled for guessing) exhibit a near zero correlation (r = −.02); however, abstractness correlates positively with (and is therefore confounded by) g-loadings (r = .61). A General Linear Model is used to determine the degree to which the abstractness vector predicts the Flynn effect vector, independently of subtest g-loadings and the portion of the secular IQ gain due to guessing (the Brand effect).  Consistent with the abstract reasoning model of the Flynn effect, abstractness positively predicts Flynn effect magnitudes, once controlled for confounds (sr=.44), which indicates an increasing tendency to utilize factors external to the items in order to abstract their solutions.

Flynn effects were derived from the difference in scores between 1933/36 and 2006 administrations of the National Intelligence Test to samples of Estonian schoolchildren (N = 890 for the older sample, 913 for the more recent sample). The Method of Correlated Vectors (MCV) was utilized to determine the effect of abstractness on the Flynn effect independent of both subtest g loadings and the Brand effect — or the portion of the secular gain in IQ that is due purely to the results of guessing.

Their method has been to get raters to assess items for abstractness.

The 28 raters used to obtain the abstract thinking dependencies for each subtest were classified into the following categories: non-professionals (without degrees in psychology), graduate students, or professionals (N=10 non professionals, 5 graduate students, 13 professionals). Each rater rated the abstract thinking dependency of each subtest on a scale from 0–100, using a text vignette defining abstract thinking (Supplement 1) as a rating criterion. The text gave examples of three hypothetical test items heavily dependent on abstract thinking; one was drawn from Luria (1976), one from Flynn (2009), and one from Flynn (2012) in a discussion of Fox and Mitchum (2013). The raters used Form 2 of the British National Intelligence Test to rate the abstract thinking dependence of each subtest.

Here are the main results in one table:

Abstractedness and Flynn effect

To nit-pick, they should have listed these by level of abstractness for ease of reading, but Analogies (76)  are almost twice as abstract as all the other tests. Synonym-Antonym are next (38) and then Vocabulary (35). The most concrete test is Comparisons (22). This provides a useful metric with which to consider what makes items difficult.

The correlation between the size of the Flynn effect on a subtest (corrected for guessing) and its level of rated abstractness is −.02, or virtually zero. A large negative correlation exists between the guessing-corrected Flynn effect and g loadings (−.55), and a large positive correlation exists between g loadings and abstractness (.61). The Brand effect and g loadings correlate strongly and positively (.8). Modest magnitude correlations exist between abstractness and the Brand effect gains (.32), and between the Brand effect and corrected Flynn effect gains (−.42). None of the effect sizes are significant; however, null hypothesis significance testing is not appropriate for evaluating the substantiveness of these results, as the N is extremely small (9 subtests).More attention should be paid to both the magnitude
of the effects, which range from small to large in magnitude (Cohen, 1988), and to the degree to which the directionality of the effects are consistent with explicit theoretical expectations.

It can be seen that abstractness now becomes a strong positive predictor of Flynn effect magnitude (r = .44), once controlled for the Brand effect and subtest g-loadings. Thus, our analysis supports the contention that abstract thinking may causally contributes to the Flynn effect. g loadings do not change as a predictor of the Flynn effect once controlled for abstractness and the Brand effect. The Brand effect residual becomes a mildly positive predictor of the Flynn effect (−
.42 to .19). At the suggestion of a reviewer,we reran the analysis excluding subtest B4 as an outlier in terms of abstractness. The recalculated effects, included in Table 2, indicate that abstractness was greatly attenuated in effect size as a predictor of the Flynn effect, but the direction of correlations did not change.

Flynn effect abstractness anova

The authors run through a list of possible issues regarding this work, but to my mind their main thesis stands. Jim Flynn was probably right that level of abstraction is part of the cause of the secular rise in intelligence test scores, without their being any notable commensurate rise in actual intelligence. It would appear that schoolchildren have learned an intellectual trick which helps them leapfrog from instances to general rules.

This is an important paper, which brings us closer to understanding the Flynn effect, and the nature of intelligence test items.

Thursday 7 April 2016

Book reviewing by Twitter

In the spirit of pre-registering a research project so that abject failure is known to all, and no negative finding is denied publication, I am hereby making open admission of my intention to summarise Robert Tomb’s The English and Their History in a series of tweets. I aim to distil the essence, picking and polishing his pearls of wisdom, and modestly imagine I may thereby launch a new literary form, a contemporary Reader’s Digest on amphetamines for hard pressed, time-poor citizens.

On the plus side, I incline towards aphorism and have a long schooling in the arts of précis. Many school evenings were dedicated to learning the arts of brevity. On the negative, or perhaps simply realistic side, the text in question is a very closely printed 2012 pages long, subtle and measured in its evaluation of historical events and their interpretation. It will take some time. I have also lost track of the beginning of my labours, so Twitter archaeologists are invited to help me aggregate individual tweets into a coherent consecutive order (a bit like the historian’s task).

My technique is to read several chapters, making pencilled notes in the margins, and then tweet comments on earlier chapters in tranquillity. I am only on page 314 with my reading, and have tweeted no further than page 196, a task which is still in progress today. Mostly I am summarising, and sometimes editing and compressing different paragraphs and sections to present the unifying thought. It is review by selection, a notice about what I have noticed. It is in the historical tradition of history being “what one age finds of note in another”.

On a humanitarian note, should you never hear from me again, please make discrete enquiries as to my well-being.

Monday 4 April 2016

Boiling off the non-believers: Henry Harpending, RIP

Henry Harpending has died. He was best known for two brilliant publications with Greg Cochran: the 2005 paper “The natural history of Ashkenazi intelligence” and the 2010 book “The 10,000 year explosion: How civilization accelerated human evolution”. Both had a common theme: given selection, groups could evolve in different ways, according to what they required or valued in human nature. Henry was a gentleman who was not browbeaten by the social and political pressures of the academic world, who thought clearly, followed insights to a logical conclusions, and who did all this with great civility.

In October 2014 ago he offered me the chance to comment on a first draft of his paper “Plain and Simple: On a Novel Feature of Amish Personality”. He proposed that those who did not fit in with the particular Amish lifestyle were “boiled off” in that they left the community, leaving the true believers to become more like themselves, a process I likened to making a consommé. He was typically kind about my efforts, saying that he agreed the Amish were probably being bred for civility. He said “Civility is not a bad gloss for what they are being bred for. Greg always points out that every society selects for something where that “something” is a direction in a high dimension space. Of course our press has recently butchered Nicholas Wade for pointing out just that in his recent book. ”

Looking at the post I see that he calculated the rate at which any group becomes more like they wish to be, by taking great care about whom they marry, and letting those who don’t wish to follow that life to leave the group. The example studied is of a particular religious group in America, the Amish, but selection for desired characteristics applies to all groups.

The next generation of children will have an average Amish Quotient of 0:10 standard deviations greater than their parents did before emigration. The process of selective emigration repeats so that the mean Amish Amish Quotient increases by one tenth of a standard deviation per generation. With 25 years per generation, “Amishness" will increase by a full standard deviation in 10 generations or 250 years. This is substantial social evolution on a time scale of a few centuries.

So, for the English to become one standard deviation more English, only 250 years are required.

In praise of a man bred for civility:

Friday 1 April 2016


Honesty is a core value, the ultimate test of deferred gratification. It has a quasi-religious quality to it, in that it requires a belief that someone will notice that you restrained yourself from the short-term convenience of lying: either an all-seeing deity or a body of right thinking honest persons into whose quiet precincts one gains admission. Honesty is not the favoured strategy of people in a hurry. At a larger level, honesty is measure of respect for society: although cheating provides personal advantage it debases the society in which the cheater lives: if he steals a bicycle then everyone must carry a bicycle lock thereafter. Understanding that particular cause and effect relationship requires a modicum of intellect, self-restraint and long-term thinking.

Simon Gächter & Jonathan F. Schulz. Intrinsic honesty and the prevalence of rule
violations across societies. Nature, Letter doi:10.1038/nature17160

Good institutions that limit cheating and rule violations, such as corruption, tax evasion and political fraud are crucial for prosperity and development. Yet, even very strong institutions cannot control all situations that may allow for cheating. Well-functioning societies also require the intrinsic honesty of citizens. Cultural characteristics, such as whether people see themselves as independent or part of a larger collective, that is, how individualist or collectivist a society is, might also influence the prevalence of rule violations due to differences in the perceived scope of moral responsibilities, which is larger in more individualist cultures.

If cheating is pervasive in society and goes often unpunished, then people might view dishonesty in certain everyday affairs as justifiable without jeopardising their self-concept of being honest. Experiencing frequent unfairness, an inevitable by-product of cheating, can also increase dishonesty. Economic systems, institutions and business cultures shape people’s ethical values, and can likewise impact individual honesty.

Unobserved in a cubicle, participants played a dice-rolling game for money. They were paid reasonable sums in their local currency according to how they said the dice fell, but had the opportunity to report better results than they actually obtained. Although the experimenters did not peer over their shoulders, the spirits of Pierre de Fermat, Blaise Pascal and Chevalier de Méré were watching, and honest persons were distinguishable after the event from those who cheated a bit (“justified dishonesty”) and those who cheated a lot (“full dishonesty”). For example, in this game throwing a six gained you nothing. How many in a national sample reported throwing a six?

Although individual dishonesty is not detectable, aggregate behaviour is informative. In an honest subject pool, all numbers occur with a probability of one-sixth and the average claim is 2.5 money units. We refer to this as the ‘full honesty’ benchmark. By contrast, in the ‘full dishonesty’ benchmark, subjects follow their material incentives and claim 5 money units.

Deviations from honesty

The authors ran their experiments from 2011 to 2015. Talk about dedication. Scrupulous sea-green incorruptible honesty would result in 2.5 money units. Even citizens of decent countries stray from rectitude and award themselves 3.17 money units,  a 0.67 tip for self-interest. Those from more corrupted polities claim 3.53 money units or 1.03 money units more than they should. They are 54% more self interested.

Our strategy was to conduct comparable experiments in 23 diverse countries with a distribution of PRV (prevalence of rule violations) that resembles the world distribution of PRV. In the countries of our sample, PRV in 2003 ranges from −3.1 to 2.0, with a mean of −0.7 (s.d. = 1.52). Thus, the distribution of PRV in our sample is approximately representative of the world distribution of PRV with a slight bias towards lower PRV countries. The countries of our sample also vary strongly according to frequently used cultural indicators such as individualism and value orientations.

So, all though not all the world was tested, this is likely to be a representative sample.

Our participants, all nationals of the respective country, were young people with comparable socio-demographic characteristics (students; mean age of 21.7 (s.d. = 3.3) years; 48% females; who, due to their youth, had limited chances of being involved in political fraud, tax evasion or corruption, but might have been exposed to (or socialized into) certain attitudes towards (dis-) respecting rules.

Where do the cheats live? The authors give 4 ways of calculating honesty, and I have picked the % of honest people in each country measure as the most explicable metric for everyday Bayesian interaction with foreigners.


Where the honest people are

Avoid Tanzania and Morocco and head for Germany and Slovakia (which many of the citizens of Tanzania and Morocco are seeking to do).

On a topical note, given the referendum on British membership of the European Union, it would probably be better for the European Union to consist of Germany, Slovakia, Austria, Sweden, Poland, UK and Lithuania; but not Italy, and possibly not Spain. A real pity France was not included in the experiment, but you cannot always have what you want.

After four years of labour the authors have come to some conclusions, and here they are:

Given that the experiment holds the rules and incentives constant for everyone, the large differences across subject pools are also consistent with a cultural transmission of norms of honesty and rule following through the generations and a co-evolution of norms and institutions. Societies with higher material security, as measured by
Government Effectiveness, tend to be more individualist, and more individualist societies tend to have less corruption. Consistent with this, we find that subject pools from individualist societies have lower claims than subject pools from more collectivist societies and also from
more traditional societies and societies with survival-related values. Further econometric analyses developed in economic literature on culture and institutions applied to PRV support the argument that both the quality of institutions, as well as culture (individualism) are highly significantly
(and likely causally) correlated with PRV.

Taken together, our results suggest that institutions and cultural values influence PRV, which, through various theoretically predicted and experimentally tested pathways, impact on people’s intrinsic honesty and rule following. Our experiments from around the globe also provide support for arguments that for many people lying is psychologically costly. More specifically, theories of honesty posit that many people are either honest, or (self-deceptively) bend rules or lie gradually to an extent that is compatible with maintaining an honest self-image. Evidence for lying aversion and honest self-concepts has been mostly confined to western societies with low PRV values. Our expanded scope of societies therefore provides important support and qualifications for the generalizability of these theories—people benchmark their justifiable dishonesty with the extent of dishonesty they see
in their societal environment.

This is a very good paper. The experiment is simple, the results compelling, the implications considerable. It bounces out of the experimental cubicle into socio-political and philosophical dilemmas. Will immigrants from corrupt countries adopt the values of the society they move to, or keep cheating? Will the current levels of rule-following in decent countries persist, or drift down to less honest global levels? Conversely, might growing affluence make honesty the next must-have consumer requirement across the world?

Consider the authors remark that the different rates of cheating “are consistent with a cultural transmission of norms of honesty and rule following through the generations and a co-evolution of norms and institutions”. The mention of “through the generations” is very welcome. Of course, they might have said “are consistent with cultural transmission of norms of honesty and rule following through the generations and a co-evolution of norms and institutions, and also consistent with a contribution through genetic transmission of a propensity to behave in a pro-social manner”.

The authors may have thought it outside the bounds of their work to consider why societies are “individualist” or “collective” or why some have developed reliable institutions and others haven’t. Cheating, they say, depends on institutions and cultural values but these are created and maintained by the people who live in the countries studied. What drives some to peoples to relative honesty, strong institutions, civil calm and material wealth and others to dishonesty, corruption, unrest and poverty?

I will be posting more about this topic, now that a distinguished colleague has spent some time crunching the relevant additional data. He has many of the answers. You know which way I am heading, but bear with me for a while.