Sunday 27 October 2013

Clocks back, time forwards, a traditional confusion


Last night, UK clocks were put back an hour, thus moving time forwards, allowing citizens an “extra hour” in bed but also leading, in some commentators’ minds, to shorter winter days. A team of specialist engineers are racing across the country to change 80 mechanical clocks, of which Big Ben is the most notable, and will probably only complete the task by the middle of this coming week. A nation of less specialist citizens will slowly get round to altering the twelve or so clocks in the their households. Some times on mobile phones and television sets will change automatically, others on kitchen wall clocks and microwave cookers won’t, so as to give confusion a proper British chance of making a mess of things. This haphazard debasement of time is done twice a year, so for at least half the time, half the timepieces may be wrong. Since I cannot be bothered to look up the manual, I keep my car clock on car time, which is right half the year, but different from that shown by the up to the minute satnav. Long case clocks are another story. They tick-tock in a comfortable and reassuring manner whatever time they show.

Readers of the blog will know that time has not changed, neither has earth’s spin, so this is merely a little local confusion, borne out of that great conspiracy, a total muddle. Greenwich Mean Time was a child of the railway age, and was not adopted until 1880, a full 55 years after the Stockton and Darlington railway opened. uncoordinated local time zones conspired to kill many railway passengers. British Summer Time was born of the Great War, and has held sway since 1916 with some interruptions and alterations. It has very little purpose, but it has become a tradition, which is the sort of thing the English are good at.

The basic problem is that Northern countries are too far North. Days are too short in winter, and gloriously long in summer. Rational folk would organise things so that they could take most advantage of long summer days, and rearrange their timetables to deal with the very short winter days, maximising the light for travel to school by altering school winter time tables. There is a way to achieve this general improvement, which goes by the confusing title of Double Summer Time. This is Greenwich Mean Time plus 2 hours. Confusingly, the more precisely measured Coordinated Universal Time (UTC) is the primary time standard by which the world regulates clocks and time, but for most purposes it is simply a more accurate GMT. This is probably a French conspiracy, but they are always a distraction, so no change there.

Anyway, if you put the clocks permanently to GMT+2 that works well for most countries. Why is this system not adopted? Well, it is rational, so it has that against it. Small but vocal interest groups pretend to be inconvenienced. Furthermore, to worry about large numbers of people doing stupid things is considered Un-British, the preserve of men who probably sleep in pyjamas. To my mind it speaks to a greater problem: the failure to distinguish between the things that need to be coordinated and those that don’t. Trains and planes and power stations and the like which constitute tightly-coupled systems would benefit from the clock never changing. Never ever. Everything else should make their own arrangements about time tables, and they can set their winter timetables however they please. In those instances the lack of coordination is an advantage: it helps smooth out traffic and electricity usage.

Thereby hangs the worst problem: that our need to be coordinated with each other divorces us from the passage of the sun and the moon, our ancient time keepers. Almost a year ago I posted about this in “Time’s Face” 

Looking now at the Emerald Sequoia virtual timepiece Mauna Kea I see that clock time and astronomical time are pretty well lined up, and the equation of time shows that they are about 17 minutes apart. The sun will go down shortly before 5 pm. Yesterday the notion was that it would have being doing so at almost 6 pm. Whatever our earthly calculations, the world still spins and the sun still shines. And now for the much anticipated hurricane.

Blow, winds, and crack your cheeks! rage! blow!
You cataracts and hurricanoes, spout
Till you have drench'd our steeples, drown'd the cocks!

Friday 25 October 2013

Genetics made very simple

There is a code which, properly interpreted and implemented, serves as a blueprint for living things, and is required for all living, reproducing things. Like all good codes it has a very simple underlying structure, in this case only 4 letters. For humans, those letters are repeated in a sequence 3 billion base pairs long. Cunning.

This allows the transmission of many billion different messages. It is somewhat more complicated than that, because the code comes in 23 different physical bundles, and patches of particular sequences may be close to each other or further apart, which is another clue as to what may be going on. For example, a section of the code can act as a “GO TO” instruction, making the operating system jump forwards or backwards to another sub-routine.

No code is entirely unbreakable. It would appear that genetic codes conform to a “Great Chain of Being”, though not as Aristotle conceived it. That supreme thinker classified organisms in relation to a linear natural scale according to complexity of structure and function so that higher organisms showed greater vitality and ability to move. Evolution has done something similar, building up the ladder of life from very basic micro-organisms and then scaling up to greater integrative complexity. In that sense, living things are concatenations and elaborations of earlier solutions, which have been tweaked through natural selection, refinements being added to the code, usually making it a bit longer. Things which are useful survive and are replicated. We should not be too precious about the length of the code, since from that purely genetic perspective we are less complicated than the Paris Japonica plant and the marbled lungfish, the latter being elegant but not renown for its cultural achievements.

The genetic code includes some mistakes (though mercifully few fatal errors), dead ends, bits which are repeated, things which seemed a good idea at the time, fixes to problems which are no longer problems, and general code, miles and miles of general code which the evolutionary process has not dropped because, to anthropomorphize,  who knows what the hell it does, but it seems to work, for the time being at least, and it could be fatal to mess around with it.  At one time it was called “junk DNA” but now it is being treated more respectfully, because it probably sorts out problems which we have not yet even identified.

Again, like all good codes, some parts seem to be much more important than others. For example, most military outposts send messages about the weather, food supplies and other housekeeping matters. References to surprise attacks tend to be less frequent, and more circumspect. 99.9% of our genetic code may be no more than saying “I am a human, and I am digesting properly”. The remaining minority fragments are responsible for determining if that particular human is inclined to iambic pentameters, hallucinations, or to living quietly and abstemiously in a tranquil suburb. A small difference in the code can have big effects on behaviour. Letter for letter, most Bibles are the same, but the Barker and Lucas royal version of 1631 omitted the word “not” from the commandment they rendered as “Thou shall commit adultery”. I do not know to what extent this increased national adultery, but it would certainly have led to tension in small villages.

Most of the traits we are interested in, such as intelligence and personality, are probably influenced by very many genes of small effect. There is no single gene for intelligence, but there are shoals of them swimming in unison, though at present they are hard to find. Individual differences show up as personal variations in the genetic code: descants on the common melody that make us all human. These blips, called snips (single letter substitutions) give us our individual character and particular pattern of abilities. When the links in our code are very much like the links in a particular population then we are said to be in linkage equilibrium with that population. It shows that our ancient ancestral DNA comes from the same sources. It is similar to working out that a particular secret message comes from a particular foreign country, because it has the same general form. To hunt down an individual we are searching for the personal signal which stands out from the community noise. To search for an extended family we are looking for the common signal which distinguishes them from other distant extended families. Discriminant function analysis, cluster analysis, factor analysis, it is all the same difference. Find the central tendency, measure each deviation from that mean, and then plot out the discrepancies in whatever fashion makes most sense for the task in hand. Signal and noise, again and again.

I am sure that genetics would be more palatable to the general public if it was subjected to a thorough linguistic makeover. Perhaps it is only me, but when geneticists use “allele” instead of variant, and “loci” instead of location I tend to turn the page and read about something else. It’s code, guys, it’s code. I don’t need to see the wriggly bits.

You may have found this explanation far too simple, but if you can bear to delve into the wet wriggly stuff, then I am told that the following might be helpful:

Introduction to Quantitative Genetics, Fourth Edition [Paperback] Douglas S. Falconer and Trudy F.C. Mackay 1996. ISBN-13: 978-0582243026  Everyone seems to recommend this book for its explanations and worked examples, though it is not the most up to date. Researchers look back at it almost as a sacred text, not that scientists believe in such things.

Behavioral Genetics. 6th Edition. Robert Plomin, John C. DeFries, Valerie S. Knopik, Jenae M. Neiderhiser 2012. ISBN-13: 978-1429242158. Described as being the crossroads where psychology and genetics meet, it covers the traditional ground and the newer authors are giving this edition an up to the minute feel.

G is for Genes: The Impact of Genetics on Education and Achievement

Kathryn Asbury and Robert Plomin. 2013  ISBN-13: 978-1118482810

This new book, yet to be published but available for Kindle,  has a far more specific focus on the genetics of scholastic achievement. It bills itself as a DNA to ABC text, but also covers the genetics of sporting ability, which makes a change from the familiar territory of IQ, motivation, special education and social status. It also propounds ideas, such as that continuity is genetic and change is environmental, genes are generalists and environments are specialists, and that the environments that matter most are those that are unique to individuals. It is written in a very readable format, probably intended for teachers. Depending on your background, it might be best to look at this one first.


From the point of view of psychology, all explanations compete on the same ground: Can the explanations lead to testable predictions? How well do those predictions match reality? Succinctly: Goodness of specification, goodness of fit.

As to genetics, just remember this: it is CODE just code. All you have to do is crack it.




Wednesday 23 October 2013

All you ever wanted to know about intelligence, Part 5 (IQ is just a single number)


Candidly, I thought that 4 parts on an intelligence primer would be enough, but now my brightest interlocutors are flinging even more clever problems at me, so it is “forwards unto the breach once more”.

A distinguished scholar tweets (they do such things nowadays): Any attempt to reduce the vast panoply of human talents to a single number (IQ) is obviously naive and silly

Let us, as modern parlance has it, deconstruct this contribution, to which I gave a low mark, thus showing that single numbers are often useful summaries.

The implication is that those who measure intelligence are trying to reduce the vast panoply of human talents to a single number. I have met many of these researchers, and I detect no such urge. Skills are not reduced or constrained by trying to understand their underlying characteristics. Researchers are certainly seeking to understand what proportion of the variance in the vast panoply of human skills can be accounted for by a number of latent factors, thereby bringing order to what would otherwise be a maze of correlations. The answer, first given in 1904 and amply confirmed thereafter, is that half of the variance resides in one general factor. This is a major finding, and one of the best replicated in the behavioural sciences.

Paradoxically to some people, to ensure this finding you have to measure the broad range of intellects and the broad range of human skills: the whole panoply, in fact. It is precisely the wide range of skills which allows you to search for factors in common, and if broadening the range reduces the explanatory significance of any factors, then you have ample proof that such factors are insufficient explanations for the observed variance. If you cannot find replicable factors, then panoply rules.

In fact, what comes out is that g is one number and it explains a lot: scholastic achievement, occupational achievement, and lifespan to a significant and sizeable degree. Finding that is not silly or naive, it is a replicable fact, and the “positive manifold” of human skills shows that some people have more panoply than others. We have found that there is a common core, something akin to cortical horsepower which is somewhat like the central processor in a computer. Some core processors work faster than others, thus allowing more mental work to be done, leading to greater and more extensive intellectual outputs. The brain does not appear to be a package of separate modules vying for cranial space, each to the detriment of the other.

IQ is actually a slightly broader measure than g because apart from the most reliable measure, Full Scale IQ, intelligence test results usually include three or four group factor IQs and also some specific scores in the form of the less reliable subtest results. So, when a person is tested individually on a face to face test they get one summary score, and four group scores, all based on ten subtests. Factor analytic techniques reduce those 10 or more tests to the best single predictor, g. If you don’t like a single figure you can have the 4 group factors. These give more detail but are a little less stable. To get a view of detailed strengths and weaknesses, look at all 10 subtests carefully, recognising that they will be usually less reliable than the overall total score. That overall IQ score is better in most circumstances, and gives the best predictions. If you are doing large scale international research, and want to use a properly validated way of extracting the common characteristics from an array of different tests, then use factor analytic techniques to extract the g factor.

By way of analogy, our intellects are like trees with a very broad and tall trunk, from which four boughs protrude, and  from them many, many twigs from which one gets a panoply of leaves.

Since we are dealing with humans who walk about under light supervision, change marital partners frequently and even occupations and sometimes countries, we are lucky if, by using all the measures available to us, we can account for 25% of the variance in life achievements.  Explaining 20-25% is as good as it gets in behavioural science at the moment. IQ is probably the best predictor of a relatively weak bunch, even better than social class or wealth in most longitudinal studies.  That is an argument for another time, but whenever intelligence is measured it has a strong association with life outcomes. IQ is a powerful measure, relative to other social science predictor variables.

Why is there so much disparagement of predictors based on a single number? Perhaps it is a fundamental misunderstanding about the nature of description and prediction. A strong predictor is not an exhaustive equation which totally defines the nature of the organism, stripping it of all degrees of freedom. To illustrate the point, here are some other single numbers that the fearful might see as reducing vast panoplies:

Height: associated with health, status, income and social leadership and intelligence.  At a pinch, you can estimate all those from a single surviving femur. Height does not determine everything, not even the shape of the rest of your body, but it has a partial influence. Tallness matters somewhat, but short people can write poetry.

Weight: associated with health, lifespan, status (now low, formerly high) income (ditto). Heavy weight is usually bad news, but some fat people are fit and healthy.

Wealth: associated with health, status, intelligence? Wealth effects are probably partly caused by intelligence leading to wealth. Children of wealthy parents are not necessarily intelligent.

Social class: associated with health, lifespan, intelligence? Social class of origin has an influence, but if you look at intelligence in children then the family of origin has less effect, and the ability level of the child more effect on later life success.

Taking social class as an example, are the presumed effects of social class on human outcomes lessened because a single number is used to summarise a broad category of status indicators? The number cannot hope to capture all the nuances of class membership, and is not intended to. If, for example, a single number for class correlates with a single number for lifespan, they we can at least ask whether class has an influence on longevity. (It does appear to, but only partially if one controls for intelligence).

Here is a list of other things that could be summarised by single numbers: popularity, journal impact factors, number of publications, number of references, penis length, breast size, brain volume, number of sexual partners, number of Twitter followers, and number of page views on a blog. Single numbers do not cover everything, but they indicate something, even though we often complain about them and want to add further measures.

Years ago, when teaching a BSc in social psychology, I used to ask the students on their first day: “What single snippet of information reveals most about you?” Social class, height, age and so on were all proposed. Then, in triumph, I would exclaim “Your postcode”. Even 15 years ago small area statistics were being extracted from the UK census and linked to postcodes (usually 6 to 7 alphanumeric characters). This short code included powerful predictor variables which gave accurate descriptions of the people living in a small area. Of course, there is always some resentment when we humans turn out to be predictable, but if we aim to be social scientists, we must rise above that.

The philosopher Gilbert Ryle once observed “one can simultaneously obey the laws of gravity and the rules of golf”. A single equation explains the fall of a ball towards the surface of the earth, and at the same time there are conventions about how the game of golf is to be played. Golf, as I understand it, depends on a single score, the lower the better, but that is another matter. There may be more to golf than could be contained in a single score, but I would not bet on it.

In conclusion, and in very succinct terms: A numerical description of one important facet is not thereby the determining and constraining definition of the entity.



By the way, throughout this discussion I have accepted the observation that human beings have a “vast panoply of talents”. There are certainly many very bright and talented people with many skills. Personally, I would not describe myself in that way, but it is nice to have goals to aspire to. Happy panoply to you all.

Tuesday 22 October 2013

On best understanding Nisbett and co.


Occasionally, curiosity leads one into byways, and thus disinters shards of memory lost in the grass of our neglect, letting ghosts speak. Here is the chain: a few days ago Greg Cochran posted a link to a 2007 opinion piece by Eric Turkheimer in which the latter said that “The important questions about the role of genetics in the explanation of racial differences in ability are not empirical, but theoretical and philosophical”. This suggested that neither a positive nor a negative empirical result would have an important effect. On the contrary, if no genetic investigation can explain intellectual differences between racial groups then that destroys the genetic hypothesis. It will be wrong, end of story. I also noted that his essay was written in 2007 so it was possible he had changed his mind.

Later that evening I realised I ought to check up on this possibility, and looked at Turkheimer’s personal website, and in which he lists a recent publication in the September 2012 American Psychologist DOI: 10.1037/a0029772 Nisbett, Aronson, Blair, Dickens, Flynn, Halpern, Turkheimer “Group Differences in IQ Are Best Understood as Environmental in Origin”.

It is a minor gripe, but “best understood” strikes me as a curious phrase. What is wrong with “Group differences are environmental in origin”? That might be true, or partly true, but at least determining it depends on fact. Even “environmental explanation is best fit with all the data” would have been clearer.

Anyway, there is much to comment on in this paper, which is written by distinguished authors in the field, but my attention was drawn to one line of text: Gains in sub-Saharan African countries of 0.50 to 0.70 SD in response to a few months of Western-style education have been reported for heavily g-loaded fluid intelligence tests (McFie, 1961). This dramatic claim is supported by far the oldest reference in the paper. Apart from another reference from 1993 all the rest are from the present century. Nothing wrong with a reference being old, but I decided to look it up. Here is the abstract:

SUMMARY Twenty-six African boys entering technical school were given a series of intellectual tests involving verbal, numerical, pictorial and constructional material, and also an ' abstraction ' test (Weigl) and ' memory for designs ' (Terman-Merrill) . None showed qualitative differences from English subjects in their performance on verbal, numerical and abstraction tests. On the non-language tests their performance was slower than would be expected of English subjects, and they showed differences in their innaccurate orientation of drawn and constructed designs. At the conclusion of two years' technical training, the subjects were retested on the same material. Significant changes were increases in scores on the non-language tests. These were associated with increased speed and accuracy of orientation, and also apparently with a more ' synthetic ' approach toward visual material. It is suggested that an ability which may be poorly developed under these cultural conditions, and which may be increased by appropriate educational methods, is that of perceiving visual material as a whole (or ' Gestalt ' perception).

From this abstract we see: 1) the sample size is small 2) the study was carried out in one country 3) the sample is specific, in that it is students entering a technical school who may well be above the local norm, and 4) the extent of technical training was two years and four months, not “a few months”.

When one reads the paper, it is apparent that the boys are between 16 and 19 and are entering what in Uganda would be tertiary training prior to taking up technical occupations like carpentry, motor maintenance and machine fitting. Retesting took place after 2 years and about 4 months. The 7 tests were in fact McFie’s adaption of standard tests, including putting in some new material and utilising some new scoring methods, and significantly increasing the time taken to complete items. This makes interpretation difficult. One can talk about change after education, but not locate the result properly in conventional intelligence testing. The paper is an indication of an effect, no more, carried out by a thoughtful researcher exploring some possible effects. Despite the very small sample, he includes some factor analytic results. These will not be stable when n=26 and tests=7, but they are a welcome addition as a statement of intent for later work.

Here are his descriptions of the tests:

The main group of tests corresponded to subtests of Wechsler's (1944) scale, with questions omitted which were obviously related to European cultural experience, and some added from other scales (e.g., the Terman-Merrill 1937) :

A.--Comprehension : Six questions : maximum two points each.

B.-Similarities : Seven pairs ; maximum two points each.

C-Arithmetic : Six questions, one point each ; two questions, two points each.

D.-Picture Description : Four photographs of scenes from African life ; 1 point for enumeration or description, 2 for synthetic interpretation.

E.-Picture Arrangement : One example and four test stories, taken from a popular African cartoon strip : maximum three points each.

F.-Block Designs : Wechsler's designs, 1-6, with time limits extended to 2' for designs 1-3, and 3' for designs 4-6.

G.-Memory for Designs : As Terman-Merrill IX, 3, but scoring two, four or six points according to quality of reproduction of each design.

H.-Weigl's Sorting Test : As described by Weigl in 1927 : pass or fail recorded according to ability to sort both ways (cf. McFie and Piercy, 1952).

Criticisms are often made about the testing of intelligence in Africa. McFie was clearly giving his subjects every chance to score well by making the material culturally appropriate, which is good. He severely cut down the number of test items, which reduces reliability and range significantly; he made his own decisions on cultural bias which may or may not have been correct; and he extended the time limits, which damages the interpretation of results. The better procedure would have been to have recorded times up to some generous limit, and then shown how the results were affected by using the generous as opposed to the official time limits. McFie also changed the marking system. This is regrettable. Better to have kept the original and then compared it with his more generous system. The tests have lost their integrity, and can only serve as being broadly indicative.

By the way, in the heated atmosphere of debate about African intelligence, if such a paper was being put forward as a proof of low African intelligence it would rightly be rejected as flawed. The tests are not complete, include other material, have been altered by the examiner, and have had their scoring system and timings altered. This still allows before and after comparisons, but the results only weakly relate to the g loaded originals.

Here are the main results in a screen grab:


It is hard to make a judgment about what the results signify in terms of overall intelligence. Even if one looks only at the tests which most resemble full Wechsler subtests: Comprehension, Similarities, Arithmetic and Block Designs; the original test results are probably equivalent to Full Scale IQ 85 and they rise after over two years of technical education to FSIQ 89. This change is within the usual 4 point retest difference, but it is certainly suggestive of a training effect. If we knew more about the selection of the sample we could judge how this compared to the local norm. For example, if they are the brighter students, chosen for tertiary training, one might expect them to be one standard deviation above the national mean. That would correspond to being drawn from a population mean of IQ 70. If they are run of the mill students then the local average is IQ 85 which is very much better than average results for sub-saharan Africans. As always, the representativeness of samples is crucial when trying to estimate the mean of a bell curve.

Once these IQ results have been spelt out, which McFie did not do, his comment: “None showed qualitative differences from English subjects in their performance on verbal, numerical and abstraction tests” becomes particularly interesting and informative. Right from the start, the students knew what was expected of them, and understood the concept of the tests. They did not have an operating system incompatibility, although there was a power difference. (I make this point because some ill-advised commentators try to suggest that there is an African way of thinking which is fundamentally different from the European way of thinking, as profound as the difference between Microsoft and Apple operating systems). These students may have had a problem with the unfamiliar blocks, but they did not have a problem in realising that they had to make a copy of the block design.

Although the tests have been much altered, in terms of the original standard deviations the statistically significant gains are Picture Description 0.67 Block Designs 0.65 Memory for Designs 0.63 and the total of all tests is 0.73. If you look at the narrow standard deviation for Picture Description, this made-up test lacks discriminative ability as does Arithmetic and Comprehension. Either there weren’t enough items and not enough hard items, or the group were already highly selected and homogenous in ability on these particular skills. Fuller testing might have shown different results. My own reading is that this small study suggests that 2 years of technical education probably improves Block Design and Memory for Designs, but without control groups we cannot be sure.

In my view, the Nisbett et al. account is not a good representation of the paper. Here it is again:

Gains in sub-Saharan African countries of 0.50 to 0.70 SD in response to a few months of Western-style education have been reported for heavily g-loaded fluid intelligence tests (McFie, 1961).

It makes the results sound extensive, broadly effective across countries, and quickly and easily attained. How should they have reported the results? There are various options, but here is one suggestion:

McFie (1961) tested 26 Ugandan adolescent boys on 7 adapted Wechsler type tests after over two years of technical education, finding gains of 0.6 sd on 3 non-verbal tests.

It is 29 words to their 30 (and 143 versus 163 characters) and I think it captures the main results more accurately. What do you think?


So, where is the ghost to whom I referred earlier? John McFie was my first boss, a kind man who left a promising career at The National Hospital, Queen Square, London to work as a rural doctor in Africa. When he eventually returned, to England with his African wife, and to brain research at Guy’s Hospital, he hired me in 1968 and mentored me as we studied the cognitive effects of cortical injuries sustained in childhood. When Arthur Jensen’s paper came out in 1969 we both decided we would attack what we thought was Jensen’s suggestion of a genetic cause of African deficits on Block Designs by extending the work John had done in his 1961 paper. He was the senior author of the first paper I published, broadly arguing that Jensen’s findings were an artefact of cultural restrictions regarding constructional toys. Did we prove our case? Was Jensen convinced when I presented the results to him at a conference? More of that later.

Sunday 20 October 2013

Dominic Cummings’s Thoughts on Education and Political Priorities.


Dominic Cummings has written a very interesting “essay for 15-25 year olds” entitled: Some Thoughts on Education and Political Priorities.

It is well-argued and data rich, with an impressive scope. He includes many research findings on intelligence and education and social mobility. Von Neumann, Feynman, Kolmogorov, Turing, Newton, Poincare and Thucydides act as front runners for contemporary researchers such as Daniel Kahneman, Robert Plomin, Stephen Hsu, Alex Wissner-Gross and others. The essay combines a mathematical perspective with classical thinking to great effect. It does not lack ambition: there are good sections on mining, fuel utilization and power generation, solar photo-voltaic panels,  digital fabrication, robots, developments in space exploration and sundry technical matters. There is much to enjoy for anyone who respects a factual approach to important  social, technological and educational issues.

By way of example, in a footnote Cummings discusses Alex Wissner-Gross’ 2013 paper in Physical Review which attempts to describe intelligence as a fundamentally thermodynamic process, proposing that intelligence can spontaneously emerge from the attempt to maximise freedom of action in the future. He built a software programme designed to maximise the production of long-term entropy of any system it finds itself in. ENTROPICA then solved various problems including intelligence tests, playing games, social cooperation, trading financial instruments, and ‘balancing’ a physical system and so on. The key ingredient seems to be the maximisation of future histories. However, although the program models human and animal problem solving, it is not clear to me whether those outcomes arise from purely thermodynamic general principles. At the moment, I see this as self-criticism, and hope to post more on this research later.

As regards education, the main thrust of the book seems to be the need for random controlled trials and evidence-based teaching. To judge outputs you need to know the inputs. The ability level of your pupils at entry is one such key indicator. Schools have to be judged on the educational value they add. Methods need to change, targets should be set more ambitiously, and there should be more schooling days in the year. Teaching should pay attention to proven ways of learning, not just traditional methods like lecturing which burden short term memory. Approaches which are tailored to the child are championed.

Here is his educational thesis in a nutshell:  The education of the majority even in rich countries is between awful and mediocre. A tiny number, less than 1 percent, are educated in the basics of how the ‘unreasonable effectiveness of mathematics’ provides the ‘language of nature’ and a foundation for our scientific civilisation and only a small subset of that <1% then study trans-disciplinary issues concerning the understanding, prediction and control of complex nonlinear systems. Unavoidably, the level of one’s mathematical understanding imposes limits on the depth to which one can explore many subjects. For example, it is impossible to follow academic debates about IQ unless one knows roughly what ‘normal distribution’ and  ‘standard deviation’ mean, and many political decisions, concerning issues such as risk, cannot be wisely taken without at least knowing of the existence of mathematical tools such as conditional probability.

Here is his charge sheet on English education in the last 30 years:

Large improvements in state-controlled test scores have not been matched by independent tests. Durham University found that the GCSE grades of pupils who sat PISA in 2009 were higher than those of pupils who got the same PISA maths score in 2006 (which they should not have been, of course).

A major study found a significant decline in the algebra and ratio skills of 14 year-olds between 1979 and 2009: fewer than 1/5 of 14 year olds can write 11/10 as a decimal.

Pupils in the top tier of countries (Shanghai, Singapore, et al) are about 1-2 years
ahead of English pupils in the PISA maths.

TIMSS (a more curriculum-focused international maths test) also shows England behind at primary school and the gap widening with age.

A 2011 BIS survey of adults found that only ~1/5 operate in maths at a level of a GCSE ‘C’ or better. Research from National Numeracy (February 2012) showed that
~½ of 16-65 year olds have at best the mathematical skills of an 11 year-old.

Before the EBac, only ~1/8 of those at a non-selective state school got a C or better GCSE grade in English, Maths, two sciences, history or geography, and a language (~½ of privately educated pupils do).

GCSEs are poor: cf. the Royal Society’s 2011 study of Science GCSEs, Ofqual’s April 2012 report, and damning analysis from the Institute of Physics and Royal Society of Chemistry. Shayer et al (2007) found that performance in a test of basic scientific concepts fell between 1976 and 2003.

Many University courses, including Cambridge sciences, had to lengthen from the 1990s to compensate for the general devaluation of exams.

Foreign languages are in crisis even in Oxbridge: forty years ago, interviews were conducted in the language, now tutors are happy if an applicant has read anything in the language. For a recent survey, cf. Coe (2013).

Only 1.7% of 15-year-olds in England achieved Level 6 in PISA 2009 maths tests compared with: Shanghai (27%), Singapore (16%), Hong Kong (11%), South Korea and Switzerland (8%), Japan (6%), Finland and Germany (5%). If one
looks at Levels 5 and 6, only 10% in England reach this level compared with: Shanghai (50%), Singapore (36%), Hong Kong (31%), South Korea (26%), Switzerland (24%), Finland (22%), Japan (21%). Given that those from independent
schools score >50 points higher than those from maintained schools, the tiny English 1.7% may include a large overrepresentation of independent schools and the performance of pupils in non-grammar state schools may be worse than these figures suggest.

During my involvement in education policy 2007-12, I never come across a single person in ‘the education world’ who raised the work of Robert Plomin and others on IQ, genetics and schools, and whenever I raised it people would either ignore it or say something like ‘well obviously IQ and genetics has no place in education discussions’. I therefore invited Plomin into the DfE to explain the science of IQ and genetics to officials and Ministers.

Here is the summary picture, which casts significant doubt on GCSE results:


Nothing particularly surprising about all of that, at least to those who have some knowledge of the intelligence literature. Not all of those difference may be entirely due to bad teaching. If one takes “school far” tests based on the sorts of things any student is likely to know regardless of the particular curriculum, then Chinese and Japanese do well on those tests, as well as those that are “school near” and depend on specific teaching. The former are sometimes called ability or intelligence tests. The same intelligence differences can be found in occupational classes.

There is huge variation in school performance (on exams that are sub-optimal) among schools with the poorest children. In about a quarter of primaries over a quarter of pupils leave not properly prepared for basic secondary studies (and few such pupils enjoy a turnaround at secondary school). Other primaries, including those in the poorest areas) have have fewer 5% of their pupils in such a
desperate situation.

Consider a basic benchmark: getting four-fifths of pupils to at least a ‘C’ in English and Maths GCSE. A small minority of state schools achieve this, while others with
similar funding and similarly impoverished pupils struggle to get two-fifths to this level.

It is a sign of the fundamental problems with ‘education research’ that the Institute of Education is very hostile to research on genetics and education.

When people look at the gaps between rich and poor children that already exist at a young age (3-5), they almost universally assume that these differences are because of environmental reasons (‘privileges of wealth’) and ignore genetics.

It is reasonable to hope that the combination of 1) finding the genes responsible for cognitive abilities, 2) scientific research on teaching methods, and 3) the power of computers to personalise learning will bring dramatic improvements to education - but this will not remove genetic influence over the variation in outcomes or ‘close the gap between rich and poor’. ‘The good school ... does not diminish individual differences; it increases them. It raises the mean and increases the variance’ (Elliot Eisner, Stanford). Good schools, in the sense of ‘teaching children of different
natural abilities as well as possible’, will not ‘eliminate gaps’ - they will actually increase gaps between those of different abilities, but they will also raise floors and averages and give all children the opportunity to make the most of their genetic inheritance (personality as well as IQ).

Education is political battlefield. It is seen as a very powerful influence on society, and the benefits of different types of education are not judged on merely pragmatic grounds but as part of a wider pattern of social change. From an empirical point of view this raises many problems. If you think some purveyors of education are better than others, this will be seen as taking a political stance.  Any book which reports results and makes policy proposals cannot help but be political, and to be judged by those standards. Education systems tend to be evaluated by intentions, not outcomes.

In a pleasant contrast, here is a graph showing a real improvement in the mental tools at our disposal.


I note that the Apple IIe which was my first home computer in 1982 or thereabouts was operating at the same level of efficiency of the then mighty Cray 1 computer, and the Compaq 386 I had by the late 80s was very quickly surpassed. Current desktops operate at the power of the fastest supercomputers in 1993. More impressively, if you took the top 500 computers in 2008, their combined power was surpassed by the best single supercomputer in 2013. Smartphones are about as powerful as the best computers of 25 years ago. The point is not that my word processing speed has increased, but that analysis of all sorts became feasible, such that you can design a wide body jet without using a wind tunnel, and get it to fly properly. Genome sequencing is becoming easier at a rate which is faster than Moore’s Law. You can also just about begin to interpret the genome, though that may take more power and much larger samples. Incidentally, much of the improvement in processing problems comes from better algorithms, not the chips. However, current computation cannot yet compete with the  estimated 1017   floating point computations, for the mere expenditure of 20 watts, achieved by the human brain. Modelling that will take a fair bit of computer power, possibly not available till after 2020.

In sum, Cummings has written a very interesting, informative and inspiring essay, which gathers together many of the ingredients required to think about our changing technical and organisational world. His parting gift is an Odyssean reading list for young students. His essay is an exciting read in the grand tradition of two cultures, covering a broad canvas with a sense of adventure and a love of discovery.

Look in my eyes, you sensitive clever person.


Every now and then some passing commentator says that intelligence tests are deficient because they do not provide sufficient assessments of warmth, understanding, and emotional sensitivity. Traditionally these have been considered aspects of personality, but because people want to be intelligent without taking a test which may reveal them to lack that quality, there has been much interest in the rebranding of these personality traits into “emotional intelligence”. This ranks somewhat higher than “gastro-intestinal intelligence” but even that latter digestive ability is something to fall back on if all else fails.

In the public relations campaign for “emotional intelligence” personal characteristics such as restraint, patience, thoughtfulness, concern about others and suchlike are not considered to be just good manners, or aspects of good character, but evidence of a specific problem-solving ability: the capacity to understand other human beings. When researchers try to bring the concept of emotional intelligence into psychometric assessment, they indeed find that much of it is simply a personality variable. However, there is an interesting possible exception: the capacity to understand depictions of other people’s emotional states. There are some positive findings, though not yet, as far as I know, proper epidemiological studies combing the “emotional perceptiveness” measures with established intelligence and personality measures. Nonetheless, there seems to be a suggestion that understanding others is g loaded (see below).

It was with these thoughts in mind that I took up the opportunity to do an online test of emotions “Reading the mind in the eyes”  by Simon Baron-Cohen et al. (1997 and 2001)


I assure you that I approached the task with the greatest sensitivity. The insensitive, brutal and very short answer is that I got 31/36 correct. The longer and more sensitive answer, or excuse, written immediately after getting the results was as follows: “I have some criticisms, which is that there should be a few trial items, so that you can calibrate how the test uses the description words. On that point, the errors should be graded for “close” errors or “far” errors. Close errors (most commonly chosen alternative) should get a quarter point, equivalent to an informed guess. Personally, I think I could claim that my first response, marked as an error, was to mark the very first picture as “comforting” rather than the required “playful”. This resulted in my uttering an expletive, and very probably falling under stereotype threat. (Clinical psychologist found failing on a core competence, collapses into greater incompetence). On strict methodological grounds (aka petulance) I claim 31.25 out of 36.

In fact, the full 2001 paper makes all clear. The revised test does have an introductory item which was not used in the online version, so it is not the author’s fault. Subjects were shown detailed word definitions with examples of usage, so that knocks another quibble on the head. Error rates for each word on the distractor items (foils) are properly listed, so petulant pedants can calculate their own, adjusted and face-saving score. Additionally, there are proper control groups, including an IQ matched control group. The gradient is: autistics 22 points, general population 26 points, students 28 points, and people with IQ 115 get 31 points. Leaving aside those with autism, the last three groups show an intelligence related gradient in the accuracy of their emotional judgments.

All in all, a good paper, with interesting material and good controls. Of course, as a clinical psychologist, I am sensitive to very subtle signs which could not possibly be depicted in an online test. Do we understand each other?

Wednesday 16 October 2013

Heritability estimates and the analysis of variance


This is a note about heritability estimates and the analysis of variance, but if you want the inessential background, Dominic Cummings has published a book in The Guardian  newspaper in which, among many things, he mentions the work of Robert Plomin showing a heritability estimate of 70% for scholastic attainment.

You can read about Plomin’s most recent work on this very blog:

Steve Jones then wrote about heritability estimates in The Daily Telegraph newspaper:

Dominic Cummings then replied:

Triggered by this current debate, I am writing a very short note to cover matters partly raised by Steve Jones which I think need some amplification. The analysis of variance depends upon the setting. If we look at scholastic achievement in present-day Britain, then Plomin’s paper shows that heritability estimates run as high as 68% of the variance. However, if on that basis we were to close all publicly funded schools, in the next decades it is likely (though not certain) that heritability estimates for scholastic attainment would decrease, because there would be an increase in the deleterious effects of the environment. It would move from being almost uniformly good or good enough, to being very heterogenous: some kids would get superb schooling and many would get none at all. In terms that R.A.Fisher might have used, if you plant different strains of wheat in uniformly well ploughed, well mixed, well fertilised soil, then the differences between the different strains will be due to their inherent qualities, and not the vagaries of the soil. On the other hand, if the soil varies considerably then the yields will vary partly because of seed quality, and partly because of soil quality. The experimental method allows us to tease out these possible sources of difference by comparing the variance between strains with the variance within strains. The results will depend on the strains tested, the soils planted, and crucially whether we look at one harvest or a whole series of harvests. Long series data are generally the most informative.

If we look at the variance between one decade and another, say the 1950s versus present time Britain we would begin to capture some of the cultural and educational changes over those sixty years. (It would depend on our having historically sound measures of attainment, not always easy to obtain). That might show great historical improvements in scholastic attainment, which would count as an environmental effect. For example, increasing access to tertiary education should have a positive effect on student knowledge and abilities. It may also be subject to the laws of diminishing returns.

A further complication is that the analysis of variance does not immediately pick up and display comparative means. It is just a ratio, after all. For example, if British school children have really become much more accomplished than their grandparents one needs to look at the means over that time period in order to determine that. (The OECD study suggests no difference, see previous post).

It might help if I were to draw all this in a diagram but for the time being I will stick to the medium of words. Earl Hunt goes into the statistics of this matter in his book “Human Intelligence” showing that in the nature/nurture debates about adoption studies some writers look at the gains in the means of intelligence (which do exist), and others look at the correlations between parental and child intelligence (which are substantial). Properly, we should look at both. We need to be scholars, not lawyers, as Hunt tartly observes.

To take another example, height is heritable and also influenced by diet in the longer term. Heights have increased in wealthy Europe over the last 60 years, and it is also still true that some European people are taller than others despite diets now being good throughout Europe. We have to be able to think about two things at once. The Dutch are the tallest nation, which is just as well, poor things. Their heads are at sea level, their feet on vulnerable reclaimed soil.

The analysis of variance depends on the context and historical setting. For example, the Dutch famine in 1944/5 had health effects on children born at the time, but not on their intelligence. It may not have lasted for long enough (for experimental purposes, that is, it was certainly too long for the victims). So, the longer term picture does not always show the environmental effects one expects, but one should always look for them. Usually, bad environments have big effects, but once the environment is OK if not spectacular, heredity estimates tend to be high. Contrariwise, Flynn effects are continuing in some rich countries in the present day, so if they are due to nutrition then that is a bit odd, because nutrition was at a good standard within a few years of the end of the war. The special issue of Intelligence on the Flynn Effect will be available in December.

I am still reading Dominic Cummings’s book (which is full of lots of interesting stuff) but I doubt he really thinks that heritability estimates imply that we can ignore schools. What I have read so far is to the contrary: he wants to improve schools by making them use better techniques and stay open for more of the year. It doesn’t sound like he is ignoring the environment at all. He wants to improve public education and encourage excellence in teaching.

Looking ahead, if we manage to build a culture which provides uniformly excellent public education, with good nutrition and good standards of living for all, then heritability estimates might be even higher than 68%. Environmental variance will be reduced, and as long as the benign provision of excellent eduation lasts it will fall out of the equation, only to come back again when the supply of social support fails. On the bright side, while the Nirvana lasts, students may even learn the analysis of variance.

Monday 14 October 2013

All you ever wanted to know about intelligence (but were too bright to ask) Part 4 final


Finally, Deary goes on to discuss the “so-named Flynn effect, whereby the absolute scores on intelligence tests have been rising since testing started in the early-to-mid
20th century. The extent of the rise, its geographical distribution in the world, and especially its causes are all still being studied. Some hypothesise that better nutrition might explain some of the increase, and others put it down to society’s making more accessible and emphasizing the skills tested by intelligence tests.”

For a look at this issue, see “The Flynn Effect Re-evaluated” already in press at Intelligence, but to be collected together in a single special issue by December:

“On the biological side there is research showing that breast feeding is associated with a sizeable advantage in intelligence later in childhood. However, there is also some evidence that this is explained by the higher intelligence scores of the mothers
who tend to breastfeed. “ I think we have covered that one, for the time being at least.

“Adoption from a deprived to a more affluent setting is reported to be associated
with an intelligence advantage. There is still debate about the effectiveness
of intensive intervention programmes early in life, and whether any cognitive
advantages last or whether advantage accrues to social rather than cognitive
skills.” I will try to post about that sometime.

Deary ends with a plea: “Human intelligence is important; it matters in our lives.”

If you are still on speaking terms with your intelligent friend send them the 4 short posts but DO NOT ASK THEM WHETHER THEY AGREE. Leave them alone to think up their refutations. I mean, what is intelligence, really?

Here are his references:

Deary, I.J. (2012). Intelligence. Ann. Rev. Psychol. 63, 453–482.
Deary, I.J., Penke, L., and Johnson, W. (2010). The neuroscience of human intelligence differences.Nat. Rev. Neurosci. 11, 201–211 .
Hunt, E. (2011 ). Human Intelligence (Cambridge: Cambridge University Press).
Nisbett, R. E., Aronson, J., Blair, C., Dickens, W., Flynn, J., Halpern, D. F., and Turkheimer, E. (2012). Intelligence: new findings and theoretical developments. Am. Psychol. 67, 503–504.
Salthouse, T.A. (2010). Major Issues in Cognitive Ageing (Oxford: Oxford University Press).

All you ever wanted to know about intelligence (but were too bright to ask) Part 3

Consequences of intelligence differences


”People who score better on intelligence tests tend to stay longer in education, to gain higher-level qualifications, and to perform better on assessments of academic achievement. Some of the correlations between intelligence scores at the end of primary school and academic results some years later are high, suggesting that it is not just a matter of education boosting intelligence. Also, educational attainment has a moderately high heritability, and a strong genetic correlation with intelligence. On the other hand, there is also evidence that education can provide a boost to
scores on tests of complex thinking, and some of these increments last into
old age. Therefore, there is probably a bidirectional causal association between intelligence and education.”

Social status and mobility

“People who score better on intelligence tests tend to go into more professional occupations (typically those with higher status) and to perform better
in the workplace. There is a positive association between intelligence test
scores in childhood and social position later in life: people who score higher
tend to be in more professional jobs, to live in less deprived areas, and to
have higher incomes.” By the way, this is not due to people publishing their intelligence test results in their job applications nor, usually, to employers conducting their own intelligence tests (though that would often be useful). These results are obtained by looking at historical intelligence test results years after the children have grown up and finished their professional careers. 

“The association is not perfect. Results show that, when it comes to attained social position in maturity, intelligence, education and parental background all count to
some extent. That is, there is some meritocracy and intelligence-driven social mobility, and there is also some social inertia.”

Intelligence seems to have a cumulative effect, and relates more strongly to occupational and social position later rather than earlier in adulthood. Even bright kids join rock bands for a while. Some do it part time for ever.

Intelligence and health

Intelligence is associated with better health and longer lifespans, but it is not entirely clear why. The early explanation was that more intelligent people learned quickly how to avoid health hazards. The gave up smoking sooner in life, bothered to read the medicine labels, and followed health advice generally. Now it seems possible that both intelligence tests and life itself test a general underlying bodily system integrity, a fundamental mens sana in corpore sana which, if you are lucky enough to have it, gives you health, intelligence and long life without much exertion on your part. Typical, isn’t it, that evolution doesn’t understand human concepts of fairness and equity? Also typical that many very good papers were written showing how intelligent people avoided health hazards, and now it turns out that those will have to be re-written.

Age related cognitive decline

Please read this section slowly, and with great care, because you may be asked questions about it later.

“There are declines in cognitive function even among people who do not develop dementia. Not all cognitive functions decline at the same rate. Some cognitive functions — often referred to as markers of crystallized intelligence — hold up well with age. These include vocabulary and general and specific knowledge. The cognitive
functions that tend to decline are called fluid intelligence. These tend to involve on-the-spot thinking with novel materials, and in situations in which past knowledge is of limited help. This includes abstract reasoning, spatial abilities, processing speed, and working and other types of memory.

Not everyone experiences the same rate of cognitive decline, and there is a growing interest in the genetic and environmental (biological and social) determinants of people’s differences in age-related cognitive changes. Some of the more solid
evidence exists for the following being cognitively protective: not having the
APOE e4 allele, being physically more active and fit, and not smoking.”

Two main hypotheses are: some people have a “cognitive reserve” such that their brains are better able to withstand damage, perhaps because a bigger brain provides redundancy or because some people’s brains are more flexible in reorganizing networks to regain or retain cognitive functions; or the common system hypothesis that age-related decline of different bodily systems is correlated; that people who are experiencing faster cognitive declines might also be experiencing faster declines in sensory and some physical functions, making researchers consider inflammation,
oxidative stress, telomere length, and the hypothalamic-pituitary-adrenal axis as common causes of variance.

All you ever wanted to know about intelligence (but were too bright to ask) Part 2


What are the causes of differences in intelligence? Twin and adoption studies show that intelligence is about 50% heritable. Counter-intuitively, heritability estimates increase to 80% as you age and put many years between yourself and your family’s influence on you. It looks like genes take time to express themselves, or set in motion processes which take time to develop.

If you measure a diverse range of skills you can show that g is highly heritable, but there is less genetic influence that is specific to each domain. Heritabilities are probably higher in richer people, and lower in deprived groups (suggesting the environment has most effect when it is very bad).

There is no “gene for intelligence” with the exception of the APOE gene, where individuals with one or two e4 alleles  tend to have lower ability in old age, and declining cognition across their lifetimes. (Reassuring that my condition has a name).

Genome wide assessment studies haven’t come up with very much, yet. There is some molecular genetic evidence that some variance in intelligence is detected by single nucleotide polymorphisms. Applying genetic complex trait analysis, between a quarter and a half of intelligence variance can be accounted for by variants in linkage disequilibrium with common SNPs. This analysis cannot identify the precise causal genes. It suggests that intelligence is highly polygenic, with large numbers of variants of small effect sizes.

The genetic correlation between intelligence measured in childhood and old age in the same individual is high: to a substantial extent the same genes cause higher intelligence in both childhood and old age. All these studies require very large sample sizes (125,000 is a good number) and allow major risks to be computed: in individuals who are at genetic risk for schizophrenia, but have not exhibited the disorder, cognitive ability in old age is lower than in those without schizophrenic genetic risk.

With regards to the much vaunted effects of the environment, twin studies suggest that the contribution of the shared environment (family for example) to intelligence is small to negligible by adulthood, and the remaining variance is due to individually created environments and error.

Brain correlates of intelligence differences

There is a general finding that there is a modest correlation of 0.30 between intelligence test scores and brain size, and a similarly sized correlation between intelligence and the general integrity of the brain’s white matter, as measured by diffusion tensor MRI. Sample sizes here are of the order of 500 persons. The association is largely accounted for by people’s differences in speed of processing. Cleverer brains seem to be more efficient. Consequently, they can think about complex problems faster and for longer, thus being more likely to find solutions.

You probably already knew that.

All you ever wanted to know about intelligence (but were too bright to ask) Part 1


There is a recently published Primer in Current Biology on Intelligence written  by Ian Deary, which is hidden behind a paywall, lest it be read by anybody other than a fully paid up current biologist. Strange world, the one in which scholars write things for nothing in order that other citizens should have to pay for the privilege of reading them. Such payments made some sense when publishing required printing, but rather less now, when the transmission of bytes is close to free.

Primer. Intelligence. Ian Deary. Current Biology Vol 23 No 16 R673  2013.

I will be using the article as a framework for a series of posts, taking one theme at a time, picking out some highlights and adding some extra bits. My target audience is bright people who don’t believe in intelligence. For a variety of reasons, they think it unseemly to acknowledge that they can think faster than others. This is not altogether stupid, because in many genocides it is the intelligent who get slaughtered. Excessive modesty may have some survival value.

“Some people are cleverer than others. It is a prominent and consistent way in which people differ from each other; the measurements we make of people’s cleverness produce scores that are correlated with important life outcomes; it is interesting to discover the mechanisms that produce these individual differences; and understanding these mechanisms might help to ameliorate those states in which cognitive function is low or declining.”

Deary distinguishes between cognitive psychologists who are trying to find out how the mind works and differential psychologists who mostly focus on how people differ in the workings of their minds. The latter try to show precisely the ways in which people differ, and try to discover the causes of those differences.  The two tribes don’t communicate very well. Cognitive psychologists, in my view, are missing a trick. A very brief vocabulary and/or digit span test or, with more time, a group intelligence test, would give them important data, and help place their results in the context of human differences.

Deary identifies four major sources of scepticism about intelligence:

1 The concept appears to be too general. People argue that they are better at some skills than others, and assume different modules are involved, such that we are all good at some specific mental skill.

2 Historical events in intelligence research which are discreditable. In the UK, the 11+ missed out people who later showed demonstrable talent; cases of probable fraud in reporting results; over use and over-interpretation of intelligence tests; controversies about intelligence differences between ethnic groups; or claims that “ordinary” intelligence has now been replaced by tests of “multiple” intelligence.

3 “It is possible that clever people develop a kind of cognitive noblesse oblige; they kind of know they have won the lottery on a valuable trait, but they think it is bad form to acknowledge it.”

4 They probably haven’t read good quality research on the topic.

I find that most of the hostility about intelligence comes from bright people, who keep up with the broad sweep of newspaper reports and popular books, but have not looked at good quality research. Despite this lack they are surprisingly vehement when they ridicule IQ.

So, it would appear that intelligence is most disparaged by the intelligent. “Define intelligence” they demand, with a knowing smile. Personally, I have found that the only answer is to show them a photo of George W. Bush. (But I am getting ahead of myself). There are three types of answer: a quip, an explanation, and a formula.

The Quip: “Intelligence is what you need when you don’t know what to do”. Carl Bereiter coined this elegant phrase. It captures the ultimate purpose of intelligence, which is to help you cope with the unknown. The best intelligence test is the puzzle to which no one knows the answer. For example, is there a detectable particle which gives objects mass?

The Explanation: “Intelligence is a very general mental capability that, among other things, involves the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience. It is not merely book learning, a narrow academic skill, or test-taking smarts. Rather, it reflects a broader and deeper capability for comprehending our surroundings — ‘catching on,’ ‘making sense’ of things, or ‘figuring out’ what to do.” Linda Gottfredson and 52 leading psychometricians agree with this explanation.

The formula: g+group+specific skill+error, where g accounts for about 50% of the variance. (I have written this in English, but it should be displayed in eigenvalues).



So, let us look behind the formula (in English) “In 1904 Spearman found that people who perform well on one type of cognitive test tend to perform well on others. That is, if cognitive test scores are ordered so that better performance equals a higher score, the correlations between them are all positive. There is shared variation among all types of cognitive performance. Spearman called this shared/common variance g: an abbreviation for general intelligence. In the 100+ years since then, every study that has applied a diverse battery of cognitive tests to a decent-sized group of people with a mix of ability levels has re-discovered the same thing: there is some cognitive variance shared by all cognitive tests. Typically, if one applies principal components
analysis, just under half of the total test score variance is accounted for by the first unrotated principal component.”

This finding of 50% of the variance in ability being due to g is matched by another finding:  IQ-type test scores are highly reliable, and highly stable. For example, when the same intelligence test is taken at age 11 years and repeated at almost 80, about 50% of the variance is stable.

So, half of intelligence is due to a common factor, and half of the variance is stable throughout life.

Unfortunately, by this stage in the argument, we will have lost probably half of the intelligent readers. They are still smiling at the idea of anyone defining their intelligence. Can you please send them this link?

Friday 11 October 2013

How illiterate is the OECD?


The OECD is has conducted a study of adult skills in some of the wealthy countries, of the world and the UK papers are aghast at the results. The UK has done badly, with many heavily educated British youths knowing less than their more lightly educated elders. Cue for outrage, hurt feelings, and political posturing. If you look at the actual publication, you will find that the key OECD results have been written up in a corporate format: suitably uplifting photos of students staring intently at their homework lurk in the background as the Secretary General makes his opening remarks:

“If there is one central message emerging from this new survey, it is that what people know and what they do with what they know has a major impact on their life chances. The median hourly wage of workers who can make complex inferences
and evaluate subtle truth claims or arguments in written texts is more than 60% higher than for workers who can, at best, read relatively short texts to locate a single piece of information. Those with low literacy skills are also more than twice as likely
to be unemployed.”

Well, blow me down. Some people are brighter than others. This is the finding which has emerged from intelligence research over a century. Read Linda Gottfredson  “Why g Matters: The Complexity of Everyday Life” (1997) if only just page 117 for an explanation of the relationship between literacy, learning and intelligence.

Then, for an explanation of the relationship between intelligence at 11 and scholastic attainment at 16 read Ian Deary:

Finally, for an explanation of the relationship between intelligence and the time taken to learn a skill to an adequate standard look at this summary, from Gottfredson: “Men of low ability (10th to 30th percentiles) took about 12 to 24 months to catch up with men of higher ability (above the 30th percentile) who had only 3 months’ experience on the job.” (Schmidt and Hunter (1998) The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, Vol 124(2), Sep 1998, 262-274; Vineberg & Taylor, 1972,Performance in four army jobs by men at different aptitude (AFQT) levels p. 55-57). For this reason the US Army has always been allowed to use “ability” (intelligence) tests to selects its recruits and is allowed to reject low ability applicants simply because it will take too long to teach them necessary skills. The military are sitting on a treasure trove of data on how long it takes to train people at different levels of intelligence, and how much the recruits can think for themselves beyond their training, at various levels of intelligence. They go for the brightest recruits every time. The higher the rate of unemployment the better the class of recruit who present themselves to try to get into a well paid, if sometimes dangerous, job. The Army do not care what genetic group recruits come from. In that sense the military are race integrated because they are intelligence integrated. If you can make the grade you get the job. The US armed forces don’t talk too much about intelligence because they fear they might be prevented from using their best weapon: using IQ tests to get good quality people. They have government permission to use intelligence tests, and to reject those who are not intelligent enough, and they don’t want to lose that privilege, like many other public service employers have done. They keep their procedures very complicated and their results obscure.

Anyway, what else has been left out of the report, apart from human intelligence? The OECD view is that the problems arise from people not having the “skills”. Give them the skills and all will be well. That is true, but only given lots of time, patience and resources. As Gottfredson has pointed out, training skills in people of low ability is a very long drawn out process, and does not generalise easily to other skills. Her work on the Wonderlic personnel selection test (designed by researchers who did not believe in general intelligence) paradoxically is one of the best proofs of general intelligence. Training someone of low ability to do a particular task does not generalise very much at all to other tasks. You are better off getting someone who learns everything at a faster rate.

On the OECD results the UK is placed slightly below average in literacy. National results do not mean very much unless you analyse immigrants separately. PISA does this, and the scores show immigrants are lower than locals even in the second generation, although the second is usually better than the first generation. When the rate of immigration is high, national “skill” levels drop (except in countries with low intelligence levels who import brighter foreigners to run things).

The much repeated finding that UK youngsters are no better than their elders turns out to be a bit misleading. Both young and old in the UK are within measurement error of each other. It is simply that British youngsters have not shown the gains shown by Korean youths. Not surprising. The British 1870 Education Act ensured access to education long ago. Korea achieved it recently.

I turned to the full report. This shows that the samples in each country were initially assessed regardless of nationality. This includes immigrants. Later in the report they are studied separately, but not separating recent arrivals, nor identifying the immigrant groups in question despite their difference in ability, above or usually below the locals. As far as I can see, they regard immigrants as a fungible commodity. There are later analyses somewhere in which personal and parental education are mentioned, but finding the real data in this publication is difficult and time-consuming.

The following are excerpts from the summary which may surprise for their obtuseness:

“Most of the variation in skills proficiency is observed within, not between, countries.”  Bell curve? Mean differences always smaller than individual differences?

“In all but one participating country, at least one in ten adults is proficient only
at or below Level 1 in literacy or numeracy. In other words, significant numbers of adults do not possess the most basic information-processing skills considered necessary to succeed in today’s world.” Bell curve? Every distribution has a lower range?

The authors seem incapable of understanding the normal distribution of human abilities. I decided to skip their carefully crafted presentation, and have a look at the methods section. Here are some scattered findings on the way: the Russian sample omits Moscow; some countries have over-sampled minorities; sample sizes range from 4,500 to 27,000 so the authors are right to say we need to pay close attention to the standard errors of the estimates. In fact, I found out much later in the Readers Companion that all countries were at the 4 to 6 thousand sample size, which is OK but not great, and only Canada managed 27,000. Deary’s work on IQ and scholastic attainment included 70,000 children and that was for just one academic paper. Basically, these researchers do not appear to have used entirely proper epidemiological samples, though they certainly used national registers. It is hard to find a single table which compares sample characteristics with population characteristics, let alone a chi-square to identify discrepancies.

The correlation between proficiency in literacy and numeracy at the individual level for the entire sample is 0.87 (see Figure 2.9). This strongly suggests a common factor, but this is not discussed. Why not show a correlation of the main cognitive variables and do a principal components analysis? Numeracy, they say, has a stronger relationship to wages than does literacy. Yes, it is a better measure of intelligence because it is more demanding. These authors tend to list their results rather than try to understand them, and all the important matters are strung out in a series of addenda.

There are no mentions of “intelligence” in the text, but the word can be found in two of the references. Presumably the censor missed those, or simply had to concede that some researchers use the term. No mention of “genetics”. 76 uses of “ability”. 68 uses of “cognitive”. Note these code words for your corporate survival. You may have ability, you may even have cognitive ability, but woe betide you if you have intelligence.

“Across the countries involved in the study, between 4.9% and 27.7% of adults are proficient at the lowest levels in literacy and 8.1% to 31.7% are proficient at the lowest levels in numeracy. At these levels, adults can regularly complete tasks that involve very few steps, limited amounts of information presented in familiar contexts with little distracting information present, and that involve basic cognitive operations, such as locating a single piece of information in a text or performing basic arithmetic operations, but have difficulty with more complex tasks.”

That, in a nutshell, is the problem of the normal distribution of skills. You can shift the distribution downwards or upwards. The shape will change somewhat depending on what sort of factors are keeping people back (disease, malnutrition, social restrictions). You cannot get rid of variation. However you define the levels, and wherever you set the cut-offs, you will find a distribution of abilities. How you deal with such disparities is a social issue. Pretending you can educate people out of showing individual differences is not possible, not if you are honest about displaying the result. So, if you do not want “any child left behind” you will have to prevent all children from working at their own pace. The slowest pace will have to be imposed upon all. Finally, although the authors do not intend it, the description they give in the above paragraph is a good explanation of what it means that one person is more intelligent than another.

More gems: “Foreign-language immigrants with low levels of education tend to have low skills proficiency. Immigrants with a foreign-language background have significantly lower proficiency in literacy, numeracy and problem solving in technology-rich environments than native-born adults, whose first or second language learned as a child was the same as that of the assessment, even after other factors are taken into account. In some countries, the time elapsed since arrival in the receiving country appears to make little difference to the proficiency of immigrants, suggesting either that the incentives to learn the language of the receiving country are not strong or that policies that encourage learning the language of the receiving country are of limited effectiveness.”  Note that the differences in skills are thought to be due to language, and that lack of ability is a matter of incentives and policies and, most of all, that any intellectual differences are due to language alone.  Language may be part of the picture, but the authors do not consider that ability levels may vary between immigrants and locals regardless of language.

Dissatisfied, I turned as a last resort to the Reader’s Companion. 

“Read this one first”, I thought, but it was a disappointment. Finally, I realised I needed to read the Technical Report. At this stage I gave up, fearing, dear reader, that you would have lost interest long ago. For all I know, there are secret messages in the repetitive slabs of tabulated data. I could not find a humble table of correlation coefficient between the main measures, let alone a factor analysis. There are some regression lines for country data, which are most welcome. Otherwise, it is death by a thousand tabulations.

Frankly, this is less well described than the average social science paper, and that is saying something. The whole thing is back to front: policy implications and conclusions are proclaimed first, then more conclusions are trumpeted, then some findings are picked out, and then finally, way in the background, they reveal some of the things you need to know to figure out if they have got it right, or even vaguely right. Perhaps our standard sequence in academic papers makes sense after all: explain the problem, explain the subjects and the methods, describe the results, discuss them including explaining why they may be wrong. And avoid having anything to do with the production of corporate brochures.

This lump of a report is not all bad. One can compare one country with another, which is the sort of thing governments like doing. I really believe that somewhere in the mass of the extended report there may be good things. On a broader matter, I am in favour of people measuring skills. That makes sense, because employers need a skilled workforce, and economies prosper if skilled people are moved to where they can contribute most. The report contains descriptions of skill levels, and that is a good thing. Skills make sense, and if you say that someone has the skill to drive a car, but not the skill to service a car, (another Gottfredson quip) that immediately makes sense to most people. We can distinguish between a driver and a mechanic. We can also understand that someone who can only handle one concept at a time should not be given the task of integrating disparate conceptual inputs. That cuts out being in the control room of most industrial processes. In does not preclude employment as a university teacher, where to manage one concept may lead to a successful career.

The problem with the “skills only” approach is that it strongly implies that is it only a matter of getting the right teacher, and the right attitude, and you can master all tasks.  If only the OECD could help me with structural equation modelling! Even a small grant would make all the difference.

If you want to say anything useful at all about why people don’t have the required skills you have to have a measure of their ability on the one hand, and a measure of the effectiveness of teaching on the other. (In that way you can judge to what extent and for which pupils teaching makes a difference).  Absent either of those measures, you have an interpretive problem. Absent both, you have a muddle.

Tuesday 8 October 2013

80 years on: classical and operant conditioning: the genetics

No sooner do I admit that I find it hard to relinquish any idea I have put on a slide and lectured upon more than three times, a message comes through about an ancient debate about the differences between classical and operant conditioning.

Naturally, I always had a slide which compared the visceral nervous system, high emotional tone, basic appetitive focus of classical Pavlovian conditioning with the lower emotional tone, higher cognitive focus of operant Skinnerian conditioning. Classical effects accounted for trauma, operant effects for ordinary learning. So, I had attempted a very crude differentiation, but was aware that deeper work was going on,  calling into question this particular divide, and looking at what was happening from an expectations perspective, grounded in the thought that the animals were trying to work out the contingencies in both and all cases.

Now Björn Brembs, Professor of Neurogenetics at Universität Regensburg says that, having stuck with this issue whilst most other had abandoned it, he has come up with a unifying explanation, backed up by genetics. He says:

“Operant and classical processes can be genetically separated, using the right behavioral experiments. What made these processes different was not how the animal was learning (i.e., operantly or classically), but what it learned (i.e., about external stimuli, e.g. Pavlov’s bell, or about their own behavior, e.g. pressing the lever in a Skinner box). Thus, in order to avoid confusion between the procedures (operant vs. classical) and the mechanisms, we had to come up with descriptive terms for the learning mechanisms. We arrived at ‘world-learning’ for the mechanism that detects and processes relationships in the external world and at ‘self-learning’ for the mechanism that detects and processes the consequences of an animal’s own behavior.”

Part of the resolution of the problem lay in realising that the response key in the Skinner box was acting as a Pavlovian conditioned stimulus for food, thus thoroughly confusing the picture. I am sure I gave the learning theory lecture at least fifteen times, and never thought of that, though I often had difficulty working through how the concepts mapped onto the broad range of human and animal learning.

I cannot give you more details, because the paper will be presented at the Winter Conference on Animal Learning and Behavior next February 2014. More details here. However, when the paper comes out, I will be the experimental animal. Will I embrace the new finding, or hold fast to my old slide (which I cannot find at the moment).

I have written this note for two main reasons.

1 I often wonder if anyone follows up on old psychology dilemmas. We should be a progressive discipline. We won’t advance as a science if we just abandon difficult issues, so this is a very welcome finding, and very much worth a look.

2 On a somewhat disconsolate note, Björn Brembs accepts that he will be speaking to a few dozen people at most, those being the few who are still interested in the issue and have survived long enough to hear about a potential solution. Perhaps you can drum up some people for the conference.

Now, sit back and relax while I place you in the experimental box.

Loci and genetic groups: The Keyhole Problem


It may be to belabour a point, but some errors take on a life of their own, and are resistant to disproof. Presumably they meet some need, personal or social. We all have favourite arguments and treasured ideas. We tend to abandon such positions reluctantly. It takes an open mind to go with the evidence, particularly as it sways to and fro. The scientific ideal of enthusiastic open-mindedness to new ideas and then the dispassionate evaluation of those notions is hard for all of us to achieve. In my case I am reluctant to abandon any position which I have depicted in a complicated slide and lectured on more than three times. Call it the Powerpoint Theory of Perverse Persistence.

At present people still argue that there cannot be real genetic groups similar to traditional races, because there is more variance within races (85%) than there is between races (15%). This is an argument put forward by Lewontin in 1972 so the fact that is is being discussed shows it has stayed in popular consciousness for 41 years. I should like to believe that some of my arguments might remain interesting for 41 weeks but Lewontin’s is a meme which has survived with a vengeance.

Let us try to understand this statement by considering the traditional racial classifications of Black Pigmies and White European. If the statement about variation is to be taken seriously, it means that there is more variability within Black Pigmies then there is between Black Pigmies and White Europeans. This is an odd assertion. Both groups have things in common which make it easy to distinguish one from another. Skin colour, for one thing. This is why we refer to “white” Europeans and “black” Pigmies. So, is there more variation in skin pigmentation within whites than between Europeans and Pigmies? No. This simple point was raised by G. Cochran, and should have been enough to dispose of the matter. However, it is possible that followers of Lewontin might argue that skin is a special case, and they are referring to other human characteristics. This is a significant concession. Presumably it means that skin must be considered part of the 15% which varies between groups more than it varies within groups. Perhaps the 15% contains most of the socially significant traits such as personality and intelligence.

However, Cochran goes on to wonder whether Lewontin’s argument might apply to height, which is brought about by very many small genetic effects, rather than just a few genes as in the case of pigmentation. Not so. Pigmies are all short, and neighbouring Bantus are as tall as Europeans.  In a mixed population, part Bantu part Pigmy, height is determined by the proportion of Bantu ancestry. The Lewontin variance approach is found wanting.

In some exasperation Cochran writes: “So Lewontin’s argument does not work.  You can’t predict group differences in trait values from the distribution of genetic variation – except in the limiting case where all of the variation is within-group, which means that the two populations are genetically identical.  You know you can’t apply it to other traits, whether they are influenced by a few genes or by many.  It’s not essential to know _why_ it doesn’t work – the mere fact that its predictions don’t come true is reason enough to discard it.”

So, why don’t people discard the “more variation within races” argument? Why don’t all commentators discard it? Cochran continues:

“We do know why, though. Selection generates correlated genetic differences. Selection for increased height causes changes in the frequency of many alleles, in principle at all loci that influence height, although that is still a small subset of the genome.   What matter is the difference in that subset: the overall distribution of genetic variation tells you nothing.  Moreover, imagine that in the ancestral population, there were two alleles for each of those loci – a short allele with a frequency of 0.7 and a tall allele with a frequency of 0.3. Suppose that after selection for height, the frequency of each short allele was 0.3 and the frequency of the tall allele was 0.7.   This could significantly increase height. In that subset of the genome, about 85% of the variation between those two population is within-group  while 15% is between-group.”

In the words of the song, not the technical terminology of geneticists, it is a case of “You got the Right Key, but the wrong Key Hole”. By a process of selection the frequency distribution of Long Keys has changed, but the overall number of keys and locks has not changed. The change has come about because there are now more functional links between Long Keys and Key Holes, resulting in generations getting taller and taller.

Evidently, since the 85-15 variance argument persists, this explanation needs to be given several times in different forms. Imagine you are in charge of a jail, and hold the key to each cell. You are told that the inmates are of different races, or different religions, or have different view on the relative contributions of nature and nurture, or just vary considerably in height. Whatever the reason, they tend to assault each other during exercise periods. Your task is to let the inmates get exercise without rioting. Using one selection of keys you release only one set of prisoners. Perhaps it is the short prisoners. They exercise, in their short way. Once they are back in their cells you release the tall ones, and they exercise in their lofty way. Neither the number of keys nor the number of locks has altered, but anyone closely observing the exercise yard would notice a significant difference in the two sets of prisoners. It is the subset of activated key/lock combinations which has caused the changes in the prison population.

Can you please find someone who still believes the Lewontin argument, and try my version out on them?  I may need to find yet further ways to explain it.