Sunday 30 November 2014

Intelligence lost at 1.23 IQ points per decade


Michael Woodley of Menie spends much of his time tending his ancestral estate, pacing the linen-fold panelled rooms of the ancient house, warming his hands at the towering stone fireplace and meditating on the collapse of the aristocracy, the paucity of contemporary innovation and the lamentable and persistent downward drift of the national intellect. Now he sends me a barefoot runner with his latest manuscript, which I have read as the autumn mists creep across the Nadder valley, before penning this reply for the poor urchin to carry back to his master.

Young Woodley avers that, not only are we going to hell in a handcart, but we are doing so at a pace which he can predict with some accuracy (1.23 IQ points per decade), composed as it is of two dysgenic effects: the dull have been reproducing with greater fecundity than the bright (.39), and increasing paternal age has increased the rate of deleterious mutations (.84).

In the spirit of the age, and with an ever-present concern for your health and safety, those of you who are of nervous disposition or advancing paternal age should turn away, and listen to light classical music.

How fragile is our intellect? Estimating losses in general intelligence due to both selection and mutation accumulation. Michael A. Woodley of Menie. Personality and Individual Differences 75 (2015) 80–84

General intelligence is an adaptation to solving evolutionarily novel and domain general fitness problems, i.e. problems which occur on an irregular rather than predictable basis throughout the course of evolution and which are complex, requiring the recruitment and coordination of large numbers of specialized mechanisms in solving them (Geary, 2005; MacDonald, 2013).

it has been argued that deleterious mutations exhibiting small effects accumulate within a population – persisting within genomes for long periods of time before substantially inhibiting fitness, thus giving rise to individual differences in general intelligence (g) and other potentially mutation-sensitive traits, such as health and physical attractiveness (Miller, 2000a,b; Penke, Denissen, & Miller, 2007).

The Breeder’s equation (Fisher, 1929) is frequently employed in studies of this kind:

R ¼ S _ h2

In this equation, S constitutes the size of the selection pressure operating on IQ transformed into a phenotypic change (i.e. the degree to which the trait will change over a generation assuming no biological regression to the mean, or perfect heritability). h2 represents the additive heritability of IQ. The product of these two terms gives us the expected responsiveness to selection, or R, which in terms of IQ is scaled as a change in ‘genotypic IQ’, or the degree to which the underlying genetic potential for a certain level of IQ should decline per generation (Lynn, 2011).

Woodley does a meta-analysis of 9 studies in the US and UK (simply for cultural similarity) to estimate this dysgenic effect on heritable g. This comes to an estimated decadal heritable g decline of 0.385 points.

It may be possible to crudely estimate the impact of mutation accumulation on g. Ideal for this purpose is the study of Kong et al. (2012) in which the numbers of de novo mutations in offspring were counted and correlated with the age of their fathers. Kong et al. found a strong linear relationship between the two (r = .97) amongst a sample of 78 Icelandic parent–child trios. They estimated an average increase in the offspring’s number of de novo mutations at 2.01 per year of paternal age. At 35 years of age, i.e. one familial generation (Kong et al., 2012), fathers are producing offspring with an average of 70 de novo mutations. Using this estimate, it is possible to determine the relationship between the increase in de novo mutation and IQ-loss as a function of paternal age.

These calculations are a little more complex, because they require re-examination of previous “corrections” for birth order. Perhaps there is a book to be written about the assumptions which underlie all statistical adjustments in psychology papers, to be called “The Correction of Corrections”.

The meta-analytic aggregate estimate of the loss in heritable g due to selection (estimated in Section 2 at q = .39 points per decade) can be combined with the loss expected from mutation accumulation, which was estimated at .84 points per decade. As the latter estimate was derived using structural equations modelling, error has been controlled, therefore the loss due to mutation accumulation is symmetric in terms of reliability and validity with respect to the meta-analytic loss due to selection.

The sum of the selection and mutation accumulation losses (i.e. the overall dysgenic loss) is therefore 1.23 points of heritable g per decade, or 4.31 points per familial generation. If the 95% confidence interval for the decline estimate due to mutation accumulation (the paternal age effect; i.e. 1.53–.14 points per decade) is generalized to the sum of both estimates, this yields upper and lower bound decline values ranging from 1.92 to .53 points per decade, or 6.72 to 1.86 points per familial generation.

One potential objection to the finding that g is declining by the amount claimed here stems from the Flynn effect, which is associated with an average increase in IQ of three points per decade (Flynn, 2009). The Flynn effect is however least pronounced on the most heritable and also g loaded IQ subtests (Rushton & Jensen, 2010; te Nijenhuis & van der Flier, 2013), which indicates that secular IQ gains occur at the level of less heritable and narrow abilities, rather than on g. Selection effects and the effects of mutation load on IQ are however more pronounced on the most g loaded subtests (Peach et al., 2014; Prokosch, Yeo, & Miller, 2005; Woodley & Meisenberg, 2013). On this basis dysgenic effects and the Flynn effect could co-occur – with dysgenics reducing the level of heritable g, and various environmental improvements raising narrow abilities simultaneously, via their effects on non-g variance. This hypothesis has been termed the co-occurrence model (Woodley & Figueredo, 2013).

So, there is a prompt and visible effect caused by the liberal use of fertilizer (environmental improvements) giving us a 3 IQ point gain, and a less visible and insidious worsening of seed quality (mutation and differential fertility) losing us 1.2 IQ points. All boats rise on the rising tide of affluence, but some are leaky.

Finally, Woodley links this finding to his work showing that reaction times are slowing up, which he sees as confirmation of the same dysgenic trend.

So, in reply to young Woodley, ever conscious that grim calculations of this sort might cause dismay to sensitive souls, I have placed another log on the fire, and sent him the cheerful and uplifting words of Walter Savage Landor:

I strove with none, for none was worth my strife.
Nature I loved and, next to Nature, Art:
I warm'd both hands before the fire of life;
It sinks, and I am ready to depart.

Wednesday 26 November 2014

Gypsy intelligence

I never spent much time thinking about gypsies. I had assumed that gypsies were gypsies, lived in caravans, bred horses and played violins in restaurants. People do stuff. It is probably better to earn money in a restaurant than to spend it there. With the passage of time I became more curious, particularly when definitional battles began to rage about travellers, itinerants, and the Romany peoples, with various spokespersons claiming priority in representing their interests, which often seemed in direct contradiction to other more settled people’s rights. Although all peoples are as ancient as other peoples in chronological fact, the Roma sometimes seemed to be claiming chronological priority, at least as far as their nomadic way of life was concerned.

So, it was with interest and some trepidation that I opened Jelena Cvorovic’s “The Roma: A Balkan Underclass” Ulster Institute for Social Research, 2014. ISBN 978-0-9573913-9-0.

Cvorovic concentrates on the Serbian Roma, with whom she has worked for 10 years. I had previously seen a film she had produced, in which different gypsy leaders spent much of their interview time explaining that their particular group were the real thing, and that the other gypsy groups lacked racial purity, and were giving the true Roma a bad name. Somehow, this clashed with the narrative I was expecting, and was possibly willing to support, that they were a minority who had been given a hard time. The film showed disordered settlements, and children living in severe poverty, some giving every appearance of mental backwardness.

Books are a better medium than film to get into details (though the film certainly had an impact). Cvorovic gives the quick background: the Roma are socially excluded (and exclude themselves) with life expectancies 10 to 15 years lower than the European norm, high infant mortality, and an 80% unemployment rate. The Roma, Gypsies, Travellers, Cigani, Manouches, Sinti showed up in Europe from the North West of India between the ninth and fourteenth centuries. No-one knows why. There are an estimated 10 to 12 million living on the margins of European society, either in niche occupations or “living off the land” which in some cases means living off other people’s property. Their code of conduct minimizes contact with non-gypsy people, and particularly abjures marriage with non-gypsies.

With great craftiness they found that Europeans in the Middle Ages received them with Christian charity, and deduced that these kind Europeans would sympathise with Egyptians, who after all had left Egypt searching for the promised land, as the Bible explained. Hence, they called themselves Egyptians, from which eGypt-sies derives, and cast themselves as dispossessed dukes, kings and princes from that land. Christians required documentary proof that these early asylum seekers were legitimate, and the gypsies willingly proffered a forged document from King Sigismund of Hungary, which represented them as penitent pilgrims atoning for their ancestors in Egypt who had rejected Christianity. As a result of the sins of their ancestors they were reduced to wandering the earth as pilgrims seeking charity.

Call me naive, but I think this an intelligent strategy, deficient as it may be in a moral sense. Incidentally, Roma morality is flexible on these sorts of matters: Non-Roma are seen as unclean and polluting, interactions with them are to be avoided, and theft and crimes against non-Roma are not morally wrong. Here are two articles on Gypsies:

I have left out much of the detail on the different Roma groups, but it is worth summarising the chapter on fertility. Cvorovic gives horrific figures for child mortality: 6 per 100 for Christian Orthodox Roma, 13 per 100 for Muslim Roma. By way of comparison, the highest global under-5 death rates are in Africa, at 90 per 1000, and for Europe only 12 per 1000. Hence, the Muslim Roma have the equivalent child mortality of 130 per 1000, and all were dead before their first birthday.  Roma mothers were younger than Serbian controls (23 vs 28) and even at that early stage had one third more children (2.42 vs 1.83) gave birth to underweight babies (2815 grams vs 3402 grams) had their first pregnancy far younger (19 vs 29) and had intercourse twice as often per week (5.4 vs 2.5). Maternity hospital staff classified 6.2% of Roma mothers mentally retarded, 3.2% deaf and mute and 1.6 mentally ill. The most likely explanation is inbreeding in very restricted groups, with significant genetic problems. Fertility differences are more striking looking at “ever born” rates (3.7 vs 2.0) and more so in terms of grandchildren (3.16 vs 0.63) for those at the end of childbearing years, though that possibility is, as far as I could see, not directly discussed.  The lower the intelligence the higher the number of children r= .509.  Within the Roma there is confirmation of dysgenic trends, the lower the intelligence the higher the number of grandchildren  r=.25

Assume, if only for a moment, that the Roma are not, as they are painted, a dependent lot of good–for-nothings, but a plucky minority who have been set upon by Europeans, though not set upon so badly that they wish to return to India. In terms of cultural theory, if the locals despise you and won’t let you participate, then you stick to your own kind and your own ways, and do the jobs the locals will not do or cannot do, and charge them the highest prices they can afford. On that account, the Roma should have gone on to great things: specialist crafts, entertainment, controlling the music business, money-lending, gambling, casinos and the like. Their schools should have been hothouses of talent. Indeed, they should have turned out like European Jews.

On the contrary, assessments of their abilities are uniformly low. Cvorovic explains that Roma children are assessed pre-school, and about two thirds diagnosed with “light mental retardation”. She gathers together published intelligence results, mostly using Wechsler tests, on reasonably sized samples and with local populations as comparison groups. After some 8 centuries one ought to be able to put aside the notion that the results are due to delayed acculturation. Adult Roma have intelligence scores very similar to the South Asian stock from which they separated centuries ago. Integration was not sought, and successfully rejected when imposed, programs of improvement failing to have any impact, even under strict Communist command.

For a wide variety of samples the average adult IQs are in the IQ 70 range. There is variation in terms of the countries assessed but as a rule of thumb the scores appear to be two standard deviations below the local norms. This is a very sizeable difference.









Scholastic attainments are usually 1 standard deviation below the mean. However, Roma children seem to be street wise, particularly on their home territories, and observation not investigated further. Their poor scholarship seems to be due to a mixture of low ability and a strong belief that education beyond primary school is of no interest or benefit. Their behaviour in school is often very disruptive. The table below shows English data for school exclusion.




This is an interesting book, drawing together many strands of research, and based on close study of gypsies in Serbia over a decade. I would have like an index and a technical appendix on the research projects reported in the text.

To my mind it shows that if a group of immigrants stick to their own extended family for marriage partners, restrict contact with the host population to the absolute minimum, and stick to their own cultural practices, there is almost zero impact from living in Europe for almost 8 centuries. The climate has done nothing detectable to them for 32 generations, nor has the spurned European culture rubbed off on them by some osmotic process.

The contrast with European Jews is instructive: both are minorities with distinctive cultures and world views; both have inbred to some degree; both have been subject to prejudice, ostracism and very much worse; both have struggled to find a niche in Europe, and yet both have (mostly) remained in Europe. However, there the similarities end, and the differences multiply. European Jews venerated scholarship, the Roma cannot see its purpose. Jews made themselves useful at the highest levels of the economy, barely tolerated but sourly respected for their financial and scholarly acumen. Gypsies made themselves resented at the lowest levels of the economy (though some recently became metal recycling millionaires after the fall of Communist heavy industry) and little respected for wheeling and dealing. Here is a thematic apperception test: what made the difference?

Perhaps it was only a difference in root stock: Roma from India, Jews from Italy.

Although they have made very modest contributions to European culture, and even less to the economy, there is one way the Roma have met with contemporary approval: they have maintained their genetic and cultural purity for roughly 32 generations, the essence of multiculturalism.


Some key references.

Bafekr, S (1999) Schools and their undocumented Polish and “Romany Gipsy” pupils. International Journal of Educational Research, 31(4) 295-302.

Dumitrascu (1999) Intellectual development of gypsy families in Romania. In WJ Donner and DL Dinnel, 14th International Congress of the International Association for Cross-Cultural Psychology, pp173-187. Lisse, The Netherlands, Swets.

Rushton, JP Cvorovic, J and Bons, TA (2007) General mental ability in South Asians: Data from three Roma (Gypsy) communities in Serbia. Intelligence, 35, 1, 1-12.

Bakalar, P (2004) The IQ of gypsies in central Europe. Mankind Quarterly, 44, (3-4) 291-300.

Kezdi,G and Kertesi, G (2011) The Roma/non-Roma test score gap in Hungary. American Economic Review, 101 (3): 519-525.

Sunday 23 November 2014

Second blog birthday



I am aware that a second birthday should be celebrated with two candles, but this photo was taken on my actual birthday, in a restaurant where it is understood that customers come to celebrate survival, not to dwell on the accountancy of age, and where “Happy Birthday” is sung ironically, sotto voce by the head waiter, a discrete rendition intended to honour the celebrant without inconveniencing other diners.

I do not know how long most blogs last, but I am pleased to be able to celebrate the second birthday of “Psychological Comments”. I aim to get readers interested in intelligence research by commenting on published papers without fear or favour, and following arguments wherever they lead.

In the usual spirit of these authorial reflections, I aver that blogging represents an historically unique perspective on the act of writing. Never before has it been possible for an author to know within minutes if he has found any readers, and where they come from. For example, I might write about something, fully aware that my main audiences live in the US and UK, only to find that for several weeks running my second most avid readers were all in the Ukraine. What did I say that attracted attention in that nation, with so many travails to think about at the moment?

United States              148,534

United Kingdom          33,201

Ukraine                        12,753

Germany                       9,944

Canada                          7,930

France                           7,420

Australia                        5,947

Russia                            3,658

Turkey                           3,078

Finland                          2,722

However, it is nothing new to find that authors cannot predict what readers will like. In common with authors since the beginning of scribbling, I can put lots of time into an essay I fondly imagine to be pretty good, and get lukewarm audience figures. On the other hand, a note which I post with slight embarrassment because I believe it is making a slight point only, probably unworthy of attention, but a fancy which crossed my mind, immediately engenders serious attention.

Here are the Top Ten:

1 Skills and demographic changes in the USA.

The United States of Mexico                                         3 Sep 2014,  1569

2 Familial risk factors account for criminality, not poverty per se.

Depraved on account of being deprived?                    25 Aug2014, 1935

3 A measure of the academic climate as regards intelligence research.

Helmuth Nyborg gets Watson’d                                  14 Nov2013, 2038

4 and 5 Two accounts which summarise the current position on intelligence

All you ever wanted to know about intelligence (bu... 14Oct2013, 2630

Intelligence in 2000 words                                           9 Dec 2013, 2097

6 The predictive power of attainments at age 7 on achieved social class at age 42

Give me a child until he is seven, and I will give...  20 May 2013, 3054

7 A detailed review of Nicholas Wade’s book.

“It’s the people, stupid”: a review of Wade’s “A T...  14 May 2014, 3070

8 Elijah Armstrong’s first paper, still riding strong

Flynn effect as a retesting, rule-based gain               2 Nov 2013,  3535

9 A re-working of data from Linda Gottfredson, interspersed with later work on high achievers, though pre-figured by Galton

The 7 tribes of intellect                                                2 Dec 2013,  4329

10 Despite all my efforts, my most read post is a very small item about the best way to depict sex differences in the variance of intelligence, a matter which was old hat in the 1960s but appears to be news today.

Are girls too normal? Sex differences in intellige...   8 Sep 2013, 5781

Of course, the 2013 favourites have had more time to build a readership, so there is some cultural lag, but there are 3 new entries which might flourish further next year.

From the start I used a stratagem to boost readership. I would send my intended posting to the relevant author upon whose work I was commenting, and ask them to correct any errors. I reckoned that this would improve the quality of my work, and recruit one university teacher and perhaps a research assistant to my readership.  I hoped they would encourage other readers to look at my blog. In recent months, when I send out the email I often get a shy admission that the author has been reading the blog for a year, though they have never mentioned the fact in an email to me, nor even left a comment on the blog, not even an anonymous one. To such persons I say: please comment, even if only briefly. Readers really like to hear from authors. They are intrigued by the arguments and clarifications which result from such interventions, and mildly flattered that the author has responded directly. It brings them into the shared tutorial, rather than looking in on an abstruse debate.

If you as an author want to make a long reply to a post, particularly if you want to counter my criticisms, my policy is to post your replies without necessarily making further comment myself. I don’t want battles, simply a fair exchange of perspectives. Incidentally, if I have been very critical, I often hesitate to contact the author. In those cases I think it best not to bother them.

Twitter is a great help. I use it to announce each post, and to abstract some phrases in order to tempt readers to get into the detail. Although the Twittersphere has its own methods and logic, and précis sharpens the mind, and sometimes the wit, my use of it is to bring readers to the blog. One can transmit much in 140 characters, but not everything.

Naturally, I have developed all sorts of ideas about my readers: that they have read a great deal, know their subjects, vary from experts in the field to interested recent voyagers into these cognitive waters, but that they are all signed-up members of the empirical project. I am often led by you into commenting on particular publications, or am sent your recent papers or pre-prints. Great. Keep them coming.

Now a personal comment to Anonymous. I understand you may wish to remain anonymous. Could you please find an anonymous name other than Anonymous? It gets me confused as to which anonymous has said what, and whether they are fighting each other, anonymously.  Use your intelligence, dear Anonymous. Anonymity is preserved by such nomenclature as Reader 127, correspondent M or N, or even slightly jokey names like Random Word, Sharp Insight, and Loose Talk. As far as I know, having just thought them up, none of these names are copyright. Just give me a clue as to which anonymous is saying what. Your name is legion.

Thank you to all of you who have loyally re-tweeted my tweets about each blog post, which is specially kind when done by celebrated bloggers like HBDChick and Jayman and others, all of whom have their own blogs to tend to. Commendations, mentions and re-tweets by figures like Steve Sailer,  Charles Murray and Steven Pinker greatly assist me. 

Twitter generated 18,314 page views; 12,952 and 3,888 so close to Twitter in total; then HBD Chick 3,606; isteve 2,690; marginal revolution 1591; and West Hunter 1429 among others.

Last year I said: Finally, I can claim that in one year 71,701 readers have given my words a look, as opposed to the modal 6 if I had published a paper.

At the end of two years I have written 418 posts, which is 4 a week, come rain or shine. Page views all time history at the end of two years:


That is a big jump for me, and many thanks to all of you. Twitter followers have increased from 199 to 597. If you have any ideas to help me reach out to more researchers and students, please let me know. The people I am after understand the basic rules of evidence based arguments, and prefer focussed discussion to sweeping generalisations. They are doubtful, cautious, helpful and do not respond to traditional inducements to participate in anything. A very stealthy approach will be required.

Now I have to prepare myself for the ISIR conference in Graz, from whence I hope to bring you all the best papers, which will probably take me an entire month. $35 from several of you would help defray my costs of flying there; $25 would defray my costs in staying there; $15 would defray my costs in eating there, and $10 would defray the likely cost of a coffee and an Austrian pastry. Donate $5 each and I will be able get there and back and still have the enthusiasm to make further forays to conferences on your behalf.

There’s a Donate button just below, on the right.

Friday 21 November 2014

Chimp digit span

At the same conference in Cambridge in July 1970 where I had lectured Arthur Jensen on the cultural explanation for black/white intelligence differences on Block Design and Object Assembly (he suggested these differences were unlikely to be cultural, and I argue that they were due to a lack of constructional toys in West Indian homes) I was also exposed to a stellar constellation of researchers, among whom David Premack made the most impact. He had been working with a clever chimpanzee called Sarah who had taught language-like manipulation of symbolic objects. Sarah’s skills astounded us, and we searched for artefacts in the experimental design to account for her uncommon abilities. Those of cynical disposition muttered “It must be Helen Keller in a fur coat”. Someone of greater breadth of intellect asked him what his ultimate purpose was in teaching a monkey to handle language. Premack replied: “So as to teach her how to pray”. I and others thought it an ironic put down, but those who quizzed him later came back astounded. “He was serious. He is a Skinnerian mystic”.


His Premack principle (1959) stuck with me, though I could never get child psychologists to implement it (reward a subject’s low frequency behaviour by letting the subject carry out a high frequency behaviour). Anyway, Premack triggered lots of very interesting work, based on the belief that our monkey cousins were brighter than they were given credit for. In 1978 he also proposed the Theory of Mind, with associated tests. Clever cookie. Respect.

Marty Seligman was at a previous conference where David Premack first presented his results on chimp language logic, and told me this story, gratified he had been present to witness this moment in the history of psychology. As Premack gave his lecture, and showed again and again the proofs that Sarah was capable of understanding language-like conditional probability in the manipulation of object symbols, the behavioural cynics tore in to him for a long time, trying to show he must be wrong. Finally, one hand was raised, and up stood  Keith Hayes who with Catherine Hayes had home schooled the young chimp Viki with their own child, and tried to get it to talk (for which task chimps do not have the appropriate vocal apparatus), all these heroic labours ending in abject failure with a mere 4 words mouthed.   Facing Premack for the first remark, and then turning to the audience to deliver the second, he said: “Looking back at all the work I did, I realize how stupid I was to try to get a monkey to talk. However, listening to the last half hour of criticisms, I am relieved to find I am not the stupidest person in the room”.

Here is a film that Roberto Colom sent me. When given to humans this particular physical digit-span like task is called Spatial Span on the Wechsler and I am reliably told that it is very difficult to do, and even more difficult for a clinical psychologist to administer, because you have to concentrate hard to make sure you have demonstrated each sequence correctly, and then have to concentrate even harder when the client quickly taps their repetition of the sequence (doing it fast before their memories fade, thus burdening the memory of the psychometrician trying to write down the results).

In the spirit of public education, I consented to be tested. First, you should note that at 10.45 pm, when my usual bedtime is 11 pm, I was not at my most alert. Second, I had been listening to the recordings of a mellifluous tenor who, aria by aria, was having a lowering and somnolent effect on my mood. Third, it is not considered good practice to be tested by one’s spouse.

5 forwards and 5 backwards, a scaled subtest score of 12. Long explanation follows: culturally biased test, stereotype threat, long term effects of being brought up in a cultural backwater in South America,  insufficient practice items, too late at night, lousy technique on my part which lacked any “chunking” strategies, marital competitiveness, and conflicting emotional undercurrents caused by doing the test in the kitchen, the scene of so many happy, nourishing family meals.

What a difficult test! No wonder Wechsler dropped it. See what a chimp makes of it.

Thursday 20 November 2014

How do you like your scores: raw or well done?


I do not think I could be considered a foodie. I enjoy good food, but that includes today’s lunchtime meal of bread, ham and cheese, then lemon cheese cake and raspberries. It was not elaborate, and the ingredients were not autochthonous. I doubt I could pass a blindfold test to distinguish this particular meal from similar breads, hams and cheeses. The meal was fine, and needs no further discussion.

Of more moment to me is how to deal with scores which arise from mental tasks. Here I have a strong preference for scores which are as raw as possible. This may be due to the teachings of Prof A.E.Maxwell , who said of his 1978 Basic Statistics: For Medical and Social Science Students:  “It must be one of the simplest text books on elementary statistics ever written.”

He was used to working with data sets by hand, which was extremely slow (his thesis was based on one factor analysis which took him three years) but allowed him to see how the actual bumps and declivities in performance scores translated into the final summary statistics. However, he was not entirely without guile, because when he suggested applying log transformations to skewed data I was mildly shocked. The fact that data are skewed, to my puritan mind, was a material fact, an aspect of reality, and I did not want it erased from view by statistical trickery, however justified and openly admitted.

Naturally, although a log transform slightly cooks the data, like real cooking it brings indigestible observations within the purview of standard statistics, in that it meets the need for normality of distributions. A declared manipulation allows data processing, like pounding and slow cooking tough stringy meat to make it edible.

I could also see the beauty of factor analysis, bringing orderly simplification to a maze of correlations. Dennis Child used the simple explanatory method of discussing vectors of force to resolve the resultant line of movement of an object subject to individual forces. For example, if an object is pulled by two equal forces at right angles to each other, it will move in a line which is at 45 degrees to those forces. The two weights are imposing real forces on the object, and the vector is the actual single path it follows. A “factor” in a correlation matrix is the vector which results when all the variables have exerted their forces. The “loading” of each variable on the common factor (vector) shows how close each variable is to that larger, simplifying, resultant force.

In that sense it is simpler and also more truthful to describe human abilities in terms of factors than to give a jumbled list of raw scores on many different tasks. g (for general intelligence) is the big vector which results from a whole lot of mental forces acting together. It is a distillation,

So, when people discuss the Flynn effect, they often argue about whether the effect “shows up on g”. If it does not, then there is case for saying that the observed changes are a case of IQ inflation rather than a real increase in ability. Dodgy tests, dodgy marking and distorted standardisation measures have confused us, say the g men. Rushton led this charge. Jan te Nijenhuis has continued it. I generally support this argument, and at the very least want to know if the gains are “hollow” as regards g. Flynn has a counter argument, which is that g is not the definition of a gain being real.

Allied to this discussion is the more technical one regarding “invariance”. Jelte Wicherts has done much on this, and Roberto Colom is right to draw attention to the changing g loading in his sample (see comments on Gignac paper). 

However, we need an explanation as to why digit span should have become less intellectually demanding. The task itself has not changed. It might have been more novel years ago, and then become humdrum, but commentators point out that increased use of mobile phones mean that numbers no longer have to be remembered, though that is a very recent phenomenon.

Some individuals, as a consequence of massed practice, have learned to “chunk” digits into groups for better recall. Indeed, as was clear to George Miller decades ago, musicians and chess players would have to be doing something like that in order to hold very long sequences in their minds. However, such massed practice is by no means the norm, which is just as well, because it brings few advantages outside very specific domains, and does not generalise well.

In summary, if someone can show me a decade long data series on a ratio scale such as digits recalled or seconds taken to respond to a signal, then I am very interested in the raw data, even in the minimally cooked form of means, standard deviations, and skewness and kurtosis. For that reason I am more influenced by the relatively unchanging means shown by Gignac than by g loadings per se.

However, I think I need to give Roberto Colom more space on this topic, so will do that after a break for a movie.

Wednesday 19 November 2014

Immigrant scholastic progress: A parent writes


What is going on is simple, and obvious to anyone with eyes - and I saw it happening in my kids primary school in a North Eastern city in England during that period of massive immigration through the 2000s.

The ability of the native population was on a bell curve, and the ability of immigrants was bimodal.

The Chinese (including Korean, Taiwanese, Hong Kong etc) were all in the top maths group and the top maths groups was mostly Chinese, one Jewish boy and a couple of locals. The bottom Maths group was I recall entirely Pakistani, Bangladeshi, Arabic, and African.  Indeed the best students in most classes were usually Chinese - even though there were local kids of middle class professional parents in a higher proportion than usual.

When we got the children's attainment test scores, we were given the scores for the whole class (anonymously) to give context, and I was astonished to see they were actually bimodal (for 90 children - not symmetrical - since there were a lot of highly able children, but with two peaks and a point of rarity between). I asked the teacher why - given that this is a very unusual distribution for a school, but she had no idea, and it seemed that nobody had noticed or commented on the fact. The proportion of immigrant children was so high that it made this bimodal distribution.

The deep problem is that concepts such as 'immigration', and even more 'diversity', are actually calculated to conceal and confuse, by lumping together heterogeneous and indeed contrasting entities.

When we are forced to debate using these categories things seem much more complicated than they really are - thereby people cannot follow, and lose interest in the 'debate' - which is exactly the intention. 

Tuesday 18 November 2014

Have backward digits sunk Flynn?


Repeating digits forwards is easy, and weakly predictive (.46) of general intelligence. Repeating digits backwards is harder, and more strongly predictive (.58) of general intelligence. Reliabilities are good if you give at least two trials for each digit string length. The task produces scores on a real, ratio scale, with a true zero, and thus is unusual in psychometrics in providing absolute results.

So, if there really is a Flynn effect, have digit spans have increased over the last century, particularly digits backwards, the better test of intelligence? “No” says Gilles Gignac from the bright blue skies of Perth, Australia. Not a glimmer of intellectual improvement since 1923. All this is as I had grimly expected. We shall all come to no good, just you mark my words.

Of course, perhaps digits backwards, demanding as they are, do not catch the full subtlety of Similarities or Vocabulary, or even Ravens Matrices. Let us dig around a little in this pre-publication paper, accepted by Intelligence.

In a careful approach, Gignac has gone back to the raw scores for longest digit spans forwards and backwards in the Wechsler intelligence test Digit Span subtest.




Gignac observes that if the Flynn effect is not acting on g and is not acting on short-term memory capacity, then it is hard to see that it is really acting on a broad range of fluid intelligence skills over time.

Turning the screw, Gignac points out that at the beginning of the century few people had to remember telephone numbers, but now we are inundated with long mobile phone numbers and login codes and the like, so there is a strong cultural reason for digit spans to have increased, but they have not.

He considers carefully the various explanations and details for the findings which might temper his conclusions, but in the end he clearly feels that it is very hard to explain how the Flynn effect, derived from standardised scores, can be real when it does not show up on actual raw scores of short-term memory.

Progress in education, but not as we know it


I had written out a reply to comments on a previous post

which had been left by Nick Hassey, but feel that the comments and my reply raise general points, so should be given as an additional post. If you look back at the original post, and all the comments, that will give you the context. On the other hand, you might want to skip all that, and just read the general points.


Thanks for your comments. I have gone through what I see as your main points. I think we are in more agreement than disagreement, but there are still differences which I feel I need to explain further.

“Progress in education measures a child's absolute attainment conditional upon their previous score.”

No, that was the whole point of my essay. The main use of “progress” in education is to say how far along the path of learning a child has got. If you make progress in maths you know more maths, and you have progressed towards numeracy, full stop.

Defining progress as the residuals on a regression line is a different matter, though it is also interesting. If you use previous scholastic results to do the calculation, you get one particular regression line. If you use intelligence test results you will get another, better, regression line. (For example, you can get reasonable estimates of intelligence before 4 years of age, and thus before much serious schooling has taken place, so it would be a better variable to use. Assessments at 11 years of age are more precise, but you can claim that education has had some influence on those later estimates). If you add some other variables such as poverty into the regression mix you will get other regression lines, and each of those modulating variables will carry assumptions and change the picture somewhat.

The residuals from all those lines, however, are of mixed origin, and will contain error terms as a consequence of measurement errors (low reliabilities). They may be due to unmeasured variables like motivation, but they could also be due to presumed acculturation effects. As I recall it, the data set did not allow “years in UK” to be directly entered into the regression (Prof Burgess will be commenting on all this shortly) and that might have given us a better understanding of likely causes of the residuals. I think the “motivation” explanation is plausible but unsupported at the moment, and the acculturation explanation is even more plausible but not measured directly in this paper.

Different measures of progress lead to somewhat different conclusions. A pupil who arrives from overseas but makes progress from primary school (not much knowledge of England) to secondary school (has picked up more knowledge of England) hasn’t thereby really boosted education in England. That could only be argued if they ended up far better than the locals. That is the case for some immigrant groups like the Chinese, but not for others. The best results would be achieved by careful selection, not random selection.

In the example you gave of the child being in the top 80% at 11 and again at 16 making "no progress" isn't quite right.

You go on to cover the argument about predicting progress on the basis of prior attainments. In fact, I was at pains to point out that that particular conclusion would have been foolish. I was simply drawing attention to a feature of residuals on a regression line. If a pupil progresses up the schooling system they learn more and more, but if they progress as expected (exactly as the regression line predicts) then there will be no residuals. For this reason the regression approach to “educational progress” can sometimes mislead. I think it led most of the journalists covering the story to think that immigrant “progress” of itself boosts final achievement, whereas the national end result is increased by some groups and reduced by others. In fact, in PISA national comparisons most researchers pay attention to immigrants, and often measure their progress separately. Rindermann and I have published a big international study on this, but we are holding it for the ISIR December conference #IQ2014. Hope to blog from there in mid-December.

It might be helpful to illustrate this general point with findings from the economic domain. Some poor sub-Saharan countries have shown faster economic growth in the last decade. This is often because they have been exporting raw materials to China. Their growth rate is much higher than “stagnant” Japan, which officially went into recession today. However, Japan is much richer than all sub-Saharan countries. “Rate of improvement” does not equate to having the highest level of actual wealth, nor do measures of adjusted educational progress necessarily imply better scholastic achievements in all immigrant groups.

The reason expected progress is used here (and in education) rather than pure attainment is to try and isolate as much as possible the impact a particular school (or area) is having on the attainment of its pupils.

Of course. Burgess is using a familiar “value add” measure of schools, and this is one way of seeing whether some schools are better than others. The question, however, is the best way to calculate progress. Pre-school cognitive ability is probably the most uncontaminated measure for judging school progress. Once we have a full genome for each child then that might become the gold standard. We are not there yet.

You then go on to discuss the “London effect”. I would not start from London, because it makes much more sense to look at the full sample before the particular towns, which have individual immigration histories. The full sample in this case is England, and that is why I quoted the Deary et al. (2007) paper. We know what causes scholastic attainment. In the main, at 0.81, it is prior cognitive ability. The advantage of cognitive measures is that they are less influenced by school teaching effects than curriculum based assessments.

So, I would not get into any arguments about London or Birmingham effects until I had some cognitive measures to look at. Absent those measures, we would be fighting over scraps of variance.

You then go on to explain what Burgess is trying to do: showing that the presumed London effect is due to race and immigration status, not fancy teaching. I think this is probably right. I can only say “probably”, because cognitive measures are missing, but are being implied from previous performance, which may be partly modulated by lack of English in new immigrants.

As you can see children in London (and Birmingham) score above average, so their pure scholastic ability is high.

Again, Burgess’s argument is that if you control for racial composition, those apparent effects vanish. Better to look at the whole picture first, individual area variations later, and only if they depart from the general pattern. Looking at your argument, I think we are entirely in agreement on this!

Now, as to the “equivalents” I didn’t bother with these, but should have, and should have made the adjustments as discussed in the paper. My major point was that, even within real GCSEs, we do not have full equivalence. If you look at the Deary results, there are many GCSE results which don’t require much intellect, but count just as much in most of the statistics. Schools are perfectly able to “game” the system, and many of them do. All the ways in which this can be done (selecting which pupils take which exams, which exams to take generally etc) are worth a separate paper. I had suggested having a single examination in adulthood to evaluate achievement, as was done in the OECD study I mentioned. This would give us a socially interesting result, in that it allows us to understand occupational histories and later wealth. If you look at the links, you will see that I am a bit exasperated that the OECD don’t mention intelligence, but keep finding it again and again in their results.

Also, spending money on education does not always give results, certainly not above a reasonably low threshold. Andrew Sabisky showed that US schools were not providing good value for money relative to international expenditures.

Finally, as regards you arguments about Chinese and Indian students, here again we are in agreement, so I don’t think we have to agree at length!

However, at the end of your comments, I think we drift apart again:

Even controlling for ethnicity, gender, month of birth, economic background and the level children started at – London schools still do a better job of getting more pupils to the highest grades than we would expect.

You use the phrase “than we would expect”. Expectations are an elastic concept. I have strong reservations when any researcher does the traditional “corrections” for economic background or socio economic status. Burgess does it, but so do most educational researchers. I thought I had posted about it many times, but my explanations have not been succinct enough, so it is good to try to put them down again in better form now.

Jensen called it the “sociologist’s fallacy”. If you “control” for socio-economic circumstances you assume that low intelligence or low application played no part in that person’s economic circumstances. That is, you are saying that every poor person is poor because of an external force, and ought to be compensated for it in the statistical treatment, despite the fact that low ability and lack of application is a frequent cause of poverty.

Here are some comments I made about immigrant results on 5 December 2013:

PISA have fallen for the sociologist’s fallacy that socio-economic status is entirely imposed externally. That is, that you are poor because the system is stacked against you, rather than that the system responds to how much you work and how much you save.  PISA have “corrected” for this. Some immigrants are poor because they have low skills and low ability. Some immigrants are poor because they have low skills and higher ability but haven’t been allowed to enter an open economy in their home country. Some immigrants have high skills and high ability and are rich. We need better calculations here. Plotting out the immigrant results by years of residence would make the effects easier to understand, as would identifying where these immigrants come from.

Furthermore, intelligence is a better predictor of social class of attainment than is social class of origin. A parent’s social class accounts for only 3% of the social class mobility of their children.  The ability of the individual child accounts for 13%. Simply talking about the apparent effects of class does not speak to the question of ability. Both need to be measured in the same samples, and then compared for predictive power.

Daniel Nettle covered this in an interesting 2003 paper, which I then used to calculate the social class composition of university entrants, according to how demanding the universities were in their entrance standards.

This was one of my earliest posts, which reminds me that I am almost at the blog’s second birthday.

Sunday 16 November 2014

Adopt a child, but discard an illusion

It is surely part of loving altruism to bring up someone else’s child. It is not entirely selfless, for adoptive parents seek to have children by the best means possible, and to reap the rewards of the love they give in return for the child’s love of them. But, and it is a big proviso, adoptive parents have to be sanguine about how much they can influence their adopted infant. All the loving and hoping in the world will not change the child’s abilities a single jot. The child remains someone else’s child, product of other seeds and eggs.

Kevin M. Beaver, Joseph A. Schwartz, Mohammed Said Al-Ghamdi, Ahmed Nezar Kobeisy, Curtis S. Dunkel, and Dimitri van der Linden. A closer look at the role of parenting-related influences on verbal intelligence over the life course: Results from an adoption-based research design. doi:10.1016/j.intell.2014.06.002

Beaver et al. have taken the Add Health sample (a representative sample of 90,000 schoolchildren collected in 1995 which included a sample of adoptees) and used it to have a very close look at the putative effects of adoption on intelligence. If nurture has any effect on intellect, adoptive parents should have an impact on the abilities of their adopted children. We already know that about 60% of the variation in intelligence is due to genetic factors, (and that 56 to 94% of the covariance between SES and childhood intelligence is due to shared genetics), so it is a matter of great interest to find out what causes the remaining 40% . Being “socialized” or cultivated by specific family practices seems, prima facie, a possible cause of variation in mental ability. Very probably, if families have an influence on intelligence it is equally likely they would have an influence on behaviour, which should show up on personality assessments. Hence the interest in trying to disentangle purely genetic from truly familial environmental factors.

Sample attrition over the years resulted in a final sample of 286 adoptees, a good number as these things go, given the comparative rarity of adoption today. Intelligence was measured using the Peabody Picture Vocabulary Test, which the authors correctly describe as assessing verbal intelligence, though it is also one of the highest predictors of general intelligence. Eight parenting measures were used. By the way, you might want to think about what those measures should be. We assume that parenting does something, but what aspects of parenting?   The sample were studied for both father and mother’s disengagement, attachment, involvement and education. The measure included how much children talked with each parent, and how close they felt to them. Of course, education is also a part surrogate for intelligence, so it is hardly a pure environmental measure, though it is often incorrectly treated as such. All 8 measures were related to children’s intelligence in the whole sample, but very weakly, at betas of .14 or less, often much less. For the adopted sample the highest beta was .16 and that was for paternal education. So, overall, parenting does not have much effect on children’s intelligence.

The authors conclude: The results thus far have revealed that parenting measures tend to have very little influence on variation in IQ scores in adolescence and young adulthood.

This is a very instructive negative result.

The authors then carry out a one-egg two-egg comparison: we use the MZ difference-scores method, where difference scores for IQ and all of the covariates are estimated and the twin pair is the unit of analysis. By using this approach, genetic and shared environmental influences are held constant (i.e., the only reason that there would be differences between MZ twins is because of nonshared environmental influences) and, in line with the adoption design, the effects of the parenting measures are not confounded by genetic influences. Importantly, though, the MZ difference score directly models the parenting measures as nonshared environments. This is particularly salient because findings from behavioral genetic studies have shown that nonshared, not shared, environments are the most influential for adolescent and adulthood IQ (Plomin & Spinath, 2004).

“Nonshared” is not an easy concept. I was really warming to Beaver et al. and now they have put their foot in it and require nomological re-education.

I think they mean that “personally created niches” account for variance, not the “standard family environment” provided by parents. You will see in my post, linked above, that one person described to me how as a child she build her own shed in the garden so that she could study on her own and avoid her disruptive parents. Apparently, this counts as a “non-shared” environment. I have difficulty believing that counting this as an environmental factor is being done with a straight face. The kid, being bright and sensible, decided she would do better in life if she avoided noisy family rows. It was her choice to move to the shed, and tidy it up and make it into a study. She created the environment: an act of creation, not a passive response to the woody smell of a garden shed.

Anyway, when MZ twins are studied, none of the differences they experienced in the way they were parented were significantly related to differences in intelligence.

In summary, this is a carefully presented analysis, showing an importantly negative result. The authors go through a number of possible explanations, and here is my immodest account of why they got their results. They had a much better sample than usual. They measured over a longer period than usual, from adolescence to young adulthood, which is when the finished product of family life hits the streets. They did a thorough job, and have shown that an expected effect of parenting was not present. Contrary to all expectation from strong environmentalism, the supposed formative effect of social class and family life does not accumulate: it diminishes as children leave home.

In the 60s we really thought that socio-economic status was like an artillery shell that fired a shell into the distance. The more wealthy, privileged and powerful the gun, the further the shell was fired (whatever the genetics of the child). It was all those books on the shelves, and the proper use of multisyllabic words at the dinner table that drummed ability into the crania of privileged brats. Take a child, any child, and put them at the dinner table (after a good wash and a medical examination) and by 7 they are on the road to mental adroitness, and by 17 surely ready to rise to the top and triumph over all. However, it turns out that by 17 kids are more like their real parents (whom they may have never met) than their adoptive parents. The “family pushes you forward by environmental means” hypothesis is not supported by the best available data.

The authors make a very good pair of final points:

While the results of the current study revealed convergence across all of the modeling strategies, we would advocate that the most methodologically defensible approach to use is a genetically sensitive research design. Two reasons inform this recommendation. First, there is ample empirical evidence showing that genetic influences account for a significant proportion of variance in IQ scores and in parenting measures (Jensen, 1998 and Kendler and Baker, 2007) and that at least some of these genetic influences overlap between IQ and family/parenting (e.g., Trzaskowski et al., 2014). What this necessarily creates is a prime example of a confounding variable, one that must be taken into account in order to rule out spuriousness. Second, even though the results from our non-genetically informative analysis revealed very little parenting influences on IQ, a large body of existing studies shows a very different pattern of results. It is quite possible that these significant parenting effects are simply due to genetic confounding (Harris, 1998) and the only way to know for certain is to employ methodologies capable of ruling out this explanation. Moving forward, therefore, studies need to more fully rule out genetic confounding before claiming that family and parenting influences represent causal contributors to IQ. Failure to do so will leave them open to attacks based on model misspecification and erroneous conclusions regarding the true effect of parental and family socialization effects on IQ.

In the plain language of this blog: when studying children’s progress into adult life, particularly their abilities and achievements, you must include measures of early IQ. If you don’t, you will misinterpret your results. If you misinterpret your results, this blog will hunt you down, and politely point out your error. Then you will be very embarrassed, and have difficulty explaining yourself, and your colleagues will shake their heads very sadly behind your back, whilst seeming to be supportive in public. In sum, it would be better to let intelligence measures play a part in your investigations.

Friday 14 November 2014

Immigrants, scholastic ability, and journalistic ability


The press have given considerable attention to a paper by Simon Burgess, Professor of Economics at Bristol “Understanding the success of London’s schools”. CMPO Working Paper Series No. 14/333, University of Bristol, October 2014.

The BBC splashed it as: Diversity 'key to London GCSE success'


The Guardian as: London’s GCSE success due to ethnic diversity in capital’s schools

The Daily Mail: Ethnic diversity 'boosts GCSE results': Cities with large numbers of children from immigrant backgrounds do better because they work harder. Schools in London and Birmingham have good results due to minority pupils. White British students make slower progress as they are less ambitious. Bristol University found that ethnic minorities have greater expectations

The Times of London used it in an editorial (12 November 2014) “High Class Immigrants: Research shows that migrants do well at school and help the locals too”. They admit the research has yet to be peer reviewed, so I am stepping into the breach, with nothing but the public interest at heart, as always.

As you will see, the Press have tended to suggest that success is due to diversity, and that immigrants are improving outcomes, rather than schools are improving outcomes. Let us have a look at the actual paper, the link to it being shown below.

Prof Burgess argues:

We showed some time ago that ethnic minority pupils make better progress through school than white British pupils (see Wilson et al (2005, 2011) and Burgess et al (2009)). Given that these pupils typically live in more disadvantaged neighbourhoods and come from poorer families, their advantages must be less material than books, educational visits and computers. It is argued that ethnic minority pupils have greater ambition, aspiration, and work harder in school. This is the main argument here – London has more of these pupils and so has a higher average GCSE score than the rest of the country.

..there is a London premium in pupil progress of 9.8% of a standard deviation. I show that ethnic composition matters a great deal: in fact, differences in composition account for all of the gap. If London had the same ethnic composition as the rest of England, there would be no ‘London Effect’. Furthermore, there is no significant difference between the progress of white British pupils in London and in the rest of the country. Looking at conditional pupil progress, a London premium of 11% is also entirely eliminated by controls for ethnicity; this is also robust to conditioning on pupil and neighbourhood characteristics. Nor is this a new phenomenon: the London progress premium has existed for the last decade and is entirely accounted for by ethnic composition in each year.

Comment: “9.8% of a standard deviation” is hard to understand. In an intelligence test it would be equivalent to 1.5 IQ points. In terms of overall mean GCSE scores I calculate, from other data, that it would be 1.8 points out of an average of roughly 42.3 points.

There is nothing inherently different in the educational performance of pupils from different ethnic backgrounds, but the children of relatively recent immigrants typically have greater hopes and expectations of education, and are, on average, consequently likely to be more engaged with their school work. These results help to explain the ‘London Effect’; they do not explain it away. My argument is that the London effect is a very positive thing, but much of the praise for this should be allocated to the pupils and parents of London for creating a successful multi-ethnic school system. By the same token, there is less evidence that education policies and practices had a large part to play in terms of innovative policies.

The claim that “There is nothing inherently different in the educational performance of pupils from different ethnic backgrounds” does not accord with most research on scholastic ability, but that is what makes the paper more intriguing. It appears to be asserting something which is contradicted by the actual UK results as published by the relevant government statisticians, and does not accord with international data.

To get down to the detail, Burgess has used the standard “Best 8” procedure: the scores on the best 8 GCSE exams from the National Pupil Database 2012/13 are used to assess scholastic attainment. GCSE results are given in grades which coincidentally go from 8 for a A*, 7 for an A and so on down to 1, a system which loses all the fine detail of the actual percentage results, and also potentially penalises those brighter students who take many examinations (Burgess says he followed this procedure because it does not “over-reward” such students, though it blunts the achievements of bright and diligent students), all this without really controlling for course difficulty. Deary et al. (2007) give the full results, as well as the best 8, and show the detailed results for each major exam, which is very instructive. That paper shows, among other things, that individual sciences are taken by very few pupils. Sadly, the Deary et al. publication also has to use the crude grading system, which is a regrettable consequence of grading. What is the point of examiners marking papers in detail and then the system trashing the results by reducing them to grades?

But the main results of Deary et al. are salutary for any researcher seeking to talk about scholastic achievement. They did a 5-year prospective longitudinal study of 70,000+ English children looking at the association between psychometric intelligence at age 11 years and educational achievement in national examinations in 25 academic subjects at age 16. The correlation between a latent intelligence trait (Spearman's g from CAT2E) and a latent trait of educational achievement (GCSE scores) was 0.81. General intelligence contributed to success on all 25 subjects. Variance accounted for ranged from 58.6% in Mathematics and 48% in English to 18.1% in Art and Design. Girls showed no advantage in g, but performed significantly better on all subjects except Physics. This was not due to their better verbal ability. At age 16, obtaining five or more GCSEs at grades A–C is an important criterion. 61% of girls and 50% of boys achieved this. For those at the mean level of g at age 11, 58%achieved this; a standard deviation increase or decrease in g altered the values to 91% and 16%, respectively.

Simply stated, if you want to talk about the causes of GCSE results at 16 you ought to quote this paper and you ought to distinguish between psychometric intelligence at 11 and educational attainment thereafter.

Burgess then goes on to explain: The best way to isolate the contribution of schools, and by extension a city-wide school system, is to analyse pupil progress: to see how well pupils do at GCSE taking account of their prior test scores before entering secondary schools. This necessarily focusses attention on secondary schools (see Greaves et al 2014 for a discussion of primary schools). The prior test scores are each pupil’s performance in the Key Stage 2 tests at age 11, in English, Maths and Science. I define pupil progress as the residual of a regression of GCSE capped 8 points score conditional on these KS2 test scores.(my emphasis)

So, when Burgess talks about “progress” it is not progress as we might usually understand it, in the sense that students are judged by how far they have progressed from ignorance to knowledge (achievement), but how far they have progressed given their earlier achievements. A child who has done badly at primary school but has then improved at secondary school will be judged to have made greater “pupil progress” than a much higher-performing child who remains at the top of the class throughout their schooling. Burgess does a further version, “conditional pupil progress” which includes a correction for poverty, as measured by being eligible for free school meals, but the main problem still stands. (Progress estimates are also compounded by the questionable assumptions of the “poverty correction”, but we have covered that many times before).

The paper is not about the final achievement of the pupils, but about their progress when allowance is made for their prior ability (as demonstrated by early assessments at age 11). We seem to have confusion between “progress” (achieves a high standard) and “progress” (improves when earlier ability is taken into account). The former is of benefit to society (school leavers able to work or do further study), the latter is one way of estimating whether secondary schools add value, all things considered. Measuring cognitive ability on a “school far” test would be a better way of getting a baseline for later estimating added value in scholastic achievement, but unlike the Deary paper, no cognitive estimates are shown in this paper.

You might, at this point, wish to stop reading.

Burgess is trying to calculate the value that secondary schools add to children’s scholastic achievements. This is a valid exercise, particularly if those schools are talking up their achievements, and boasting without proofs. However, “pupil progress” does not entirely achieve that aim. In fact, the schools may be adding lots of value, but getting results which are in line with predictions based on ability.

To explain this, in words short enough to be understood by science journalists and the leader writers of The Times (at one time a paper of record), consider the following. A bright child, more scholastically able than 80% of their class finishes primary school with good marks. The child doesn’t know very much, but they are learning with each year of education. They go on to secondary school. At age 16 the child is still more scholastically able than 80% of their class but they have learned a lot more. Using this type of pupil progress as a measure will make it seem as if they have not made any progress. In the jargon, there are no residuals from the regression line. They are doing no better than expected, though they are showing more knowledge and more developed skills in actual fact. The progress measure does not show us what level they have progressed to, but only the distance they have travelled according to various assumptions, including assumptions about poverty.

Of course, only a fool would think that children hadn’t learned anything because they had progressed up the system at their usual speed.

Now consider a child who does less well at primary school, perhaps because they are slow to mature, or a recent immigrant. At 11 they are in the bottom 20% of the class. At secondary school they mature, or in the case of immigrants, learn English. Now they do better, and rise to the 30th percentile of the class, a massive 50% improvement. Bingo, they have made a brilliant contribution to the progress score for the school. If the school is being judged on this sort of “progress” it would be smart to find lots of young immigrants who have much to learn.

So, if cities like Birmingham and London have lots of recent immigrants, those groups will do poorly at primary school but, as they become acculturated to England, may subsequently do better. The White British locals will already be acculturated, so there is no progress for them to make on that front. This supposition is confirmed in figure F2 which shows that the higher the non-white population the higher the “progress” score.

Burgess goes on to explain that Birmingham shows a greater “London effect” than London. Of course, intelligence research usually shows that brighter people move to cities, but I think that this particular finding is an artefact of the progress measures being used in this paper. Having lots of recent immigrants will increase the likelihood of apparent progress at later ages.

If you look at the “large city versus rest of England” contrasts in the paper, Birmingham is far ahead, London slightly less so, Manchester a bit behind the national average, and Liverpool very close to it. It may be due to the proportion of Asian pupils (look at Table T5).

Burgess also explains that those immigrants who did not complete the 11 year old assessments (perhaps because they arrived as teenagers) were dropped from the analysis, so we cannot judge the progress of this minority. They tended to have lower GCSE scores, possibly because they were late to acculturate.

In fact, a real measure of progress would be to find a test or broad range of tests which could be given throughout early life and into adulthood. Perhaps a wide-ranging general knowledge and skills evaluation (roughly like those carried out by the OECD) would show how well a pupil had been prepared for earning a living.

You will note that the OECD is surprised and concerned to find that at the end of formal education large numbers of people cannot do very much in the economy.

There is something missing from the Burgess paper, which is to answer the question: How good are pupils’ scores at the end of secondary education? It is very hard to find the answer to this question in the paper. I think, but cannot be sure, that the answer is given in Appendix 3 on page 33, which may be further than most science journalists are willing to read, assuming they have read the paper at all.



Burgess has chosen to show all the GCSE totals in standardised scores, which makes them harder to interpret. Plain statistics are always preferable, and the actual scores would allow immediate comparison with other publications, whereas standard scores obscure those key benchmarks. The standardised scores also obscure the pass rates which, as we will see later, are a major cause of distortions in reporting school progress and racial gaps in achievement.

It seems that the highest achieving students are Pakistani, then Black Caribbean and White British and Bangladeshi, and the lowest performing are the Chinese. Obviously, I have made a mistake in reading this table, so I turned from the paper to the latest Government statistics for the relevant year, 2013.

One of the headlines is: Chinese pupils remain the highest attaining ethnic group. The percentage of Chinese pupils achieving 5 or more GCSEs at grade A* to C or equivalent including English and mathematics is 17.5 percentage points above the national average.

This is in line with everything we know about the intellectual and scholastic ability of the Chinese. I have apparently read the table back to front. However, if Appendix 3 is correct, then in the UK in 2013 75% of students get a pass mark, and have an equivalent IQ of 110. Time to go to other sources of final GCSE statistics.

Here is the data on scholastic attainment in 2011. It gives the results in 2007 and 2011. Without intending to, it also shows how the gap between ethnic groups can be manipulated by making exams easier, which can also be done by giving the best results without requiring that they include English and Maths.

To save time, here are the scores in that posting:



To see how the more recent test 2011 results look with an easy 58% pass rate, concentrate on the higher maroon histograms on the right of each pair. The Chinese and Indians are ahead, the rest gradually falling near or behind the White British level, with Black Caribbeans last except for the small numbers of Roma. Now look at the earlier 2007, harder test results with a 45% pass rate marked in light purplish blue. Notice how the Chinese and Indians are still ahead but the other groups are in more difficulty. The scores are proportions passing, not the actual scores, which would show the Chinese even further ahead. Good trick, isn’t it?

La Griffe du Lion explained how this was done in 2004, and educators have not been shamed into dropping it. If you make the pass rate a little higher every year by making the test easier, for several years you will get an apparent closing of the gap, without any fundamental change in the scores. This is because the apparent percentage gap is a function of the two bell curves and the pass mark which is being used as a cut-off.



Consider two populations, the one shown above being better at scholastic achievement than the one shown below, such that 50% of the top group can pass an exam and only 16% of the bottom group pass that same exam. There is a mean difference in scholastic attainment, shown by comparing the distance between the two means. The newspaper headline figure, for those who are not used to looking at normal distributions, is that there is a 50% - 16% = 34 point gap. That makes a good headline, even though it depends on a particular pass rate, and ignores the best measure, which is the mean difference shown in the figure above.

At that point, if you are an educationalist with a political position, announce you are going to transform the educational system (as Bush did in Texas in the 90s). Now, without changing the schools or the teachers, change the pass rate slightly, either by making the exam slightly easier or just passing more children with lower marks, or a bit of both. Keep doing that every year and the apparent percentage gap will eventually come down to 16% - 2% = 14 point gap. Of course, the actual ability levels have not changed, and the areas under the normal curve have not changed, but by moving the cut-off point and using the misleading point-gap statistic you can probably fool most journalists. (Once you have got to the end of the curve the trick runs out of steam so, flushed with success, you move to another school district at a higher salary and repeat the trick).

I think it is time to attempt a summing up. As far as I can see, the whole of the UK press, in company with Prof Burgess some of the time, has misunderstood what was being measured and has drawn conclusions which are unsubstantiated, and very probably wrong. It is alarming to conclude this, so I welcome anyone who can point out my mistakes and misunderstandings. (I generally ask authors for comments anyway, and post up their replies without any further comment). I think they all got it wrong, utterly wrong, but I may be mistaken.

The proper and fully validated conclusion should be:

“Progress” measures do not equate to scholastic achievement, so this paper does not inform you about final achievements at age 16. If you want to find out about those, read government statistics, though those can be confusing. Furthermore, these data do not allow you to make assumptions about the amount of effort students are putting into their work (which was not measured), and whether immigrants are desperate to get ahead. They may be, but this paper cannot confirm that. The findings could well be an artefact of the progress measures used, because a low starting point leads to more progress, even if the end result is average.

I don’t do policy, but here is some advice for Head teachers and education authorities.

Schools will get good achievement results if they can get bright pupils. About 65% of the variance in scholastic attainment is due to prior intelligence. If Head teachers and education authorities want to be totally cynical, here is some advice: If you are allowed to give children intelligence tests, use those to select your pupils. Failing that, if you can look at their prior achievements, use those. If are denied the right to choose on that basis, find children with well-educated parents, even if they are very poor. Try to pack your school with such children, whatever their race. The educational level of parents is an intelligence surrogate measure, and a better predictor than wealth. If you are not allowed to select on parental intelligence, pack in as many Chinese children as possible. Then select Indians/Asians of professional rank, and all Irish, and Whites. Avoid other groups, particularly Roma. Your school will look good in terms of final results. Be highly selective in your “diversity”. If inspectors come to call, show Chinese and Indian students staring down a microscope.

On the other hand, if you want to be even more cynical, and want to be judged not on the final achievements of your students, but on a measure of their progress, make sure you find children with a low starting point in primary school. Find any child whose parents are poor (because “adjusting” for poverty boosts their scores regardless of the cause of poverty). Pack the school with recent immigrants who cannot speak English and who have not adjusted to life in England. Their low scores will make you look good, because with every passing day they will watch TV, speak to English kids, walk the streets and read billboards, newspapers and listen to radio. As they acculturate it is likely they will do better at school, if not in absolute terms, then in the more elastic relative terms. For really dramatic results, try to avoid Chinese children. They are bright to begin with, and on your dodgy progress measures they won’t show much progress.

In conclusion, it is a great pity Prof Burgess’s paper did not contain any psychometrics, which would have fleshed out his argument. It is also a pity that he has allowed himself (some of the time) and his listeners (virtually all of the time) to conflate progress with achievement. He has sought to make a particular point: if student progress (allowing for previous achievement) is the criterion then the London effect is spurious, and probably due to immigrants doing better in secondary school than primary school. One cannot attribute to the quality of schooling results which are probably due to immigrants showing progress from a low level in primary schools to a higher level in secondary schools as they get used to the local white culture.

The progress-in-the-light-of-former-achievements measure should a) mention the proportions of recent and more established immigrants, and b) should be discussed in the more important context of the end results: GCSE results by ethnic group. What has happened is that Prof Burgess has drifted from making his first point into making an un-validated and incorrect second point: that immigrants boost school performance. The official statistics show that it depends on which immigrants. Chinese students will do wonders for attainments, Black Caribbeans far less so, Roma not at all.

Economics has been called the dismal science, but the reception given to this paper reveals the dismal level of science reporting in the United Kingdom, certainly as regards psychology.

Please reassure me I am not the only person in the world who detects fatal errors in the conclusions drawn so enthusiastically from this paper by so many journalists.

Comments please.

Deary, I. J., S. Strand, P. Smith and C. Fernandes (2007) Intelligence and educational achievement. Intelligence 35, 1, pp13-21.

Thursday 13 November 2014

The Imitation of Intelligence

Benedict Cumberbatch starring at Alan Turing alongside Keira Knightley in The Imitation Game.


To the Science Museum last night, to see the London premier of The Imitation Game, a film about Alan Turing. Museum staff took ages to manage the audience, and were at cross-purposes as to how to check that customers had bought their tickets. Collectively, they gave the Museum a bad name, and their procedures were no advertisement for science. It gave me time to look at steam engines, and to wonder if I have been too harsh to researchers who claim we are in the grip of dysgenics. The director was interviewed before the film, which was the wrong way round. It meant that many of his answers were of the form “as you will see in the film, but I won’t spoil it for you”. Placed at the end, we could have asked specific questions. Perhaps the Science Museum staff have too high an opinion of themselves.

The film faced a difficult task: it had to show highly complex crypto-analytic techniques in a way which would engage a general audience. Discussing the actual techniques was out of the question, so artistic licence was used in industrial quantities, and in the end the story was that the Germans had built a fiendishly complicated machine and Turing, deeply misunderstood and hindered by colleagues who were fools, eventually built his own machine which solved the puzzle. If you are interested, read Wikipedia or the score of books on the subject. Hugh Sebag-Montefiore’s “Enigma”(2000) is good, but there are many others: Hinsley and Stripp “Codebreakers” (1993) is a good example which comes to hand.

Better still, spend a day at Bletchley Park. The code breakers used “cribs” much of the time: they knew that enemy weather stations would have to report the weather, submarines the positions of shipping, and that the very most important orders might include the sequence of letters “Hitler”. (I have encoded this sequence on an Enigma machine myself, wearing white gloves, not out of deference to afore-named assassin, but as a courtesy to the curator of the Bletchley Park museum, to protect the machine I was using). Insights into human behaviour (not provided by psychologists) made an important contribution: some German operators did not bother to carry out proper randomisation of initial setting letters and rotors, so it was worthwhile checking whether an operator had been lazy first, and then going through the longer standard checking procedure which assumed full randomisation.

Apart from attending lectures given by Hugh Alexander (sadly before he could talk very much about his code-breaking work) and Captain Jerry Roberts, I have also spoken to others who worked there at the time. A great pity that secrecy was maintained for too long, and much of the machinery destroyed. Probably Churchill’s greatest error, and that is saying something.

Of more interest is how the film depicts intelligence. Heartless, socially awkward, inept, boastful and untidy seemed to have been Turing’s main characteristics in the eyes of the film makers. Also, able to do big sums easily. Finally, to put the nail in the coffin: no sense of humour. Compared to that, his homosexuality was a redeeming sin. The top team of code-breakers were made to look like fools by comparison, which was a crude way of showing his genius. With just a few additional minutes the director could have shown a sliver of the problems and solutions attempted by the team. He did not even deal with the concept of contradiction which was central to Turing’s technique, and useful to the story the film was telling. Hugh Alexander, Gordon Welchman (not depicted), Jack Good, Tommy Flowers (not depicted) were brilliant minds, and better scripts could have been provided for the code breaking team than the bully boy banalities and fisticuffs of the film. The mathematician Peter Hilton was portrayed, but as a mere cypher, and without a mention of his masterly palindrome, the product of a sleepless night:


When reading a palindrome from left-to-right, it is impossible to locate the "middle" until the entire word has been read completely. Please time how long it takes you to check that it is right, and to identify the keystone letter.

The animations of the bombing of London were very good, probably the best scene in the film. The fearsomeness of the Blitz was well shown. The crossword puzzling Londoners in their shelters was a brilliant idea. Keira Knightly played very well, a star performance. Cumberbatch did the very best with his character. He was believable and moving. The young Turing was also very well played. Charles Dance was a bit over the top as Commander Denniston, but was enjoying himself, and enjoyable to watch. Mark Stewart as the MI6 leader Stewart Menzies was the most convincing. Good supporting parts thoughout, and the spirit of the age well conveyed. A well made film.

What a pity that Turing’s mind was shown as part machine, and mostly autistic demon. The director said in his interview that he had got closer to understanding Turing, but I can only say he still had far to go. He used the device of casting Turing as struggling to pass the Turing test. That’s fine, but some of his concepts could have been animated and explained, without mathematics. The film did not even show that every key press caused one or more rotors to step by one twenty-sixth of a full rotation, before the electrical connections were made, which is what made it so hard to break.

In fact, a contemporary of his whose lecture I listened to shortly before he died, Captain Jerry Roberts (UCL German 1941) said that Turing mostly sat quietly at his desk thinking, such that Roberts “doubted he was earning his corn”, though he revised his opinion later. I cannot think of a way of showing thinking, other than an animation of concepts interacting in a design, cascade or lines of force. The team that made the first TV version of Douglas Adam’s “The Hitchhikers Guide to the Universe” years ago did pretty well, by showing some of the ideas in a news headline type panel on the margins of the screen.

It is silly to expect too much of films, but the audience must have been left feeling that high intelligence was useful in times of need, but otherwise a curse. Tortured genius is just too good a parody for a film maker to turn down.