Tuesday, 6 October 2015

The Tetlock Forecast

I have admired Philip Tetlock since, almost 30 years ago, he reviewed a book I had just written which contained one big and so far untested prediction, and gave it by far the most detailed, insightful and helpful assessment it had received among many warm but perfunctory reviews, mildly adding references to a few papers which, when I followed them up, showed me exactly how much I had missed out. His kindness made his critical points far more effective. (In a subsequent lecture tour I met up with one of the international affairs experts he had mentioned, who offered to work with me, though in the end I went on to other things, and consequently made no revision of the book).

Tetlock, P.E. (1986). Review of J. Thompson, Psychological aspects of nuclear war. British Journal of Social Psychology, 25, 78-79.

Now the Press are picking up his work on super-forecasting, which has major implications for how we go about anticipating and planning for future events, supposedly one of the features of high intelligence. Bright people should be particularly good at forecasting, shouldn’t they?

Superforecasting: The Art and Science of Prediction. Philip Tetlock and Dan Gardner. Sep 29, 2015

What has Tetlock found? First, that most pundit forecasts are unfalsifiable. Even time travel would not help you know if the predictions of these commentators had been met. They are at the low level of Nostradamus and contemporary journalism. Second, if you run a proper forecasting contest (not “will there be a stock market correction sometime soon” but “what will the Standard and Poor index stand at on 31 December 2015”) most commentators are “too busy” to participate. They do the broad brush stuff which gets well paid, not the nitty-gritty testable stuff  that nerds do for fun.

In his 1953 essay on Tolstoy’s view of history, Isaiah Berlin drew a distinction which he intended to be no more than an intellectual game, though he later admitted that every classification throws light on something.

There is a line among the fragments of the Greek poet Archilochus which says: ‘The fox knows many things, but the hedgehog knows one big thing.’  Scholars have differed about the correct interpretation of these dark words, which may mean no more than that the fox, for all his cunning, is defeated by the hedgehog’s one defence. But, taken figuratively, the words can be made to yield a sense in which they mark one of the deepest differences which divide writers and thinkers, and, it may be, human beings in general. For there exists a great chasm between those, on one side, who relate everything to a single central vision, one system, less or more coherent or articulate, in terms of which they understand, think and feel – a single, universal, organising principle in terms of which alone all that they are and say has significance – and, on the other side, those who pursue many ends, often unrelated and even contradictory, connected, if at all, only in some de facto way, for some psychological or physiological cause, related to no moral or aesthetic principle. These last lead lives, perform acts and entertain ideas that are centrifugal rather than centripetal; their thought is scattered or diffused, moving on many levels, seizing upon the essence of a vast variety of experiences and objects for what they are in themselves, without, consciously or unconsciously, seeking to fit them into, or exclude them from, any one unchanging, all-embracing, sometimes self-contradictory and incomplete, at times fanatical, unitary inner vision. The first kind of intellectual and artistic personality belongs to the hedgehogs, the second to the foxes; and without insisting on a rigid classification, we may, without too much fear of contradiction, say that, in this sense, Dante belongs to the first category, Shakespeare to the second; Plato, Lucretius, Pascal, Hegel, Dostoevsky, Nietzsche, Ibsen, Proust are, in varying degrees, hedgehogs; Herodotus, Aristotle, Montaigne, Erasmus, Molière, Goethe, Pushkin, Balzac, Joyce are foxes.

(As you can see, Isaiah Berlin could write. He was also very kind, and a friend tells me stories about him, while pointing at the two prints Berlin gave him).

Tetlock has taken this distinction to heart as a classificatory system.  Forecasters can have a specialist, narrow focus expertise (hedgehogs) or a broad overview, using plagiaristic combinations of other people’s deep knowledge plus their own feelings (foxes).

After conducting many prediction contests, Tetlock finds that some people are particularly accurate, and deserve the accolade of being superforecasters. Superforecasters could assign probabilities 400 days out (before the event) about as well as regular people could about eighty days out. Many of the superforecasters were quite public-spirited software engineers. Software engineers are quite over-represented among super-forecasters.



A surprisingly large percentage of our top performers do not come from social science backgrounds. They come from physical science, biological science, software. Software is quite overrepresented among our top performers. If you looked at the personality profile of super-forecasters and super-crossword puzzle players and various other gaming people, you would find some similarities.    

The individual difference variables are continuous and they apply throughout the forecasting population. The higher you score on Raven’s matrixes the higher you score on active open-mindedness, the more interested you are in becoming granular, and the more you view forecasting as a skill that can be cultivated and is worth cultivating and devoting time to, those things drive performance across the spectrum, and whether you make the super-forecaster cut, which is rather arbitrary or not. There is a spirit of playfulness that is at work here. You don’t get that kind of effort from serious professionals for a $250 Amazon gift card. You get that kind of engagement because they’re intrinsically motivated; they’re curious about how far they can push this. 

Comment: I think this makes sense. These super-forecasters are probably counters, not chatterers, that is, STEM not Verbal, with high fluid intelligence. Software has to work, and there are many, many ways in which it can go wrong. Murphy’s Law applies. Programs have to be tested to flush out errors, and you have to simulate the special situations users will create which can make an untested system crash. This background makes software engineers cautious, humble, and supremely focussed on “on budget, on time”.

Tetlock tried to boost forecasting accuracy by means of his Good Judgment Project, and found that his training techniques could boost accuracy by 50-70% from the group average. The project does this in the following ways:



The test of fluid intelligence was Raven’s Matrices. I promise you I began writing this post without knowing that. What I thought would be a little break from intelligence research turns out to prove the adage that intelligence runs through human life like carbon through biology.

You may have heard about the wisdom of crowds. I am with Dryden (1668) when he said: If by the people you understand the multitude, the hoi polloi, tis no matter what they think, they are sometimes in the right, sometimes in the wrong; their judgment is a mere lottery. As a general rule, crowds are in favour of war at the beginning of wars, and against them if they drag on, which most of them do.



So, the wisdom of crowds depends on the intelligence of the crowds, or more precisely, it is boosted by paying extra attention to intelligent crowd members. Where opinions are polarised, then one option is to use an algorithm to combat the centralising and emasculating effect of those clashing perspectives. This helps get useful predictions out of crowds, but does not help super-forecasters (who probably know how to combine conflicting opinions anyway).

An example of Kahneman based predictive training is this rule of thumb: The likelihood of a subset should not be greater than the likelihood of the set from which the subset has been derived.   

What we’re trying to encourage in training is not only getting people to monitor their thought processes, but to listen to themselves think about how they think. That sounds dangerously like an infinite regress into nowhere, but the capacity to listen to yourself, talk to yourself, and decide whether you like what you’re hearing is very useful. It’s not something you can sustain neurologically for very long. It’s a fleeting achievement of consciousness, but it’s a valuable one and it’s relevant to super-forecasting.     

The beauty of forecasting tournaments is that they’re pure accuracy games that impose an unusual monastic discipline on how people go about making probability estimates of the possible consequences of policy options. It’s a way of reducing escape clauses for the debaters, as well as reducing motivated reasoning room for the audience.    

Regarding partisan pundits, Tetlock says:

High stakes partisans want to simplify an otherwise intolerably complicated world. They use attribute substitution a lot. They take hard questions and replace them with easy ones and they act as if the answers to the easy ones are answers to the hard ones. That is a very general tendency.

Does my side know the answer? is the really hard question. The easier one is, whom do I trust more to know the answer, my side or their side? I trust my side more to know the answer. Attribute substitution is a profound idea, and it allows us to think we know a lot of things that we don’t know. The net result of attribute substitution among both debaters and audiences is it makes it very hard to learn lessons from history that we weren’t already ideologically predisposed to learn because history hinges on counterfactuals

Tetlock is now focussing on the societal impact of his findings, hoping to improve the predictions on which decisions are based. The minimalist goal is to make it marginally more embarrassing to be incorrigibly close-minded, just marginally. The more ambitious goal is to make it substantially more embarrassing, and that requires talent and resources of the sort that academics like myself don’t possess. I don’t know how to create a TV show.       

Tetlock has some advice for improving forecasts. Like most advice it has some disappointments, in that a researcher close to the material understands in detail what is meant by “strike the right balance” but the phrase itself is of little help, simply an irritating truism.

Ten Commandments for Aspiring Super-Forecasters

1 Triage. Concentrate on questions which lie in the Goldilocks Zone between Clocklike predictable and Cloudlike impossible.

2 Break seemingly intractable problems into tractable sub-problems. How many potential mates will a man find in London? Divide the total population by half to get the number who are women, then by those in his age range, those who are single, those of roughly the right age, those with a university degree, those who he will find attractive, those who will find him attractive, those who will be compatible and you end up with 26 women out of a population of 6 million.

3 Strike the right balance between inside and outside views. How often do things of this sort happen in situations of this sort? When estimating the time taken to complete a project, take the employee estimate with a pinch of salt, and the client estimate as a correction factor.

4 Strike the right balance between over-reacting and under-reacting to evidence. The best forecasters tend to be incremental belief updaters, slightly altering probability estimates. They also know when to jump fast.

5 Look for clashing causal forces in each problem. Understand both thesis and antithesis, summarize both so you recognise how they will develop, then attempt synthesis.

6 Strive to distinguish as many degrees of doubt as the problem permits but no more.

7 Strike the right balance between under- and overconfidence, between prudence and decisiveness.

8 Look for the errors behind your mistakes but beware of rearview mirror hindsight biases.

9 Bring out the best in others and let others bring out the best in you.

10 Master the error-balancing bicycle.

To get further into this, either read the book or look at his Edge masterclass (5 parts) in which he answers question and responds to suggestions.



This is a first look at an engaging and important problem: how to perceive the world accurately enough to work out what will happen next. Intelligent beings need to be accurate much of the time. If Alex Wissner-Gross (2013) is right, intelligence is a thermodynamic process, and can spontaneously emerge from any organism’s attempt to maximise freedom of action in the future. The key ingredient seems to be the maximisation of future histories.

The key to decision making is to keep one’s options open, the most important option being staying alive.

Wednesday, 30 September 2015

Does culture cultivate, or do you need a good plough?


The search for culture-free or culture-fair tests has proved endless, because “culture” can be used so broadly as to encompass virtually anything a human does. People live in society, and societies transmit the habits of previous generations. There was a time in the debates after Jensen’s 1969 paper when psychologists believed that they could estimate the cultural loading of a test by inspecting the items. Indeed, my very first published paper attacked Jensen for arguing that the Wechsler subtest of Block Design was relatively culture-free, such that black-white differences on that test were probably genetic, whereas I felt it depended on access to constructional toys.

How does one determine the cultural loading of an intelligence test item? A Dutch team have plunged into these waters (strictly speaking they are below sea level, but no matter) and have rated subtests thus: Cultural load was operationalized as the average proportion of items that were adjusted in each subtest of the WAIS-III when the scale was adapted for use in 13 countries (Georgas et al., 2003).  To my eye that is certainly a language adjustment, though I wonder whether it allows for the different availabilities of artefacts in the home (not that I can think of an easy way to measure that).

Kees-Jan Kan, Jelte M. Wicherts, Conor V. Dolan, and Han L. J. van der Maas. On the Nature and Nurture of Intelligence and Specific Cognitive Abilities: The More Heritable, the More Culture Dependent. Psychological Science 24(12) 2420–2428





On this measure, Vocabulary is the most culture dependent subtest. On first glance that makes sense: the easiest way to learn a language is to be immersed in the particular culture that speaks it. However, that merely covers translating words from one language to another. Even within a culture, even the most ethno-centric citizens do not learn all the available words: intelligence is required for that.

From an early post on Vocabulary: Some people have the simplistic notion that vocabulary must be determined by mere exposure to spoken language. That is necessary, but far from sufficient, as even children work out. They notice patterns, informal rules, and the contexts in which communication takes place.  “The acquisition of meaning is based on the eduction of meaning from the contexts in which the words are encountered”. (So, even if the word “eduction” in the quotation from page 146 of Jensen’s “Bias in mental Testing” is unfamiliar, you will not be surprised to deduce that it means “To assume or work out from given facts; deduce”).The meaning of a word is acquired in some contexts which permits at least some partial inference as to its meaning. By hearing or reading the word in different contexts, through a process of generalization, discrimination and eduction one can guess at the essence of the meaning of the word, so as to use it (experimentally) oneself the next time a similar context presents itself. Words move from being unfamiliar to familiar, from familiar but not really understood to being familiar and partly understood (at which stage the explanations given about the meaning of the word are threadbare and inaccurate), and from there to being explained by use of synonyms (though those can range from partial to full understanding as shown by the power of the explanations and definitions).


The Methods section is explicit about how things were calculated, one step at a time: a model approach to be commended.


WAIS subtest heritabilities

The culturally loaded tests have higher heritabilities.

The authors conclude:

Each subtest’s proportion of variance in IQ shared with general intelligence was a function of cultural load: The more culture loaded, the higher this proportion. In addition, in adult samples, culture-loaded tests tended to have greater heritability coefficients than did culture-reduced tests, and there was a relationship between subtest’s proportion of variance shared with general intelligence and heritability. In child samples, these relationships were in the same direction, but correlations were small and insignificant.

They sound a cautionary note about the data, but their substantive point is:

A correlation between, for instance, g loading and heritability coefficient is in line with the hypothesis that the g factor is the most heritable factor (Jensen, 1998), but a test of the significance of this correlation does not provide the means to test whether the g factor is indeed the most heritable factor1 (Dolan & Hamaker, 2001). The method merely serves to evaluate competing theories of intelligence (Rushton & Jensen, 2009): A significant correlation denotes that a phenomenon exists that is in need of theoretical explanation. Theories that account for the correlation are stronger (with respect to this correlation) than are theories that do not account for it or are silent about it. The same line of reasoning holds for the correlations of cultural load with g loadings and heritability coefficients.

Having given their conclusions, the team then go against normal sequence and start a discussion.

Our result showing that culture-loaded knowledge tests (crystallized tests) are more strongly related to general intelligence than are culture-reduced cognitive processing tests (fluid tests) fits better with the idea that g loadings reflect societal demands (Dickens, 2008) than that they reflect cognitive demands (Jensen, 1987). Furthermore, in adult samples, our finding that the heritability coefficients of culture-loaded tests tend to be larger than those of culture-reduced tests calls for an explanation, given that this result does not follow from the subtest-complexity and investment hypotheses of g theory and fluid-crystallized theory.

After discussing some options they plump for genotype-environment covariance.

Because the acquisition of knowledge depends on cognitive processing, individuals who develop relatively high levels of cognitive-processing abilities tend to achieve relatively high levels of knowledge. High achievers are more likely to end up in cognitively demanding environments that encourage and facilitate the further development of a wide range of knowledge and skills. The contents and organization of these environments largely reflect societal demands. These societal demands thus influence the degree of dynamical interaction among cognitive processes and knowledge and, hence, their intercorrelations. In this way, the societal demands determine IQ-subtest loadings on the general factor of intelligence and, eventually, the degree to which broad-sense heritability coefficients of IQ subtests include the effects of (growing) genotype-environment covariance. In view of theoretical parsimony, we conclude that the assumption of a true causal g can be incorporated but that this is not required.

This paper presents interesting, counter-intuitive findings, which deserve replication on other samples and other psychometric tests. As to their favoured genotype-environment effect, I don’t see how bright people can obtain high levels of knowledge without being bright in the first place. They don’t develop intelligence, they have that ability in varying degree and use it to develop their knowledge to varying degrees. I am still working this out, but I think that ability is prior, and therefore more likely to be causal.

See what you think.

Monday, 28 September 2015

Blood Moon: Recalling Eratosthenes


Although the Observatory at Greenwich still be-straddles the globe as the origin of longitude, and thus of Time itself, London long ago ceased to be a good vantage point for examining the heavens. Anthracite coal conquered the world, but it besmirched London’s skies, and then the Clean Air Act coincided with ever-stronger electric street lighting, so light replaced soot as the celestial pollutant, fading distant stars. Of more moment, British skies are always covered with clouds, so astronomy is well-nigh impossible. And yet, and yet, last night the London sky was sparkling clear, so the whole supermoon lunar eclipse was visible for the seven stages from first to final contact.

At this numinous and transient moment I sleepily tried to explain to myself what was happening. The earth was travelling round the sun, but not at a speed sufficient to account for what I was observing. The earth was rotating on its axis, but rotation does not cause shadow. The moon was apparently fixed in the sky, yet it was slowly falling into a shadow caused by the nearest heavenly body, the Earth, and the atmosphere of that home planet was causing selective filtering of light-waves, taking out the shorter ones, and leaving the red.

If Eratosthenes, the third-ever Chief Librarian of the great library at Alexandria, had been standing beside me at my London window, he would have been the perfect teacher, had he not, as would have been more likely, been using the event to avoid chit-chat and make his own observations and calculations.


Astronomical events were the first and greatest of puzzles faced by our ancestors, the stuff of creation myths, superstitions, rituals and eventually sceptical surmise: the dawn of science. Astronomy required a leap of understanding: that the all too solid earth on which we stand might also be just one hurtling dot among the many visible (and invisible) one in the skies. Tycho Brache, the last of the naked-eye astronomers, was chronicling the regressive paths of the planets, but could not fully agree with Copernicus’ interpretation of those wandering planets in terms of the Earth’s own orbit round the sun.  The Copernican shift of perspective was a leap of Piagetian proportions, in which an observant maturing child eventually understands that what they see, and what a doll placed in an assembled mini-landscape sees, are not one an the same thing. The ego-centric perspective of early childhood is attenuated by a growing appreciation of the perspectives of other minds. It is similar in intellectual status to a growing child noticing that kindergarten children are becoming smaller, but eventually realizing that is only because he has grown bigger, not because new generations have shrunk.

How well do contemporary citizens understand astronomy? I should add, understand it without looking it up and repeating it, only to fall into ignorance again? If the Flynn Effect is real, then it will be far easier for average persons to understand eclipses, night and day, summer and winter. I do not have up to date data on pass rates, but here is an interesting finding from a random survey of British adults in 1992, which is 122 years after the 1870 Education Act and 472 years after the publication of Copernicus’ De revolutionibus orbium coelestium in 1543.

John Durant, Geoffrey Evans and Geoffrey Thomas. (1992) Public understanding of science in Britain: the role of medicine in the popular representation of science. Public Understand. Sci. 1 161-182.



So, 30% of British adults thought that the observed passage of the sun in the sky meant that our star obligingly whizzed round us. 16% imagined it did so once a day, as per visual observation of the same apparent phenomenon. More recent findings gratefully received.

If I concentrate on matters involving events observable to our ancestors, and avoid calendrical calculations, science education websites show that popular misconceptions still include the following: that the phases of the moon are caused by the moon going into and out of the earth’s shadow; that the moon has a side which is in perpetual darkness; that the moon does not rotate; that the phases of the moon are completed in exactly the number of days it takes to completes its orbit of the earth; that the moon is somehow larger on the horizon than when it is high in the sky; that the four seasons are the result of the changing distance from the sun; and that heavier objects fall faster than lighter objects.

I know that we no longer live on the land, and therefore are more distant from nature, and from the peregrinations of the sun and moon. I know that ignorance about astronomy is very largely a matter of education, but it is likely also that some education has been given but was not retained, because egocentric observation is deemed sufficient by many people. I know that I can make errors, and that many people make errors in simple Newtonian physics (imagining that if you drop an object when running it will describe an arc backwards to the ground, not forwards to the ground).  I know all this, but if we were really getting brighter over the last two centuries we would be able to work out much of this for ourselves, as Eratosthenes worked out the circumference of the world when he heard a casual remark about a well to the south of Alexandria where on one particular day of the year the sun shone down to the very bottom of the well.

Just for amusement, here are some Northern hemisphere university graduates explaining the astronomical causes of seasonal variations.


Here are some popular misconceptions tracked down and explained.



So, these were some thoughts whilst watching the super-moon lunar eclipse last night, among the blazing street lights of urban London. If by some magical process Eratosthenes had stood next to me I am sure that I would have best served him only by listening to him attentively.

Sunday, 27 September 2015


On 19 June the blog reached 500,000 and just a moment ago, 102 days later, it achieved 600,000. I am aware that most citizens will continue with their post-Sunday lunch nap, but I judge that my select group of readers will at least raise one eyebrow before returning to their even more serious reading. The total is far higher than I conceived possible when I began the blog two years and 10 months ago, when I felt lucky if I got 20 readers a day. The current daily page-view rate is roughly a thousand.




The snapshot of the past month shows a pronounced ISIR conference peak, and the most popular posts in that period are all about the conference.


The all-time greats remain the familiar old posts, with a few additions in the lower ranks:



Where do readers come from?



Readers are almost 6 times more likely to be in the US than the UK. The peak age for readers is 18-35 range, going down gradually to age 65+. So, the message is getting through to those who have most of their working life ahead of them, and might make decisions based on intelligence results. Readers are interested in science, news and politics. In terms of how they get to the blog, last month 4080 visitors came directly (loyal established readers?)  3912 through social media (loyal plus new readers?)  2822 through “organic search” (searchers for knowledge?) and  2634 through referral (new readers willing to admit they are searching for knowledge?).

There is a tendency for the longer essays to get correspondingly longer reading times, suggesting readers stick with the content. The item on whether Asians were bright but lacking in curiosity made people read for 6 minutes, whilst shorter conference announcements got half that duration of attention, all consistent with visitors being real readers.

Twitter is my hyper-active front-runner for the slower meditative page-turners of my blog. Precis is good for the mind. I have a few more followers (now at 1,262) which is welcome. Of course, I want the right sort of followers: those who contribute to knowledge, even when they just ask questions. I tweet sparingly (on average 3 or 4 a day) and virtually always only about blog posts or published work. I get 107 retweets per 100 tweets, and almost always respond to tweets with further answers.










Request: Could I ask university teachers and researchers to select one of their students, and get them to critique one of my posts, or to take up one of the suggestions for further research? It would be good to have more psychology student readers (not that I know how many of my readers are in that category) and I think that will come if psychology teachers get their students to have a look at the blog.

If you have any suggestions for getting more blog readers please let me know.

The Donate button is down on the bottom right. Just one gentle press does the trick. $35 buys readership of a pay-walled paper (this is just a price guideline) $20 buys a printer ink cartridge, $15 any number of coloured pencils. What more do I need, other than to keep up my enthusiasm, secure in the knowledge my readers are changing the world?

Thursday, 24 September 2015

Types of psychology lecture


It was said of Presidential Addresses at the British Psychological Society Annual Conference that they fell into three types:

Whither Now?

Patients I have Cured


These three encapsulated the philosophical, clinical and physiological traditions in the society.

With that in mind, I have looked back at the very recent International Society for Intelligence Research conference in Albuquerque to see if I can detect a classificatory structure, tripartite or otherwise. Here, without too much poetic licence, is a possible troika of themes:

Technical: use of statistics and modelling techniques, understanding the limitations and characteristics of particular intelligence measures, arguing about the hierarchical structure of intelligence

Correlational: Real-life associations with IQ, and examples of the predictive power of intelligence.

Genetic: the genetic underpinnings of intelligence and related behaviours.

The technical theme is very specialised. It makes crucial points about intelligence measures and how results can be modelled and analysed. Tracking down whether tests show “measurement invariance” is essential if you want dependable findings. Understanding all this is crucial for researchers. Speaking personally, I find some of the discussions about the structure of group factors less interesting.

The correlational theme is enormous in scope, and accounts for the bulk of published results. Intelligence runs through psychology like carbon through biology. All of human life is there. Intelligence is the most replicated result in psychology, and with the largest sample sizes, sometimes in the millions. There is much still to learn, and the results keep coming in.

The genetic theme is transformational. This is the leading edge of intelligence research. The tempo seems to be about one major publication every 2 or 3 months. Sample sizes are usually above 100,000 and sometimes 300,000. These papers usually find links between the genome and human behaviour which are statistically significant but moderate in effect size, and very probably caused by very many genes of small effect, and which also have effects on other things. I get forewarning of a few of these publications, and will comment further when papers in the review pipeline get published. Tracking down the genetics of intelligence is happening now, with impacts which most people don’t yet appreciate.

The classificatory scheme is a mere sketch, and very open to counter-claims. It might be better to follow the path outlined by Borges in classifying animals in “Celestial Emporium of Benevolent Knowledge” : Those that belong to the Emperor; Embalmed ones; Those that are trained; Suckling pigs; Mermaids; Fabulous ones; Stray dogs; Those that are included in this classification; Those that tremble as if they were mad; Innumerable ones; Those drawn with a very fine camel hair brush; Those that have just broken the flower vase;  Those that resemble flies from a distance.

Perhaps all lectures should be judged by the criterion “Have they just broken the flower vase?”


Sunday, 20 September 2015

#ISIR15 ends, celestial carriages await


So, thus ends a stellar conference.

One of the delights of a conference is to sit next to like-minded and knowledgeable confederates who feed me comments, evaluations and questions which need to be asked. The audience should have the last word. So, here is a selection of what such persons said to me about the talks.

James Lee’s talk went very well. At the beginning I thought it was an over-sell, but boy, the flow of his argument is terrifically clear.

I heard a lot of audience reaction after Steve Pinker's talk.  Comments like 'inspired, entertained, definitely going to try harder to write more clearly'.  There was quite a podium-rush after his talk - felt for his safety! Really good to have the hall ringing with laughter at the end of a long day.

PhD student Sephira Ryman gave a standout talk.  She asked: since men and women have similar mean intelligence, yet women have smaller brain sizes, are there other features that differ?  She found that gray matter volume is important for men, but white matter network connectivity was more important for women. Evidence from her sample of 244 persons that men and women may arrive at their intelligence by slightly different means.

Paul Sackett and Nathan Kuncel utterly destroyed the idea that SAT tests do not predict college performance. Their "ginormous" dataset comprised over a million students. A droll and data rich talk, they left myths about the non-utility of standardised tests lying like road-kill on the highway of evidence.

PhD student Helen Davis gave a fascinating talk that contrasted the spatial abilities and mobility patterns of two traditionally-living (forager-horticulturalist) peoples: the Maya and the Tsimane.  The lifestyle and ecology differ between the societies and this is reflected in their spatial abilities and movement patterns. The typically found sex difference in spatial ability (men outperform women) was only found in the Maya where men travel greater distances to find food for their children.

Alice Dreger. Simply barnstorming. Brilliant, and packed with rich content.  We must keep in contact with her.

Tim Bates' talks are consistently a highlight of any meeting he speaks at.  His careful replications, showing null results of famous memes that tear through the classroom like flu, are a pleasure to hear.

IN CONCLUSION – See you in St Petersburg, 15-17 July 2016


Russian and UK school kids



Elaine White1,2 , Margherita Malanchini1,2 , Dina Zueva2 , Olga Bogdanova2 , Yulia Kovas1,2

1 Goldsmiths, University of London, UK, e.white@gold.ac.uk.

2 Tomsk State University, Russia.

Research suggests that within any country, almost the whole spectrum of individual variation in academic achievement is observed in any school or classroom, with only a small portion of within-population variance attributable to differences across teachers, classes and school (e.g. Asbury et al., 2008).

It may be that shared effects of class/teacher are weaker or stronger as a function of such factors as teacher training, curricula, educational norms, and cultural stereotypes (e.g. Kovas et al., 2013). As longitudinal research into teacher/classroom effects are limited to date and neglect the contribution of non-cognitive factors, this study investigates teacher/classroom effects on academic achievement, across several points of the academic year in two countries.

This longitudinal study follows 622 11-12 year old Russian and UK secondary school students at several waves across one academic year. As students have subject-specific teachers for the first time in their education, comparisons can be made between their classrooms for two subjects, maths and geography. The students from 3 urban schools completed a range of tests and self-report questionnaires during their maths lesson. Data were collected to assess cognitive and non-cognitive factors in relation to academic progress. The students’ school achievement data were also obtained.

We explore differences: across the two countries; within and between classes; across the two school subjects; and motivational factors. Preliminary results (from the first 3 waves) suggest stability of the measures, maths ability and maths self-efficacy, over time. A reciprocal relationship was shown between maths ability and maths self-efficacy across time 1 and time 2. This suggests that higher performance increases self-efficacy and higher self-efficacy increases performance. This reciprocal relationship remains when controlling for IQ and the relationship strengthens between ability at time 1 and self-efficacy at time 2. A negative relationship, which appears between ability at time 2 and self-efficacy at time 3, is likely to be the result of performance feedback.

This research investigates potential differences between Russian and UK education systems comparing classroom environments of mathematics in contrast to geography. Although taught and utilised differently, both academic subjects contain similar attributes. Both Russian and UK secondary school students have specific subject teachers for the first time in their education. UK students have the same teacher for all subjects during primary school and changes yearly, whereas Russian students have the same teacher throughout the four years of their primary education. The study therefore provides an ideal comparison of cognitive and non-cognitive factors across subject and classroom environments. Identifying factors moderating classroom effects is important for educational policy and provision.

Remember the 7 tribes of intellect


Take a dozen eggs. Better still, take several dozen eggs and compare them to another several dozen eggs. Eggs are eggs, and an omelette make.

However, from the individual differences perspective, humans differ. Brighter kids learn faster, about 5 times faster than their slower classmates. Take a whole school district and you will find a few children who learn 7 times faster, hence


Here’s a deal: we will improve our experimental designs if they will measure, even very briefly, the ability and personality of their experimental subjects.Even a simple brief vocabulary test, plus a digit span test or speeded coding task would provide useful information, and if parents could be persuaded to do the same we would have a handle on a major source of unexamined variance in experimental designs.

As Sara says: There is a world outside of experimental designs


Sara A. Hart Florida State University, hart@psy.fsu.edu.


There has been a growing body of work, which suggests that the individual traits that a child brings into an intervention project have an interactive effect on literacy learning. Even within intervention studies shown to be impactful at the mean level, there are individual differences in how children responded to the intervention.

I contend that there are numerous (typically unmeasured) sources of these individual differences, and for this talk I will present data examining the role of both crystallized and fluid intelligence in predicting individual differences in response-to-intervention, with data pooled across multiple projects allowing for generalization beyond any given intervention protocol. Integrative Data Analysis (IDA; Curran & Hussong, 2009) was used to create a pooled source of Project KIDS raw data of 545 kindergarten and first grade children (age M = 5.6yrs) who had previously participated in one of three literacy-based randomized control trial interventions in the treatment group.

IDA allows for raw data from each project to be combined and heterogeneity, such as age and project, controlled for. Reading was measured as pre- and post-intervention scores on the Woodcock Johnson Tests of Achievement Letter-Word Identification (LWID) subtest, crystallized intelligence was measured using a pre-test mean raw score across the KBIT-2 Verbal Knowledge and Riddles subtests, and fluid intelligence was measured using a pre-test raw score from the KBIT-2 Matrices subtest.

As a first step of IDA, a moderated nonlinear factor analysis was used to create scale scores which are project invariant for the constructs of interest. I then used Proc Mixed to calculate covariance adjusted scores to model change from pre-test to post-test for LWID, operationalizing “response-to-intervention”. Quantile regression was then used to model both crystallized and fluid intelligence predicting response to-intervention.

The models indicated that both crystallized and fluid intelligence were statistically significant predictors across the distribution of response-to-intervention, although for both, the effect was statistically greater for the students who made the greatest gains due to the intervention.

These results indicate that brighter children do even better in an intervention that is impactful for most students. Although certainly not surprising for the audience of ISIR, child traits such as intelligence are not often included in determining response-to-intervention in education studies, and I argue that it is important moderator that should be considered. Beyond these specific findings, I will discuss how we will use these pooled data to exploring many other sources of moderation of response-to-intervention, including other cognitive traits, behavioral traits, the environment and family history. This work will expand the understanding of how and why some children are more successful when receiving gold standard educational interventions.

Older fathers still have bright children


This is an interesting paper, but I note it refers to European populations. It may not hold true of societies in which many children are the product of older men accumulating many younger wives.



Ruben C. Arslan 1 , Kai P. Willführ 2 , Emma M. Frans 3 , Mikko Myrskyla 4 , Catarina Almqvist 3 & Lars Penke

1 Georg August University Göttingen, Germany, ruben.arslan@gmail.com.

2 MPI for Demographic Research, Rostock, Germany.

3 Karolinska Institut, Stockholm, Sweden.

4 MPI for Demographic Research, Rostock, Germany.

Ruben Arslan



Paternal age at offspring conception seems to be the main driver of single nucleotide de novo mutations (Kong et al.., 2012). Different theories posit that intelligence is linked to mutation load as a fitness indicator or simply owing to its genetic complexity. Based on evolutionary genetic theory we predicted negative paternal age effects on offspring fitness and intelligence in the normal range. To investigate effects on fitness, we used church records from three pre-industrial Western populations and governmental data from 20th century Sweden. We used a sibling control design and accounted for confounds including maternal age, birth order and parental loss. Main analyses had an aggregate N > 1.3 million.

To investigate effects on intelligence, we compared siblings in the German Socio-Economic Panel (N = 1479). Furthermore we were the first to directly adjust for measured parental intelligence, the most obvious confound, in data from the Minnesota Twin Family Study (N = 1898 twin pairs). We found clear support for mutational paternal age effects on offspring survival, mating and reproductive success. Weaker effects were found in 20th century Sweden, possibly indicating a diminished strength of purifying selection. However, we found no mutational paternal age effect on offspring intelligence, which was corroborated further by a Swedish study of half a million men (D’Onofrio et al.., 2014).

Although paternal age effects seem to be an appropriate way to characterize the effect of de novo mutations on fitness, no effect was found on intelligence in the normal range. Genomic research supports this result. The inferred genetic architecture of intelligence does not seem to make it fragile and vulnerable to increases in paternal age-driven mutation or to decreases in purifying selection.

Creativity and fluid intelligence


Finally, Brazilians take the stage, in the form of Paulistano Ricardo Primi, who gives a creative take on creativity, looking at the difficult-to-measure genre of figural drawing. Brazil needs to figure more in international intelligence research, particularly on the large matter of genetic differences, since Brazil’s history is very different from that of the US, and the contrast can provide a test of cultural explanations for black/white intelligence differences. That bigger project will have to wait, but here is what they have done on their drawing task, also using Bootstrap to test model fit.



Ricardo Primi 1 , Nelson Hauck-Filho 1 , Tatiana de Cássia Nakano 2

1 Universidade São Francisco, rprimi@mac.com.

2 PUC-Campinas.



This study examines the association of fluid intelligence and creativity. In divergent thinking tests it is common to observe that later responses tend to be more creative than earlier ones – this is called serial order effect. Recent view of the role of executive function on divergent production predicts that high fluid intelligence subjects will have creative responses already in the beginning of divergent thinking tasks. This indicates a central role of executive functions –inhibiting common less creative responses and management interference on idea production.

Most studies observing these relationships are done in verbal tasks. This research tests if this relationship can be found on divergent productions of figural drawings. Participants in the present study were 585 children and high school students with ages from 7 to 17 (mean = 11.11 years, SD = 2.02; 52.5% female). All participants provided demographic information on a self-report questionnaire, and undertook a cognitive assessment battery (verbal, abstract, logic and numeric reasoning) supplemented by a creativity task, whose data we analyzed in the present study.

This creativity task consisted of 10 stimuli, which participants were required to complete using paper and pencil. Independent raters subsequently coded each resulting drawing in a scale from 1 to 5 to reflect the extent to which it approached a set of criteria defining creative responses. Data analysis was conducted using Mplus 7.11. Factor growth mixture modelling were performed in order to detect groups of potentially differing patterns of performance (ratings) from the first to the last stimulus of the task.

Bayesian Information Criterion (BIC) and the Bootstrap Likelihood Ratio Test (BLRT) suggested that a three-class solution was a better fit to the data (entropy = .77) when compared to alternative 1-, 2-, and 4-class solutions. Latent classes revealed a large group (83.36%) of individuals with initially modest scores and descending performances along the 10 stimuli, as well as two small groups of individuals with high initial scores—one (12.52%) with a descending performance, and the other (4.12%) with a stable high performance across the whole task.

Last two groups have significantly higher scores in Gf. This study shows that executive processes of top down voluntary control are important components for production of creative responses. This demonstrates a higher role of intelligence on creative idea production. It shows a high role of fluid intelligence in idea production.