Monday 11 April 2016

Instantiation and abstraction

What makes problems difficult? Indeed, what ever made events turn into problems? Usually, I assume, specific instances had to be confronted and dealt with: an approaching predator, an escaping prey, an edible nut that had to be opened. In all these instances a solution is required to a pressing problem. In time, some general principles may be discerned: perhaps those were discussed in campfire stories, or formed the preparation rituals of hunter-gatherers. Perhaps, more likely, people made them up as they went along.

Intelligence test items are very specific instances of problems. They are chosen to be unfamiliar, so that the test is always a real test, and not the exercise of specifically trained skills. Tests have to be kept secret, and defended from cheats. Tests only test problem-solving when the correct solutions are not known to the test taker. Tests must not only be unfamiliar, but preferably easily explained by using familiar concepts, ones known to almost everybody in that culture, or in any culture. Things can get bigger and smaller, go in front of or behind other objects, increase or reduce in number: that sort of thing. The mental habits of our species, as depicted on pottery, funerary objects, sculpture, buildings, jewellery, ornaments and dress. The all of it, as they say in Oxfordshire.

So, when a specific problem arises one examines the instance, and then attempts a solution. Is it helpful to be able to abstract general principles? Does abstraction assist, or is it better to concentrate on the individual task?

Into this hall of mirrors step an international gang to bring us some jewels from Estonia, a Finnic country with high income and living standards, which has the additional benefit of having done proper intelligence testing in 1933, and has the results item by item. These can then be compared with the data for 2006, item by item. And what a gang: they come from the US, Korea, Brazil, Germany, Belgium and of course Estonia.

Elijah L. Armstrong, Jan te Nijenhuis, Michael A.Woodley of Menie, Heitor B.F. Fernandes, Olev Must, Aasa Must. A NIT-picking analysis: Abstractness dependence of subtests correlated to their Flynn effect magnitudes. Intelligence 57 (2016)

We examine the association between the strength of the Flynn effect in Estonia and highly convergent panel ratings of the ‘abstractness’ of nine subtests on the National Intelligence Test, in order to test the theory that the Flynn effect results in part from an increase in the use of abstract reference frames in solving cognitive problems. The vectors of abstractness ratings and Flynn effect gains, controlled for guessing) exhibit a near zero correlation (r = −.02); however, abstractness correlates positively with (and is therefore confounded by) g-loadings (r = .61). A General Linear Model is used to determine the degree to which the abstractness vector predicts the Flynn effect vector, independently of subtest g-loadings and the portion of the secular IQ gain due to guessing (the Brand effect).  Consistent with the abstract reasoning model of the Flynn effect, abstractness positively predicts Flynn effect magnitudes, once controlled for confounds (sr=.44), which indicates an increasing tendency to utilize factors external to the items in order to abstract their solutions.

Flynn effects were derived from the difference in scores between 1933/36 and 2006 administrations of the National Intelligence Test to samples of Estonian schoolchildren (N = 890 for the older sample, 913 for the more recent sample). The Method of Correlated Vectors (MCV) was utilized to determine the effect of abstractness on the Flynn effect independent of both subtest g loadings and the Brand effect — or the portion of the secular gain in IQ that is due purely to the results of guessing.

Their method has been to get raters to assess items for abstractness.

The 28 raters used to obtain the abstract thinking dependencies for each subtest were classified into the following categories: non-professionals (without degrees in psychology), graduate students, or professionals (N=10 non professionals, 5 graduate students, 13 professionals). Each rater rated the abstract thinking dependency of each subtest on a scale from 0–100, using a text vignette defining abstract thinking (Supplement 1) as a rating criterion. The text gave examples of three hypothetical test items heavily dependent on abstract thinking; one was drawn from Luria (1976), one from Flynn (2009), and one from Flynn (2012) in a discussion of Fox and Mitchum (2013). The raters used Form 2 of the British National Intelligence Test to rate the abstract thinking dependence of each subtest.

Here are the main results in one table:

Abstractedness and Flynn effect

To nit-pick, they should have listed these by level of abstractness for ease of reading, but Analogies (76)  are almost twice as abstract as all the other tests. Synonym-Antonym are next (38) and then Vocabulary (35). The most concrete test is Comparisons (22). This provides a useful metric with which to consider what makes items difficult.

The correlation between the size of the Flynn effect on a subtest (corrected for guessing) and its level of rated abstractness is −.02, or virtually zero. A large negative correlation exists between the guessing-corrected Flynn effect and g loadings (−.55), and a large positive correlation exists between g loadings and abstractness (.61). The Brand effect and g loadings correlate strongly and positively (.8). Modest magnitude correlations exist between abstractness and the Brand effect gains (.32), and between the Brand effect and corrected Flynn effect gains (−.42). None of the effect sizes are significant; however, null hypothesis significance testing is not appropriate for evaluating the substantiveness of these results, as the N is extremely small (9 subtests).More attention should be paid to both the magnitude
of the effects, which range from small to large in magnitude (Cohen, 1988), and to the degree to which the directionality of the effects are consistent with explicit theoretical expectations.

It can be seen that abstractness now becomes a strong positive predictor of Flynn effect magnitude (r = .44), once controlled for the Brand effect and subtest g-loadings. Thus, our analysis supports the contention that abstract thinking may causally contributes to the Flynn effect. g loadings do not change as a predictor of the Flynn effect once controlled for abstractness and the Brand effect. The Brand effect residual becomes a mildly positive predictor of the Flynn effect (−
.42 to .19). At the suggestion of a reviewer,we reran the analysis excluding subtest B4 as an outlier in terms of abstractness. The recalculated effects, included in Table 2, indicate that abstractness was greatly attenuated in effect size as a predictor of the Flynn effect, but the direction of correlations did not change.

Flynn effect abstractness anova

The authors run through a list of possible issues regarding this work, but to my mind their main thesis stands. Jim Flynn was probably right that level of abstraction is part of the cause of the secular rise in intelligence test scores, without their being any notable commensurate rise in actual intelligence. It would appear that schoolchildren have learned an intellectual trick which helps them leapfrog from instances to general rules.

This is an important paper, which brings us closer to understanding the Flynn effect, and the nature of intelligence test items.


  1. "the portion of the secular IQ gain due to guessing (the Brand effect)": golly, is that Chris Brand? I used to know him.

    "schoolchildren have learned an intellectual trick which helps them leapfrog from instances to general rules": where, I wonder, do they learn it?

    1. Chris Brand himself, still in Edinburgh, still very active. My phrase about the trick is a summary: through modern education and modern life, the theory goes, children now handle abstractions where before they concentrated on specific instances.

  2. I suspect that anyone who has participated in a mathematical olympiad, or something similar, will also tell you that computation contains many learnable, intellectual tricks which help one to leapfrog conceptual chunks of a problem. I can imagine that this is a part of the cause for the negative Flynning of that item; we likely don't train computation as much in the day of the calculator and the computer as we once did.

    1. Normal part of learning a skill. Bryan and Harter 1897

    2. I need more concrete examples of abstraction.

    3. I need more concrete examples of abstraction.

    4. The package is in the post.

  3. elijahlarmstrong11 April 2016 at 21:50

    A NIT-pick: Analogies is not "twice as abstract"; the rating is twice as high, but the abstractness rating is not necessarily a ratio scale; while it has an absolute zero, the intervals may not be identical.

    By the way, immediately after our paper was published, this one was too:

    1. Nit-pick accepted. Working on the Must paper now.

  4. Kind of makes me feel bad that most of my abilities come from my excellent ability to abstract, rather than simpler test areas like reaction time.

  5. Could someone spell out the Brand effect for me, google is not helping. How do you measure guessing, correlation with g loading of iq subtests, relationship with flynn effect, etc.

    1. Must and Must have very good data on this. About to comment on their most recent paper.

    2. Brand Effect: earlier in the 20th Century, test-takers were overly cautious about guessing. They would have scored higher if they'd guessed more. Later in the 20th Century, this prejudice against guessing declined, leading to higher scores.

    3. Brand Effect: earlier in the 20th Century, test-takers were overly cautious about guessing. They would have scored higher if they'd guessed more. Later in the 20th Century, this prejudice against guessing declined, leading to higher scores.

    4. They used to discourage guessing by penalizing wrong answers beyond a blank answer. When did they stop? I thought it was fairly recently.

    5. Michael A. Woodley of Menie17 April 2016 at 20:54

      I coined the term "Brand Effect" in a 2014 paper to describe the portion of the secular IQ gain that is due purely to guessing the answers to multiple-choice questions. In the absence of 'negative scoring' (i.e. subtracting points based on wrong answers) guessing on items with, say, five answer options will get you a correct answer purely by chance once for every five questions guessed.

      People tend to use guessing as a form of ‘test-wiseness’ to a greater extent today than in the past, where when people couldn’t answer an item, they tended to leave it unanswered. I’m sure that many of (the younger) readers will recall being told by teachers in test-prep something along the lines of “if you don’t know the answer guess”.

      Chris deserves the credit for this - he was the first to predict (way back in 1987) a role for guessing in secular gains on multiple choice-type tests like the Raven’s. Remarkably, hardly anyone seems to have taken him seriously until recently.

      Must and Must (2013) estimated the sensitivity to guessing on National Intelligence Test subtests by subtracting the numbers of wrong answers from the numbers of right answers, yielding a 'false positive' rate for each subtest. They found that the Brand Effect accounts for about a third of the secular gain on NIT scores in Estonia.

      My colleagues and I found that harder (more g-loaded) subtests were more likely to elicit guessing as a test-taking strategy (obviously). When a residual of the secular gain is computed controlling for the Brand Effect, you are left what I termed the 'Authentic Flynn Effect'. This source of secular gains seems to be strongly associated with the specific ability-variance associated with subtests - consistent with the finding that g-saturation and subtest heritability are strongly positively related.


      Brand, C. R. (1987a). Intelligence testing: Bryter still and Bryter? Nature, 328, 110.

      Brand, C. R. (1987b). British IQ: Keeping up with the times. Nature, 328, 761

      Must, O., & Must, A. (2013). Changes in test-taking patterns over time. Intelligence, 41, 791–801.

      Woodley, M. A., te Nijenhuis, J., Must, O., & Must, A. (2014). Controlling for increasing guessing enhances the independence of the Flynn effect from g: The return of the Brand Effect. Intelligence, 43, 27-34.

  6. I believe there is a kind of 'subconscious clic' 'ordinary people' 'adapt' '(conform) without really knowing what they are conforming. They are not self-questioning, metacognition, so they are carried by the winds of the culture. Just as is the habit among the other species, which tend to be less meta-conscious, most people just do not stop to think what is happening to them, because they take their lives as completely natural. That is why terms like '' conspiracy theory '', are so successful. Because they always take their shared reality (not perceived) as absolutely correct. It's conspiracy theory to think otherwise. Psychological resistance may have some positive effects on the mind, but can also, paradoxically, may make it less adapted to changes.

    This seems to have happened to me, I wonder so much that I gave up handing me the'' spontaneous '' fluidity of events in social environments.

    When you do not think about something, something that tends to happen. The more energy you spend in the search for an object, the smaller the chance to find it. It is a very interesting paradox.

    So subconsciously to decorate the social/laboral rules, it seems natural or spontaneous for most people as it is for most other species.

    people, supposedly, understand very well abstractions, specially verbal ones.

    How explain the discrepancy between older people (concrete thinking) and a lot of younger (but not all them, of course, seems).

  7. And a lot of people who at least are self-questioning, they make good questions at priore, but they tend to be likely to choice/search for wrong answers or to do it themselves. Good questions, bad answers.

  8. Guessing itself is mental work. Better intuition is better brain.

    1. Guessing is less g loaded than solving.

    2. "Guessing is less g loaded than solving."
      Certainly. But it is still a g loaded components. Guessing/intuition is based on fuzzy logic. Pure random guessing is null like result of lottery with zero g load. Better guessing is brain work with some g load. Example: Using elimination of obvious wrong answers as way to increase the odd of correct choice. Solving is final power of mental work.

  9. I haven't yet read the papers (marking exam scripts!) but I will predict that there is one area of accomplishment where these groups of ultra-high IQ kids turned out to be deficient: having babies.

    From memory, in an earlier study from the US maths group, the women had about half a child each, on average.

    And what about the 'Outsiders' the ultra intelligent misfits? They never get emphasized in these studies, but they are there - and are very interesting (I know several of the type):