Friday, 27 June 2014

Measurement errors

 

One popular criticism of intelligence testing is that scores could be affected by motivation and levels of practice. By implication, those who are not motivated to take the test will do badly and will be unfairly judged, to the detriment of any society which uses intelligence test results as a ticket of admission to education or employment. By further implication, such lack of motivation may apply most strongly to those who are poorer and most dispirited for other reasons.

Test administrators know all that, and make sure that subjects understand the test, take and pass the practice items, and are encouraged before, during and after each test (guided by protocols as to what help and encouragement is permissible) and that at least 6 months elapses between face to face testing sessions, and that alternate forms are used if testing has to be conducted sooner.  Hence psychometric reports talk about the person’s level of engagement, the amount of effort they show, and the specific problems they may have encountered. If there are significant problems the results are either set aside, or labelled as being under-estimates and further testing carried out later usually resolves the issue. Monitoring is easier in face to face testing, but item analysis gives some insight into lack of effort in group tests. Group tests often have more practice items and care is taken to provide good quality test settings. By following all these procedures practice effects and motivational differences are reduced, but not eliminated entirely. It is still possible that some low results may be due to low motivation, and also that some high results might be due to lucky guessing. How big could these effects be?

Assume for a moment that motivational and practice effects have an influence, and that to the true low scores of less able people must be added the false low scores of those who found the test boring, pointless, and not worth bothering about. People like me, for example. I prefer watching clothes dry on a cloudy day than taking most intelligence tests.

If that were true, IQs would under-predict real life successes in things which were intrinsically  interesting: getting good qualifications so as to get on in life, making money, and becoming famous.

If motivation were a major confounder, then correlations between IQ scores and real life scores would be low. However, IQ and real life are strongly correlated. For example, the largest recent study (Deary et al., 2007) of over 70,000 English children found correlations of r=0.81 between general intelligence measured at 11 years of age and GCSE scores at age 16. This is an extremely high predictive power (accounting for 64% of the variance). The colossal sample size gives us exceptional confidence in the robustness of the results. By way of comparison, most educational psychology publications have sample sizes of a few hundred, and are far less robust. As further proof of the common sense view that intelligence is involved in academic achievement, we can be even more precise about the impact of intelligence on different subjects. IQ scores on their own accounted for 58.6% of the results in Mathematics, 48% in English and down to 18.1% in Art and Design, that subject being the least intellectually demanding (Deary et al., 2007).

I. J. Deary, S. Strand, P. Smith and C. Fernandes (2007) Intelligence and educational achievement. Intelligence 35, 1, pp13-21. (For private study, email the author at the University of Edinburgh and ask for a copy).

Problems of motivation and practice also apply to scholastic examinations and to any procedures followed in job interviews.  Varying motivation applies not just to IQ test but to all measures: intelligence tests, scholastic tests, and work assessments. Nobody gets round measurement error, not even the Spanish Inquisition.

In summary: Assume some people’s IQ scores are reduced by lack of motivation. That will reduce the correlation between IQ and other real life measures. IQ at 11 correlates 0.81 with scholastic attainment at 16. If motivation is a problem, the correlation is really higher.

If you prefer that as a Tweet:

If IQ scores are reduced by lack of motivation, but IQ at 11 correlates 0.81 with GCSEs at 16, then the real correlation is much higher.

15 comments:

  1. If real IQ was completely uncorrelated with life outcomes but motivation was highly correlated with life outcomes, the IQ test would be more correlated with life outcomes than real IQ. That exact scenario is unlikely but who knows whether real IQ or test IQ is more correlated with life outcomes.

    ReplyDelete
  2. The hypothesis can be tested by comparing the predictions made by tests of intelligence and by tests of motivation. So far, tests of intelligence are the better predictors, (also also better than self-rated intelligence).

    ReplyDelete
  3. However, if IQ and personality are correlated then motivation may be an aspect of intelligence http://drjamesthompson.blogspot.co.uk/2013/07/intelligence-personality-and-self.html

    ReplyDelete
    Replies
    1. I think intelligence and motivation are correlated because if you're intelligent enough to see when the benefits of hard work outweigh the costs, you'll be more likely to work harder, and the more intelligent you are, the more your hard work will pay off which will motivate you to work harder in the future.

      But I would never say that motivation is an aspect of intelligence. Instead I would say that intelligence is the ability to problem solve, but motivation defines what is a problem in the first place. So if there's no food and I'm motivated to eat, then the lack of food by definition is a problem, and my intelligence figures out a solution: order pizza.

      Delete
    2. Yes, intelligent people are motivated to solve problems, because they find problems interesting, and are often rewarded by the pleasure of solving them.

      Delete
  4. And a bit more on the confidence literature:http://drjamesthompson.blogspot.co.uk/2013/12/isir-confidence-and-achievement.html

    ReplyDelete
  5. Could differences in motivation play a significant role in the Flynn Effect? I understand that psychologists giving (individually administered) IQ tests, pay close attention to motivation and consider the scores of unmotivated people to be suspect, but does this apply to the norming samples too? Are the scores of unmotivated people counted when they are standardizing tests like the Wechsler? And if most people in the 1930s were unmotivated to try their best on these tests, would their lack of motivation have even been recognized if it was the norm for the time?

    ReplyDelete
    Replies
    1. standardization test-ees & testers are fairly well-paid. (i.e., motivated:) standardization testers are on the look out for situations & behavior which could invalidate precious data.

      every once in a while something goes horribly awry: true story - standardization validity study - kid got high score on test A. week or 2 later kid gets low score on test B. discovered a month later during a routine eyeball scatterplot analysis - kid's datapoint stuck out like a sore thumb. much investigation over outlier ensued.

      turns out kid was pulled from a birthday party to take test B - kid could even see ongoing party from her window as she took test B:( needless to say, her data were not included.

      party motivation inversely correlates with test scores.

      test companies rigorously pursue anal-retentive methods for considering & ruling out the effects of everything upon everything else (&/or trust in randomness to even them out). standardization tests are long - testers are often told spread it out over 3 days, do an hour a day, etc. it's all been thought of & looked at to death, examined, evaluated, looked at sideways, rethought, reanalyzed, etc.

      but when is an outlier an outlier? how far out of line do the scores have to be before not including them? depends on the party, i guess.

      there is a positive correlation between IQ & motivation. researchers have looked into more different kinds of motivation with multi trait multi method studies than any reasonable person would ever want to know :)

      test standardization is a serious business - test companies dislike paying test cooperators for invalid data.

      Delete
    2. Dear Panjoomby: Always a pleasure to have a true expert on board.

      Delete
    3. Dear Panjoomby: Always a pleasure to have a true expert on board.

      Indeed! Your excellent blog attracts a lot of very high quality people.

      As I read Panjoomby's knowledgeable comment, it got me wondering about the SAT. Unlike the excellent Wechsler scales which are individually administered so motivation can be carefully monitored, there's no way to monitor the million plus people who write the SAT every year.

      I would think there would be huge differences in motivation for that test. Some kids have been prepping since they were 3 with big plans for the Ivy League, while other kids go to poor schools and have no plans to go to college at all and just write the test on a lark, perhaps half-drunk, and given how long the test is said to be, probably lose motivations quickly.

      I used to think the SAT was a good measure of intelligence because that's what the research seems to show, but perhaps most of the research is comparing homogeneous samples attending the same schools.

      One red flag is that a few celebrities who I've always assumed to be highly intelligent turned out to have surprisingly low SAT scores. I realize such anecdotal evidence is scientifically meaningless because the celebs could be joking or trying to charm or amuse the public with self-deprecating populist lies, and even if they are true, the data is very selectively reported. But it does make me wonder about the SAT.

      A good example is Bill Cosby who went on to become one of the richest and most beloved people in America, in a field as seemingly g loaded as comedy, despite claiming to have got a combined SAT score (old scale) of only 500, which by my calculations, equates to an IQ around 80. He talks about it here:

      http://www.youtube.com/watch?v=E9qjxSzNiEg

      Delete
  6. In group test data we usually don't have any data other than item responses, and sometimes latencies of response. Olev Must has good historical data which suggests that there may have been differences in guessing over the years, a hypothesis first proposed by Chris Brand. The picture on guessing rates is not consistent though. However, I doubt that effort is a major factor, though persistence on untimed tasks might be.

    ReplyDelete
  7. One question is whether motivation matters in PISA-type tests. For example, Finland routinely outperforms on PISA tests how it does on IQ standardizations. This could be that Finland's schools really are better than the rest of the Caucasian world's schools. Or maybe the Finnish school system is effective at giving pep talks to Finnish students to try really hard, get a good night's sleep before hand, and other wise treat this low stakes test like a high stakes test.

    ReplyDelete
    Replies
    1. I think that some countries might "game" the tests, just by giving them a lot of emphasis, though as you and others have pointed out other countries have probably ensured that weaker students did not show up on the exam day. We need more data on the "anti cheating" checks on procedures and data.

      Delete
  8. Why not say that IQ scores predict real-life outcomes both because they measure "g" and because they measure motivation, both of which are important?

    ReplyDelete
    Replies
    1. Very possibly so. Hard to get objective measures of motivation, but it looks as if brighter students are motivated by the sheer interest of tackling difficult questions.

      Delete