Look in my eyes, you sensitive clever person.


Every now and then some passing commentator says that intelligence tests are deficient because they do not provide sufficient assessments of warmth, understanding, and emotional sensitivity. Traditionally these have been considered aspects of personality, but because people want to be intelligent without taking a test which may reveal them to lack that quality, there has been much interest in the rebranding of these personality traits into “emotional intelligence”. This ranks somewhat higher than “gastro-intestinal intelligence” but even that latter digestive ability is something to fall back on if all else fails.

In the public relations campaign for “emotional intelligence” personal characteristics such as restraint, patience, thoughtfulness, concern about others and suchlike are not considered to be just good manners, or aspects of good character, but evidence of a specific problem-solving ability: the capacity to understand other human beings. When researchers try to bring the concept of emotional intelligence into psychometric assessment, they indeed find that much of it is simply a personality variable. However, there is an interesting possible exception: the capacity to understand depictions of other people’s emotional states. There are some positive findings, though not yet, as far as I know, proper epidemiological studies combing the “emotional perceptiveness” measures with established intelligence and personality measures. Nonetheless, there seems to be a suggestion that understanding others is g loaded (see below).

It was with these thoughts in mind that I took up the opportunity to do an online test of emotions “Reading the mind in the eyes”  by Simon Baron-Cohen et al. (1997 and 2001)


I assure you that I approached the task with the greatest sensitivity. The insensitive, brutal and very short answer is that I got 31/36 correct. The longer and more sensitive answer, or excuse, written immediately after getting the results was as follows: “I have some criticisms, which is that there should be a few trial items, so that you can calibrate how the test uses the description words. On that point, the errors should be graded for “close” errors or “far” errors. Close errors (most commonly chosen alternative) should get a quarter point, equivalent to an informed guess. Personally, I think I could claim that my first response, marked as an error, was to mark the very first picture as “comforting” rather than the required “playful”. This resulted in my uttering an expletive, and very probably falling under stereotype threat. (Clinical psychologist found failing on a core competence, collapses into greater incompetence). On strict methodological grounds (aka petulance) I claim 31.25 out of 36.

In fact, the full 2001 paper makes all clear. The revised test does have an introductory item which was not used in the online version, so it is not the author’s fault. Subjects were shown detailed word definitions with examples of usage, so that knocks another quibble on the head. Error rates for each word on the distractor items (foils) are properly listed, so petulant pedants can calculate their own, adjusted and face-saving score. Additionally, there are proper control groups, including an IQ matched control group. The gradient is: autistics 22 points, general population 26 points, students 28 points, and people with IQ 115 get 31 points. Leaving aside those with autism, the last three groups show an intelligence related gradient in the accuracy of their emotional judgments.

All in all, a good paper, with interesting material and good controls. Of course, as a clinical psychologist, I am sensitive to very subtle signs which could not possibly be depicted in an online test. Do we understand each other?


  1. Cricket tests are deficient because they do not provide sufficient assessments of ability to kick a rugby ball.

  2. That test is a pet peeve of mine, and not just because I got an average score. Taking the test actually had me uttering expletives, too, and feeling total exasperation, mostly for the following reasons:

    * Several of the eyes in the test are recognizable. This opens up for associations to this person, their carreer, and the way one feels about that person (I judge Marilyn Monroe differently than I judge Elvis, and Elizabeth Taylor has them both beat as actors by a huge margin). Such associations are bound to taint the responses.

    * Several of the pictures turn out to be of people ACTING. This was very confusing to me, as I picked up on their underlying emotions even more than those they were trying to convey. Trying to match the conflicting information to the possible answers proved very difficult when you're compulsively honest and hesitate to pick the answer it is obvious that you are supposed to pick, because you know it is, at best, incomplete.

    * ""Target words and foils were generated by the first two authors and were then piloted on groups of eight judges (four male, four female). The criterion adopted was that at least five out of eight judges agreed that the target word was the most suitable description for each stimulus and that no more than two judges picked any single foil. Items that failed to meet this criterion had new target words, foils, or both generated and were then repiloted with successive groups of judges until the criterion was met for all items."" ( To borrow the words of Catharine Vetter Alvarez, who wrote about Gender Bias in the test ( "In other words, the "correct" answers were generated by Simon Baron-Cohen and his co-author and then if the pilot groups were able to choose that answer from among the ones he gave them, it was considered correct.

    A less biased method would be to show items to a test group and have each person describe in their own words what emotion was being displayed. In other words, the initial set of "correct" answers should be generated by a consensus of a group rather than two people."

    * Again quoting Catharine: " here's an interesting sentence from Simon Baron-Cohen in that same paper:

    "There is no objective method for identifying the underlying mental state from an expression."" Does this not render the entire test rather pointless?

  3. Thanks. Yes, using the pictures to generate emotion descriptions in a large sample of viewers would seem to be the better test. Nowadays it would be better to use high quality video clips, because many emotions are read as transitions, often of very short duration. Time for you to do a replication?

  4. “Avoid many a pitfall, by taking heed of the advice from those that are ahead of you.” No; it’s not my dialogue I’ve written here. Someone very professional is trying to help confused guys like me to overcome some difficulties we talked about in here. Emotional intelligence