Monday 23 March 2015

Consensus catalepsy

While keeping me waiting, automated reply systems often prattle: Your opinion is important to us. Should it be? My opinion may be of no consequence, and they know it, or ought to.

Years ago a clinical psychologist friend attended an important meeting about national training standards in clinical psychology and found that psychoanalysts were also present. He laid out the evidence base for behavioural techniques being therapeutically effective and needing to be taught in preference to any theories without empirical foundation. To his surprise the lawyer Chairman said: “I am in the hands of you experts, and I value your opinion but I must equally value expert psychoanalytic opinion. Both expert opinions must be represented”. Implicit in the Chairman’s ruling was the view that every expert had their own body of expertise. The notion that controlled trials showed one approach to be objectively better than another cut no ice with him.

Nicolo Machiavelli recommended (first few paragraphs of Chapter 23, The Prince) that one should only consult councillors on specific matters in their area of competence:

Therefore a wise prince ought to choose the wise men in his state, and give to them only the liberty of speaking the truth to him, and then only of those things of which he inquires, and of none others; (my emphasis) but he ought to question them upon everything, and listen to their opinions, and afterwards form his own conclusions. With these councillors, separately and collectively, he ought to carry himself in such a way that each of them should know that, the more freely he shall speak, the more he shall be preferred; outside of these, he should listen to no one, pursue the thing resolved on, and be steadfast in his resolutions. He who does otherwise is either overthrown by flatterers, or is so often changed by varying opinions that he falls into contempt.

A prince ought always to take counsel, but only when he wishes and not when others wish; he ought rather to discourage every one from offering advice unless he asks it; (my emphasis) but, however, he ought to be a constant inquirer, and afterwards a patient listener concerning the things of which he inquired; also, on learning that any one, on any consideration, has not told him the truth, he should let his anger be felt.

Here is a modern take on the theme of evaluating advice. It is a very important paper, or at the very least an important step in researching whether a pair can learn to favour the more perspicacious observer of the two when coming to a judgment about a physical signal.

We tend to think that everyone deserves an equal say in a debate. This seemingly innocuous assumption can be damaging when we make decisions together as part of a group. To make optimal decisions, group members should weight their differing opinions according to how competent they are relative to one another; whenever they differ in competence, an equal weighting is suboptimal. Here, we asked how people deal with individual differences in competence in the context of a collective perceptual decision-making task. We developed a metric for estimating how participants weight their partner’s opinion relative to their own and compared this weighting to an optimal benchmark. Replicated across three countries (Denmark, Iran, and China), we show that participants assigned nearly equal weights to each other’s opinions regardless of true differences in their competence—even when informed by explicit feedback about their competence gap or under monetary incentives to maximize collective accuracy. This equality bias, whereby people behave as if they are as good or as bad as their partner, is particularly costly for a group when a competence gap separates its members.

Mahmoodi, Bang, Olsen, Zhao, Shi, Broberg, Safavi, Hang, Ahmadabadi, Frith, Roepstorff, Rees and Bahrami (2015)Equality bias impairs collective decision-making across cultures. 1073/pnas.1421692112/-/DCSupplemental. PNAS Early E

The 13 author team led by Mahmoodi and including the redoubtable Chris Frith (who helped design the experiments)  have done the unspeakable thing of showing that some people are more expert than others. This is an immense relief, because it avoids the stupid rule of thumb that so many people follow, which is that a person who is confident in their opinions is probably right and should be believed, thus giving the oxygen of resources to people who are too dull to comprehend their incompetence, and denying the bright-but-modest any say in matters of state. However, to the authors dismay, although pairs of experimental subjects must have realised that one is more accurate than another, they do not act on this fact to maximise their performance.

When making decisions together people tend to give everyone an equal chance to voice their opinion. To make the best decisions, however, each opinion must be scaled according to its validity.

The authors say: A wealth of research suggests that people are poor judges of their own competence—not only when judged in isolation but also when judged relative to others. For example, people tend to overestimate their own performance on hard tasks; paradoxically, when given an easy task, they tend to underestimate their own performance (the hard-easy effect). Relatedly, when comparing themselves to others, people with low competence tend to think they are as good as everyone else, whereas people with high competence tend to think they are as bad as everyone else (the Dunning–Kruger effect). In addition, when presented with expert advice, people tend to insist on their own opinion, even though they would have benefitted from following the advisor’s recommendation (egocentric advice discounting). These findings and similar findings suggest that individual differences in competence may not feature saliently in social interaction. However, it remains unknown whether—and to what extent—people take into account such individual differences in collective decision-making

In order to find a simple task capable of close evaluation and of being replicated in different cultures. They found their healthy adult males in Iran (15 pairs), Denmark (15 pairs) and China (19 pairs). Pairs of subjects had to decide individually (without conferring) whether they had seen a target in a visual display.

On each trial, two participants viewed two brief intervals, with a target in either the first or the second one. They privately indicated which interval they thought contained the target, and how confident they felt about this decision. In the case of disagreement (i.e., they privately selected different intervals), one of the two participants (the arbitrator) was asked to make a joint decision on behalf of the dyad, having access to their own and their partner’s responses. Last, participants received feedback about the accuracy of each decision before continuing to the next trial. Previous work predicts that participants would be able to use the trial-by-trial feedback to track the probability that their own decision is correct and that of their partner. We hypothesized that the arbitrator would make a joint decision by scaling these probabilities by the expressed levels of confidence, thus making full use of the information available on a given trial, and then combining these scaled probabilities into a decision criterion. In addition, to capture any bias in how the arbitrator weighted their partner’s opinion, we included a free parameter that modulated the influence of the partner in the decision process.

According to any sensible procedure the pair should have been able to work out, on the basis of feedback about their judgments, which of the two tended to get better results, and then favour the more accurate and perceptive observer so as to get optimal results as a two person team. However, they did not do so, but persisted in being friendly, egalitarian, and incompetent.

Summarising 4 slightly different experiments the authors say: Remarkably, dyad members exhibited “equality bias”— behaving as if they were as good as or as bad as their partner—even when they (i) received a running score of their own and their partner’s performance, (ii) differed dramatically in terms of their individual performance, and (iii) had a monetary incentive to assign the appropriate weight to their partner’s opinion.

Why crash the plane by letting the weakest pilots equal access to the controls? The authors work through various hypotheses:

1) People may try to diffuse the responsibility for difficult group decisions, alternating between their own opinion and that of their partner when the decision is particularly difficult (high uncertainty), thus sharing the responsibility for potential errors.

2) The equality bias may arise from group members’ aversion to social exclusion, which invokes strong aversive emotions. The better performing member may have been trying to avoid ignoring their partner. 

3) The equality bias may arise because people “normalize” their partner’s confidence to their own. Although the better-performing members of each group were more confident, they also over-weighted the opinion of their respective partners and vice versa.

It may be that the task did not seem sufficiently complicated and important. They gave immediate results, offered money rewards, manipulated the difficulty of the task so as to favour one participant, but the equality bias persisted. Another way of putting it is that the participants either could not work out, or thought it unseemly to act upon, the fact that one of the pair might have been more accurate than the other, and that their opinions and judgments should be preferred. Of course, dyads may be too intimate. Surely larger groups would have no hesitation in downgrading the views of an incompetent observer? Testable hypothesis, for next time.

The authors have found subjects from three different cultures, and find identical results. Denmark may be immersed in Scandinavian consensuality,  but it cannot explain the results from Iran. Amusingly, it strikes me that if the task could be done by internet it would be possible to have culturally mixed teams (Danish/Iranian, Danish/Chinese, Chinese/Iranian) competing against culturally homogenous teams. This would give the implausible “diversity is strength” mantra a chance of showing that it had some empirical support, which currently is thin on the ground. Of course, if the equality bias is really universal it might be even stronger in the culturally mixed teams, who should be even more inclined to value equality, and thus inadvertently be more incompetent.

Typical experimentalists, the authors say absolutely nothing about their subjects, in the experimental tradition of showing that it is the experimental manipulation which counts. They give no IQ or personality measures, which is dreadful loss. Even a short Raven’s Matrices, or Digit Symbol, plus a brief personality inventory could have transformed the discussion of the results. Consulting a comparative psychologist might have helped them.

The apparent universality of their findings either shows that the researchers have found a handicapping universal bias or that their technique gives no opportunity for one person to lose trust in another’s judgment. It will not be the first time that an experimental setup fails to replicate real life, because the task may be too trivial to make anyone take the risky step of disparaging another’s judgement. For example, if a overconfident chief pilot attempts to land on the wrong runway even the most deferential of co-pilots will probably clear his throat, and suggest they go around and try again. In theory Crew Resource Management gets round the Confident Incompetence problem, even in pairs of pilots, by training each to respect the other’s area of expertise, and getting the junior to question the senior’s actions. On the other hand, if this experimental effect generalises to more complicated, important real life tasks, then the authors are to be congratulated for revealing a terrible aspect of all team work: mediocrity of outcomes and consensus catalepsy.

Trying to be upbeat, the authors speculate that the equality bias simplifies the decision process by reducing the cognitive load to a simple direct comparison of confidence. Equality bias may facilitate social coordination, with participants trying to get the task done with minimal effort but inadvertently at the expense of their joint accuracy. Equality bias helps participants quickly converge on social norms to reduce disharmony and chaos in joint tasks. Humans tend to associate and bond with similar others, and equality bias may assist this process.

However, when a wide competence gap separates group members, the best strategy is that each opinion be weighted by its reliability. Otherwise equality bias can be damaging for the group. Indeed, previous research has shown that group performance in the task described here depends critically on how similar group members are in terms of their competence. This is another reason for wanting some IQ data on the dyads.

The authors conclude: In the early years of the 20th century, Marcel Proust, a sick man in bed but armed with a keen observer’s vision, wrote “Inexactitude, incompetence do not modify their assurance, quite the contrary”. Indeed, our results show that, when making decisions together we do not seem to take into account each other’s inexactitudes. However, are people able to learn how they should weight each other’s opinions or are they, as implied by Monsieur Proust’s melancholic tone, forever trapped in their incompetence?

At the very least, this paper should be quoted widely, and with luck might even make people question strongly whether everyone’s opinion should be given equal weight.

Until then, just follow old Nick’s advice, which I paraphrase thus: consult only the competent.


  1. Usually us physicists and mathematicians are too busy to read Psychological Comments regularly. In the case of this article "Consensus Catalepsy" we dropped our pencils and took notice. Equality bias is an important concept for us to be aware of.

    One immediately asks questions such as: Can equality bias manifest itself in the hard sciences? In data collection by teams of scientists? What about in funding decisions, or in management decisions for large scientific projects?

    Next time I participate in a project review panel, I will be on the lookout for instances of harmful equality bias in the decision making process, and because of this article, will be better able to articulate the problem.

  2. Thank you for your comments. Group decision making research is a fraught business, and this new finding is only one part of the puzzle, but an interesting one because of the experimental methods used. Previous research has shown (probably) that group effects can sway individual judgment (the social conformity effect) and that groups can move towards blinkered thinking. What this method shows is a contrary problem, which is that politeness about opinions may interfere with a hard headed judgment as to the quality of the opinions. Physicists and mathematicians, whilst certainly being likely to be brighter than average, and a bit brighter than social scientists, will very probably be subject to the same biases. The moral, to my mind, is to try to rate the person giving the advice by their accuracy rate, and then adjust the composite score accordingly.. Two problems come to mind which vitiate my own proposal: the ideals of science require that we should examine a hypothesis whomsoever puts it forward, and in physics getting a handle on whether a hypothesis has merit may take 40 years. Over to you guys.

  3. Dr. Thompson, I have a personal question (very off-topic)...

    Recently took the Wechsler (WAIS-IV)— and to be honest, wasn't too happy with the scores. Only scored 120 on my Full Scale IQ (91 percentile), with each index score being around 120.

    My questions are:

    1. What does it mean if I scored low on the Similarities subtest? My vocabulary was 19/19 but my score on the Similarities was miserably low— 8/19 to be truthful. My test proctor said this had something to do with being unable to think abstractly, which was very surprising because I've always thought of myself as being an abstract thinker. What does Similarities test exactly? And can my low score have something to do with how the proctor gave the exam?

    2. Will such a score be prohibitively low in me achieving my ultimate goals of becoming an engineer? I plan on attending a prestigious university where the average SAT is about 2100/2400.

    3. Speaking of the SAT, I scored 2260/2400 on that exam, but that was after taking a prep class. What explains the huge discrepancy between my WAIS score (barely one standard deviation above the mean) and my SAT (2.5 SD above the mean)? Thank you for any response.

    1. In terms of your future, I think your SAT score will be far more important.Similarities is a good test of abstraction and the capacity to look for higher order classifications. You may be literal minded and practical, which is bad news for a poet but on balance a benefit in an engineer. If all this bothers you, try a variety of high level intelligence tests. The AH5 (Alice Heim 5) for UK undergraduates or some of the Mensa type tests. Or, better still, get an engineering placement somewhere as an intern and get some practical experience.

  4. My memory of decision-making in schoolboy rugby is as follows. Decisions to do with scrums and line-outs were made by the pack leader. Decisions in open play, on receipt of the ball, were made by the half-backs, particularly the fly-half. The captain would perhaps say something tactical at half-time, and would re-arrange the team if someone had to go off injured, or for any other reason. He also decided whether to kick for goal or for touch. A good pack leader, and a good skipper, might solicit opinions at half time, not usually by promoting general discussion but by aiming questions at particular players, as might be the prop forwards or the line-out jumpers. If you heard your opponents holding a general debate you tended to perk up, because it implied that they were in disarray.

    1. My memory of watching the school rugby players is that they discussed nothing. They did practice sessions which took up 6 days of the week, so probably felt that talk was a waste of breath. When I played as wing the spectators always yelled "corner flag" but since I was myopic that was of no help to me.

    2. "My memory of watching the school rugby players is that they discussed nothing": I may have been schooled in a more intellectual milieu than you.

  5. I should add that there were occasions for general discussion, but during a game wasn't one of them. The other decision maker, of course, was the teacher who selected the XV. I don't remember there being even the possibility of a debate with him.

  6. One of Parkinson's Laws was that a leader's cabinet tends to expand over time for political reasons (e.g., veterans want the head of Veteran's Affairs to get cabinet rank). But inevitably a new Inner Cabinet of about a half dozen emerges within the excessively bloated Outer Cabinet. One reason might be "equality bias" -- discussions work better if we treat our co-discussants as equals, but our tendency to bloat our formal decision-making bodies with hacks meets the real top guys need a new forum in private where they will consult with their true peers.

  7. Smart people are always want to do things the easy way. It is up to rest of us to help them build character.

  8. Every smart guy in a group project (or worse working in pairs) has dreaded equality bias. I would think we incline with mediocrity of outcomes to avoid social conflict.