Tuesday 26 August 2014

The Woodley Challenge

 

For some time now I have been getting tired of the “correlation is not causation” mantra. This slogan is true as far as it goes, but it tends to be used so as to argue that, despite many correlations linking A with B being found in different circumstances, these will somehow never suffice to strongly suggest a causal link between A and B. On the contrary, I argue that correlation is a necessary feature of causation, but not a sufficient proof. I want to change the slogan to: Correlation is not always causation, but it helps find causes.

In doing all this I half-remembered a challenge set by Michael Woodley at the London Conference on Intelligence last April, so after getting the wording from him again, I thought I would bring it to a wider audience:

"Sure, correlation does not equal causation, but find me just one single instance of a causal relationship where there is no correlation (just one would suffice)."

As befits a challenge, I will be offering the traditional bottle of wine to the best instance. Woodley judges, I arbitrate if required, and provide the bottle of wine.

18 comments:

  1. Increasing CO2 in the atmosphere causes a rise in the average global temperature. Stated more carefully, increasing CO2 doesn't alter the energy that the earth receives as solar radiation, but does change the equilibrium energy balance by increasing "back radiation," i.e. the atmosphere's "slowing" of the emission of radiation to outer space in the infra-red portion of the spectrum. Over the long run, this can be confidently calculated to increase the average "temperature anomaly" by about 1.5 C per doubling of CO2 (reference).

    However, as all readers know, the earth's climate is not a simple system. Its climate (including the greenhouse effect) includes many feedbacks, both positive and negative. But the net effect of rising CO2 must be positive -- removing a greenhouse gas would hardly warm the planet.

    Despite this clear-cut causal relationship, the rise of the earth's temperature (its energy-balance point) has more-or-less paused for over a decade. See graphs and discussions here and here.

    This example is unlikely to garner me that bottle of Château Mouton Rothschild, because the temperature anomaly has clearly increased over the course of the second half of the 20th century, more-or-less in line with the consensus received wisdom. Still, the current pause is an instance of the decoupling of causation and correlation.

    ReplyDelete
    Replies
    1. Dear AMac, thanks for your example. As you say, the earth's climate is not a simple system. Your stated correlation is between CO2 concentration and air temperature, and a recent article in Nature strongly suggested that the decade long pause was a temporary effect caused by the oceans absorbing the increased heat and transporting it deeper down. This will not last for long, apparently, and then the longer term rise, as you correctly stated, is very likely to continue. Furthermore, pace Fisher, we can show that heat is retained in artificially CO2 enriched laboratory atmospheres, so the causal mechanism is not in doubt. Not my subject, but I assume that water, covering 71% of the world's surface, might be able to absorb extra heat for some time. Can I put you in the category of front runner so far for an honorable mention ?

      Delete
    2. Judith points out that half of the presumed anthropogenic warming could be due to Atlantic El Ninio like oscillations, which is a new story for me. Interesting stuff.

      Delete
  2. What about suppression effects (especially those not yet discovered).

    e.g., from http://www.soc.iastate.edu/sapp/soc415SSRelationships.html

    "Consider this theoretical proposition: The greater the academic performance, the greater the job performance. For challenging jobs, there is a causal connection between academic performance and job performance. For routine jobs or those involving much repetition, however, there is little correlation between academic performance and job performance. The extraneous variable is boredom with the job. For routine or repetitive jobs, persons who excel academically have poor job performance because they tend to get bored."

    So the correlation between AA and JP is zero, even though AA is causal (you can see a diagram of this at the link above).

    Assume also that no one has yet identified boredom as the suppressor variable. Thus, AA causes JP without a (direct) correlation.

    Not 100% committed to this example; just trying for your wine.

    ReplyDelete
    Replies
    1. Theoretical propositions don't even get a glass of municipal water! On the hypothetical example, I think that the link between scholastic achievement and job performance is pretty solid throughout the range of occupations, including very routine ones. Linda Gottfredson has looked at US Army training data for pretty simple tasks. However, I appreciate the effort. Thanks

      Delete
    2. I demand an appeal to Sir Woodley!

      Delete
    3. I will see if I can raise him from his magnificent study in his stately abode.

      Delete
    4. Structural equation model can better deal with suppression effect than classical multiple regression, because the latter does not give you the correlations between the predictors.

      This article is irrelevant, if you want to read more of it :

      Zhao, X., Lynch, J. G., & Chen, Q. (2010). Reconsidering Baron and Kenny: Myths and truths about mediation analysis. Journal of consumer research, 37(2), 197-206.

      Delete
  3. Following up on that, our paper shows (using the 50 US states as the unit of analyses), no correlation between state racial composition and voting for Obama (I'd argue there is indeed a causal relationship here).

    Various well-being variables (including state IQ) have suppressed the direct correlation between race and Obama votes.

    For example, race correlates .088 with votes cast for Obama. After also controlling IQ, it correlates .302.

    https://campusdrive.csuohio.edu/Users/1001180/iq_obama_final.pdf?uniq=227991

    ReplyDelete
  4. The better a thermostat works, the better it will suppress the (actual, causal) influence of the outside temperature on the temperature inside the house.

    In my own research, I was surprised to find how low the correlation between proximity to the Mexican border and the proportion of the population that is foreign-born was in a sample of U.S. cities. This correlation rose substantially when I controlled for historical population sizes (immigrants tend to move to cities that are already large). In this example, the correlation was not zero, but I guess you could construct the sample so that it comes very close (eliminate large cities close to the Mexican border).

    ReplyDelete
  5. Sorry for multiple posts, thirsty:

    Likewise, IQ correlates .137 (NS) with votes cast for Obama. After also controlling race (% black or hispanic), it correlates .33.

    It's widely accepted that the correlation between IQ and red states / blue states is an urban legend (see snopes.com). It's not, if one also controls for race!

    ReplyDelete
  6. So BGI uses a negative control group? I guess that you need a low-functioning baseline in addition to a normal for comparison with the smart people. 'pdychologists' indeed...

    ReplyDelete
  7. In econometrics, there is a technique widely known in time series regression: Granger causality test. It's a multiple regression, generally with 2 predictors. It is stated as follows: a variable X Granger cause Y if Y can be better predicted by the lagged values of both X and Y than by the lagged values of Y alone. Put it otherwise, you should have Xt, Xt-1 and Yt-1 in the set of independent variables, and Yt in the dependent variable. It will tell you how the changes in X would affect the changes in Y. This is how I advise to do the research but it needs to have longitudinal samples. Time series is different and requires a strong assumption; stationarity, but because it is usually violated, econometricians use detrending and differencing to transform the variables. However, it's not relevant for psychological research where the columns in the data are the variables and the rows are the subjects. There is also another technique widely known in econometrics, which is considered to be able to answer the question of reverse causality. It's through instrumental variable (IV) regressions, sometimes termed 2SLS models. I don't know a lot about IV regressions, but my impression is that it is unlikely to answer the question of reverse causality. Maybe it has its use but I rarely see that technique applied in most research in psychology.

    That said, my feeling is that those people who generally use this expression "correlation is not causality" do not really understand what it implies. They just use this expression when they don't like the conclusion of the research. When the result is in line with their beliefs, they will just say "just as I expected, it's too obvious".

    ReplyDelete
  8. Well of course! Confounding factors can raise or lower the correlation, making it bigger or bringing it to zero. This is Stats 101. Of course if we talk about pure correlations, there will never be a decoupling of causation and correlation. But we can never be sure we've controlled for all the possible confounding variables. So I think this challenge is just ill defined and not even worth a bottle of beer.

    ReplyDelete
  9. Meng Hu and Merculinus, thank you for your comments. I agree with Meng Hu that the trope "correlation is not causality" tends to be used for associations people do not like. Longitudinal studies in psychology are rare but very instructive. For example, some of the Longitudinal Study of Youth results used by Charles Murray were later used to look at the second generation, This strengthens the case for a causal interpretation of the link between maternal IQ and later child achievements. On the point raised by Merculinus, "we can never be sure we've controlled for all of the possible confounding variables" I agree with the general principle, but I suppose that after controlling for most things (though not all things) I would be willing to say, on balance of probabilities, that we had a probably causal relationship. That is, I would give it some weight in a Bayesian sense, and would want some convincing that the other, unspecified, possible confounders were likely to have an effect. I would not want it to degenerate into "It may be something else, anything else" without more solid evidence as to what the confounder actually is, and how it operates, and why it might operate in the particular case.

    ReplyDelete
    Replies
    1. I agree with you but my point is that the challenge is not precise enough to deserve more than a pint of beer. So the challenge should be more clearly defined as: "find me just one single instance of a causal relationship where there is no correlation- after we've controlled for most of the possible confounding variables". But this leaves too much subjectivity as to when we think we've included most of the relevant variables.Half a pint to me for showing that the challenge cannot have a winner(of a bottle of wine)?

      Delete
  10. Who said "Correlation is not causation, but it's the way to bet"?

    ReplyDelete