Friday, 13 June 2014

Are Scottish reaction times slowing up?

We have spoken about the Scots many times, a Northern tribe whose exploits I have championed. Now grim news comes in from Woodley, Madison and Charlton suggesting that, for women at least, Scottish reaction times over the last 40 years have been slowing up, equivalent to a g equivalent decline of -7.2 IQ points, or -1.8 points per decade. This cannot be due to any problems with Victorian pendulums since the measures were taken with modern instruments. The rate of decline is higher than the –1.2 derived from both sexes since Victorian times. Men, with slower maturation, do not show such an effect.

Michael A. Woodley, Guy Madison, Bruce G. Charlton. Possible Dysgenic Trends in Simple Visual Reaction Time Performance in the Scottish Twenty-07 Cohort: A Reanalysis of Deary & Der (2005). Mankind Quarterly. In press.

In a 2005 publication, Deary and Der presented data on both longitudinal and cross-sectional aging effects for a variety of reaction time measures among a large sample of the Scottish population. These data are reanalyzed in order to look for secular trends in mean simple reaction time performance. By extrapolating longitudinal aging effects from within each cohort across the entire age span via curve fitting, it is possible to predict the reaction time performance at the start age of the next oldest cohort. The difference between the observed performance and the predicted one tells us whether older cohorts are slower than younger ones when age matched, or vice versa. Our analyses indicate a significant decline of 36 ms over a 40-year period amongst the female cohort. No trends of any sort were detected amongst the male cohort, possibly due to the well-known male neuro-maturation lag, which will be especially pronounced in the younger cohorts. These findings are tentatively supportive of the existence of secular declines in simple reaction time performance, perhaps consistent with a dysgenic effect. On the basis of validity generalization involving the female reaction time decline, the g equivalent decline was estimated at -7.2 IQ points, or -1.8 points per decade.

Get the whole thing here:


  1. Andrew Sabisky13 June 2014 at 11:05

    and yet I am reliably informed that choice RTs, which are more highly correlated with IQ, show no such effect. I am also unconvinced by the (most likely entirely post hoc) rationale behind splitting the sample by sex. It is all too easy to produce specious effects by torturing your data until they confess - capitalising on coincidences - and arbitrary splitting of samples is an extremely common way to do this (which is the sort of thing preregistration is supposed to prevent).

  2. @AS - Your criticism of data torturing is silly nonsense!

    This is a paper *replicating* the previous finding of simple reaction time slowing over time:

    Woodley, M.A., J. te Nijenhuis & R. Murphy. 2013. Were the Victorians cleverer than us? The decline in general intelligence estimated from a meta-analysis of the slowing of simple reaction time. Intelligence 41: 843-850.

    Simple reaction times were originally chosen to measure long term changes in general intelligence because it is a simple and robust test and there is data on simple RTs going back to the 1880s - this does not apply to choice RTs.

    So far, the data suggests that simple RTs have slowed a lot and since sRTs correlate with general intelligence, this means that general intelligence has declined significantly.

    If you want to dispute this, then find some decent data showing that sRTs have got faster over the past decades; or that slowing sRTs do not imply reducing intelligence - or some relevant data of some kind!

    At this point in the discussion, micro-methological quibbling just demonstrates a lack of understanding or a fixed prejudice.

    1. This comment has been removed by the author.

    2. oh for heaven's sake, it is not micro-methodological quibbling to say that a form of RT which correlates MORE CLOSELY with IQ shows NO increase over time in the Exact. Same. Sample. that you and your co-authors are analysing here. Nor is it micro-methodological quibbling to point out the acknowledged finding that the supposed dysgenic effect is not shown amongst males, and to question the rationale behind splitting the sample.

      Does the whole sample, men and women taken together, show the effect? Answer that question, please. Also please answer honestly whether or not you expected a positive result in women but a null finding in men before your analysed the data, because if so I'll eat a whole milliner's shop.

  3. Michael A. Woodley13 June 2014 at 15:27

    [First part]

    Firstly, lets look at the issue of choice RT vs. simple RT. Choice RT is much more sensitive to training than is simple RT. Jensen (2006) notes the following:

    "SRT to light, with maximum S–R compatibility shows virtually no practice effect (i.e., improvement) in RT after the first 10 trials. For CRTs, however, the practice effect persists over at least 10,000 trials and the CRT is a decreasing linear function of the logarithm (log10) of the number (N) of trials." (p. 48).

    This means that whilst choice RT may be a better correlate of IQ in within-cohort comparisons than simple RT, it is likely to fail to invariantly measure g between cohorts compared cross-sectionally. This is especially the case given that modern environments contain copious opportunities to train choice RT, such as via driving, video games and other electronic media. We know that practice effect gains are completely hollow (te Nijenhuis et al. 2007). In order to be sensitive to dysgenic effects, an indicator must be strongly measurement invariant with respect to g across cohorts in cross-section. This means that a relatively weaker within cohort measure of g might be more sensitive to cross-sectional trends in g than a relatively stronger one, if the latter flunks measurement invariance. This is the case with the Raven’s, which is a very strong individual-differences measure of g, but clearly measures something entirely different between cohorts (Armstrong & Woodley, 2014; Fox & Mitchum, 2013). As choice RT flunks this requirement (there would be no persistent practice effect otherwise), the trends with respect to this indicator tell us absolutely nothing about what g might be doing. Simple RT on the other hand is ideal for this purpose.

    Secondly, sexes absolutely should NOT be combined in analyses of RT. I learned this the hard way, as in response to my original Victorian’s paper (Woodley et al., 2013), no fewer than all four critical commentaries pointed out that sexes should not be combined owing to sex differences in the strength of the simple RT/IQ correlation, the size of the standard deviations, the differences in the means etc. This is why in the reanalysis of the original data my co-authors and I utilized only the male data. Prof. Charlton predicted in personal communication to me that when attempting to use combined cross-sectional and longitudinal data to tease-out secular trends, males would reveal little to no effect. This is on the basis that males are significantly lagged relative to females in terms of neuromaturation (Lenroot et al., 2007). Peak RT relates to neuromaturation, with RTs rising up until the point at which maturation is achieved (e.g. van Damme & Crombez, 2009). After this point, RT begins to decline owing to ageing effects. Therefore, given a) the fact that males exhibit bigger standard deviations relative to females for most traits, b) the significant developmental lag, and c) the fact that RT performance is consequently rising for longer in males than in females, we have a solid basis for expecting a whole-load of nothing when males of different ages are compared to females using this method. This was all explicitly predicted by Prof. Charlton before we even started looking for data with which to test this new method. The lack of any effect in the males is therefore a confirmatory prediction. Of course the developmental lag won’t have any effect when males of the same age (at peak maturation, i.e. 25-30) are compared with one another on a purely cross-sectional basis, as in Woodley et al. (2014).

    1. Michael A. Woodley13 June 2014 at 15:28

      [Second part]

      Thirdly, confirming the same effect over and over again using different methods and different traits is not called capitalizing on chance, it is referred to as multitrait-multimethod validation (please read Campbell & Fiske, 1959). Prof. Charlton is right about the inanity of micro-methodological criticism. I suggest that you start trying to differentiate the nomological forest from the micro-methodological trees.


      Armstrong, E. L., & Woodley, M. A. (2014). The rule-dependence model explains the commonalities between the Flynn effect and IQ gains via retesting. Learning and Individual Differences, 29, 41–49.

      Campbell, D. T., & Fiske, D. W. (1959). Convergent and divergent discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56,

      Fox, M. C., & Mitchum, A. L. (2013). A knowledge based theory of rising scores on “culture-free” tests. Journal of Experimental Psychology: General, 142, 979–1000.

      Jensen, A. R. (2006). Clocking the mind: mental chronometry and individual differences. Amsterdam: Elsevier.

      Lenroot, R. K., Gogtay, N., Greenstein, D. K. et al. (2007). Sexual dimorphism of brain development trajectories during childhood and adolescence. Neuroimage,. 36, 1065-1073.

      te Nijenhuis, J., van Vianen, A. E. M., & van der Flier, H. (2007). Score gains on g loaded tests: No g. Intelligence, 35, 283–300.

      van Damme, S. & Crombez, G. (2009). Measuring attentional bias to threat in children and adolescents: a matter of speed? Journal of Behavior Therapy and Experimental Psychiatry, 40, 344-351.

      Woodley, M.A., te Nijenhuis, J., & Murphy, R. (2014). Is there a dysgenic secular trend towards slowing simple reaction time? Responding to a quartet of critical commentaries. Intelligence, 46, 131-147.

      Woodley, M.A., te Nijenhuis, J., & Murphy, R. (2013). Were the Victorians cleverer than us? A decline in general intelligence estimated from a meta-analysis of the slowing of simple reaction time. Intelligence, 41, 843-850.

    2. Is there any evidence at all to suggest that CRT is an invalid measure of between-cohorts differences? And that driving and video games etc actually do constitute a practice effect for CRT? I might as well argue that the decreased modern participation in competitive sport ought to show up in increased CRT. Are you then arguing that we ought to see a Flynn Effect for CRT? This is all assumption and no substance.

    3. Michael A. Woodley13 June 2014 at 18:51

      [First part]

      This is getting bizarre. There is an established literature on the effects of video games in boosting choice RT performance (e.g. Dye et al. 2009; Green, et al., 2010). Simply Googling "Choice reaction time" and "video games" would have led you to these papers (and more). As for driving - again we are talking about a large and established literature connecting the two - also easily accessible online. (2014) even states the following:

      "Driving is a type of choice reaction test. Drivers must evaluate various stimuli and make different decisions depending on individual circumstances."

      Ergo, the more you drive, the more you practice your choice RT.

      Note that the component of choice RT that is improved by practice is NOT the actual reaction time itself, but decision time. People can be trained to make faster decisions, which has the effect of increasing overall performance on a joint measure of decision and reaction time such as choice RT. By contrast there is virtually no decision time component to simple RT - it taps pure speed, hence it cannot be effectively trained.

      If choice RTs are improving (and the results of the Deary and Der paper imply by our method that this is very likely to be the case), it means that we can strongly infer the increased presence of factors that are boosting decision times. Increased use of electronic media and driving being two solidly plausible candidates (with the latter being more likely than the former to have played a role in this particular cross-section). An alternative possibility is that secular increases in extraversion (Twenge, 2001) could be driving the effect, as more extraverted cohorts will make snappier decisions than ones that are less so. There is literature showing that extraverts exhibit faster decision times than introverts (starting with Shanmugan, 1965, see also Eysenck, 2009). I therefore explicitly and confidently predict that the Flynn effect on choice RT will not be on g, but will instead either be a practice effect, or a consequence of secular trends in personality, or even some combination of the two and will therefore be concentrated on decision time. Thus it will flunk measurement invariance with respect to g in a way that is absolutely typical of Flynn effects on other measures (e.g. Wicherts et al. 2004).

    4. Michael A. Woodley13 June 2014 at 18:52

      [Second part]

      I suggest reading Figueredo and Berry (2002) on how telling 'Just-not-so' stories does not a compelling rebuttal make. The take-home message is that an adequate counter to the findings presented by my colleagues and I must be able to:

      a) Account for the overall pattern of the results parsimoniously (i.e. by not invoking dozens of different hypotheses to account for what we can account for using one or a small number of complimentary hypotheses).

      b) Make new predictions, which if confirmed would lead us to strongly prefer the alternative model.

      Saying that something is 'just not so' is simply a way of protecting some 'hard core' belief that is resistant to being overturned or updated for extra-scientific reasons. Lakatos termed this approach to doing science ‘degenerative’ with good reason.


      Dye, M. W. G., Green, C. S., & Bavelier, D. (2009). Increasing speed of processing with action video games. Current Directions in Psychological Science, 18, 321-326.

      Twenge, J. (2001). Birth cohort changes in extraversion: a cross-temporal meta-analysis, 1966-1993. Personality and Individual Differences, 30, 735-

      Eysenck, H. J. (2009). The biological basis of personality. New Jersey: Transaction Publishers (Third printing).

      Figueredo, A. J., & Berry, S. C. (2002). “Just not so” stories: Exaptations, spandrels, and constraints. Behavioral and Brain Sciences, 25, 517-518.

      Green, C. S., Pouget, A., & Bavelier, D. (2010). Improved probabilistic inference as a general learning mechanism with action video games. Current Biology, 20, 1573-1579. (2014). How does aging affect reaction times.

      Shanmugan, T. E. (1965). Personality, severity of conflict and decision time. Journal of the Indian Academy of Applied Physiology, 2, 13-22.

      Wicherts, J. M., Dolan, C. V., Hessen, D. J., Oosterveld, P., van Baal, G. C. M.,
      Boomsma, D. I. & Span, M. M. (2004). Are intelligence tests measurement invariant over time? Investigating the nature of the Flynn effect. Intelligence, 32, 509–537.

  4. @James T - BTW some readers might find it easier to understand the preliminary numb-skull/ pen and paper analysis I did for my own interest before the Big Statistical Guns of Woodley and Madison were brought to bear on the data:

  5. This comment has been removed by the author.

  6. This exchange is better than Holland 5 Spain 1

    1. Quite.

      Michael, thank you so much for your detailed replies. I appreciate them greatly. Your model of cohort effects for CRT makes very specific predictions which it should be fairly easy to check, so when I've got round to doing that perhaps James would be amenable to publishing the results here?

  7. I think Woodley is correct about genetic intelligence declining, however in my humble opinion, this article overestimates the decline in the same way his last one did. Both inappropriately double the effect size by dividing it by the (synthetic) g loading of reaction time. I explain why I consider this is inappropriate here:

  8. Simple googling works for answering that guy and his concerns about cross-sections by sex, huh?

    Eventually you'll figure out that simple reaction time can vary by ~100ms depending on temperature in healthy adult humans. These effects are known to subsets of the medical community, show up in random literature occasionally even if not read or cited by psychometricians, and likewise are known in some niche professional practices like military training,

    I've not seen anyone anywhere give a single satisfactory account of experimental methodology around any of the Victorian era or otherwise "old" historical data that alleviates the problem of huge variance due to environmental conditions. I think we understand this is pretty much because they did not follow consistently controlled, or blinded, or randomized, well documented practices, nobody's fault and nothing anyone can do about it now, but that just makes the data simply unreliable with a high likelihood of spurious results. Sure, there is a small chance there is a clever way to attempt to correct for this and get something useful. At least, maybe someone will get around to running a study of their own with a thousand present day subjects to fully explore the extent of predicted effects, heteroskedasticity with fully representative samples of the entire general population and so on.