Psychological comments: ORIGINAL PAPER: "A response to Prof Rabbitt – The Victorians were still cleverer than us" by Woodley, te Nijenhuis and Murphy

Sunday, 19 May 2013

ORIGINAL PAPER: "A response to Prof Rabbitt – The Victorians were still cleverer than us" by Woodley, te Nijenhuis and Murphy

A response to Prof Rabbitt – The Victorians were still cleverer than us

By Michael Woodley, Jan te Nijenhuis, and Raegan Murphy

Professor Rabbitt has reacted to our interpretation of the secular trend in simple reaction time speeds first detected by Silverman (2010), and validated by us (Woodley, te Nijenhuis & Murphy, 2013). We would like to thank professor Rabbitt for his interest in our work and for being one of the first to substantially contribute to the scientific discussion that was started by our paper. Rabbitt makes several interesting points of criticism – here we will show however that these do not constitute sufficient grounds to reject the reality of the secular slowing of simple reaction time.

Firstly, Rabbitt argues that the level of inaccuracy in instrumentation designed to measure simple reaction time was historically quite high, especially in the pre-1970’s era where he argues that it was on the order of 100 or so ms. Rabbitt then goes on to state paradoxically that a reading of 200 ms might therefore fall between 200 and 299 ms, which assumes a bias of 99 rather than 100 ms, and also that the instrumentation would consistently ‘round down’ reaction time estimates. In actuality a bias of 100 or so ms would yield an average bias of 50 ms either way, assuming that the error due to bias was normally distributed, and that there was no tendency for biases to be skewed in one direction rather than in the other. Rabbitt does not provide any evidence for such a tendency towards rounding down – he merely states this as a fact apparently based on personal experience with pre and post-1970’s instrumentation.

Secondly, Rabbitt argues that method variance across studies employing different instrumentation makes direct mean-wise comparison of results problematic. He illustrates this via reference to the use of warning signals along with the signal intensities, durations and rise-times of different light sources (such as bulbs, fluorescent tubes, LEDs, computer monitors, etc), and also with respect to response keys that might have been non-uniformly ‘sticky’ across different apparatus.

Thirdly, Rabbitt argues that the presence of only two data points from the Victorian era in our studies means that we can “… leave aside an important question whether there is any sound evidence that creativity and intellectual achievements have declined since the Great Victorian Flowering”.

In addressing the first of Rabbitt’s claims, we are skeptical about the suggested level of inaccuracy in pre-70’s era instrumentation (such as Galton’s apparatus and the electro-mechanical Hipp chronoscope). True millisecond resolution in measurement had been achieved far earlier than Rabbit claims, namely in 1908 (Haupt, 2001), with instruments prior to that being typically accurate to at least a hundredth of a second. It is not obvious why decent resolution (perhaps on the order of a hundredth of a second) would not have been within the grasp of someone of Galton’s mental stature and notoriously obsessive attention to detail (Rose & Rose, 2011). His apparatus was described in an 1889 paper and employed a half-second pendulum, whose duration could be estimated using very basic mathematics. Its release occurred concomitantly with the concealing of a white paper disk, which functioned as the stimulus - depressing a key facilitated its capture, registering the reaction-time score. Similarly the much more sophisticated Hipp chronoscope, with its electro-mechanical clutch-based mechanism was capable of true millisecond resolution (Haupt, 2001). The issue of true millisecond resolution is at any rate rendered moot in light of the fact that we are dealing with the means of a large number of individuals measured by Galton and others in multi-trial type experiments. Resolutions of hundredths of a second would seem to suffice in such samples (Haupt, 2001).

These observations aside, there is a far more substantive problem with Rabbitt’s primary claim, namely that, even assuming a normally distributed 100 ms level of inaccuracy, the preponderance of pre-1970 studies still reveal upper bound means for simple reaction time that are shorter in duration than the sample size weighted ‘true millisecond resolution’ mean of post-1970 studies.

Table 1

Reaction time means for five pre-1970 studies used in Woodley et al. (2013) along with estimates of error due to sub-100 ms measurement imprecision

Reported mean (combined and N-weighted for the sexes where available)	Error range assuming 50 ms either way
184.3 ms (Galton, 1890’s)	134.3-234.3 ms
208 ms (Thompson, 1903)	158-258 ms
197 ms (Seashore et al., 1941)	147-247 ms
203 ms (Seashore et al., 1941)	153-253 ms
286 ms (Forbes, 1945)	236-336 ms

Weighted mean of post-1970 studies = 264.1 ms

Based on Table 1, assuming a normally distributed 100 ms inaccuracy, the upper estimate falls below the post-1970 ‘true millisecond resolution’ mean in four out of five cases (the exception being the study of Forbes, 1945). The cumulative odds of this being a chance result can easily be calculated. Let us assume a 50% chance that the instruments would produce a mean value whose upper-bound estimate falls above that of the post-1970’s study. The odds of four studies producing consecutive means whose values are lower is equal to 0.5*0.5*0.5*0.5, or 6.25%. In other words, the probability that this is a chance finding is small. If we add to this the systematic review of Ladd and Woodsworth (1911), which found a mean for 19^th- and early 20^th-century samples of 192 ms, and whose hypothetical upper mean also falls below the weighted post-1970 mean (242 ms), the cumulative odds of this being a chance finding fall to 3.12%.

Secondly, and again assuming high inaccuracy, why are the results of the pre-1970's studies likely to be overestimates rather than underestimates of the true values? Let’s look at the sources of bias that Rabbitt describes. Sticky keys might require more force to in order to register a result. This was more likely to have been a problem in the case of earlier studies employing cruder instruments, such as mechanical or hybrid electro-mechanical apparatuses, rather than computer-based ones, for example. This suggests that the bias would have been in the opposite direction for earlier studies to that described by Rabbitt. Sticky keys would necessarily lengthen rather than shorten reaction time estimates. Long-duration visual signals, and also ones that are more intense and exhibit rapid rise-times typically elicit faster (or maximal) reaction times (Kosinski, 2012). Galton’s apparatus used a purely mechanical signal in the form of a paper disk, which could be made to disappear via the operation of levers, thus triggering the subject to depress a key and halt the swing of a half-second pendulum. The signal duration was therefore indefinite – persisting until the point at which the apparatus would be reset. It is hard to argue against the high visibility of such a signal either, assuming a well-lit laboratory. Subsequent studies employing the Hipp chronoscope such as Thompson (1903) and the studies described in Ladd and Woodsworth (1911) would have employed light sources. Thompson (1903) for example employed a Geissler tube suspended against a black background which was reported as producing a “flash of pale purple light” that was “thrown out sharply” (p. 8). Geissler tubes are plasma-discharge or fluorescence-based illumination sources. Fluorescent light sources exhibit extremely rapid rise-times compared to filament-based incandescent bulbs, for example (Sivak, Flannagan, Sato, Traube & Aoki, 1993).

Whilst the issue of signal duration in these early studies employing light sources as stimuli is indeed problematic, the suboptimal tendency is towards shorter duration signals (i.e. brief flashes), which would lengthen rather than shorten reaction time estimates. It is long-duration visual signals that permit the recovery of accurate maximal reaction time latencies (Kosinski, 2012). Once again, any measurement error in these earlier instruments would tend to skew the estimates towards higher rather than lower latencies.

What of the issue of warning signals? As Silverman (2010, p. 41) reports, there is very little evidence that warning signals actually make a difference to recorded reaction time latencies, especially when the ensuing stimulus is unpredictable, as was the case in all studies employed in our and Silverman’s analyses. It is unlikely that Galton utilized a warning system in his single person-single trial study. Thompson (1903), however, did use an audio warning system in her study involving multiple trials per person. The difference in the means between the two studies is extremely small (18.7 ms), and in the opposite direction to that predicted by the theory that the presence of a warning signal reduces the latency of reaction time means. This strengthens Silverman’s conclusion that employing warning signals makes little difference.

We agree with Rabbitt, and also Jensen (2011), who both argue that method variance between studies can be a substantial problem when it comes to comparing between different studies, especially those using different instrumentation. However, Rabbitt seems to have missed the point of the meta-analytic nature of our own and Silverman’s study. Indeed, the study of Silverman (2010) set out to explicitly address the issue of method variance using a stringent set of seven inclusion rules (p. 41) coupled with a detailed meta-analytic search. The rules were selected on the basis that all studies included in the comparison set should be as closely matched with respect to Galton’s study on as many dimensions as possible. The stringency of these rules means that method variance across studies is substantially reduced, however the trade-off is that the number of potentially usable studies is also massively reduced. Our meta-regression ultimately demonstrates the power of a properly conducted meta-analysis in this regard as we found no significant role for moderators in explaining the secular trend towards increasingly latent simple reaction time performance. There is scatter around the regression line, but that is exactly what meta-analytical theory predicts. All data points being on or very close to the regression line is an extremely unlikely outcome for a meta-analysis (see Hunter & Schmidt, 2004).

Finally, what of the issue of sound evidence for the greater accomplishments of 19^th-century Western populations relative to contemporary ones? This is an important issue that has been addressed quantitatively using historiometry, which is the historical study of human progress or individual personal characteristics, using statistics to analyze references to geniuses, their statements, behavior and discoveries in relatively neutral texts (Simonton, 1984). Historiometric research into innovation rates and the lives and accomplishments of eminent individuals (geniuses) has shown that the per capita rate (i.e. events per billion of the population per year) of significant innovation and also geniuses in science and technology peaked in the late 19^th century, after a long period of increase. Throughout the 20^th century there was a decline (Huebner, 2005; Murray, 2003).

What is a significant innovation? It is simply one that is conspicuously different from anything that came before – so much so that multiple encyclopedists and compilers of inventories of innovation are likely to independently note it. Examples include the development of the plough, the steam engine, splitting the atom and putting a man on the moon. The iPhone 5 is not a significant innovation in comparison with its earlier incarnations by contrast, and is unlikely to be considered as such by contemporary historians of science and technology. Similarly geniuses can be rated via the degree to which these same sources reference them. The use of a ‘convergence’ criterion based on prominence across encyclopedias not only allows us to reasonably quantify the frequencies of significant innovation and geniuses throughout the history of civilization, but it also allows us to rank those same innovations and individuals in terms of importance. This historiometric technique, like many extremely useful ideas, has its origins in the writings of Galton (1869).

In conclusion, whilst Rabbitt’s criticisms are interesting, they are clearly insufficient grounds for rejecting the central claims made in our paper – namely that the secular trend in increasing simple reaction time latency is robust and translates into a decline of -1.23 IQ points per decade or -14.1 points since Victorian times.

References

Forbes, G. (1945). The effect of certain variables on visual and auditory reaction times. Journal of Experimental Psychology, 35, 153–162.

Galton, F. (1869). Hereditary genius. London, UK: Macmillan Everyman's Library.

Galton, F. (1889). An instrument for measuring reaction time. Report of the British Association for the Advancement of Science, 59, 784–785.

Haupt, E. J. (2001). Laboratories for experimental psychology: Gottingen’s ascendancy over Leipzig in the 1890s. In: Rieber, R. W., & Robinson, D. K. (Eds.), Wilhelm Wundt in history. The making of a scientific psychology. (pp. 205-250). New York, NY: Kluwer Academic.

Huebner, J. (2005). A possible declining trend for worldwide innovation. Technological Forecasting and Social Change, 72, 980–986.

Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis (2nd Ed.): Correcting error and bias in research findings. Thousand Oaks, CA: Sage.

Jensen, A. R. (2011). The theory of intelligence and its measurement. Intelligence, 39, 171–177.

Kosinski, R. J. (2012). A literature review on reaction time. http://biae.clemson.edu/bpc/bp/lab/110/reaction.htm

Ladd, G. T., & Woodworth, R. S. (1911). Physiological psychology. New York, NY: Scribner.

Murray, C. (2003). Human accomplishment: The pursuit of excellence in the arts and sciences, 800 BC to 1950. New York, NY: Harper Collins.

Rose, H., & Rose, S. (2011). The legacies of Francis Galton. The Lancet, 377, 1397.

Simonton, D. K. (1984). Genius, creativity and leadership: Historiometric inquiries. Cambridge, MA: Harvard University Press.

Sivak, M., Flannagan, M. J., Sato, T., Traube, E. C., & Aoki, M. (1993). Reaction times to neon, LED, and fast incandescent brake lamps. The University of Michigan Transportation Research Institute, Report Number. UMTRI-93-37.

Seashore, R. H., Starmann, R., Kendall, W. E., & Helmick, J. S. (1941). Group factors in simple and discrimination reaction times. Journal of Experimental Psychology, 29, 346–394.

Silverman, I. W. (2010). Simple reaction time: It is not what it used to be. The American Journal of Psychology, 123, 39–50.

Thompson, H. B. (1903). The mental traits of sex. An experimental investigation of the normal mind in men and women. Chicago, IL: The University of Chicago Press.

Woodley, M. A., te Nijenhuis, J., & Murphy, R. (2013). Were the Victorians cleverer than us? The decline in general intelligence estimated from a meta-analysis of the slowing of simple reaction time. Intelligence. Doi:10.1016/j.intell.2013.04.006

14 comments:

B.B.19 May 2013 at 16:22
Where can I read Professor Rabbitt's article?

B.B.
ReplyDelete
Replies
Unknown19 May 2013 at 20:07
http://deevybee.blogspot.co.uk/
It should be the first post on the list, with a good photo of Galton's lab
ReplyDelete
Replies
Elijah Armstrong19 May 2013 at 21:55
As I note in my response to Woodley et al. (pending peer-review in PAID), -1.23 points per decade is extraordinarily high. As Herrnstein and Murray pointed out, 3 IQ points worth of decline means a 42% decline in people with 130+ IQs. Hence in the last 30 years, if Woodley et al. are right, the number of people who have IQs of over 130 (by 1980s standards) has about halved! I don't think this is possible, quite frankly.
ReplyDelete
Replies
Anne20 May 2013 at 09:37
Great Britain experienced a huge outflow of people emigrating to the North America and Australia once steamships were running reliably. I wonder if poor but bright people were more likely to leave.
ReplyDelete
Replies
Jan te Nijenhuis20 May 2013 at 10:34
@Elijah Armstrong,
In Victorian times there was an explosion of creativity, but the amount of people that went to university or studied to become an engineer was very small. Research budgets were miniscule. People worked long hours to be able to put food on the table and there was little time left for discoveries.
However, in our time most people with the IQ to go to university also do. Western countries are astonishingly rich: there are billions of dollars and euros for research. There is much more spare time to spend on your hobbies and fascinations. So, shouldn't we have dozens more per capita big inventions than the Victorians?
ReplyDelete
Replies
Jan te Nijenhuis20 May 2013 at 10:37
@Frau Katze [great name!],
Indeed, many British emigrated to North America, Canada, South Africa, New Zealand, and Australia. However, if you study the IQ scores of the people in the colonies you will see that their scores are very similar to the scores in the UK. This suggest that over a longer period of time the immigrants were representative of British society.
ReplyDelete
Replies
dearieme21 May 2013 at 16:02
" ... a significant innovation ... is simply one ... that multiple encyclopedists and compilers of inventories of innovation are likely to independently note [it]."

Since encyclopaedists are forever pillaging each others' work the chances of "independently" occurring are slight.

Anyhow, the statement is a mere assertion that might be better phrased as "I haven't thought of a better way to measure this so I'm going to claim that this is the best, perhaps the only, way to do it." But how can you test it? If you can't the claim is unscientific.

"Similarly geniuses can be rated via the degree to which these same sources reference them." Come now. You'd probably be as well using the Einstein's Wall method. On Einstein's wall hung portraits of Newton, Faraday and Clerk Maxwell, so the four greatest physicists are those three plus Albert himself: this too is a windy assertion but at least has the advantage of avoiding the bogus scientism of the count-the-references-in-the biggest-books-in-the-library technique
ReplyDelete
Replies
deevybee24 May 2013 at 17:30
Prof Rabbitt has posted a reply as a postscript to the original post. You can find it if you scroll down on this post http://bit.ly/10XBWKK
ReplyDelete
Replies
Flint24 May 2013 at 18:14
Keep in mind... Woodley may be better known for his studies of Sea Serpent Taxonomy.

http://www.cryptomundo.com/cryptozoo-news/crypto-pinnipeds/
http://www.forteantimes.com/reviews/books/3169/in_the_wake_of_bernard_heuvelmans.html
http://publicationslist.org/M.A.Woodley
http://scholar.google.co.uk/citations?user=mmoY0-kAAAAJ&hl=en
http://www.youtube.com/watch?v=FbPb-etyiG0#t=2m18s

When he isn't working on taxonomies of Sea Serpents for Crypto-zoology; he is trying to argue that race among humans constitute subspecies with impact on intelligence, democratization and GDP.

"M A Woodley (2010) Is Homo sapiens polytypic? Human taxonomic diversity and its implications Medical Hypotheses 74: 1. 195-201
Abstract: The term race is a traditional synonym for subspecies, however it is frequently asserted that Homo sapiens is monotypic and that what are termed races are nothing more than biological illusions. In this manuscript a case is made for the hypothesis that H.sapiens is polytypic, and in this way is no different from other species exhibiting similar levels of genetic and morphological diversity. First it is demonstrated that the four major definitions of race/subspecies can be shown to be synonymous within the context of the framework of race as a correlation structure of traits. Next the issue of taxonomic classification is considered where it is demonstrated that H.sapiens possesses high levels morphological diversity, genetic heterozygosity and differentiation (FST) compared to many species that are acknowledged to be polytypic with respect to subspecies. Racial variation is then evaluated in light of the phylogenetic species concept, where it is suggested that the least inclusive monophyletic units exist below the level of species within H.sapiens indicating the existence of a number of potential human phylogenetic species; and the biological species concept, where it is determined that racial variation is too small to represent differentiation at the level of biological species. Finally the implications of this are discussed in the context of anthropology where an accurate picture of the sequence and timing of events during the evolution of human taxa are required for a complete picture of human evolution, and medicine, where a greater appreciation of the role played by human taxonomic differences in disease susceptibility and treatment responsiveness will save lives in the future."
ReplyDelete
Replies

Add comment