A response to Prof Rabbitt – The Victorians were still cleverer than us
By Michael Woodley, Jan te Nijenhuis, and Raegan
Murphy
Professor Rabbitt has reacted to our interpretation of
the secular trend in simple reaction time speeds first detected by Silverman (2010),
and validated by us (Woodley, te Nijenhuis & Murphy, 2013). We would like
to thank professor Rabbitt for his interest in our work and for being one of
the first to substantially contribute to the scientific discussion that was
started by our paper. Rabbitt makes several interesting points of criticism –
here we will show however that these do not constitute sufficient grounds to
reject the reality of the secular slowing of simple reaction time.
Firstly, Rabbitt argues that the level of inaccuracy
in instrumentation designed to measure simple reaction time was historically
quite high, especially in the pre-1970’s era where he argues that it was on the
order of 100 or so ms. Rabbitt then goes on to state paradoxically that a
reading of 200 ms might therefore fall between 200 and 299 ms, which assumes a
bias of 99 rather than 100 ms, and also that the instrumentation would
consistently ‘round down’ reaction time estimates. In actuality a bias of 100
or so ms would yield an average bias of 50 ms either way, assuming that the
error due to bias was normally distributed, and that there was no tendency for
biases to be skewed in one direction rather than in the other. Rabbitt does not
provide any evidence for such a tendency towards rounding down – he merely states
this as a fact apparently based on personal experience with pre and post-1970’s
instrumentation.
Secondly, Rabbitt argues that method variance across
studies employing different instrumentation makes direct mean-wise comparison
of results problematic. He illustrates this via reference to the use of warning
signals along with the signal intensities, durations and rise-times of
different light sources (such as bulbs, fluorescent tubes, LEDs, computer
monitors, etc), and also with respect to response keys that might have been
non-uniformly ‘sticky’ across different apparatus.
Thirdly, Rabbitt argues that the presence of only two
data points from the Victorian era in our studies means that we can “… leave
aside an important question whether there is any sound evidence that creativity
and intellectual achievements have declined since the Great Victorian
Flowering”.
In addressing the first of Rabbitt’s claims, we are
skeptical about the suggested level of inaccuracy in pre-70’s era
instrumentation (such as Galton’s apparatus and the electro-mechanical Hipp
chronoscope). True millisecond resolution in measurement had been achieved far
earlier than Rabbit claims, namely in 1908 (Haupt, 2001), with instruments
prior to that being typically accurate to at least a hundredth of a second. It
is not obvious why decent resolution (perhaps on the order of a hundredth of a
second) would not have been within the grasp of someone of Galton’s mental
stature and notoriously obsessive attention to detail (Rose & Rose, 2011).
His apparatus was described in an 1889 paper and employed a half-second
pendulum, whose duration could be estimated using very basic mathematics. Its
release occurred concomitantly with the concealing of a white paper disk, which
functioned as the stimulus - depressing a key facilitated its capture,
registering the reaction-time score. Similarly the much more sophisticated Hipp
chronoscope, with its electro-mechanical clutch-based mechanism was capable of
true millisecond resolution (Haupt, 2001). The issue of true millisecond
resolution is at any rate rendered moot in light of the fact that we are
dealing with the means of a large number of individuals measured by Galton and
others in multi-trial type experiments. Resolutions of hundredths of a second
would seem to suffice in such samples (Haupt, 2001).
These observations aside, there is a far more
substantive problem with Rabbitt’s primary claim, namely that, even assuming a
normally distributed 100 ms level of inaccuracy, the preponderance of pre-1970
studies still reveal upper bound means for simple reaction time that are
shorter in duration than the sample size weighted ‘true millisecond resolution’
mean of post-1970 studies.
Table 1
Reaction time
means for five pre-1970 studies used in Woodley et al. (2013) along with
estimates of error due to sub-100 ms measurement imprecision
Reported mean
(combined and N-weighted for the
sexes where available)
|
Error range
assuming 50 ms either way
|
184.3
ms (Galton, 1890’s)
|
134.3-234.3 ms
|
208
ms (Thompson, 1903)
|
158-258 ms
|
197
ms (Seashore et al., 1941)
|
147-247 ms
|
203
ms (Seashore et al., 1941)
|
153-253 ms
|
286
ms (Forbes, 1945)
|
236-336 ms
|
Weighted mean of post-1970 studies = 264.1 ms
Based on Table 1, assuming a normally
distributed 100 ms inaccuracy, the upper estimate falls below the post-1970
‘true millisecond resolution’ mean in four out of five cases (the exception
being the study of Forbes, 1945). The cumulative odds of this being a chance
result can easily be calculated. Let us assume a 50% chance that the
instruments would produce a mean value whose upper-bound estimate falls above
that of the post-1970’s study. The odds of four studies producing consecutive
means whose values are lower is equal to 0.5*0.5*0.5*0.5, or 6.25%. In other
words, the probability that this is a chance finding is small. If we add to
this the systematic review of Ladd and Woodsworth (1911), which found a mean
for 19th- and early 20th-century samples of 192 ms, and
whose hypothetical upper mean also falls below the weighted post-1970 mean (242
ms), the cumulative odds of this being a chance finding fall to 3.12%.
Secondly, and again assuming high inaccuracy, why are
the results of the pre-1970's studies likely to be overestimates rather than
underestimates of the true values? Let’s look at the sources of bias that
Rabbitt describes. Sticky keys might require more force to in order to register
a result. This was more likely to have been a problem in the case of earlier
studies employing cruder instruments, such as mechanical or hybrid
electro-mechanical apparatuses, rather than computer-based ones, for example.
This suggests that the bias would have been in the opposite direction for
earlier studies to that described by Rabbitt. Sticky keys would necessarily
lengthen rather than shorten reaction time estimates. Long-duration visual
signals, and also ones that are more intense and exhibit rapid rise-times
typically elicit faster (or maximal) reaction times (Kosinski, 2012). Galton’s
apparatus used a purely mechanical signal in the form of a paper disk, which
could be made to disappear via the operation of levers, thus triggering the
subject to depress a key and halt the swing of a half-second pendulum. The
signal duration was therefore indefinite – persisting until the point at which
the apparatus would be reset. It is hard to argue against the high visibility
of such a signal either, assuming a well-lit laboratory. Subsequent studies
employing the Hipp chronoscope such as Thompson (1903) and the studies
described in Ladd and Woodsworth (1911) would have employed light sources.
Thompson (1903) for example employed a Geissler tube suspended against a black
background which was reported as producing a “flash of pale purple light” that
was “thrown out sharply” (p. 8). Geissler tubes are plasma-discharge or
fluorescence-based illumination sources. Fluorescent light sources exhibit
extremely rapid rise-times compared to filament-based incandescent bulbs, for
example (Sivak, Flannagan, Sato, Traube & Aoki, 1993).
Whilst the issue of signal duration in these early
studies employing light sources as stimuli is indeed problematic, the
suboptimal tendency is towards shorter duration signals (i.e. brief flashes),
which would lengthen rather than shorten reaction time estimates. It is
long-duration visual signals that permit the recovery of accurate maximal
reaction time latencies (Kosinski, 2012). Once again, any measurement error in
these earlier instruments would tend to skew the estimates towards higher rather
than lower latencies.
What of the issue of warning signals? As Silverman
(2010, p. 41) reports, there is very little evidence that warning signals
actually make a difference to recorded reaction time latencies, especially when
the ensuing stimulus is unpredictable, as was the case in all studies employed
in our and Silverman’s analyses. It is unlikely that Galton utilized a warning
system in his single person-single trial study. Thompson (1903), however, did
use an audio warning system in her study involving multiple trials per person.
The difference in the means between the two studies is extremely small (18.7
ms), and in the opposite direction to that predicted by the theory that the
presence of a warning signal reduces
the latency of reaction time means. This strengthens Silverman’s conclusion
that employing warning signals makes little difference.
We agree with Rabbitt, and also Jensen (2011), who
both argue that method variance between studies can be a substantial problem
when it comes to comparing between different studies, especially those using
different instrumentation. However, Rabbitt seems to have missed the point of
the meta-analytic nature of our own and Silverman’s study. Indeed, the study of
Silverman (2010) set out to explicitly address the issue of method variance
using a stringent set of seven inclusion rules (p. 41) coupled with a detailed
meta-analytic search. The rules were selected on the basis that all studies
included in the comparison set should be as closely matched with respect to
Galton’s study on as many dimensions as possible. The stringency of these rules
means that method variance across studies is substantially reduced, however the
trade-off is that the number of potentially usable studies is also massively
reduced. Our meta-regression ultimately demonstrates the power of a properly
conducted meta-analysis in this regard as we found no significant role for
moderators in explaining the secular trend towards increasingly latent simple
reaction time performance. There is scatter around the regression line, but
that is exactly what meta-analytical theory predicts. All data points being on
or very close to the regression line is an extremely unlikely outcome for a
meta-analysis (see Hunter & Schmidt, 2004).
Finally, what of the issue of sound evidence for the
greater accomplishments of 19th-century Western populations relative
to contemporary ones? This is an important issue that has been addressed
quantitatively using historiometry, which is the historical study of human progress or
individual personal characteristics, using statistics to analyze references
to geniuses,
their statements, behavior and discoveries in relatively neutral texts
(Simonton, 1984). Historiometric research into innovation rates and the
lives and accomplishments of eminent individuals (geniuses) has shown that the
per capita rate (i.e. events per billion of the population per year) of
significant innovation and also geniuses in science and technology peaked in
the late 19th century, after a long period of increase. Throughout
the 20th century there was a decline (Huebner, 2005; Murray, 2003).
What is a significant innovation? It is simply one
that is conspicuously different from anything that came before – so much so
that multiple encyclopedists and compilers of inventories of innovation are
likely to independently note it. Examples include the development of the
plough, the steam engine, splitting the atom and putting a man on the moon. The
iPhone 5 is not a significant innovation in comparison with its earlier
incarnations by contrast, and is unlikely to be considered as such by
contemporary historians of science and technology. Similarly geniuses can be
rated via the degree to which these same sources reference them. The use of a
‘convergence’ criterion based on prominence across encyclopedias not only
allows us to reasonably quantify the frequencies of significant innovation and
geniuses throughout the history of civilization, but it also allows us to rank
those same innovations and individuals in terms of importance. This
historiometric technique, like many extremely useful ideas, has its origins in
the writings of Galton (1869).
In conclusion, whilst Rabbitt’s criticisms are
interesting, they are clearly insufficient grounds for rejecting the central
claims made in our paper – namely that the secular trend in increasing simple
reaction time latency is robust and translates into a decline of -1.23 IQ
points per decade or -14.1 points since Victorian times.
References
Forbes, G. (1945). The effect of certain variables on visual and
auditory reaction times. Journal of
Experimental Psychology, 35, 153–162.
Galton, F. (1869). Hereditary
genius. London, UK: Macmillan Everyman's Library.
Galton, F. (1889). An instrument for measuring reaction time. Report of the British Association for the
Advancement of Science, 59, 784–785.
Haupt, E. J. (2001). Laboratories for experimental psychology:
Gottingen’s ascendancy over Leipzig in the 1890s. In: Rieber, R. W., & Robinson, D. K. (Eds.), Wilhelm Wundt in history. The making of a scientific
psychology. (pp. 205-250). New York,
NY: Kluwer Academic.
Huebner, J. (2005). A possible declining trend for worldwide
innovation. Technological Forecasting and
Social Change, 72, 980–986.
Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis (2nd Ed.):
Correcting error and bias in research findings. Thousand Oaks, CA: Sage.
Jensen, A. R. (2011). The theory of intelligence and its measurement. Intelligence, 39, 171–177.
Kosinski, R. J. (2012). A literature review on reaction time.
http://biae.clemson.edu/bpc/bp/lab/110/reaction.htm
Ladd, G. T., & Woodworth, R. S. (1911). Physiological psychology. New York, NY: Scribner.
Murray, C. (2003). Human
accomplishment: The pursuit of excellence in the arts and sciences, 800 BC to
1950. New York, NY: Harper Collins.
Rose, H., & Rose, S. (2011). The legacies of Francis Galton. The Lancet, 377, 1397.
Simonton, D. K. (1984). Genius, creativity and leadership: Historiometric inquiries.
Cambridge, MA: Harvard University Press.
Sivak, M., Flannagan, M. J., Sato, T., Traube, E. C.,
& Aoki, M. (1993). Reaction times to neon, LED, and fast incandescent brake
lamps. The University of Michigan Transportation Research Institute, Report
Number. UMTRI-93-37.
Seashore, R. H., Starmann, R., Kendall, W. E., & Helmick, J. S.
(1941). Group factors in simple and discrimination reaction times. Journal of Experimental Psychology, 29,
346–394.
Silverman, I. W. (2010). Simple reaction time: It is not what it used
to be. The American Journal of Psychology, 123, 39–50.
Thompson, H. B. (1903). The
mental traits of sex. An experimental investigation of the normal mind in men
and women. Chicago, IL: The University of Chicago Press.
Woodley, M. A., te Nijenhuis, J., & Murphy, R. (2013). Were the
Victorians cleverer than us? The decline in general intelligence estimated from
a meta-analysis of the slowing of simple reaction time. Intelligence. Doi:10.1016/j.intell.2013.04.006
Where can I read Professor Rabbitt's article?
ReplyDeleteB.B.
http://deevybee.blogspot.co.uk/
ReplyDeleteIt should be the first post on the list, with a good photo of Galton's lab
As I note in my response to Woodley et al. (pending peer-review in PAID), -1.23 points per decade is extraordinarily high. As Herrnstein and Murray pointed out, 3 IQ points worth of decline means a 42% decline in people with 130+ IQs. Hence in the last 30 years, if Woodley et al. are right, the number of people who have IQs of over 130 (by 1980s standards) has about halved! I don't think this is possible, quite frankly.
ReplyDeleteGreat Britain experienced a huge outflow of people emigrating to the North America and Australia once steamships were running reliably. I wonder if poor but bright people were more likely to leave.
ReplyDeletePeople leave to flee a creditor, the policeman or a wife. People leave to find cheap land or less competition. People leave because cousins have already left. It takes all sorts.
Delete@Elijah Armstrong,
ReplyDeleteIn Victorian times there was an explosion of creativity, but the amount of people that went to university or studied to become an engineer was very small. Research budgets were miniscule. People worked long hours to be able to put food on the table and there was little time left for discoveries.
However, in our time most people with the IQ to go to university also do. Western countries are astonishingly rich: there are billions of dollars and euros for research. There is much more spare time to spend on your hobbies and fascinations. So, shouldn't we have dozens more per capita big inventions than the Victorians?
Hi Jan,
DeleteI don't think that Herrnstein and Murray are right that social stratification by IQ has increased tremendously. Somewhat, yes, but not tremendously. (I also suspect that in Victorian times it was easier for "outsiders" to enter science. Michael Faraday today would be dismissed. To some extent this compensated for the fact that many bright people did not go to university.)
Anyway, if you are right, since 1980 - not since the Victorian era, but since 1980 - people with IQs over 130 have about halved. People with IQs over 160 are less common by a factor of three! This is hard to believe. I guess you could say that since 1980 dysgenics has been less severe, but this is only true for Scandinavia; in the US, it's probably worse.
This comment has been removed by the author.
DeleteThe barrier to novelty is much higher now than it was during the Victorian era. The amount of material that has to be learnt in a field to be considered an expert and to actually participate on the frontier is much higher now than it was then. All the easy stuff has been discovered, and is now taught to undergraduates. After two semesters of basic physics, chemistry, and biology many undergraduates know far more than most intelligent scientists in the Victorian era.
DeleteYou know take thermodynamics, how hard was it to do the first experiments on entropy? I mean, a child could do the same experiment. What's required to do an experiment in particle physics? A PhD and a large machine worth more than most scientists will make in a lifetime.
Something like the law of diminishing returns suggests itself.
@Frau Katze [great name!],
ReplyDeleteIndeed, many British emigrated to North America, Canada, South Africa, New Zealand, and Australia. However, if you study the IQ scores of the people in the colonies you will see that their scores are very similar to the scores in the UK. This suggest that over a longer period of time the immigrants were representative of British society.
" ... a significant innovation ... is simply one ... that multiple encyclopedists and compilers of inventories of innovation are likely to independently note [it]."
ReplyDeleteSince encyclopaedists are forever pillaging each others' work the chances of "independently" occurring are slight.
Anyhow, the statement is a mere assertion that might be better phrased as "I haven't thought of a better way to measure this so I'm going to claim that this is the best, perhaps the only, way to do it." But how can you test it? If you can't the claim is unscientific.
"Similarly geniuses can be rated via the degree to which these same sources reference them." Come now. You'd probably be as well using the Einstein's Wall method. On Einstein's wall hung portraits of Newton, Faraday and Clerk Maxwell, so the four greatest physicists are those three plus Albert himself: this too is a windy assertion but at least has the advantage of avoiding the bogus scientism of the count-the-references-in-the biggest-books-in-the-library technique
Prof Rabbitt has posted a reply as a postscript to the original post. You can find it if you scroll down on this post http://bit.ly/10XBWKK
ReplyDeleteKeep in mind... Woodley may be better known for his studies of Sea Serpent Taxonomy.
ReplyDeletehttp://www.cryptomundo.com/cryptozoo-news/crypto-pinnipeds/
http://www.forteantimes.com/reviews/books/3169/in_the_wake_of_bernard_heuvelmans.html
http://publicationslist.org/M.A.Woodley
http://scholar.google.co.uk/citations?user=mmoY0-kAAAAJ&hl=en
http://www.youtube.com/watch?v=FbPb-etyiG0#t=2m18s
When he isn't working on taxonomies of Sea Serpents for Crypto-zoology; he is trying to argue that race among humans constitute subspecies with impact on intelligence, democratization and GDP.
"M A Woodley (2010) Is Homo sapiens polytypic? Human taxonomic diversity and its implications Medical Hypotheses 74: 1. 195-201
Abstract: The term race is a traditional synonym for subspecies, however it is frequently asserted that Homo sapiens is monotypic and that what are termed races are nothing more than biological illusions. In this manuscript a case is made for the hypothesis that H.sapiens is polytypic, and in this way is no different from other species exhibiting similar levels of genetic and morphological diversity. First it is demonstrated that the four major definitions of race/subspecies can be shown to be synonymous within the context of the framework of race as a correlation structure of traits. Next the issue of taxonomic classification is considered where it is demonstrated that H.sapiens possesses high levels morphological diversity, genetic heterozygosity and differentiation (FST) compared to many species that are acknowledged to be polytypic with respect to subspecies. Racial variation is then evaluated in light of the phylogenetic species concept, where it is suggested that the least inclusive monophyletic units exist below the level of species within H.sapiens indicating the existence of a number of potential human phylogenetic species; and the biological species concept, where it is determined that racial variation is too small to represent differentiation at the level of biological species. Finally the implications of this are discussed in the context of anthropology where an accurate picture of the sequence and timing of events during the evolution of human taxa are required for a complete picture of human evolution, and medicine, where a greater appreciation of the role played by human taxonomic differences in disease susceptibility and treatment responsiveness will save lives in the future."
Like Sea Serpents.
Delete