Neuroskeptic asks whether there is an ordering of hyperbolic statements, and Matthew Hankins notes the popularity of “paradigm shift”.
Is the hyperbolic inversely related to fatuity?
Peer review now has sacrosanct status. It is seen as equivalent to quality control in licenced medicine: a guarantee that the product will do you no harm, and that it may very probably do you a lot of good. It is sold as the gold standard, separating the precious metal from the dross, ensuring that everything which goes through the review process is of the highest standard.
This perspective is beloved of academic publishers. whose authors write for nothing (indeed, they are indentured labourers in academia) and whose reviewers review for nothing, and then the publications are sold for extortionate sums. $35 for one academic paper? You could buy a meal, a newspaper, a magazine, a romantic novel and still have change for several coffees.
Worse, anonymous reviewers can exercise power without responsibility: “the prerogative of the harlot throughout the ages”. They can bitch, spit, claw and slash, till the original work is in tatters. The supplicant, seeking promotion or mere survival, concedes all, and puts his name to a paper the reviewers have written for him, making him say things he does not believe, and commonly, cannot stomach. As the published papers accumulate he advances up the academic ladder, and looks forward to getting his revenge, either on his reviewers if he has found out who they are, or on his worst rivals, the bright young things snapping at his heels. The cycle of disparagement and suppression of contrary imaginations continues.
It is not all bad news. Some papers are rightly rejected; many are improved; some reviewers are kind-hearted, encouraging, helpful; it is even possible that some of the standard expressions of authorial gratitude to nameless reviewers are heartfelt. Anonymous review encourages honesty as well as spite. Sharp criticism may lead to great scholarly effort. It may also lead to some authors taking up farming, to the great benefit of academia, if not always to farming.
However, there is a quicker way to do all this. The authors could circulate their paper to friends, and incorporate some of their suggestions. They they could post it up on an open access website and invite reviews, thus getting several different public perspectives. It would be a more open and complete procedure. It would also be much faster. It would still be peer review, but with accountability and with far better metrics. The reviewers would be able to build up a profile: fair minded/usually fair minded/harsh/poisonous. Reviews could be counted, assessed for quality as above, and counted towards academic output in an open way. The way authors struggle to deal with criticisms could also be seen in an open way. Above all, no author could ever complain that one of their ideas was strangled at birth because of the psychopathy of a few anonymous critics.
This posting was not peer reviewed. Would you like to do so now?
Yesterday afternoon, to the Old Refectory at UCL to attend a celebration of the work of Prof Graham Scambler, a long time colleague and friend. Five lectures on social theory and health, with an audience composed of fellow sociologists, former students, and his family and grandchildren.
The pleasures of academia come from discovery and influence: finding out new things; hearing from a student that they were inspired by a lecture or book; noting that a paper written long ago still has some impact; or that a new journal has finally established an academic niche. In academia such feedback is often much delayed, partly by publication processes which may run to a year and partly by the slow rate at which new publications find their way into student textbooks. At this particular celebration the former students who gave talks had achieved professorial rank, but still remembered their origins and their path to increased understanding. Although several joked about his vast library, only one speaker mentioned that Graham was primarily an intellectual, but the dread word passed quickly without causing embarrassment to English sensibilities. In a sense all his students had been drawn in to his ambit by a single finding from his PhD thesis, which was that the social and personal impact of being diagnosed as epileptic was often greater than the medical severity of the condition.
Reflecting on the talks, Graham noted that he had somewhat marred the event by not yet being dead. He spoke about a central dilemma of sociology, which is the tension between investigating social forces and trying to change them. Psychology does not lack applied practitioners, but sociology is awkwardly poised between those who advise governments (at least one speaker was involved in health policy and grant allocation) and those, like Graham, who between publications want to man the barricades.
As a coda, as the speakers gave their accounts, 40 years of academic life flashed by: the realisation in the early 70’s that traditional medical education had many shortcomings from the patient point of view, leading eventually to the sudden recruiting of sociologists and psychologists to try to make a difference; all this sullenly accepted by medical schools who doubted that the experiment would work, and resented the reductions in their teaching hours. The students were almost uniformly male, and thought of medicine as a refined form of rugby. To defend themselves, the new entrants wrote text books and set exam questions based on them. Our group at the Middlesex Hospital Medical School (two psychologists, two sociologists) took the unusual step of collaborating on a book “The Experience of Illness” (1984) which brought together psychological and sociological perspective (to give you a flavour: “The interview is the one thing which distinguishes medicine from veterinary surgery”) and which in turn launched a dozen monographs. We won the initial battles, recruited students to our Intercalated BSc courses, marked exams, started research. And now, as the decades have passed, behavioural sciences courses have got shorter, the lecturers fewer, exam time far shorter and now vacant posts are often not replaced. The wave has passed.
Then drinks in the Haldane room and the inevitable halting ramble through city streets of a small gaggle of academics trying to find their way to an Indian restaurant which was just round the corner, somewhere. In all, a very English celebration: low key, friendly and irreverent, and no evasion of differences amidst wry, amused reflection.
In the words of the 1968 Mary Hopkin song: “Those were the days, my friend, we thought they’d never end”, the days of hope and very earnest lectures which were going to change the face of medicine.
Digit Span must be one of the simplest tests ever devised. The examiner says a short string of digits at the rate of one digit a second in a monotone voice, and then the examinee repeats them. The examiner then tries a string which is one digit longer, and continues in this fashion with longer and longer strings of digits until the examinee fails both trials at that particular length. That determines the number of digits forwards.
Then the examiner explains that he will say a string of digits and the examinee has to repeat them backwards, that is, in reverse order. For example, 3 – 7 is to be said back to the examiner as 7 –3. This continues until the examinee fails two trials at a particular length which determines the number of digits backwards.
I hope you will agree that this is a simple test, easy to understand, and largely bereft of any intellectual content. All you need is: to know the names of single digits, and to understand the simple instructions and examples given so that you repeat the digits forwards, and in the later version of the test, backwards. In particular, if you can do digits forwards you reveal you know your digits and have some memory, and if you can do a short string backwards you reveal that you have some memory and you understand the idea of repeating digits backwards.
The test is not only bereft of intellectual content, but is also low on cultural content. Once you have learnt digit names you are ready to do the test. I assume that forwards and backwards are concepts understood by all cultures worthy of the name.
Initially, test constructors regarded the test as an optional extra, because test-retest reliabilities were low. Arthur Jensen pointed out that this was simply because not enough trials were used. Once extra trials are provided, Digit Span becomes a good measure of general intelligence, correlating with g at 0.71. Of course, Wechsler being Wechsler, they have also included some new tasks in Digit Span, in which digits are read to the examinee and have to be remembered back in order of magnitude, but we can leave that out for the time being, since it does not affect the central comparison between digits forwards and backwards.
How does digit backwards have this profound effect? Short term memory is just an auditory store. Most of the intellectual demand comes from digits backwards. That simple little task of remembering the forward sequence, and then keeping it in mind while reading off the sequence in reverse order taxes the mind. Digit backwards spans are usually at least a digit shorter than digits forwards. If someone can remember 7 digits forward (the average adult score) but only 6 backwards (the average adult core), that is a 14% reduction in memory capacity. (At age 11 for white kids the reduction is 23% and for black kids 30%, as shown below). Digits forwards are related to g, but digits backwards are even more loaded on g.
How does this finding relate to the vexed question of group differences? Well, it is hard to give a plausible cultural explanation for the effect, unless you stretch the concept of culture to absurd lengths. Could there really be a culture in which there are numbers but no reversible operations? Even if there were a culture or putative sub-culture in which using numbers was discouraged, it should affect all digit tasks, not just digits backwards. (What name would one give to a culture in which number use is discouraged?)
If any group defined in genetic or cultural terms has a particular difficulty with digits backwards this is a strong indicator that they have difficulty with tasks as they get more intellectually demanding. The higher the g loading the more they should differ from brighter groups.
Hence the great interest in the most recent scores, to see if they conform to the usual pattern described by Jensen in the G factor (p. 405, referring to work he did in 1975 with Figueroa, ref on p 614). Over at Human Varieties, Dalliard has tried to replicate those results using data from CNLSY (these are the children of the female participants in NLSY79). Incidentally, this is a great follow-up survey: “My Mummy did your tests before I was born”. Gradually we are getting to understand the transmission of intelligence through the generations.
The chart shows the increase in digit span with increasing age, and the nature of the gap between digits forwards and backwards in the different groups. This is clearer in the second table, which shows the gaps as Cohen’s d
Incidentally, the fact that Hispanics have a slightly lower digit forwards score than whites and blacks but reasonable digits backwards slightly reduces their gap between the two conditions.
Dalliard says: “That the black-white gap on forward digits is substantially smaller than on backwards digits is a robust finding confirmed in this new analysis. This poses a challenge to the argument that racial differences in exposure to the kinds of information that are needed in cognitive tests cause the black-white test score gap. The informational demands of the digit span tests are minimal, as only the knowledge of numbers from 1 to 9 is required. Forward digits is a simple memory test assessing the ability to store information and immediately recall it. The informational demands of backwards digits are the same as those of forward digits, but the requirement that the digits be repeated in the reverse order means that it is not simply a memory test but one that also requires mental transformation or manipulation of the information presented.”
It is good to have a replication of a well-established and informative finding. However, Dalliard has pushed the analysis further, with a factorial study which suggests that black kids have a slight short term memory advantage which is enough to overcome the g demands of digits forwards, but not enough to cope with the higher g demands of digits backwards. This is a new finding which could lead to further studies.
Read the whole thing here http://humanvarieties.org/2013/12/21/racial-differences-on-digit-span-tests/
Finally, the really engaging feature of digit span from a psychometric point of view is that it is a true scale with a true zero. If you cannot remember any digits, your score is zero and that corresponds to zero digits. If you can remember 4 or 5 or 6 or 7 digits those are real scores, and the intervals between them are identical. So, for purists, this is an interval scale with a true zero like the Kelvin scale, where 0 Kelvin is absolute zero. Nothing is colder than that. Age in years is also a true scale.
At this point, it would be normal to explain what psychologist S S Stevens called it in his 1946 proposed typology in Science. Why on earth should I do that? You already understand the notion of a true scale with a true zero, where the intervals are truly each as big as each other. What more do you need to know? If someone says that IQ isn’t a real measure because “a quotient is all relative” please tell them a thing or two about digit span.
Ratio. I didn’t want you to waste time looking it up.
Readers will know that I sometimes toy with the idea of writing a “Boost your IQ” book, which will also have an associated training course, expensive test materials, and very possibly lengthy seminars in international beach resorts. Trouble is, one would have to write the damn thing.
Then, whilst going through Linda Gottfredson’s website on another matter, I remembered she had written a very good “Instant Expert” piece in 2011 for the New Scientist, which covers all the main findings: the different types of intelligence; what intelligence tests measure; what is intelligence; quantifying intelligence; age effects; brain localisation; what makes someone smart; nature and nurture; realising your assets; simplifying your world; boosting brainpower (YES); cognitive enhancement; and are we getting smarter.
This publication is guaranteed to boost your intelligence, so long as you accept that increasing your knowledge might count as boosting your crystallized intelligence. What is more, it is freely available on her website. I understand that, on the basis of effort justification, you would like to pay a large sum of money and do N-back training for 20 hours, but why not take the intelligent short cut, and spend 20 minutes reading it, and then send it to a colleague who claims not to understand the concept?
Much shorter version:
Say, through the mischance of illness, birth injury or genetic disorder, 1% of all children are born with something which damages body and mind, such that they look odd in some way, and fall below IQ 70.
Say, through the normal variation of genetic inheritance some children with normal undamaged bodies and minds also fall below IQ 70.
Then in the case of white kids the proportions will be 1% funny looking and 2% normally backward, so 1 in 3 looking odd.
In the case of black kids the proportions will be 1% funny looking and 16% normally backward, so 1 in 16 looking odd.
Black kids below IQ 70 will usually be normal looking, white kids less frequently so.
Arthur Jensen was always a paid up member of the select club of psychologists who actually give intelligence tests, as well as writing about intelligence. He was an educational psychologist who believed that every child could learn, and wanted them to have a cafeteria of learning choices, not an inflexible school set meal. He was fascinated by human intelligence, and bewildered by the vituperation of those who were not, and who shouted down his observations. He was thorough in his work, honest in giving his opinions, and steadfast in letting the results have pride of place. He was also very bright. Like many bright people, he assumed that you were probably bright, which was kind of him.
His first observation was admit that when he tested some children he came out of the assessment session convinced that they were bright. Then he would add up the scores and find that their results were pretty modest. Intrigued by his mistake, he worked out that they had been socially skilled, had presented themselves well, and had fooled him with their charm. Typically, he went on to say that he did not doubt that they would do well in life, because aspects of character other than intelligence can have an influence on success. Equally, he did not cover up his mistake in estimating intelligence, but reported it with interest. He was certainly not a person to boast “I can tell a person’s IQ at a glance”. Early on he noted that presentation during testing was not always an accurate guide to actual mental ability.
Then he took his observations further. He noted that when he tested Black kids of say, IQ 70 they came across as normal and their behaviour in the playground appeared normal. That is, they related pretty well, had the sorts of interests that other children had, and seemed to be street wise. Seemed to be. They weren’t always able to explain the rules and scoring systems of the games they played, so it depended on how closely you examined their understanding. White kids at IQ 70 were often slightly odd. They were sometimes funny looking in their appearance, and more difficult to make a relationship with. They were often somewhat naive.
One explanation for the difference is that intelligence tests do not accurately measure black children’s intelligence. Jensen wrote a book on this topic in 1980 “Bias in Mental Testing” so you can look up the data and his argument in that text. In a nutshell, he found that the tests did not underestimate black intelligence. In fact, they very slightly over-estimated it. He was also doubtful that there were such separate things as black intelligence or white intelligence. In his view there was human intelligence, and the results showed that both people and groups differed in how much they had. (That is something of a simplification, because he also showed there were some differences, not least in the distribution of full scale scores, with black respondents having a more slender distribution, white respondents a more broad and “normal” distribution).
What other explanations are possible, other than test bias? Jensen argued that black children of IQ 70 were normal. It was not an illusion. Assuming a black IQ mean of 85, IQ 70 is but one standard deviation below the black mean. Nothing special, and not specially bad from that population point of view. A full 16% of black kids in the US are below one standard deviation for the black population. They were normal black kids. In fact, in terms of normality alone, though not in terms of ability, they were rather like white kids of IQ 85 who are one standard deviation below the white IQ mean of 100. Nothing special. Normal white kids. A full 16% of white kids are below one standard deviation of the white intelligence mean. Only about 2% of white kids are below two standard deviations, IQ 70, whereas 16% of black kids are (or were, see below).
In summary, IQ 70 is minus 2 sigma for whites but only minus 1 sigma for blacks. Being below IQ 70 is rare for white kids (2%) and pretty common for black kids (16%). Jensen pointed out that there were two routes to mental retardation: 1) simply being at the lower part of the intelligence distribution; and 2) having something wrong with your brain. So, some white kids are retarded because of some injury or illness or genetic disorder, which also makes them “funny looking”, plus some are just naturally dull. A larger proportion of black kids are naturally dull, and some (proportionately fewer) have had an injury or illness or genetic disorder.
Hence, the difference in apparent normality is real, and is explained by a careful understanding of normal variation in each population.
Can we test this explanation? Apart from checking all the facts (which appear to be correct) we have a new development in the last decade or two. It seems that average black IQ in the US is now about 90. If that is so, fewer black children will be 2 standard deviations below the mean. IQ 70 will be 20 points below the mean for them, rather than 15 points below the mean, so probably only 9% of black children will fall below that particular cut-off point.
As a consequence the proportion of funny looking black kids in the under IQ70 range should have gone up a bit in the last two decades, not because black kids are getting genetic disorders, but because there will be somewhat fewer normal looking backward black children. There will still be proportionately more funny looking white kids than black kids below IQ 70, but the different rates will not be so striking as before.
By the way, people with IQ 70 can do lots of things. Humans are spectacularly intelligent even at 2 sigma below the Greenwich Population Mean. A great deal can be achieved, even in a group who, compared to everyone else, are considered to be at high risk.
Finally, the observation that a child can have difficulties either because they are naturally dull or because they have experienced some adventitious insult to their otherwise normal abilities, is not new. Here is a researcher estimating the number of backward children in a population and expressing himself in forthright language:
“We have seen  that there are 400 idiots and imbeciles, to every million of persons living in this country; but that 30 per cent of their number, appear to-be light cases, to whom the name of idiot is inappropriate. There will remain 280 true idiots and imbeciles, to every million of our population.  No doubt a certain proportion of them are idiotic owing to some fortuitous cause, which may interfere with the working of a naturally good brain, much as a bit of dirt may cause a first-rate
chronometer to keep worse time than an ordinary watch. But I presume,
from the usual smallness of head and absence of disease among these persons, that the proportion of accidental idiots cannot be very large”.
Who will be the first to provide the name of the author, the title of the work, and the page number?
Can I just check a few issues about the blog?
Does the “Follow by email” function work? I assumed it would be helpful, but as far as I can see from the summary statistics, only 22 people are registered. If you have tried to register and encountered a problem, I will try to fix it.
Does the search function work? It had a phase a month or two back when there seemed to be a problem, but it seems have fixed itself.
Last, can you find the topics you are looking for? I don’t particularly want to move to another platform, but if topic search is difficult, I might have to consider it.
And, of course, any other feedback you might like to give, particularly about topics which need comment.
Testing intelligence used to be a simple business. The patients used their wits, and the psychologists used their instruction manuals. Some instruction and practice was required, because psychologists had to learn the instructions to be given for each test at every stage, including the prompts; learn how to record the answers and also time them with an old mechanical stopwatch; do all this when the material in front of you on the patient’s side was upside down and left right inverted; record any response which was out of the ordinary, and keep the patient cheerful and engaged throughout. To help you, the presentation booklets had discreet little numbers for you to read, if only to check that you were presenting the right problem. There were also recording forms to jot down the results, and prompts about how many failures patients were allowed before you moved briskly to the next subtest.
Block design, object assembly, picture completion and picture arrangement all required some kit, which had a tendency to get battered or lost. Coding required a form on the back of the test record booklet, and a cardboard overlay to mark the results quickly.
A mechanical stopwatch, I should explain, was a large, heavy, metallic chronometer which never ran out of batteries, and was easy to use. Multiple lap time analysis was not an option, nor were nano-seconds, so error rates in recording times were low. More sophisticated testers were provided with a chronometer wrist watch, so that timing could be done discretely, without the person noticing it and getting too anxious. I was taken on special journey by my boss to a specialist watch shop in the City of London in order to get the numbered chronometer placed on my wrist. It was a Moeris Antimagnetic, Swiss made watch, and it still works well.
A psychologist of modest intellect could be trained to use all these materials in a matter of weeks, and then they were tested on a patient or two by a senior psychologist, after which they were considered competent to begin their testing careers.
In the old days of testing, psychologists tested lots of people, so they started taking short cuts. They boiled the instructions down to the sensible minimum, having found out that the basic short form generally make more sense than the elaborate longer one. Then they started cutting out tests, on the “bang for your buck” basis. Bluntly, how long does it take to get a result out of each subtest? Some are easy to give and score. Others require setting out lots of material, gathering it back again, and require you to work through complicated scoring systems. Those tests tended to be left in the bottom drawer of the desk. Psychologists may be dull but they are not stupid.
Eventually researchers worked out statistically derived short forms in which 4 key subtests gave roughly the same result as the full 10. Roughly. Any psychologist who was in a hurry plumped for those. Of course, the error term was larger, but pragmatism ruled. As a consequence, a very large number of intelligence test results are not done properly in that they are not based on the full test. It is hardly surprising later that scores on re-testing may differ, particularly when psychologists pick and choose which tests to include out of the 10, according to their own interests and theories. Short form testing also increase the apparent variability of the results, leading some gullible psychologists into thinking that it was wrong to calculate an overall result, when in fact that overall result had higher validity. Nobody gets round sampling errors, not even the Spanish Inquisition.
When new tests came on the market they usually provided extremely interesting approaches with extremely complicated and bulky equipment. Take Digit Span, for example, which tests short term memory. This now comes in a more complicated form, but might be useful. Then, in the Wechsler memory tests, someone decided to have a sequence tapping test of “spatial memory”. You were required to set out the provided array of blue blocks welded onto a plastic tray, and then tap out a sequence of moves with your finger, which the patient had to copy. No problem when one or two moves were required. However, when the tapping sequence was 7 different positions long, it was difficult to be sure one had tapped out the sequence correctly, and then baffling when the patient tapped out the sequence back again so quickly that you could not be sure you had recorded it correctly. That test has been quietly dropped. One cannot have the punters showing up the psychologists.
However, the search for the quick test that gives a valid result continues. The task is not a trivial one. Here are the g loadings of the Wechsler Adult Intelligence Scale subtests, simply as a guideline on the competitive psychometric landscape which confronts any developer of a new intelligence subtest. These are taken from Table 2.12 of the WAIS-IV manual.
Vocabulary .78 Similarities .76 Comprehension .75 Arithmetic .75 Information .73 Figure Weights .73 Digit Span .71 are all good measures of g.
Block Design .70 Matrix Reasoning .70 Visual Puzzles .67 Letter-number sequencing .67 Coding .62 Picture Completion .59 Symbol Search .58 are all fair measures of g.
Cancellation .44 is a poor measure of g but has remains one of the optional subtests.
Each of the subtests, particularly the top 7 are good measures of g, and none of them take more than 10 minutes each, and most of them less. They provide plenty of psychometric bang for your testing-time buck. With a bit of practice in memorising the scoring criteria, you can almost mark up the vocabulary score as you go along.
So, here is the ultimate intelligence test item for intelligence testers. Can you think of a task which is quicker and easier than the best Wechsler subtests, but has higher predictive utility?
While thinking about that, would you like to take a non-Wechsler vocabulary test, just for private amusement and to provide a quick general intelligence measure that you can keep to yourself?
Test construction used to be a sober business. Every ten or fifteen years a new version of an established test would come out, and test-giving psychologists bought the new version. They often complained about it, on the grounds of cost, and on the grounds of having to learn new material when they had grown familiar with the old version, and knew all its characteristics intimately. They also noticed the occasional improvement.
The benefits of sticking to what you know were explained to me by an anaesthetist years ago. He told me that out of the many anaesthetics on the market he restricted himself to the best three, and mostly used just one. In this way he learnt everything he could about how it worked and how his patients reacted to it. By paying great attention to the patient’s medical history, and also the family history of allergies, he learned how to anaesthetise his patients. In special cases he would sometimes use the other two anaesthetics, singly or in combination. In that way he got good results, and fewer nasty surprises. For that reason, I would rather have a skilled surgeon with a somewhat blunt knife, than expose myself to a very sharp knife in the hands of a dull surgeon.
Even the old Stanford Binet which bundled together verbal and non-verbal items, and which the Wechsler tests rendered obsolete could, in the hands of a skilled practitioner, give you the child’s intellectual level very quickly, and pretty accurately. Testers knew which items to skip. They were the pioneers of dynamic testing, now the domain of computer administered tests.
Of course, new tests kept coming out, and they often caused some excitement. There were tests which tested things which had never been tested before. Tests which tested old things in new ways. Tests which tested new things in a new way and displayed the results in new ways. (The latter were particularly popular). Also, many, many tests which were not tests. They were not tests (tests of) learning “styles”, creativity, and sundry other abilities, but they were not intelligence tests. Or so the description of the test asserted.
Test publishers quickly realised that they made profits every time they published a new test (particularly if you had to go on a training course to give the test). Once that new test became even slightly popular, new test enthusiasts could silence any critic by saying “Have you done the training?”. In that way anyone with critical faculties was either rendered mute, or had to pay for both test and training, after which some conceded that one of the subtest items was passable. Of course, you may rightly wonder what sort of psychologist needs to be trained to give a test, when every single examination of a person’s mental capabilities has to follow the same inevitable steps: item construction, piloting, item selection, test explanation, scoring systems, standardisation, item analysis, construction of scaled scores, construction of group ability scores, construction of overall ability scores, and so on. By all means give psychologists a very good grounding in psychometrics, even if it takes six months to a year, but if you have to have a training course for each test, then both the test and the psychologist are in difficulties.
Naturally, most of these tests fell by the wayside. There is a positive manifold in human skills, so whatever the label on the test, the test-taker’s brain has to solve the items, and general intelligence comes into play. Once you have a measure of general intelligence and, say, two group factors, you get diminishing returns when you push further into special skills, and you also tend to pick up more error variance.
Now a few words about the grand-daddy of individually administered “clinical” intelligence tests, the Wechsler tests. They are the gold standard for intelligence testing. First of all, they are pretty good, which is why they got their dominant position. The material was sensibly selected and well organised. It provided a full scale IQ based on 10 subtests and two well known and well understood sub-scales, Verbal and Performance, each composed of 5 each of the subtests. Sales were brisk, and most clinicians could call on roughly 40 years of data on a common basis, with only minor up-dating each re-standardisation. For once psychology was moving towards replicability of results, and you could even begin to do comparisons across generations, all of whom had done roughly the same test, and whose full scale IQs were directly comparable.
So, Wechsler decided to mess everything up. They broke the 10 subtests in 4 subscales, composed of 3 or 2 subtests each. You do not have to know much about sampling theory to realise that the error term for each will be wider than for a 5 subscale. Two subtests do not a factor make. The search for apparent detail in factors came at a cost in terms of accuracy. When you make allowance for the reality that many psychologists do not give the full test, you have a prescription for psychologists writing long reports about many subscale results, with fragile support for their interpretations. There are now 4 subscales about which one can speculate, where formerly there were 2. Good for business. Your chance of finding a special strength (my client’s genius has been under-estimated) or a special weakness (my client’s genius has been damaged by your negligence) has been doubled at a stroke. Good for business.
Then, add in a few other tests which are g loaded, but not as g loaded as intelligence tests. For example, tests of memory. Even here, Wechsler has designed memory tests with known correlations with intelligence, so you can calculate how big a discrepancy has to be before one can argue that a client’s memory is poorer than their intelligence would predict. There is wiggle room, but not much. However, there are other tests of memory, so those can be used in addition with more chance of finding apparent discrepancies.
Then add in several other tests with variable g loadings. Tests of executive function are the most popular. With each additional test your chance of finding a significant discrepancy rises. By giving the percentile rank for each test result you can convince almost every reader of the report that the person’s abilities are highly variable. You can then use this variability (based on improper sampling) to argue that calculating a full scale IQ would be “meaningless”. Like adolescent poets, such clinicians adore meaninglessness. Of course, no attainment result is meaningless. Each contributes information about the person’s abilities, and also contains an error term. The trick is to maximise the former and decrease the latter. Pooling the result of several well-sampled tests helps achieve that.
Now, the Wechsler team are not responsible for the plethora of special tests, but their foray into “factors” based on 3 or 2 subtests was not a good precedent. It has led to a confusion among some clinical psychologists about the factorial structure of intelligence. Wechsler must have gambled that producing a large number of factor scores on the basis of a small number of subtests was what the market wanted, and they relied on the professionalism of testers to give the full 10 tests, and then give precedence to the best founded score, which is Full Scale intelligence.
The current situation is like inviting every Olympic athlete to compete in the decathlon, but then allowing them to drop some of the events and to ask for prizes on the basis of a quasi-random selection of their best 2 or 3 events. The decathlon is what the Wechsler test required: 10 core tests for a full result. We should return to that simple standard if, like trying to find the best all-round athletes, we want to find the best all-round minds.