Psychological comments: No sex differences in Romania

Friday, 28 October 2016

No sex differences in Romania

I am slowly learning the perverse art of headline writing, but retain an inherent allegiance to telling the truth: I am sure that there are the usual sex differences in Romanian men and women, as indicated in the traditional costumes above, but apparently no consistent differences in intelligence. A null result is as important as a positive result, so this finding must enter the mix for us to ponder about. Does it show something specific about one country, or something general about our methods, or both?

Dragos Iliescu, Alexandra Ilie, Dan Ispas, Anca Dobrean, Aurel Ion Clinciu. Sex differences in intelligence: A multi-measure approach using nationally representative samples from Romania. Intelligence Volume 58, September–October 2016, Pages 54–6

http://dx.doi.org.libproxy.ucl.ac.uk/10.1016 /j.intell.2016.06.007

https://drive.google.com/file/d/0B3c4TxciNeJZcENSTl9tRTZIc1k/view?usp=sharing

Interestingly, the intelligence tests standardised in Romania cover the full range: almost as if no intellectual measure had been left out. Whatever the finding, one cannot easily quibble that another test would have shown a different result.

However, the Lynn hypothesis is that boys are late to mature, so it is only at adult ages that male advantage shows itself. The SON test goes up to 8 years, so is not relevant. The WISC-IV goes up to 17 years so is partially relevant. The Raven test covers the full age range, so is relevant:

Sample sizes are small, which reduces the chance of “significance” but out of 13 age bands 10 show male advantage to some small degree. Advantage Lynn.

For the 12 adult groups on the MAB-II the story collapses. Overall IQ favours men for 10 out of 12, but only one is significant, the rest tiny. Performance IQ shows male advantage for 10 out of 12, but most are infinitesimal, so forgettable.

For GAMA there are 14 adult age groups, of which 11 show male advantage, but mostly tiny ones, only 3 being significant.

For IST there are 10 adult age groups, of which 2 show male advantage, and only the female advantage is significant.

Looking at the individual test results as a whole the picture is, as the authors imply, unconvincing on the male advantage hypothesis, even among those tests that cover adults.

However, almost all these tests do not report the raw scores, which is a considerable problem in ability testing. Why not? Well, many intelligence tests have idiosyncratic scoring systems according to the material used, number of items, additions for quick completion, reductions for partial errors, and so on. So the real raw scores are changed into scaled scores, and those scaled scores may be drawn from different tables according to age. There is some scope for blurring reality. It should not affect sex differences, but the change from raw to scaled scores is not something easy to track down. This certainly has an impact on Flynn effect calculations. Looking at the raw scores on coding tasks or digits forwards and backwards for each age (where the raw score is a real ratio scale) would be very interesting, which should knock on the head any residual doubts.

If you inspect the torrent of individual results in the paper, there is little evidence of any consistent pattern of sex differences. The sample sizes for each age band are respectable though not large, so it was with some relief that I turned to their overall meta-analysis of the results in Table 7, though that table is a little hard to read. A positive Cohen’s d score reveals a male advantage. The Q score is the Chi-square test result, with the degrees of freedom in brackets. The L squared test gives the chi-square results corrected for degrees of freedom and calculates the percentage of the variability in effect estimates that is due to heterogeneity rather than sampling error.

However, to test Lynn’s hypothesis we should have a Table 8 which restricts itself to the 17 year+ adults, up the whole age range. This would be interesting.

The authors say: The only two scores with a significant (though small) effect are the Raven (d = 0.11, p < 0.01), and the Performance subscore of the SON-R (d = 0.12, p < 0.01), both in favor of males. In the case of the SON-R, medium heterogeneity is signalled by the data: Q(5) = 10.01, p < 0.10, I² = 50.04, I.e. 50% of the total variability in this set of effect sizes are due to between-subsamples variability (true heterogeneity). In the case of the Raven scores, heterogeneity is not present: Q(22) = 21.34, ns., I² = 0.00; I.e. all variability in effect size estimates is due to sampling error within subsamples.

Of course, as Richard Lynn found out, the Wechsler may have been fiddled with a bit to brush away some sex differences, but I doubt that can have been the case for all the other measures, particularly the Raven, designed long ago.

The authors do not bother to remark on something which caught my eye: the Wechsler Intelligence Scale for Children shows a lot of heterogeneity on Full Scale IQ, Verbal IQ and Perceptual Reasoning IQ. The Multidimensional Aptitude Battery and Intelligence Structure tests also show a fair amount of heterogeneity, compared with none for the Raven test. Of course, Richard Lynn might argue that the children’s scale does not prove anything, but that the adult form (not used here) would do so.

The authors conclude: The random and non-replicable pattern of differences observed in the current research seems to support the conclusion that any sex mean or variance differences are likely spurious and the result of sampling or measurement errors than substantive and stable effects. This conclusion is supported for both general intelligence and second-level (more specific) abilities (e.g. performance vs. reasoning, verbal vs. performance, fluid vs. crystallized).

Cautiously, they admit: The current study has a number of limitations. First, even though all the 6 samples on which we report data are carefully selected nationally representative samples, they are not comparable in volume to some of the samples on which data was reported in other studies, such as Deary et al. (2003), or Lohman and Lakin (2009). Therefore, while they make an important contribution for an understudied culture, they may only have a limited impact on the international state of knowledge. Second, some of the tests used in the current research were developed to be as sex neutral as possible. At least for the WISC-IV and SON-R, item bias was examined both by trained judges and through item analysis, and the GAMA and MAB-II were developed with the clear objective of minimizing adverse impact by gender. This may have affected the results and contributed to our null effect conclusion.

My comment: “sex neutral” sound impeccable, but the general drift of test construction is towards sex difference suppression.

Their final word: Research on group differences in intelligence is a politically charged topic with important societal consequences. Therefore, we strongly encourage researchers examining group differences in intelligence to pay close attention to the quality of the samples used and make efforts for increasing their representativeness.

In fact, I think the authors have done very well. They have set out results from many intelligence tests, not just one, on a good national sample. No, it is not the whole nation, as with the Scottish data. No, there was not a meta-analysis of the adult data separately (though it probably would not come up with much), but overall it certainly gives pause to the acceptance of the sex difference findings in other work.

Is it all down Romania, and some special sex-difference-annulling culture, as so sedulously sought by some people? Has Romania achieved what the Nordics strived for but could not attain? Although I believe in exceptional countries, as an outside observer I cannot find anything in Romania’s long and rich history which leads me to believe that sex differences were deliberately diminished. However, Romanian readers are invited to send me further and better particulars.

19 comments:

akarlin28 October 2016 at 22:37
There's a big problem with this study which I pointed out when Scott Alexander blogged about it:

How exactly were the samples obtained? The great thing about school based tests is that it typically includes the whole spectrum of abilities. Getting busy successful adults (>IQ) and lumpenprole dregs (<IQ) to sit the tests is harder.

"The normative sample was selected in such a way as to maximize representativeness on age, sex, urban vs. rural residence and geographic region, from a sample of 4417 participants, which were tested in-home and in-school by trained operators."

So yes, this sounds “problematic.” You also need representativeness on income, occupational prestige, etc.
ReplyDelete
Replies
Anonymous29 October 2016 at 08:37
It's not that there are no differences (can't prove the null), its that there is no evidence that there are differences. Unless they did Bayesian stats or equivalence testing.
ReplyDelete
Replies
Santoculto29 October 2016 at 13:38
Or no have avg differences but this study seems don't analysed outliers, OR NOT...

yup, i hate read...

OR romanians have less mathematical-verbal tilts.
ReplyDelete
Replies
dearieme29 October 2016 at 15:46
I wonder whether any county records have survived from the days of the eleven-plus. They would have names, sex, date of birth, and (I presume) results from the IQ and attainment tests. You'd now have a roughly fifty year follow-up of how those people did in life. Or will the records have been destroyed long since?

They would have two advantages over the Scottish snap-shot; the records might cover more than a decade of results, and would presumably cover not only Scotland but also E & W and NI.
ReplyDelete
Replies
Santoculto30 October 2016 at 11:29
Romanian brain drain may have been some impact*
ReplyDelete
Replies
Anonymous30 October 2016 at 14:14
a sneaky method: one can reduce the chance of finding differences by removing outliers (thus reducing variability: less variability to partition into explained/unexplained = less chance of finding something that's really there). also, perhaps most likely to find real differences in 3D/spatial tasks rather than 2D nonverbal tasks (a la Raven), yet 3D/spatial tasks are rarely measured (e.g., there's no 3D on the current wechsler:)
ReplyDelete
Replies
Unknown30 October 2016 at 17:40
Don't think that any outliers would have been removed.
ReplyDelete
Replies
Santoculto30 October 2016 at 21:26
I do not think they have removed the outliers, they just looked at the overall averages, and the result were these minor differences between the sexes. They did not remove the outliers groups, they just do not analyze them separately.

For example, to analyze the overall averages of the sexes in the United States without regard to the outliers. It seems to me that the differences are already visible, and become significant between the outliers.
ReplyDelete
Replies
dearieme31 October 2016 at 16:34
O/T: found in a comment on the Greg C site.
http://emilkirkegaard.dk/en/wp-content/uploads/From-Terman-to-Today-A-Century-of-Findings-on-Intellectual-Precocity.pdf
ReplyDelete
Replies
dearieme31 October 2016 at 16:36
Come to think of it, there might have been something equivalent, if for rather older children, for youngsters applying to gymnasiums and lycées.
ReplyDelete
Replies
Larry, San Francisco1 November 2016 at 04:48
I thought the stylized fact (do psychologists use this term) was that average IQ was equal but that the variance for men was higher causing significantly more men to highest IQs (or at least math ability)
ReplyDelete
Replies
herbal erection pills25 November 2020 at 00:02
Very nice article. I definitely appreciate this website. Continue the good work!
ReplyDelete
Replies

Add comment