Friday 26 September 2014

Should educationalists be streamed?

There has long been a debate as to whether educationalists should be streamed, so that the brighter practitioners should not be held up by the slower pace of their less able colleagues. The contrary view is that educationalists of different levels of ability should be mixed together, so that the clever ones can lead the intellectually impaired to better things. It is not clear where the Institute of Education stands on this important policy matter.

This debate is remarkably similar to the question as to whether children should be streamed in schools. Before all else, do a thought experiment: when you stream children, what result would count as success? Certainly if all streamed children do better than un-streamed children then that would count as a clear win. It would show that “correct pace” teaching was good for all.  However, what if bright children race ahead whenever they do not have to wait for their less bright peers? Should that be counted a success, or a partial success, or a failure? The economic and cultural contribution of the brightest minds appears to be considerably greater than that of average citizens, so it might be best to give them a clear run, and settle accounts later with redistributive taxation. On the other hand, if you value the mean value of achievement for the group as a whole, then brighter children should be held back to encourage the others.

On the issue of streaming, Samantha Parsons & Sue Hallam, both at the Institute of Education have written “The impact of streaming on attainment at age seven: evidence from the Millennium Cohort Study” . Oxford Review of Education 24 September 2014.

Their work has been prominently reported, which is a good thing. It is based on a very good sample, which is also a good thing. Most citizens will read the newspaper accounts only, so here is the Guardian headline as a guide:

School streaming helps brightest pupils but nobody else, say researchers: Splitting classes by ability undermines efforts to help disadvantaged children, finds research into English primaries

So much for what the public will read and believe to have been proved. What does the actual study reveal?

The Millennium sample a good size, is representative, and there is an increased representation of minority, poor and immigrant groups. The sample is somewhat better than the population averages.The sample studied in the paper was N=2544 of whom 83% were not streamed. The sample size is fine by social science standards, and much better than the modal values in publications, though negligible compared to the 70,000+ in the Deary et al (2007) education paper.

What is less satisfactory is that the authors do their study on the basis of Key Stage 1, when the children are 7.  These ratings are done by teachers on the basis of “informal tests”. I do not know if these are actual tests with published characteristics, or just an overall impression. They also have an earlier baseline teacher assessment called Foundation Skills Profile. For children at school in England these assessments are made on the basis of the teacher’s accumulating observations and knowledge of the whole child.

Seven years of age is rather early to come to any conclusions about teaching methods. This is the earliest age, from a psychometric point of view, that we can get an indication whether they have reading problems of any significance. It is also a little hard to believe that 7 year olds have achievements in science. These teacher assessments are somewhat weak, and insensitive to actual differences in ability. I have looked at them in relation to court cases, and would not put too much reliance on them. As a rule of thumb, if you want to know how well teachers teach, do not rely on teacher’s assessments of progress. Use national examinations marked by others.

Now we turn to the crux of the paper: the difference between schools that stream and schools that don’t. We need to know if schools that stream are different from those schools which don’t in terms of parental background, child ability, and other teaching methods. In particular, we need to know if the scholastic achievements of children in the un-streamed schools have the same means and standard deviations as the achievements of the streamed schools. Otherwise the differences between the overall score of un-streamed children and the overall scores of the streamed children may differ for reasons that are not directly due to streaming.

For example, schools which find they have a very broad range of child abilities (large standard deviation) might have to do streaming; schools with a narrower range of abilities (low standard deviation) might not bother. We need to check that a fair comparison is being made.

The results in Fig 1 suggest that those who were streamed (17% of this sample) were duller and more variable than the majority who were un-streamed. Looking within the streamed children, the brightest are only a little above the average of the un-streamed majority. Case proved that streaming is not worth it? Not at all.

This is yet another case when very simple statistics would be a great help. Showing the actual distribution of the Stage 1 total scores for the steamed 17% and the un-streamed 83% would be useful. The streamed children are out-numbered four to one. 222 children were in the ‘top’ stream, 130 in the ‘middle’ stream and 94 in the ‘bottom’ stream. These are reasonable numbers, but hardly substantial ones. We must check that the decision to stream children is not influenced by student heterogeneity. As far as I can see, these checks have not been done.

The authors have done regression analyses so as to predict the Key 1 scores. This potentially obscures the position in that it denies us a clear contrast between the streamed/un-streamed groups. Instead, you have to try to derive these differences from the beta coefficients.

The authors note: Standardised regression coefficients do not directly indicate the effect of a unit change in the outcome, they rather represent change in terms of standard deviations. The predictor with the biggest regression coefficient is the most important predictor of the outcome, regardless of the direction of the relationship.

One little-reported conclusion: The child’s earlier academic performance, as measured by the Foundation Stage Profile (FSP) score, was identified as the most significant predictor of later academic attainment as measured by KS1 performance.

Another little-reported conclusion: Among the family socio-economic characteristics, parental education remained significantly associated with the KS1 outcomes, after controlling for all other variables in the model. Household income appeared to be an independent risk factor for overall KS1 performance, as did lone parenthood for KS1 maths attainment.

Comment: This first conclusion is what Heiner Rindermann found in many international samples: parental education is more important than parental wealth. That raises the possibility that unmeasured genetic factors make a contribution.

Although the authors have not provided what I regard as a proper comparison between schools, they surprisingly say:

These differences have developed over a short period of time, since the children began compulsory schooling. The findings support the divergence hypothesis (e.g. Linchevski & Kutscher, 1998) which is of particular concern given that prior teacher rated ability at age five was taken into account, along with a range of child and family and school factors.

I am not persuaded on the basis of this paper that “these differences have developed” as a consequence of schooling. I will of course check to see what further analyses they may have done. There might be no differences in standard deviations between the two groups, so it may be a moot point.

Under “Implications” they write:  The evidence from this and earlier research demonstrates that streaming does not of itself raise attainment for all children (e.g. Barker Lunn, 1970; Ferri, 1971) and widens the gap between low and high attaining pupils. Schools need to take this into account when planning the ability grouping structures that they adopt.

I do not think they can argue that, on the basis of their results. They have already said that the prior measures of the Foundation Stage Profile account for a large part of the variance in children’s attainments. The foundation profile has a large gaping hole in it (see below). They have not fully explored the reasons for the possible differences between the streamed and un-streamed children, such that streaming might be applied where there are wide differences in ability.

What dog did not bark in the night? There are no cognitive ability measures reported. None. Why do so many authors fail to consider that intelligence may be a factor in educational attainment? Why leave this out, when it can be measured quickly, and always accounts for a significant proportion of educational outcomes?

Finally, here is my summary:

Although sample sizes are small and the prior measures of ability are weak, those prior abilities are the best predictors of attainments at age 7, and although we cannot be sure that streamed schools haven’t got a wider range of abilities than un-streamed schools, nonetheless it looks as if streaming does not lift the overall abilities  of students.

Snappy headlines are one of my most evident whole-person special skills.



I. J. Deary, S. Strand, P. Smith and C. Fernandes (2007) Intelligence and educational achievement. Intelligence 35, 1, pp13-21.


  1. Not sure about streaming in elementary school - much of what I have read suggests that benefits of this approach are more important from early adolescence on? And, those benefits are disputed.

    But, working in school systems where streaming is taboo on ideological grounds, often until year 11, I marvel at the poor maths teachers who must prepare material for 16 year olds who range in ability from mild-moderate ID, all the way up to those preparing for university. The chimera of maximum freedom and maximum equality, I guess.

    It is nearly impossible to fully consider the willful foolishness of some education systems in these and other matters. I greatly fear that competitor nations are going to leave some of us in the dust because of our wrong-headed ignoring of the science around how people actually learn.

    Again, the aim is the illusion of maximum freedom, with equality. I have actually sat in a room with 100+ teachers, and been told "We don't have equality of input, we must strive for equality of output", meaning that the goal of the education system was to reduce the differences between students over time. The dream (I guess, although it would not have been stated in these terms) was to both shift the whole bell-curve to the right, and to markedly reduce the spread of scores. In academic achievement (rather than cognitive ability) terms, the bell-curve has instead been shifted left, the spread of scores remains the same, and the frequency of scores at the right-tail has decreased.

    One example. When maths and science achievement is celebrated in the system I am referring to, perusal of the photos of students so celebrated (in the local papers) skew 2/3 to females rather than males. So, either the dominance of males at the elite level in those endeavors elsewhere has been challenged and overcome by brilliant teaching, or.......

  2. Yes, I think there is a large element of wishful thinking.

  3. This is an awesome blog post!
    Ans very very thankful to you.
    Cambridge Centre Pakistan