Goodness of fit is an alluring concept. First, it is a good name for a statistical procedure, suggesting that one is on the side of the angels, and above all the devilish tricks of statisticians. Second, it describes how well a set of observations fits a theoretical model. The smaller the discrepancy between the observed values and the expected, model values, the better the fit.
Question is, fit with what? Chi square simply tells you the extent to which a particular frequency of observed values fits what would be expected from, usually, a chance model. It depends on a model, which in turn depends on a set of assumptions. In simple cases the chance model is fine. In more complicated cases the expected frequencies are somewhat harder to calculate.
However, that is by no means the main problem. Non-statisticians use a different heuristic, and count the number of points of concordance between a narrative and a set of observations. It is goodness of fit only in the sense of a comfortable and convincing similarity. Chi square it ain’t.
For example, in buying a car one might make a list of desirable characteristics, and then measure the extent to which each car “ticks the right boxes”. Car manufacturers know this, so they construct the list for you, and then reveal the perfect fit: “Our car has doors, and you wanted doors, so 1 point to us” and so on. Fitting the facts to a narrative often follows a similar, self-serving, confirmation bias. People tend to count the points of concordance without looking at individual probabilities.
In trying to make sense of the mysterious loss of Malaysian Airlines MH370 many people want to start with the narrative. In the well known example of a simple explanation: if there had been a fire on board, and if that fire damaged communication systems, and if the the pilots had set a course to the nearest safe airport, then they might have been overcome by fumes and carried on flying in the same automatic Westerly direction until either the fire consumed the plane, or the fuel ran out. Most convincingly, the map of the safe airport showed it had the characteristics the narrative had required: an approach over water to a particularly long airstrip on which you would wish to land if you were on fire. Spooky.
Of course, it is better to assemble the facts first, and display them with their error terms. Some of those basic facts, and the error terms, have been difficult to track down. Not all of them. Inmarsat plotted two possible trajectories, with associated error bands, which appear to give objective guidance, even though they are based on very new types of inference. More confusing were the timelines of events, and now even the cockpit to control tower conversations, some of which have reportedly been challenged because of translation errors. The Malaysian Government read out an urgent note about apparent debris spotted by a Chinese satellite, but gave very large dimensions which turned out to be wrong.
The Bayesian approach would be to look at each of the assumptions in terms of probabilities, and then establish confidence limits for those probability estimates. The chain of assumptions contain some that can vary semi-independently, and others which appear to cancel each other out. If a plane has crashed somewhere, it is not likely to keep “pinging” in a way that can be received by an antenna on a satellite, since the capacity to “ping” depends on having an intact system with a power supply.
Equally, if a camera on a satellite shows something floating on the sea (or just under the surface) then the significant of that firstly depends upon the error rate of interpretation (that the signal relates to something solid, and not a pattern of waves) and more importantly, a probabilistic judgment about whether the something is likely to be part of a plane or a container, pallet, or glutinous conglomeration of pallets, plastic bags, bottles, trainers and yellow plastic ducks (all of which have polluted the oceans and improved our detailed knowledge of sea currents).
Doctors face these sorts of dilemmas every time they cannot reach a diagnosis. They generally ask for more tests, which may illuminate or delay the decision. Privately, they often use a frequency table, on the basis that frequent things occur frequently, and that is the best and most defensible guide to action.
Consider the following data from a Flight Global showing why planes have been lost during level flight, which is usually the safest part of a plane journey. Sabotage 13, Loss of Control 8, Airframe 8, Explosion or Fire 4, Collision 4, Hijack 2, Ditching 1, Power Loss 1, Shot down 1, Unknown 4 (includes MH370).
In terms of prior probability you would go for sabotage as the primary suspect, followed by loss of control or airframe. Given that no debris was found at the point of last transmission, or nearby, that means that there was no bomb, no loss of control, no airframe disintegration, no explosion, probably no fire, no collision (pretty sure of that), no ditching, power loss or shot down (unless there is one hell of a cover up).
Looks like hijack, in the sense of hijack by pilot for reasons unknown. However, out of 45 planes falling down from level flight before this one, hijack accounted for 2 and unknown for 3. So, looks like unknown, possibly hijack. Time to send some satellites to look at the debris to the West of the Maldives, just as a control case.
Finally, at what level of cost will governments begin to lose interest? I predict by the 35th day after the disappearance, when the black box pinger stops, everything will be scaled back, and the searchers will return to the statistics lab, until a bit of the tailplane shows up years later in a fishing net.