Most of what the media make a fuss about over health or diet should not be believed.
It should not be believed even as it cites peer-reviewed articles or official guidelines. All too often the claims made are based on misuse of statistics and are an abuse of common sense.
That little rant was set off by a piece in the august New York Times: “Pollution leads to greater risk of dementia among older women, study says”).
Alarms were triggered:
“Older women”: Only among older and not younger? Women but not men?
The original article did not improve my mood:
The pollution actually studied was “fine particulate matter, P.M. 2.5, 2.5 micrometers or smaller in diameter”: What about 2.5 to 3, say? Or 3 to 4? And so on.
“Women with the genetic variant APOE4, which increases the risk of Alzheimer’s disease, were more likely to be affected by high levels of air pollution”:
Is this asserting that there’s synergy? That the combined effect is not just the added effects of the two factors? That pollution is not just an independent risk factor but somehow is more effective with APOE4 carriers? So what about APOE3 or APOE2 carriers?
The New York Times piece mentioned some other studies as well:
“[P]renatal exposure to air pollution could result in children with greater anxiety, depression and attention-span disorders”.
“[A]ir pollution caused more than 5.5 million premature deaths in 2013”.
With those sort of assertions, my mind asks, “How on earth could that be known?”
What sort of study could possibly show that? What sort of data, and how much of it, would be required to justify those claims?
So, with the older women and dementia, how were the observational or experimental subjects (those exposed to the pollution) distinguished from the necessary controls that were not exposed to pollution? Controls need to be just like the experimental subjects (in age, state of health, economic circumstances, etc.) with the sole exception that the latter were exposed to pollution and the controls were not.
For the controls not to be exposed to the pollution, obviously the two groups must be geographically separate. Then what other possibly pertinent factors differed between those geographic regions? How was each of those factors controlled for?
In other words, what’s involved is not some “simple” comparison of polluted and not polluted; there is a whole set of possibly influential factors that need somehow to be controlled for.
The more factors, the larger the needed number of experimental subjects and controls; and the required number of data points increases much more than linearly with the number of variables. Even just that realization should stimulate much skepticism about many of the media-hyped stories about diet or health. Still more skepticism is called for when the claim has to do with lifestyle, since the data then depend on how the subjects recall and describe how they have behaved.
The dementia article was published in Translational Psychiatry, an open-access journal from the Nature publishing group. The study had enrolled 3647 women aged between 65 and 79. That is clearly too small a number for all possibly relevant factors to have been controlled for. Many details make that more than a suspicion, for example, “Women in the highest PM2.5 quartile (14.34–22.55 μg m −3) were older (aged ≥75 years); more likely to reside in the South/Midwest and use hormonal treatment; but engage less in physical activities and consume less alcohol, relative to counterparts (all P-values <0.05. . . )” — in other words, the highest exposure to pollution was experiences by subjects who differed from controls and from other subjects in several ways besides pollution exposure.
At about the same time as the media were hyping the dementia study, there was also “breaking news” about how eating enough fruit and vegetables protects against death and disease, based on the peer-reviewed article “Fruit and vegetable intake and the risk of cardiovascular disease, total cancer and all-cause mortality — a systematic review and dose-response meta-analysis of prospective studies”.
Meta-analysis means combining different studies, the assumption being that the larger amount of primary data can make conclusions stronger and firmer. However, that requires that each of the individual studies being drawn on is sound and that the subjects and circumstances are reasonably comparable in all the different studies. In this case, 95 studies reported in 142 publications were analyzed. Innumerable factors need to be considered — the specific fruit or vegetable (one cannot presume that apples and pears have the same effect, nor cauliflower and carrots); and the effects of different amounts of what is eaten must somehow be taken into account. There are innumerable variables, in other words, permitting considerable skepticism about the claims that “An estimated 5.6 and 7.8 million premature deaths worldwide in 2013 may be attributable to a fruit and vegetable intake below 500 and 800 g/day, respectively, if the observed associations are causal” and that ‘Fruit and vegetable intakes were associated with reduced risk of cardiovascular disease, cancer and all-cause mortality. These results support public health recommendations to increase fruit and vegetable intake for the prevention of cardiovascular disease, cancer, and premature mortality.” Skepticism is yet more called for since health and mortality are influenced to a great extent by genetics and geography, which were not controlled for.
The authors deserve credit, though, for the clause, “if the observed associations are causal”. What everyone should know about statistics is that correlations, associations, never prove causation. That law is almost universally ignored as the media disseminate press releases and other spin from researchers and their institutions, implying that associations are meaningful about what causes what.
It is easy enough to understand why considerable skepticism should be exercised with claims like those about mortality and diet or about dementia and pollution, simply because studies to test these claims properly would need to include much larger numbers of subjects. But an even greater reason to doubt such claims, as well as claims about newly approved drugs and treatments, is that the statistical analyses commonly used are inherently flawed, most particularly by a quite inadequate criterion for statistical significance.
Almost universally in social science and in medical science, statistical significance is defined as p≤0.05: the probability that the results are mere coincidence, owing just to random chance, is less than 5%, in other words less than 1 in 20.
Several things are wrong with that. Among the most serious are:
- That something is not a coincidence, not owing to random chance, does not tell us what it is owing to, what the cause is. It is not necessarily the experimenter’s hypothesis, yet that is the assumption made universally with this type of statistical analysis.
- 1 in 20 is a very weak criterion. It means that 1 in every 20 “statistically significant” conclusions is wrong. Do 20 studies, and on average one of them will be “statistically significant” even though it is wrong.
- That something is statistically significant does not mean that the effect is meaningful.
For example, after I had a TIA (transient ischemic attack, minor stroke), the neurologist automatically prescribed the “blood thinner” Plavix, clopidogrel, as lessening the risk of further strokes. I am wary of all drugs since they all have “side” effects, so later I searched the literature and found that Plavix is statistically significantly better at decreasing risk than is aspirin, p = 0.043, better than p≤0.05. However, the relative efficacies found were just 5.83% compared to 5.32%; to my mind, not at all a significant difference, not enough to compensate for the greater risk of “side” effects from clopidogrel than from aspirin which has been in use for far longer by far more people without discovery of seriously dangerous “side” effects. (Chemicals don’t have two types of effect, main and side, those we want and those we don’t want. “Side” effects are just as real as the intended effects.)
Many statisticians have pointed out for many years what is wrong with the p-value approach to statistics and its use in social science and in medical science. More than two decades ago, an editorial in the British Medical Journal pointed to “The scandal of poor medical research” [i] with incompetent statistical analysis one of the prime culprits. Matthews [ii] has explained clearly point 1 above. Colquhoun [iii] explains that p ≤ 0.05 makes for wrong conclusions even more often than 1 in 20 times: “If you use p=0.05 to suggest that you have made a discovery, you will be wrong at least 30% of the time”. Gigerenzer [iv] has set out in clear detail the troubles with the commonly used p-value analysis.
Nevertheless, this misleading approach continues to be routine, standard, because it is so simple that many researchers who have no real understanding of statistics can use it. Among the consequences is that most published research findings are false [v] and that newly approved drugs have had to be withdrawn sooner and sooner after their initial approval [vi].
Slowly the situation improves as systemic inertia is penetrated by a few initiatives. A newly appointed editor of the journal Basic and Applied Social Psychology (BASP) announced that p-value analyses would no longer be required [vii], and soon after that they were actually banned [viii].
In the meantime, however, tangible damage is being done by continued use of the p-value approach in the testing and approval of prescription drugs, which adds to a variety of deceptive practices routinely employed by the pharmaceutical industry in clinical trials, see for example Ben Goldacre, Bad Pharma: How Drug Companies Mislead Doctors and Harm Patients (Faber & Faber, 2013); Peter C. Gøtzsche, Deadly Medicines and Organised Crime: How Big Pharma Has Corrupted Healthcare (Radcliffe, 2013); David Healy, Pharmageddon (University of California Press, 2012). Gøtzsche and Healy report that prescription drugs, even though “properly” used, are the 3rd or 4th leading cause of death in developed countries.
***************************************************************************
[i] D G Altman, BMJ, 308 [1994] 283
[ii] Matthews, R. A. J. 1998. “Facts versus Factions: The use and abuse of subjectivity in scientific research.” European Science and Environment Forum Working Paper; pp. 247-82 in J. Morris (ed.), Rethinking Risk and the Precautionary Principle, Oxford: Butterworth (2000).
[iii] David Colquhoun, “An investigation of the false discovery rate and the misinterpretation of p-values”, Royal Society Open Science, 1 (2014) 140216; http://dx.doi.org/10.1098/rsos.14021
[iv] Gerd Gigerenzer, “Mindless statistics”, Journal of Socio-Economics, 33 [2004] 587-606)
[v] (John P. A. Ioannidis, “Why most published research findings are false”, PLoS Medicine, 2 [#8, 2005] 696-701; e124)
[vi] Henry H. Bauer, Dogmatism in Science and Medicine: How Dominant Theories Monopolize Research and Stifle the Search for Truth, McFarland, 2012, Table 5 (p. 240) and text pp. 238-42
[vii] David Trafimow, Editorial, Basic and Applied Social Psychology, 36 (2014) 1-2
[viii] (David Trafimow & Michael Marks, Editorial, BASP, 37 [2015] 1-2; comments by Royal Statistical sociry[viii] and at https://www.reddit.com/r/statistics/comments/2wy414/social_psychology_journal_bans_null_hypothesis/)