The evolving insignificance of significance
When I was an impressionable undergraduate in environmental
sciences at the beginning of the 1980s, statistics was just hitting the main
fashion runways of biology and especially ecology. We didn't know why, but we were pummeled with Fisherian statistical training that required studying, cover to cover the works
of authors such as Sokal, Rohlf, and Zar.
We learned techniques such as analysis of variance and its variants and the
algebra of factor analyses and without computers I might add. Computing was taking off so my cohort and
those just ahead of us teaching statistics soon became well trained users of
SAS and SPSS. I worked in a research
group with a variety of graduate students who discussed at length topics such
as ANOVA, regression, and the emerging applications of multiple linear
regression and PCA in biology and ecology.
These were heady days when such statistics were, I far as I knew, the perceived
new elemental particle of ecology.
It wasn't until I became a graduate student in the mid-1980s
that I realized that asking “why” had a wider required application than just my
research. I asked my mentors why
statistics had become the all consuming, fashion in ecology. The best answer or at least the one I
understood to underlie the fashion was the perceived, absolute need for ecology
to become a “hard” science like physics.
Ecology was considered “soft” also described as not rigorously absolute
and therefore the discipline was perceived to border on non-science and this
was not acceptable. The explanation
included the statements: ecology needed to move away from its natural history
roots; we know enough about the natural world already; and, ecology needs to get
structured, synthesize, and this rigorous mathematics is the ticket to
salvation. I didn't understand who would
make such determinations nor did I understand fashion (as pictures from the
time prove), but I did respect the status of my mentors and as a former
mathematician I understood the rigorous nature of numbers.
I was an acceptable mathematician and computer programmer,
so I fit well within this intensifying fashion and helped many colleagues
including mentors with their struggles with these statistics. My science grew as I churned through the
modern scientific method of questions, hypotheses, predictions, and tests of
predictions. But I also watched several
of my more senior mentors struggle to adjust to Fisher’s statistics which was creating
a fundamental, philosophical change in how we studied natural environments.
For about ten years I acceptingly immersed myself in the
forced application of applied statistics in my natural history studies, which
included both the biological and physical sciences of natural ecosystems. I was publishing papers, theses, and reports
with peer-reviewed and accepted statistical analyses. Why questions endlessly nag me because of my
nature, but the “why these statistics” question began to overwhelm me with my
own recurring and other published results of statistical analyses that simply
affirmed the obvious or where significance was biologically, physically, or
chemically irrelevant for the ecosystem.
It took those ten years for me to begin to truly comprehend the
statistics we were using in the environmental sciences and importantly, the insignificance
of statistical significance.
That knowledge came from battles fought in my trenches of
ecological research, i.e., the required long, hard hours in unrelenting weather
collecting data about living creatures and their natural habitats, replicating such
among cohorts, and then the battles with editors and reviewers arguing that 1-2
populations or years of replication is the first and probably the last set of
data that we will achieve towards answering the asked question. The take home messages of my battles were: Fisherian
statistical zealotry was the environment we had built; fashion is more
important than content; conform or be cast out.
I was in a minority group asking challenging questions about
the foundations of the statistics driving our science. The majority were caught up in the power and
magic of numbers. Those who could
understand the analyses were the fashion gurus.
They used the mystification of mathematics, because the vast majority of
humans don’t get mathematics, to drive our science ruthlessly though a maze of
confusion that produced very little advancement and definitely not the
Nobel-prizing winning inspirations most believed they were going to produce at
any moment (this was the unwritten conclusion expressed in their publications
and conference presentations). But this
was a lesson in “fashion”, a false human construct (yet interestingly arising
from most probably an evolutionary process of selection) and in addition, my community
of research was male-dominated so the essence of megalomania is well entrenched. It was not in the best interests of the gurus
to understand or inform on their weaknesses, e.g., the true assumptions of
statistical analyses and probability theory, or to tame the zealots.
The simple and definitive flaw for Fisher-based statistics
as an elemental particle of environmental sciences is the natural environment
itself. Statistics was created within the
bounded world of numbers. Natural
environments are variable and boundaries while seemingly apparent are in fact rather
difficult and maybe impossible to define.
Moreover, living creatures exist only because of variability of form across
all scales from genes to individuals to populations to communities to
ecosystems. You don’t have to be a
scientist to realize the incongruity of applying an analytical process based on
rigid boundaries to a highly variable system with indefinable boundaries. This realization eluded me in the beginning because
fashion can be alluring, and that became my first hard lesson in science.
I now teach what I consider to be hard-won rules about statistics
in the environmental sciences. Natural
variability crushes all assumptions of any currently used, Fisher-based, statistical
analyses. Attempts to address this fact,
i.e., “tests of assumptions” are used only to mollify your wrong application of
the analysis. I present a theorem: If you
need statistics, then you should have designed a better experiment
(somebody else’s statement, most probably the physicist/chemist Lord E. Rutherford);
and its corollary: If you need statistics
to prove it is significant, then it isn't significant. My students now should be talking about
probabilities and multiple working hypotheses.
All this is not intended to instigate a movement to abolish Fisher’s
statistics from the environmental sciences.
Indeed these statistics have some very useful purposes when applied
properly and when the limits of analyses are well described. In the realm of environmental regulation,
constraints of our legal system force heavy dependence on understanding the
state of “normal”, variability associated with normal, and defining what is not
normal and thus, requiring structured applications of statistical
analyses. Perhaps Fisher’s greatest gift
to the environmental sciences is the impact on what is now the foundation for
sampling variable systems, i.e., the necessity to randomly sample among and
within all possible habitats/locations (e.g., stratified random sampling). There is no reason not to use tools such as
analysis of variance or linear-based regression to help you understand and then
explain a hypothesis, but that can’t be the only tool in your toolbox. Statistic’s “null hypothesis” hasn't been as
kind to science because it created wide-spread confusion about the logic of “falsifiability”
which underlies our modern scientific method (another debatable statement, but
the two concepts are not related). But in
the not too distant future, Bayesian statistics and probability theory might prove
a saviour of environmental science once this community of researchers re-learns
there are no absolutes in the natural world.
I’m not sure if my world of science and research is changing
its fashion, but there are fewer demands from journal editors and their
reviewers to include meaningless statistical analyses and such demands are more
easily refuted during the review process.
I still receive editor-level rejections based on the proposition that my
studies in natural history lack sufficient replication. Rejection based on poor judgement is
difficult to accept, but you learn that in the short-term, fashion is more
important than content and revolutions don’t happen over-night. Anyway, I published those articles in
journals with better impact factors which is another statistic created for
fashion and a topic for a future essay.
Some interesting
reading on this topic:
Cohen, J. 1994. The earth is round (p<. 05). American Psychologist 49:997-1003.
Fisher, R.A. 1956. Statistical methods and scientific inferences.
New York, NY: Hafner
Lawrence, P.A. 2007. The mismeasurement of science. Current Biology 17(15):583-585.
Pigliucci, M.
2002. Are ecology and
evolutionary biology ‘soft’ sciences? Annales
Zoologici Fennici 39:87-98.
The American Psychology Association’s Task Force on
Statistics: http://www.apa.org/science/leadership/bsa/statistical/tfsi-initial-report.pdf
An interesting essay on the reality of Fisherian
statistics:
http://www.creative-wisdom.com/computer/sas/math_reality.html
Great post Allen. I share your frustration with the frequency that blind use of Fisherian null hypothesis tests have been substituting for careful logical reasoning in ecological data analysis. As described in the Cohen (1994) reference you provided in your list of interesting reading on this topic, much of the problems associated with null hypothesis testing are related to the consistent use of an arbitrary statistical decision-making threshold (the significance level alpha = 0.05).
ReplyDeleteI've recently been working on this problem and I've developed a simple method for calculating study-specific significance levels that are tied to biological relevance and minimize Type I and Type II errors, subject to their relative costs. I feel that use of this approach would vastly improve interpretations made from null hypothesis tests in ecology.
Here's a paper providing a general description of the approach and instructions on how to apply it:
Mudge, JF, LF Baker, CB Edge and JE Houlahan. 2012. Setting an optimal α that minimizes errors in null hypothesis significance tests. PLoS ONE 7(2): e32734.
Here's a paper applying the approach to re-evaluate a decade of decisions made under Canada's Environmental Effects Monitoring program:
Mudge, JF, TJ Barrett, KR Munkittrick and JE Houlahan. 2012. Negative Consequences of Using α = 0.05 for Environmental Monitoring Decisions: A Case Study from a Decade of Canada’s Environmental Effects Monitoring Program. Environmental Science and Technology 46: 9249−9255.
Here's a brief overview intended to describe the problem and proposed solution for a layperson audience interested in statistics:
Baker, LF, and JF Mudge. 2012. Making statistical significance more significant. Significance 9(3): 29-30.
Thoughtful consideration needs to be the foundation of data interpretation in ecological research. I hope to encourage this for users of null hypothesis tests by providing an approach for setting study-specific significance levels that minimize Type I and relevant Type II errors through a priori consideration of what effects would be considered biologically relevant to detect.
Joe Mudge
Hey thanks for the comment. I'll enjoy reading your papers - send them along.
DeleteAllen