Monday 24 August 2020

Statistics and Ecologists Today

Statistics and Ecologists Today: More from the “Emperor Has No Clothes Chronicles”

In my opinion, the question most often asked by ecology practitioners today is “what statistical method should I use in my study?”.  Why? Because whether you are just beginning to learn about ecology, e.g., you are a first-year graduate student preparing your proposal or you are further on and thinking about publishing in a peer-reviewed journal, the pressure is on to understand and decide upon your statistical approach.  It is a modern paradigm that statistics are fundamental in ecology, i.e., most likely your supervisor and your journal editors/reviewers will demand statistical analyses be included in your publication.  The question of the method to use is both good and bad.  One of the paradigm’s great outcomes is the training of ecologists to design effective and meaningful studies (see also Prof. Kreb’s thoughts - https://www.zoology.ubc.ca/~krebs/ecological_rants/on-defining-a-statistical-population/#comments).  However, one massive failure is the mushrooming of complex statistical approaches and easy software packages that are neither as effective nor meaningful as some ecologists wish them to be.

The Covid-19 lock-down let me catch up on some statistics papers I’d tucked into my “to-read” folder.  My younger colleagues are very bright and doing very complex, statistical analyses that aren’t easy to understand, I’m reviewing work using these techniques for journals and funding agencies, and I wanted to invest time exploring and hopefully learning more about these emerging ideas about statistical approaches and applications.  There are many, very excellent papers describing methods and applications that are clearly written by intelligent people who have spent time thinking about statistical approaches in ecology, and more generally biology.  I’ve been reading about AAN, AIC, Bayes, CV-R2, GAM, GLMM, LLM, PLS, RDA, and RF among others. 

My first conclusion is that there is a direct, positive correlation between the abundance and complexity of data arising from emerging sampling tools, e.g., remotely sensed data in my world, and the abundance and complexity of statistical analyses, e.g., Lortie et al. (2020).  I posit that the correlation began about the time that SAS and its 2000-page manuals hit our desks in the 1980s (SAS 1989).  Here is great quote that summarizes statistics in biology more broadly today:  “The suite of statistical tools available to biologists and the complexity of biological data analyses have grown in tandem…The availability of novel and sophisticated statistical techniques means we are better equipped than ever to extract signal from noisy biological data… [statistical] models are powerful yet complex tools.” (Harrison et al. 2018).  The quote is true regarding the much larger and complex data sets and the complex statistical analyses, analytical approaches, and packages today; however, the phrase “noisy biological data” glosses over the fact that it is the fundamental nature of biology to be messy and stay messy.  My second conclusion is that if you want to explain the fiery heat in your spicy chili, then trying to count and find patterns among the chili molecules isn’t a great investment of your time - ”Blackholes are simpler…But even if those equations could be solved for immense aggregates of atoms, they wouldn’t offer the enlightenment that scientists seek.” (https://aeon.co/ideas/black-holes-are-simpler-than-forests-and-science-has-its-limits).

It is the inherent nature of living things to be and stay messy.  All biological systems must have continuous variability and random or not opportunities to break moulds, otherwise selection for survival in a given environment can’t occur and life and lineages end.  This is evolution and broadly, natural selection with some mutations thrown in along the way.  It is this dynamic variability of living systems that jams up biology as it tries to fit into classical, physics-based definitions of the natural world (e.g., Egler 1986; Pigliucci 2002).  Biology has one law for certain (for now at least):  there will be lots of fluxing and variability and the occasional, and often unpredictable, mutations.  Stability is a major discussion point in ecology, but it is a temporal illusion because if you wait a few or 1,000,000 years, change will happen.

The rise of numeracy in biology is a great thing and there is no arguing that numbers are important and useful, especially in ecology.  The problem is that mathematics is bounded and as a consequence, it doesn’t always get along with the especially “noisy” data of ecology.  Counting, measuring, and summarizing are cornerstones of ecology.  Correlations and relations between measured factors and comparing groups are all very useful for developing an understanding of ecological systems.  The clash comes when ecologists, seeing their complex data become enthralled by complex number-busters of mathematics, e.g., statistics.

The clash occurs because 99% of mathematicians don’t understand that 99% of the rest of the world doesn’t get math.  Then add to this the rise of the machines.  Today there isn’t much in the way of statistical analyses that anyone with a few moments of ‘online help’ can’t do, e.g., the rise of “R” (Lortie et al. 2020; and a useful overview is https://blog.eduonix.com/software-development/rise-r-programming-language-usefulness-data-science/).  The story-line has been as follows: there is very beautiful mathematics (more on this later), it is translated through a machine with the virtual pressing of a button and with instant results, there is a growing throng of intelligent “applied statistics” crusaders, and voilĂ , we have the perfect recipe for disaster.  The crusaders are smart people, know there is a math issue, and sincerely also advise ecologists to ‘consult a statistician’.  But this is one of the disconnects: normal people (the 99%) don’t understand that mathematicians can’t conceive of a system not bounded by equations, and they are very happy that anyone is interested in mathematics and will dive into any math problem presented.

Taking a few steps back, I’m a math nerd and have been since I was 8 years old and calculating the least expensive set of groceries on a hand-held, pocket counter while wheeling the cart through the store.  I went to university to study mathematics, did two years in Canada’s elite mathematics’ programme, realized that math had other uses (leading to my mostly ecology career), and 30 years later I now teach statistics to undergraduate and graduate students in the environmental sciences.  I have been in the community of mathematicians and, while I am generalizing about them for literary purposes (apologizes to my math friends), I live at the mathematics-ecology nexus and I have happily added to the mathematics-ecology mash-up during my career, especially early on.

Math is cool even if 99% of us don’t get it.  Watching your hard work and collected data become a statistically significant regression or show statistically significant differences supporting your original hypothesis are powerful moments, especially early in your career.  Turning your very large and complex, interconnected data set into a 2-dimensional principle component space is amazing.  These analyses, among many math applications, have moved ecology far beyond counting and measuring, and the math can be very informative for advancing our understanding of complex systems.  Math gives ecologists many useful tools.  But – and there is a big but - mathematics has rules and ecological systems flaunt every one of those rules.  Ecology, i.e., the study of living things and their environments, is inherently variable across space and time, within and among individuals, families, groups, populations, species, communities, ecosystems, and at many more levels we can’t yet comprehend.  The biological and environmental information collected today will vary later today, be different tomorrow, and so on.  Natural systems want to change and have to change, but math needs stability and it has boundaries.    

I describe our current situation as statistics running amok over ecology at the beckon of ecology.  The math is beautiful and the people applying it and creating software to perform the complex computations are more intelligent than I, but the ever-increasing birds-nests of statistical analyses are mostly unnecessary as other more intelligent people than I have pointed out (see for example, Murtaugh 2007 and Amrheim et al. 2019).  Imagine I need to get from my home on the east coast of Canada to the west coast some 5,400 km away.  I used to drive a 1976 Chevy Nova which was the most standard car design on the road for a couple of decades.  I could successfully achieve my goal by driving that car across Canada with some simple assumptions that hold true: I have a paper map and there is gas and a mechanic who can fix any problem in every town.  Alternatively, I could drive a somewhat hypothetical, but close to reality automobile of today that is self-driving, GPS linked and controlled, electric fueling, and so on.  My assumptions are that self-driving is possible on all roads, all my computer systems don’t fail, the GPS satellites are detectable, Siri isn’t leading me astray, the computers that run the remote things don’t fail, and so on.  I also assume I can get these things fixed, but anyone who drives a lesser-imagined car today knows you can’t get it fixed unless you are at a big city dealer with a service computer plug-in for your car.  My analogy in statistical language: a simple t-test may not look pretty in today’s psychedelic statistical landscape, but it achieved the same result. 

I’m also sensing a negative impact on the advancement of ecology because we get distracted creating and promoting, more and more complex statistical analyses and software.  It is an ever-deepening rabbit’s hole because our inherently complex ecological systems are far beyond our current ability to comprehend and creating more complex statistical models and computational processes will never advance our understanding of the original question in ecology.  Trying to extract a signal we won’t recognize from guaranteed-to-be-increasingly-complex and noisy data is a never-ending do-loop. 

My take home message is what I try to instill in my young learners in the environmental sciences: “If your experiment needs statistics, you ought to have done a better experiment.”  A statement most often attributed to E. Rutherford, I explain that he (or whoever) wasn’t slamming statistics but was appealing for better experimental design.  I follow with Curry’s Corollary : “If you need statistics to tell something is significant, then it is not significant”.  This is about the natural variability you will face, that there is no magic bullet to overcome it, and that a clear question and good sampling design is the foundation you need to find or get you close to a solution, including which, if any, statistical approach you choose to use in your studies.  Statistics is just a tool, like many that we use in the environmental sciences.  It has value, but it remains just one tool in your toolbox.  I also emphasize ad nauseam, including with journal editors, that a far better tool is a well-thought out figure showing an effect/no effect.

I encourage and teach the use of mathematics, especially statistics, in all the environmental sciences.  These disciplines should be very grateful because statistic’s greatest gift has been the teaching to think about our questions, sampling design, and interpretation and presentation of data (there are many useful guides, e.g., Kass et al. 2016; Zuur and Ieno 2016).

A final note to readers.  There aren’t many references herein on purpose.  If you feel the need for a rebuttal, then you will already have many, very effective references at your fingertips to slam into your response.  Or you will get my point, smile, and contemplate your next study design a priori over a beverage of your choice.

Allen

References:

Amrhein V, Greenland S, McShane B.  2019.  Scientists rise up against statistical significance.  Nature 567(7748):305-307.

Egler, FE.  1986.  Physics envy in ecology.  Bulletin of the Ecological Society of America 67:233-235.

Harrison XA, Donaldson L, Correa-Cano ME, Evans J, Fisher DN, Goodwin CED, Robinson BS, Hodgson DJ, Inger R. 2018. A brief introduction to mixed effects modelling and multi-model inference in ecology. PeerJ 6:e4794.

Pigliucci, M.  2002.  Are ecology and evolutionary biology ‘soft’ sciences?  Annales Zoologici Fennici 39:87-98.

Kass RE, Caffo BS, Davidian M, Meng X-L, Yu B, Reid N.  2016.  Ten simple rules for effective

    statistical practice. PLoS Comput Biol 12(6):e1004961.

Lortie, CJ, Braun, J, Filazzola, A, Miguel, F.  2020.  A checklist for choosing between R packages in ecology and evolution. Ecol Evol 10:1098– 1105.

Murtaugh, PA.  2009.  Performance of several variable‐selection methods applied to real ecological data. Ecology Letters 12: 1061-1068.

SAS Institute Inc.  1989.  SAS STAT user's guide, version 6.  4th ed. SAS Institute Inc., Cary, N.C.

Zuur AF, Ieno EN.  2016.  A protocol for conducting and presenting results of regression‐type analyses. Methods Ecol Evol 7: 636-645.

Saturday 14 March 2020

Defining Oneself

Each year my new graduate students take a course that requires a synopsis of their supervisor.  Each time I’m asked to describe myself, to define who I am, and I struggle to know what to say.  Curiously, my son pointed this out to me, “when people ask, you always skirt around a description of what you do”.  When my PhD student insisted I describe myself in a few lines for a project he was working on, I felt it was time to put some effort into my self-understanding of my apparent, enigmatic storyline.

My first answer when asked who I am is typically, “I’m a natural historian”.  For my academic colleagues and my students, this statement evokes a look of uncertainty because this term evokes an image of a Victorian naturalist or Monk collecting rocks, plants, or animals and describing them in detail, and maybe the differences within and among.  So here is the long-winded explanation that goes with my natural historian self-moniker. 

I’m a professor of biology, forestry, and environmental management, and very clearly ensconced in the world of modern science.  I am an academic studying animals (fish and invertebrates) and their habitats (rivers, lakes, coastal zones).  I sit in a Department of Biology most days.  As a biologist, I’m in a community that has spent 50+ years divorcing itself from its “natural history” legacy (see for example Rickleff 2012 ) as the community pushed very aggressively to be a “hard science”, sometimes referred to as physics envy (e.g., Egler 1986 ).  For my biology colleagues and students who are well-trained a.k.a. indoctrinated, claiming to be a natural historian is tantamount to heresy.  As an aside, check out interesting read on science as a religion by Manson (2016)  and the many references therein.  Back to my colleagues and students, surely you must be an “ecologist”, even if you lower your position in the community by using the adjective “applied” ecologist because much of your science relates to answering questions that need answers today (e.g., Curry and Devito 1996; Monk et al. 2011, Freedman et al. 2012; Lento et al. 2018; O’Sullivan et al. 2019 ).  The term “applied” when used in the academic, biological sciences community is considered the lowest of castes because it is not “theoretical” to which all biologist must now strive to achieve.  Only theoretical approaches will advance of biology as a “hard science”; therefore, you are expected to do this for the community and your standing or rank is thusly judged.  This is a legitimate predicament for my biology students, they know their boss has a very successful career, but how is that possible as a “heretic” and most importantly for them, “how will that impact me, my studies, and my career?”.

I also live in the academy of physical sciences, e.g., my NSERC Discovery Grant comes from the Geosciences group (NSERC Discovery is the pinnacle of recognition for Canadian academics).   Describing my life is generally easy for these colleagues and students because they are not hung-up on a desire to be something else like my biology-type associates.  The dilemma for this community is the view that biology is a “soft”, lower caste of science.  How is it possible that a successful scientist could also be involved in such soft science, and for the students, “how will that impact me, my studies, and my career?”.

It is this apparent complexity of lives that confuses who I am to people outside science.  I am a professor, which invokes the typical “you work eight months of the year as a teacher”, which is partially true (some professors follow that model).  Once we dance around this for a few moments, I then get to describe what I “teach”.  I sometimes say I’m a biologist which is generally understood – medicine right?  I study fish, which is then quickly interpreted as “you are a marine biologist”, and some days I am.  You explain that most of your work is in freshwater, but that only confuses people because to them fish and fish-like creatures on TV and movies are Jaws, Free Willy, Flipper, etc.  You can’t often talk about invertebrates because these are, when you are lucky, just insects and they don’t live in water do they?, or they are seafood.   Hydrology is easier to explain because I can talk about flooding and invoke the popular topic of climate change.  Trying to explain that I study how water flows across and through landscapes is too deep, so how cutting down trees impacts stream temperatures is usually a good storyline.

Right now, my two biggest projects involve a large dam removal and the regional scale hydrology of New Brunswick (my province in Canada).  Dam removal is pretty easy to understand, unless I get asked for more details because it is hard to explain in one sentence the breadth of my work from the ecology of fish, invertebrates, and macrophytes to the hydrodynamic modelling of rivers and engineering of fish passage.  The regional hydrology study explanations are quickly consumed and transformed into a conversation lead by the questioner about the loss of buffer zones, too large cut blocks, poor forest roads, and always, industry is bad. 

Which brings me back to my description of myself as a natural historian.  I choose that description because I want to invoke the image of Charles Darwin or Charles Lyell.  Not because they were great scientists whom I think I am like, they are way out of my league, but because their era’s detailed studies and description of the natural world is what I do.  I’m interested in the very mundane, day-to-day structures and processes of the natural world, and what happens when we alter these.  I sometimes use a battlefield analogy to describe my work, an analogy where many may aspire to be majors and generals leading the way, but somebody still has to do the dirty work in the trenches, on the beaches, and door to door, or I do grunt work.

I’ve described myself as an explorer too.  I will go to the difficult places few or no others have gone before.  Eventually others may follow as pioneers and settlers.  I liked to turn over rocks as a kid because there are very intriguing things and creatures to discover.  It turns out the same discoveries occur when you turn over the “rocks” of our science world.  You discover that many of our modern ideas aren’t actually ours and we rarely care to acknowledge the science that came before us.  Our studies of the hydrology of landscapes is rapidly expanding with amazing new tools such as remote sensing with large and fine scale maps of surface and sub-surface attributes and stable isotopes that give us an idea of water age.  Long forgotten is the same work written eloquently by among others, Noel Hynes -A Stream and Its Valley (Hynes 1975 ), Tom Winter – Hydrological Landscapes (Winter 2001) , and Jack Stanford/James Ward  - Hyporheic Corridors (Stanford and Ward 1993).  And in the biological sciences, Charles Darwin wrote about many and arguably most of the new ideas proposed as “modern” biology theories (see among many reviews,  Boero 2015).

So, the next time you hear me being asked “Hey Allen, what do you do?”, know that all of these many storylines are streaming through my head as I try to decide which is the most appropriate response for the situation.  In the end, I remain a proud natural historian who is happy to use modern tools to explore nature, turn over rocks, get dirty, and find the best answers for today’s challenges today. 

REFERENCES
Boero, F.  2015.  From Darwin's Origin of Species toward a theory of natural history.  F1000prime reports, 7.
Curry, R.A. and K.J. Devito.  1996.  Hydrogeology of brook trout (Salvelinus fontinalis) spawning and incubation habitats: implications for forestry and land use development. Canadian Journal of Forest Research 26:767-772.
Egler, F.E.  1986.  Physics envy in ecology.  Bulletin of the Ecological Society of America 67:233-235.
Freedman, J.A., R.A. Curry, and K.R.M. Munkittrick.  2012.  Stable isotope analysis reveals anthropogenic effects on fish assemblages in a temperate reservoir. River Research and Applications 28:1804-1819.
Hynes, H.B.N.  1975.  The stream and its valley.  SIL Proceedings 1922-2010 19:1-15.
Lento, J., M.A. Gray, A.J. Ferguson, and R.A. Curry.  2018.  Establishing baseline biological conditions and monitoring metrics for stream benthic macroinvertebrates and fish in an area of potential shale gas development. Canadian Journal of Fisheries and Aquatic Sciences 999:1-15.
Manson, M.  2016. “The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life.”  Harper Collins. 
Monk, W.A., D.L. Peters, R.A. Curry, and D.J. Baird.  2011.  Quantifying trends in indicator hydroecological variables for regime-based groups of Canadian rivers. Hydrological Processes 25:3086-3100.
O'Sullivan, A.M., T. Linnansaari, and R.A. Curry.  2019.  Ice Cover Exists (ICE): A quick method to delineate groundwater inputs in running waters for cold and temperate regions. Hydrological Processes 33: 3297– 3309.
Ricklefs, R.E.  2012.  Naturalists, natural history, and the nature of biological diversity (American Society of Naturalists Address).  The American Naturalist 179:423-435.
Stanford, J.A., and J.V. Ward.  1993.  An ecosystem perspective of alluvial rivers: connectivity and the hyporheic corridor.  Journal of the North American Benthological Society 12:48–60.
Winter, T.C.  2001.  The concept of hydrologic landscapes 1.  Journal of the American Water Resources Association 37:335-349.