Reflections on Statistical Non-Significance

Statistical Hypothesis testing does an OK job at avoiding proving the presence of effects, but it does a mediocre job (or worse) at disproving them. There are a lot of reasons for this, poor training among them, but it is largely systemic. I spent my Thanksgiving morning watching the “Vanishing of the Bees,” and my mind kept drifting to thoughts of Type II error. I know. I can grasp the obvious … maybe I need a break.

I don’t have any biological expertise in evaluating, in detail, the research on either side of the fascinating Colony Collapse Disorder debate, but I am always suspicious of negative findings of any kind unless I can read the research. In the case of this documentary, they claim (a claim that is perhaps biased) that pesticides were determined to be safe after administering a fairly large dose to an adult bee, and determining that the adult bee did not die during the research period. Was that enough? I can’t speak to the biology/ecology research, but it got me thinking about Type II.

We know well the magnitude of the risk we face in committing Type I, and it is trained into us to the point of obsession. When meeting analysts wearing this obsession on their sleeve, reminding everyone who will listen, leveling their wrath on marketing researchers daring to use exploratory techniques, I am often tempted to ask about controlling for Type II. I am often underwhelmed with the reply. There are just so many things that can go wrong when you get a non-significant result. Although I wrote about something similar in my most recent post, I’m am compelled to reduce my thoughts to writing again:

1) The effect can be too small for the sample size. Ironically, the problem is usually the opposite. Often researchers don’t have enough data even thought the effect is reasonably big. In this case, I was persuaded by the documentary’s argument that bee “birth defects” would be a serious effect. Maybe short term adult death was not subtle enough. More subtle would require more data.

2) The effect can be delayed. My own works doesn’t involve bees, but what about the effect of marketing? Do we always know when a promotion will kick in? Are we still experiencing the effects of last quarter’s campaign? Does that cloud our ability to measure the current campaign? Might the effects overlap?

3) The effect could be hidden in an untested interaction (AKA your model is too simple). The bee documentary proposed an easy to grasp hypothesis – that the pesticide accumulates over time in the adult bee. Maybe a proximity * time interaction? We may never know, but was the sample size sufficient to test for interactions, or was Power Analysis done assuming only main effects. Since they were studying bee autopsies the sample size was probably small. I don’t know the going rate for a bee autopsy, but they are probably a bit expensive since the expertise would seem rare.

4) Or its hidden in a tested interaction (AKA your model is too complex). I had a traumatic experience years ago when a friend asked me what “negative degrees of freedom” were. Since she was not able to produce a satisfactory answer to a query regarding her hypothesized interactions, her dissertation committee required here to “do all of them”. Enough said. It was horrible.

5) The effect might simply be, and what could be more obvious, not hypothesized. This, we might agree, is the real issue regarding the adult bee death hypothesis. It may not have been the real problem at all.

Statistics doesn’t help you find answers. Not really. It only helps you prove a hypothesis. When you are lucky, you might be able to disprove one. Often, we have to simply “fail to prove”. In any case, I recommend the documentary. Now that I’ve been able to vent a bit about Type II, I should watch it again and focus more of my attention on the bees.

3 comments on “Reflections on Statistical Non-Significance

  1. Keith,

    A little suspicion never hurts. It’s interesting that you say you’re always suspicious of negative findings. Since negative findings – no effect – are rarely published, I would think that positive findings are even more deserving of suspicion. We’re more likely to see a false report of a positive finding than a negative one.


  2. Meta,

    Thanks for your comment. I think you are right. We don’t get to see most of these, but I think Type I gets more attention. When negative findings are part of a broader study, I think some analysts struggle with how to report it.

    In this case, nothing happened, so it was deemed to be safe. A close friend, a physiologist, is driven up the wall whenever she reads that “exercise didn’t help mitigate the disease/pain etc.”. I’ve told her the researcher probably didn’t really have the data to say that. “We failed to find a significant exercise effect” would probably be safer.


  3. Taleb in The Black Swan clarifies this issue by mentioning that a commonly used acronym in Medicine is NED (No Evidence of Disease). There is, he points out, no acronym END (Evidence of No Disease). Many doctors, apparently, confuse the two, just as many research authors seem to confuse the two.

Leave a Reply

Your email address will not be published. Required fields are marked *


* Copy This Password *

* Type Or Paste Password Here *

388 Spam Comments Blocked so far by Spam Free Wordpress

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>