When you get a statistical result, one too often immediately jumps to the conclusion that the finding “is statistically significant” or “is not statistically significant.” While that is literally true since we use those words to describe below .05 and above .05, it does not imply that there are only two conclusions to draw about our finding. Have we ruled out the possible ways that our statistical result might be tricking us?
Things to think about if it is below .05
Real: You might have a Real Finding on you hands. Congrats. Consider the other possibilities first, but then start thinking about who needs to know about your finding.
Small Effect: Your finding is Real, but is of no practical consequence. Did you definitively prove a result with an effect so small that there is no real world application of what you have found? Did you prove that a drug lowers cholesterol at the .001 level, but the drug only lowers it at a level so small that no Doctor or patient will care? Is your finding of a large enough magnitude to prompt action or to get attention?
Poor Sample: Your data does not represent of population. There is nothing you can do at this point. Are you sure you have a good sample? Did you start with a ‘Sampling Frame’ that accurately reflects the population? What was your response rate on this particular variable? Would the finding hold up if you had more complete data? Have you checked to see if the respondent and non-respondent status on this ‘significant’ variable is correlated with any other variable you have? Maybe you have a census, or you are Data Mining – are you sure you should be focused on p values?
Rare Event: You have encountered that 5% thing. It going to happen. The good news is we know how often it is going to happen. If you are like everyone else, you probably are operating at 95% confidence, and then each test, by definition, has a 5% chance of coming in below .05 from random forces alone. So you have a dozen findings – which ones are real? Was choosing 95% Confidence a deliberate and thoughtful decision? Have you ensured that Type I error will be rare? If you have a modest sample size did you chose a level of confidence that gave you enough Statistical Power (see below)? If you are doing lots of tests (perhaps Multiple Comparisons) did you take this into account or did you use 95% confidence out of habit?
Too Liberal: You have violated an assumption which has made your result Liberal. Your p value only appears to be below .05. For instance, did you use the usual Pearson Chi-Sq when Continuity Correction would have been better? Maybe Pearson was .045, Likelihood Ratio was .049, Continuity Correction was .051. Did you chose wisely? Did you use Independent Samples T-Test when a non-parametric would have been better? Having good Stats books around can help, because they will often tell you that a particular assumption violation tends to produce Liberal results. You could always consider a Monte Carlo simulation or Exact Test, and make this problem go away. (An interesting ponderable is to ask if we are within a generation of abandoning distributional assumptions as ordinarily outfitted computers get more powerful?)
Things to think about if it is above .05
Negative Finding: You might have disproven your hypothesis. (I know that you have ‘proven’ your ‘Null Hypothesis’, but does anyone talk that way outside of a classroom?) Congrats might be in order. Consider the other possibilities and then start thinking about who needs to know about your negative finding. If it is the real thing, a negative finding could be a valuable. Be careful however before you shout that the literature was wrong. Make sure it is a bona fide finding.
Power: You may simply have lacked enough data. Did you do a Power Analysis before you began? Was your sample size commensurate with your number of Independent Variables? Did you begin with a reasonable amount of data, but attempted every interaction term under the sun? Did you thoughtlessly include effects like 5 way interactions without measuring the impact that it had on your ability to detect true effects? If you aren’t sure what a Power Analysis is, it is best that you describe your negative results using phrases like: “We failed to prove X”, not “We were able to prove that the claim of X, believed to be true for years, was disproved by our study (N=17)”. You can also Google Jacob Cohen’s wonderful “Things I have Learned (So Far)” to learn more about Power Analysis. I mention is in my Resources section, and it has influenced my thinking for years. Its influence is certainly present in this post.
Poor Sample: Your data is not representative of the population. This one can get your p value to move, incorrectly, in either direction.
Too Conservative: You have violated an assumption which has made your result Conservative. Your p value only appears to be above .05. Did you use an adjusted test in an instance when no adjustment was needed? Did you use Scheffe for Multiple Comparisons, but aren’t quite sure how to justify your choice? Most assumptions make our tests lean Liberal, coming in too low, but the opposite can occur.
This list has served me well for a long time. Always best to report your findings thoughtfully. Statistics, at first, seems like a system of Rule Following. It is more subtle than that. It is about extracting meaning, and then persuading an audience with data. Without an audience, there would be no point. They deserve to know how certain (or uncertain) we are.