Stop abusing statistical significance


I just made my first edit on Wikipedia, on the article on 'statistical power'. Here's the old text, with the deleted parts in bold:

There are times when the recommendations of power analysis regarding sample size will be inadequate. Power analysis is appropriate when the concern is with the correct acceptance or rejection of a null hypothesis. In many contexts, the issue is less about determining if there is or is not a difference but rather with getting a more refined estimate of the population effect size. For example, if we were expecting a population correlation between intelligence and job performance of around .50, a sample size of 20 will give us approximately 80% power (alpha = .05, two-tail). However, in doing this study we are probably more interested in knowing whether the correlation is .30 or .60 or .50. In this context we would need a much larger sample size in order to reduce the confidence interval of our estimate to a range that is acceptable for our purposes. These and other considerations often result in the true but somewhat simplistic recommendation that when it comes to sample size, "More is better!"

However, huge sample sizes can lead to statistical tests becoming so powerful that the null hypothesis is always rejected for real data. This is a problem in studies of differential item functioning.


Leaving the cost of collecting data aside, larger (appropriately collected) samples are ALWAYS BETTER. At the end of the day, if your sample is *too* large (for example if your statistical software restricts the amount of information you can load on it and you don't need the extra information anyways) you can always obtain a smaller random sample from your larger random sample. So, the 'more is better' recommendation is simple, but not simplistic.

The last paragraph reveals a fundamental misconception about statistical significance that refuses to go away. If the effect of an independent variable on the dependent variable is zero, using a very large sample will result to an estimated effect that is 0 to many decimal places; as the sample size increases further, the effect will approach *exactly* zero even more. NEVER USE STATISTICAL SIGNIFICANCE AS A PROXY FOR PRACTICAL SIGNIFICANCE. I have no clue whether large sample sizes have been seen as a problem in the past in studies of differential item functioning, but if that is the case then the researchers are idiots.

Here is another post on problematic applications of statistical significance.

2 comments:

  1. Gabriel M. Says:

    Michael G., from http://yetanothersheep.blogspot.com/, has several posts on this, including one with a debate in the comments with a McClosky fan.

    He also has another post on misconceptions surrounding heteroskedasticity,

  2. datacharmer Says:

    Thanks Gabriel, I'll have a look.