Why We Prefer the Unbiased Estimator of Sample Variance Over the Biased One

The choice between using an unbiased estimator of sample variance and a biased estimator often hinges on the goals of statistical inference and the properties of these estimators. This article explores the reasons for preferring the unbiased estimator over the biased one, focusing on key concepts such as interpretability, consistency, long-term performance, and the bias-variance tradeoff.

Key Concepts

Unbiased Estimator

An estimator is unbiased if its expected value equals the parameter it estimates. For sample variance, the unbiased estimator is given by:

[ s^2 frac{1}{n-1} sum_{i1}^{n} (x_i - bar{x})^2 ]

where (bar{x}) is the sample mean. This estimator is unbiased for the population variance (sigma^2).

Biased Estimator

A biased estimator may not equal the true parameter on average. The biased estimator of sample variance is:

[ s_b^2 frac{1}{n} sum_{i1}^{n} (x_i - bar{x})^2 ]

This estimator is biased because its expected value is:

[ E[s_b^2] frac{n-1}{n} sigma^2 ]

which is less than (sigma^2).

Mean Squared Error (MSE)

The Mean Squared Error of an estimator is defined as:

[ text{MSE} E[hat{theta} - theta]^2 text{Var}hat{theta} text{Bias}hat{theta}^2 ]

where (hat{theta}) is the estimator and (theta) is the true parameter.

Why Use the Unbiased Estimator

Interpretability

An unbiased estimator provides a direct interpretation on average it will equal the true parameter. This is particularly important in inferential statistics where the goal is to make statements about population parameters based on sample statistics.

Consistency

The unbiased estimator of variance is consistent, meaning that as the sample size increases, it converges in probability to the true population variance. While the biased estimator has a smaller MSE for small sample sizes, it does not converge to the true value as effectively.

Long-term Performance

In larger samples, the difference in MSE between the two estimators diminishes, and the unbiased estimator remains valid across different sample sizes. In inferential statistics, where large samples are often the goal, the unbiased estimator becomes more reliable.

Bias-Variance Tradeoff

While the biased estimator may have a smaller MSE for small sample sizes, it suffers from a systematic bias. In practice, this bias can lead to incorrect conclusions about the variability in the data, especially in hypothesis testing and confidence interval estimation.

Conclusion

Finally, the choice to use the unbiased estimator of sample variance is driven by its interpretability, consistency, and the importance of unbiasedness in statistical inference. While the biased estimator may exhibit a smaller MSE in specific scenarios, the long-term reliability and theoretical properties of the unbiased estimator make it the preferred choice in most statistical applications.