Conclusion


Throughout this module, we have explored characteristics about the statistics that can be recorded to summarize a sample. We've discussed relevant statistics that we can record, depending on:

  • the type of variable
  • the number of populations
  • the characteristics and structure of the data

We've also demonstrated that the specific values of a statistic will vary from sample to sample. Sampling distributions help us to understand all of the values that a statistic could take from different samples. In some instances, we can determine the theoretical sampling distribution for some statistics (those based on sample means $\bar{x}s$ and sample proportions $\hat{p}$s). On the other hand, we can rely on simulation to determine the sampling distribution for many statistics, and we can do so using either the population data if we have it or using the sample data to approximate the sampling distribution. We specifically examined how properties of the sampling distribution vary depending on characteristics, including:

  • the number of repetitions (samples)
  • the sample size
  • the type of statistic recorded.

The most important characteristics of sampling distributions that we observed were the:

  • shape
  • center
  • variability (or spread)

Once we have a sampling distribution, we can then use it to calculate probabilities associated with the statistic.

Often, we don't actually know the true population, and so often we have to rely on our sample data to help us estimate the sampling distribution. Although this is an approximation that relies on having appropriate data, this approximation does allow us to evaluate the uncertainty associated with statistics measured from samples.