Overview of Statistical Inference


Throughout Modules 8-11, we’ve been focusing on machine learning methods. In machine learning, we were building a model that could learn from the available data and that we could apply to new data.

In Modules 12 and 13, we’re going to switch our framework for working with data. Now, instead of thinking about building a generalizable model that we can apply to new data, we’re going to focus on making statements about the underlying population from our available data while making considerations for uncertainty about this process.

At the end of Module 12, you should be able to:

  • Define the population of interest and determine how the sample was generated from the population of interest
  • Generate a distribution of possible values for means or proportions (percents) calculated from possible samples through simulations
  • Define the Central Limit Theorem and how it applies to sampling distributions
  • Describe properties of these sampling distributions of sample means and sample proportions
  • Extend the sampling distribution to more complex scenarios, including when we are interested in comparing two populations

After working through the material in Module 13, you should also be able to:

  • Identify some common characteristics about which we would like to make statements
  • Calculate a range of reasonable values for an unknown population characteristic of interest
  • Make a decision about whether or not a statement seems reasonable about a population characteristic of interest
  • Communicate about and apply these processes to a variety of settings and a variety of characteristics
  • Apply the statistical inference procedures based on simulated sampling distributions or theoretical sampling distributions

Many of the concepts in these two modules were introduced in Module 5: Polling, Confidence Intervals, and the Normal Distribution. We will formalize and expand upon many of these concepts in the next two modules.

Some of the questions that we’ll work to answer in these two modules include:

  • Has the average Airbnb host been a host for longer than 5 years and 9 months?
  • What is the median price for the population of all Chicago Airbnb listings?

We hope to build a framework that you can apply to new and different scenarios when you are interested in learning more about characteristics other than a mean or a proportion.

Let's get started!