Overview of Statistical Inference


In Modules 8 and 9, we’re going to answer questions with data, with an underlying goal of making statements about the underlying population from our available data while making considerations for uncertainty about these generalizations.

At the end of Module 8, you should be able to:

  • Define the population of interest and determine how the sample was generated from the population of interest
  • Generate a distribution of possible values for means or proportions (percents) calculated from possible samples through simulations
  • Define the Central Limit Theorem and how it applies to sampling distributions
  • Describe properties of these sampling distributions of sample means and sample proportions
  • Extend the sampling distribution to more complex scenarios, including when we are interested in comparing two populations

After working through the material in Module 9, you should also be able to:

  • Identify some common characteristics about which we would like to make statements
  • Calculate a range of reasonable values for an unknown population characteristic of interest
  • Make a decision about whether or not a statement seems reasonable about a population characteristic of interest
  • Communicate about and apply these processes to a variety of settings and a variety of characteristics
  • Apply the statistical inference procedures based on simulated sampling distributions or theoretical sampling distributions

Many of the concepts in these two modules were introduced in Module 5: Polling, Confidence Intervals, and the Normal Distribution. We will formalize and expand upon many of these concepts in the next two modules.

Some of the questions that we’ll work to answer in these two modules include:

  • Has the average Airbnb host been a host for longer than 5 years and 9 months?
  • What is the median price for the population of all Chicago Airbnb listings?

We hope to build a framework that you can apply to new and different scenarios when you are interested in learning more about characteristics other than a mean or a proportion.

After these two modules focused on inference, we'll then shift our framework for working with data towards machine learning. We'll introduce how other variables in the data can help us to understand our variable of interest. We'll also consider how to build a generalizable model that can learn from available data and apply to new data.

Let's get started!