Overview of Statistical Inference

← Comparing Machine Learning Models Next: Populations →

Throughout Modules 8-11, we’ve been focusing on machine learning methods. In machine learning, we were building a model that could learn from the available data and that we could apply to new data.

In Modules 12 and 13, we’re going to switch our framework for working with data. Now, instead of thinking about building a generalizable model that we can apply to new data, we’re going to focus on making statements about the underlying population from our available data while making considerations for uncertainty about this process.

At the end of Module 12, you should be able to:

Define the population of interest and determine how the sample was generated from the population of interest
Generate a distribution of possible values for means or proportions (percents) calculated from possible samples through simulations
Define the Central Limit Theorem and how it applies to sampling distributions
Describe properties of these sampling distributions of sample means and sample proportions
Extend the sampling distribution to more complex scenarios, including when we are interested in comparing two populations

After working through the material in Module 13, you should also be able to:

Identify some common characteristics about which we would like to make statements
Calculate a range of reasonable values for an unknown population characteristic of interest
Make a decision about whether or not a statement seems reasonable about a population characteristic of interest
Communicate about and apply these processes to a variety of settings and a variety of characteristics
Apply the statistical inference procedures based on simulated sampling distributions or theoretical sampling distributions

Many of the concepts in these two modules were introduced in Module 5: Polling, Confidence Intervals, and the Normal Distribution. We will formalize and expand upon many of these concepts in the next two modules.

Some of the questions that we’ll work to answer in these two modules include:

Has the average Airbnb host been a host for longer than 5 years and 9 months?
What is the median price for the population of all Chicago Airbnb listings?

We hope to build a framework that you can apply to new and different scenarios when you are interested in learning more about characteristics other than a mean or a proportion.

Let's get started!

← Comparing Machine Learning Models Next: Populations →