Module 7: Understanding and Wrangling Data

This module provides an introduction to the interconnected nature of the data science pipeline. We consider how to ask research questions, pursue research goals, prepare data, and communicate results.

Module 8: Populations, Samples, and Statistics

How do values of a statistic vary from sample to sample? How does that depend on the sampling scheme? These questions will be answered in this module focusing on Populations, Samples, and Statistics.

Module 9: Statistical Inference for Populations

When we only have information from a sample, what can we say about the underying population? We will look at two inferential techniques that allow us to determine a range of reasonable values for an unknown population parameter or decide between two competing theories about our unknown population parameter.

Module 10: Linear Regression

This module introduces how to predict a quantitative response variable using a linear regression model given a set of provided predictor variables.

Module 11: Logistic Regression and Classification

In this module, we introduce the logistic regression model for predicting a categorical response variable with two distinct values. We discuss how to fit and evaluate a logistic regression model for machine learning purposes and how to use this model as the basis of a classifier.

Module 12: Feature Selection and Cross-Validation Techniques

What does it mean to overfit a predictive model? How does an overfit model impact our ability to pursue machine learning goals? We explore ways of attempting to find the optimal combination of explanatory variables that best meet our machine learning goals for a predictive model.

Module 13: More Machine Learning Methods

This module provides a deeper dive into some selected machine learning methods. This provides additional materials for a motivated student interested in deepening their awareness of machine learning methods.