Module 7: Understanding and Wrangling Data
This module provides an introduction to the interconnected nature of the data science pipeline. We consider how to ask research questions, pursue research goals, prepare data, and communicate results.
7-00
» Your Data Science Journey - From Beginning to End7-01
» Review of Data Basics7-02
» Answering Questions Using Data7-03
» Cleaning and Preparing Data7-04
» Missing Data7-05
» Reshaping and Merging Data7-06
» Summarizing Variables with Statistics, Tables, & Plots7-07
» Measurement Errors7-08
» Deeper Dive in Data Cleaning
Module 8: Populations, Samples, and Statistics
How do values of a statistic vary from sample to sample? How does that depend on the sampling scheme? These questions will be answered in this module focusing on Populations, Samples, and Statistics.
-
8-00
» Overview of Statistical Inference -
8-01
» Populations -
8-02
» Samples -
8-03
» Describing a Sample with Visualizations and Statistics -
8-04
» Sampling Distributions -
8-05
» Sampling Distribution Properties -
8-06
» Sampling Distribution for Two Populations -
8-07
» Simulations for Difference Data -
8-08
» Calculating Probability for Statistics -
8-09
» Deeper Dive into Underlying Theory -
8-10
» Conclusion
Module 9: Statistical Inference for Populations
When we only have information from a sample, what can we say about the underying population? We will look at two inferential techniques that allow us to determine a range of reasonable values for an unknown population parameter or decide between two competing theories about our unknown population parameter.
-
9-00
» Overview -
9-01
» Population Parameters and Sample Statistics -
9-02
» One Hypothesis Testing Example -
9-03
» Hypothesis Testing Framework -
9-04
» Confidence Intervals -
9-05
» Traditional Procedures for Inference -
9-06
» Name That Scenario -
9-07
» Conclusion
Module 10: Linear Regression
This module introduces how to predict a quantitative response variable using a linear regression model given a set of provided predictor variables.
-
10-00
» Predicting Airbnb Prices for New Datasets -
10-01
» Single Variable Descriptive Analytics and Data Manipulation -
10-02
» Describing Associations between Two Variables -
10-03
» Describing Associations between Three Variables -
10-04
» Fitting a Multiple Linear Regression Curve -
10-05
» How to Incorporate Categorical Explanatory Variables -
10-06
» Interpreting your Model's Slopes -
10-07
» Interaction Terms -
10-08
» A Machine Learning Technique for Finding Good Predictions for New Datasets -
10-09
» Evaluating your Linear Regression Model for Machine Learning and Interpretation Purposes -
10-10
» Sampling Distributions for Regression -
10-11
» Inference for Regression -
10-12
» Airbnb Research Goal Conclusion -
10-13
» Variable Transformations
Module 11: Logistic Regression and Classification
In this module, we introduce the logistic regression model for predicting a categorical response variable with two distinct values. We discuss how to fit and evaluate a logistic regression model for machine learning purposes and how to use this model as the basis of a classifier.
-
11-00
» Introduction -
11-01
» Instagram Classifier Introduction -
11-02
» Introducing Logistic Regression -
11-03
» Odds and Probability -
11-04
» Fitting a Logistic Regression Model -
11-05
» Multiple Logistic Regression -
11-06
» Making Predictions -
11-07
» Slope and Intercept Interpretations -
11-08
» Evaluating your Logistic Regression Model -
11-09
» Classification with Logistic Regression -
11-10
» Inference for Logistic Regression
Module 12: Feature Selection and Cross-Validation Techniques
What does it mean to overfit a predictive model? How does an overfit model impact our ability to pursue machine learning goals? We explore ways of attempting to find the optimal combination of explanatory variables that best meet our machine learning goals for a predictive model.
-
12-00
» Introduction -
12-01
» Overfitting vs. Underfitting to a Dataset -
12-02
» Finding a Parsimonious Model -
12-03
» Overview of Feature Selection Techniques -
12-04
» Backwards Elimination Algorithm -
12-05
» Forward Selection Algorithm -
12-06
» Breast Cancer Research Introduction -
12-07
» Regularization Techniques -
12-08
» Cross-Validation Techniques -
12-09
» Principal Component Regression -
12-10
» Feature Selection for Logistic Regression -
12-11
» Conclusion
Module 13: More Machine Learning Methods
This module provides a deeper dive into some selected machine learning methods. This provides additional materials for a motivated student interested in deepening their awareness of machine learning methods.
-
13-00
» More Machine Learning -
13-01
» Decision Trees -
13-02
» Random Forests -
13-03
» Neural Networks -
13-04
» Comparing Machine Learning Models