Module 7: Understanding and Wrangling Data
This module provides an introduction to the interconnected nature of the data science pipeline. We consider what it means to pursue research goals and ask research questions effectively with data. Given that there are often many decisions involved in pursuing a beginning-to-end data science analysis, what are some best practices when it comes to communicating our research findings? Finally, what are some ways in which we might clean an manipulate a dataframe for further analysis?
-
7-00
» Your Data Science Journey - From Beginning to End -
7-01
» Review of Data Basics -
7-02
» Answering Questions Using Data -
7-03
» Cleaning and Preparing Data -
7-04
» Missing Data -
7-05
» Reshaping and Merging Data -
7-06
» Summarizing Variables with Statistics, Tables, & Plots -
7-07
» Measurement Errors -
7-08
» Deeper Dive in Data Cleaning
Module 8: Linear Regression
This module introduces how a linear regression model can be used and evaluated for machine learning purposes. We discuss how to predict a numerical response variable given a set of numerical and/or categorical variables.
-
8-00
» Predicting Airbnb Prices for New Datasets -
8-01
» Single Variable Descriptive Analytics and Data Manipulation -
8-02
» Describing Associations between Two Variables -
8-03
» Describing Associations between Three Variables -
8-04
» A Machine Learning Technique for Finding Good Predictions for New Datasets -
8-05
» Fitting a Multiple Linear Regression Curve -
8-06
» How to Incorporate Categorical Explanatory Variables -
8-07
» Interpreting your Model's Slopes -
8-08
» Evaluating your Linear Regression Model for Machine Learning and Interpretation Purposes -
8-09
» Interaction Terms -
8-010
» Airbnb Research Goal Conclusion -
8-011
» Variable Transformations
Module 9: Feature Selection and Cross-Validation Techniques
What does it mean to overfit a predictive model? How does an overfit model impact our our ability to pursue machine learning goals? One way to overfit a predictive model is by including too many explanatory varaibles that don't bring 'enough' predictive power to the model? In this section we explore ways of measuring whether or not an explanatory variable brings 'enough' predictive power to a predictive model. We also explore ways of attempting to find the optimal combination of explanatory variables that best meet our machine learning goals for a predictive model.
-
9-00
» Introduction -
9-01
» Overfitting vs. Underfitting to a Dataset -
9-02
» Finding a Parsimonious Model -
9-03
» Overview of Feature Selection Techniques -
9-04
» Backwards Elimination Algorithm -
9-05
» Forward Selection Algorithm -
9-06
» Breast Cancer Research Introduction -
9-07
» Regularization Techniques -
9-08
» Cross-Validation Techniques -
9-09
» Principal Component Regression -
9-010
» Conclusion
Module 10: Logistic Regression and Classification
In this module we introduce the logistic regression model which is one of the most common models for predicting a categorical response variable with two distinct values. We discuss how to fit and evaluate a logistic regression model for machine learning purposes. Furthermore, we discuss how to use a logistic regression model as a classifier. We discuss how to evaluate the performance of a classifier model. Finally, we implement the features selection techniques that we introduced in module 9 to attempt to find the optimal combination of explanatory variables to use that yields the best classifier performance for machine learning purposes.
-
10-00
» Introduction -
10-01
» Instagram Classifier Introduction -
10-02
» Introducing Logistic Regression -
10-03
» Odds and Probability -
10-04
» Fitting a Logistic Regression Model -
10-05
» Multiple Logistic Regression -
10-06
» Making Predictions -
10-07
» Slope and Intercept Interpretations -
10-08
» Evaluating your Logistic Regression Model -
10-09
» Classification with Logistic Regression -
10-010
» Feature Selection
Module 11: More Machine Learning Methods
-
11-00
» More Machine Learning -
11-01
» Decision Trees -
11-02
» Random Forests -
11-03
» Neural Networks -
11-04
» Comparing Machine Learning Models
Module 12: Populations, Samples, and Statistics
-
12-00
» Overview of Statistical Inference -
12-01
» Populations -
12-02
» Samples -
12-03
» Describing a Sample with Visualizations and Statistics -
12-04
» Sampling Distributions -
12-05
» Sampling Distribution Properties -
12-06
» Sampling Distribution for Two Populations -
12-07
» Sampling Distributions for Regression -
12-08
» Simulations for Difference Data -
12-09
» Calculating Probability for Statistics -
12-010
» Deeper Dive into Underlying Theory -
12-011
» Conclusion
Module 13: Statistical Inference for Populations
-
13-00
» Overview -
13-01
» Population Parameters and Sample Statistics -
13-02
» One Hypothesis Testing Example -
13-03
» Hypothesis Testing Framework -
13-04
» Confidence Intervals -
13-05
» Traditional Procedures for Inference -
13-06
» Inference for Regression -
13-07
» Inference for Logistic Regression -
13-08
» Name That Scenario -
13-09
» Conclusion