Introduction


Introduction

In this module we will introduce a new kind of predictive model called a logistic regression model. A logistic regression model is very similar to a linear regression model in that we seek to come up with an optimal intercept and set of slope values that each correspond to a given explanatory variable (or indicator variable) in a linear equation as such.

$$f(response\ variable) = \hat{\beta}_0 + \hat{\beta}_1x_1 + ...+\hat{\beta}_px_p$$

The main difference between a linear regression model and a logistic regression model is the type of response variable that we want to predict. Specifically,

  • a linear regression model aims to predict a numerical response variable.
  • a logistic regression model aims to predict a categorical response variable with two levels.

Logistic Regression for Machine Learning

A logistic regression model can be used to answer a variety of research questions or pursue a variety of research goals. In module 13 we'll discuss how to use a logistic regression model to answer inference-based research questions. Specifically, we'll learn how to evaluate if we have enough evidence to suggest that there is an association between an explanatory variable and a numerical response variable in a large population dataset that we don't actually have "in our hands".

In this module 10, we'll show how we can use a logistic regression model for machine learning purposes. Specifically, we'll show how you can build and use a logistic regression model to predict a categorical response variable (with 2 levels) for new datasets. For instance, our module 10 research goal that we'll discuss in section 2 will involve predicting whether or not an Instagram account is fake.

Module Outline

  • Model Basics: In sections 3, 4, 5, and 6 we'll discuss the basics of how we go about fitting a logistic regression model.
  • Model Predictions:In section 7 we'll discuss how to make predictions with a logisistic regression model.
  • Model Interepretations:In section 8 we'll discuss how to interpret the intercept and slopes of a logistic regression model.
  • Model Evaluation: In section 9 we'll discuss how to evaluate a given model for a variety of different purposes including.
    • Suitability of the model (9.1)
    • Predictive power of the model (9.2)
    • Ability to trust model predictions (9.3)
    • Ability to trust slope interpretations (9.4)
  • Using a Model for Classification In section 10, we'll discuss how to use a logistic regression model for the purposes of classification. Then we'll talk about how to evaluate a classifier's performance and pick out the one that best meets our research goal.
  • Feature Selection In section 11, we'll discuss how to use feature selection techniques to help us select the "best" set of explanatory variables to use in our logistic regression model.
  • Cross-validation Finally, in section 12 we'll discuss how to use cross-validation techniques to help us best assess how well our chosen model will predict fake vs. real accounts for new datasets.