More Machine Learning


In the previous modules, we've been exploring regression, which is one of the most common machine learning techniques. Machine learning involves developing algorithms (or models) that learn from the available data and help in predicting or estimating output.

Two primary purposes or goals exist for machine learning: to make predictions for new data or to understand the underlying processes that connect our input with our output in order to learn more about the world.

We saw how we could meet both of these goals (or at least start to) while creating models in our last few sections. In that sense, we've already been using machine learning.

There are two main branches of machine learning: supervised learning and unsupervised learning. Supervised learning is used when the output is known for the training data. We are able to focus on prediction or classification processes. Unsupervised learning is used when the output is unknown for the training data, with clustering as one common aim. Here, we'll focus on supervised learning and introduce a few different machine learning techniques for classification. We will only touch the surface of classification within one branch of machine learning, so know that there are many other techniques that could be employed.

Here, we'll specifically demonstrate how different techniques can be used to answer the same question, and how each of these techniques will give slightly different answers in this module.

Research Question

We will continue exploring the Instagram data in this section, with the goal of predicting which Instagram accounts are real accounts and which are fake accounts.

Note that although we are applying these techniques to categorical response variables, many of these can also be applied to situations where the response variable is quantitative.