Your Data Science Journey - From Beginning to End

From Data Science Discovery to Data Science Explorations

Welcome back to the world of data science! In Data Science Discovery Karle Flanagan and Wade Fagen-Ulmschneider gave you a brief introduction to a breadth of exciting beginner-level data science tools that you used to discover hidden insights about datasets. You learned how to do the following:

Manipulate dataframes with pandas.
Calculate probabilities.
Conduct a few TYPES of hypothesis tests about populations based on a random sample (this is formally called conducting inference).
Perform some introductory machine learning analyses by fitting a linear regression model.

By the end of this Data Science Explorations course, Tori Ellison and Julie Deeke will bring all of these tools together to complete a beginning-to-end data science project. We frame these projects by what research question we would like to ask or what research goal we would like to pursue, based on a compelling research motivation. With this in mind, we carefully take into account all the many decisions that we as data scientists can make when completing the project and how these decisions might uniquely affect the question or goal that we're pursuing. The typical end of a data science project involves some means of communicating these insights effectively to your desired audience.

In addition, we will explore new and more advanced machine learning and inference techniques that can help us answer and pursue a greater breadth of research questions and research goals with data!

Research Motivation

Real vs. Fake Instagram Accounts

Let's take a look at the following screenshot of the Instagram user wanderingggirl below.

Question: Do you think that this account is real or fake?

What was it about this person's account that made you think that it was fake or real? Was it the pictures that look like stock photos? Was it the number of posts this person had made? Was it the number of posts that this person had made in relation to the number of followers that they have? There was most likely a combination of factors that helped you make your decision.

Dataset

Let's explore this research topic further with the following dataset. A researcher collected an (assume random) sample of 60 Instagram accounts that they determined to be real and an (assume random) sample of 60 Instagram accounts that they determined to be fake. You can learn more about how this dataset was collected here.