Neural Networks

← Random Forests Next: Comparing Machine Learning Models →

Neural networks (often shortened as neural nets) were originally designed to simulate the network of neurons from brains. The idea is that neurons receive stimuli from the surrounding networks within the brain, which results in the neurons being activated. Once a neuron is activated, it also fires and further activates the other neurons along the same path.

Within machine learning, neural networks are designed to have the inputs send signals through a path, resulting in an output corresponding to a prediction for the response variable. The path that the neural network can take may be simple but could also be quite complex. Linear regression, for example, is a simple form of a neural network where the path consists of the inputs being linearly connected to the outputs. That is, each features is multiplied by a coefficient (assigned during the path) and combined to provide a given output. More complex paths are allowed, allowing for interactions between different inputs and allowing for the relationship between inputs to depend on various other features. That is, it is possible for the strength of one variable to depend on multiple other variables when fitting a model, and for that relationship to also depend on the original value of that variable. This is referred to as having multiple layers on neurons as the input stimuli travel through the neural network to generate the final output. The specific network being fitted can continue to be updated using additional data.

Fitting a Neural Network

We will again use our Instagram data. For this example, we will fit a neural net as accessed through scikit learn. Note that for more advanced neural networks, you might opt to use one as fit with additional machine learning packages, like pytorch or Keras.

We'll start with our data prepared as it has been for the previous pages. Again, we won't separate our data into training and testing data for this illustration, although this would be done for most analyses.

We will then fit our neural network using MLPClassifier. Note that other arguments could adjust the features of the neural network that we would fit to the data.

from sklearn.neural_network import MLPClassifier

mlp = MLPClassifier(hidden_layer_sizes=(8,8,8), activation='relu', solver='adam', max_iter=500)
mlp.fit(X,y)

pred_y = mlp.predict(X)

pd.crosstab(pred_y, y)

account_type	0	1
row_0
0	35	11
1	25	49

Our resulting classifier does not perfect performance. We see that our 36 of our 120 observations are classified incorrectly (either false positives or false negatives). We can look a little more into the resulting probabilities, as well.

y_prob = mlp.predict_proba(X)
y_prob = pd.DataFrame(y_prob)
y_prob.columns = ['prob_fake', 'prob_real']
y_prob.head()

	prob_fake	prob_real
0	0.008599	0.991401
1	0.009402	0.990598
2	0.121067	0.878933
3	0.000000	1.000000
4	0.000971	0.999029

For this example, we'll generate two graphs to visualize the predicted probabilities for the real and the fake Instagram accounts.

y_prob.iloc[0:60, 1].hist()
plt.xlabel('Estimated Probability of a Real Account')
plt.ylabel('Frequency')
plt.title('Histogram of Predicted Probabilities for Real Accounts')
plt.show()

Histogram of the estimated probabilities that an Instagram account is real for real accounts.

y_prob.iloc[60:120, 0].hist()
plt.xlabel('Estimated Probability of a Fake Account')
plt.ylabel('Frequency')
plt.title('Histogram of Predicted Probabilities for Fake Accounts')
plt.show()

Histogram of the estimated probabilities that an Instagram account is fake for fake accounts.

From this output, we can observe that the neural network struggled to classify some of our Instagram accounts, and in some cases performed very poorly. However, we can also tell that there were a large number of accounts that were easy to identify as either real or fake, as demonstrated by the spike in each of the histograms near 1, where 1 represents an account that would be always correctly classified.

Advantages and Drawbacks of Neural Networks

Neural networks are very flexible. They can be applied to many different types of problems and domains, including problems based on images (e.g., identifying whether a given image is of a backpack or not), medical diagnoses, finance, and quantum chemistry. Neural networks also can be applied for many different statistical techniques, including feature selection, function approximation (regression), and classification. They can also model complex scenarios and relationships between different variables, allowing the model itself to be flexible in how each variable is included in the model.

However, each of these advantages also comes with its own set of drawbacks. In some fields, neural networks may require a large amount of data to build an accurate model. The data required may prove the model to be unsuitable or unusable within the scope of the project, especially if the amount of data means that the model will not provide predictions when they would be most usable. Additionally, fitting a neural network can take a large amount of computing power (both memory and storage), especially in the case of big data. Unfortunately, neural networks are another black box technique, where it is challenging to use this technique to understand much about the underlying phenomenon that is driving the results of the neural network. That being said, some connections and inferences have been made using neural networks, allowing the networks to assist in making real-world conclusions using the network.

There are many decisions to make when fitting a neural network. While this is not necessarily an advantage or a drawback, fitting a neural network is not a simple process. Being aware of the variety of options when fitting a neural network can be helpful to ensure that a model is appropriate (or most appropriate).

← Random Forests Next: Comparing Machine Learning Models →