Fitting a Multiple Linear Regression Curve


In Data Science Discovery we briefly touched upon how a linear regression model finds a "line of best fit" for our training dataset. In the case of our example, we are seeking to find the best intercept and slopes $\hat{\beta}_0,\hat{\beta}_1,...,\hat{\beta}_8$ for our linear regression model.

$\hat{y}=\hat{\beta}_0+\hat{\beta}_1 x_1+\hat{\beta}_2 x_2+⋯+\hat{\beta}_8 x_8$

But how do we know when we've selected the best intercept and slopes $\hat{\beta}_0,\hat{\beta}_1,...,\hat{\beta}_8$? There are actually many ways that a certain set of intercepts and slopes may be deemed best.

However, one of the most popular methods is the ordinary least squares (OLS) method which says that the best intercept and slopes are the ones that minimize the sum of the square of each of our residuals in the training dataset. Or in other words,

$f(\hat{\beta}_0,\hat{\beta}_1,...,\hat{\beta}_8)$

$=\sum_{i=1}^n(residual_i)^2 =(residual_1)^2+...+(residual_n)^2$

$=\sum_{i=1}^n(observed\_ y_i - predicted\_ y_i)^2 =(observed\_ y_1 - predicted\_ y_1)^2+...+(observed\_ y_n - predicted\_ y_n)^2$

$=(y_1 - \hat{y}_1)^2+...+(y_n - \hat{y}_n)^2$

$=\left(y_1 - (\hat{\beta}_0 +\hat{\beta}_1x_{1,1}+...+\hat{\beta}_8x_{8,1})\right)^2+...+ \left(y_n - (\hat{\beta}_0+\hat{\beta}_1x_{1,n}+...+\hat{\beta}_8x_{8,n}) \right)^2$

For illustrative purposes, let’s temporarily switch gears to the task of predicting price with just a single explanatory variable beds. Thus our linear regression curve that we are trying to fit would be a simple linear regression model, because we have just one explanatory variable.

$\hat{price}=\hat{\beta}_0+\hat{\beta}_1 beds$

df_train[['price','beds']]
price beds
1193 85 2.0
1392 602 4.0
338 335 5.0
1844 395 2.0
359 166 2.0
... ... ...
1316 130 2.0
3363 120 2.0
2591 159 1.0
3291 275 2.0
1596 50 2.0

1648 rows × 2 columns

This means that our goal would be to find the optimal values for $\hat{\beta}_0,\hat{\beta}_1$ that minimized a 3-d curve like the one shown below.

$f(\hat{\beta}_0,\hat{\beta}_1)$
$=(85 - (\hat{\beta}_0+\hat{\beta}_1(2)))^2+...+(50 - (\hat{\beta}_0+\hat{\beta}_1(2))^2$

3-dimensional curve representing $f(\hat{\beta}_0, \hat{\beta}_1)$, highlighting its minimum point

The process to find these optimal values of the intercept and slope requires multivariate calculus, which is beyond the scope of this course. However, the sci-kit learn LinearRegression() function will return these optimal values for us.

For instance, the optimal intercept was found to be $\hat{\beta}_0^*=40.86$ and the optimal slope was found to be $\hat{\beta}_1^*=66.61$.

df_train = X_train.copy()
df_train['price']=y_train
df_train.head()
neighborhood room_type accommodates bedrooms beds price
1193 Logan Square Entire home/apt 5 2.0 2.0 85
1392 Logan Square Entire home/apt 8 4.0 4.0 602
338 Lake View Entire home/apt 12 4.0 5.0 335
1844 West Town Entire home/apt 5 2.0 2.0 395
359 Lake View Entire home/apt 4 2.0 2.0 166
df_test = X_test.copy()
df_test['price']=y_test
df_test.head()
neighborhood room_type accommodates bedrooms beds price
2592 Near West Side Entire home/apt 2 1.0 1.0 179
139 Lake View Private room 1 1.0 1.0 63
471 Lake View Entire home/apt 2 1.0 1.0 61
2015 West Town Entire home/apt 5 2.0 2.0 122
3324 Near North Side Entire home/apt 6 2.0 3.0 425
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train[['beds']], y_train)
print('slope: ', model.coef_)
print('intercept: ', model.intercept_)
    slope:  [66.6196058]
    intercept:  40.859249687151646

Hence our fitted ordinary least squares simple regression model is the following:
$\hat{price}=40.86+66.61 beds$