Fitting a Multiple Linear Regression Curve
In Data Science Discovery we briefly touched upon how a linear regression model finds a "line of best fit" for our training dataset. In the case of our example, we are seeking to find the best intercept and slopes $\hat{\beta}_0,\hat{\beta}_1,...,\hat{\beta}_8$ for our linear regression model.
$\hat{y}=\hat{\beta}_0+\hat{\beta}_1 x_1+\hat{\beta}_2 x_2+⋯+\hat{\beta}_8 x_8$
But how do we know when we've selected the best intercept and slopes $\hat{\beta}_0,\hat{\beta}_1,...,\hat{\beta}_8$? There are actually many ways that a certain set of intercepts and slopes may be deemed best.
However, one of the most popular methods is the ordinary least squares (OLS) method which says that the best intercept and slopes are the ones that minimize the sum of the square of each of our residuals in the training dataset. Or in other words,
$f(\hat{\beta}_0,\hat{\beta}_1,...,\hat{\beta}_8)$
$=\sum_{i=1}^n(residual_i)^2 =(residual_1)^2+...+(residual_n)^2$
$=\sum_{i=1}^n(observed\_ y_i - predicted\_ y_i)^2 =(observed\_ y_1 - predicted\_ y_1)^2+...+(observed\_ y_n - predicted\_ y_n)^2$
$=(y_1 - \hat{y}_1)^2+...+(y_n - \hat{y}_n)^2$
$=\left(y_1 - (\hat{\beta}_0 +\hat{\beta}_1x_{1,1}+...+\hat{\beta}_8x_{8,1})\right)^2+...+ \left(y_n - (\hat{\beta}_0+\hat{\beta}_1x_{1,n}+...+\hat{\beta}_8x_{8,n}) \right)^2$
For illustrative purposes, let’s temporarily switch gears to the task of predicting price
with just a single explanatory variable beds
. Thus our linear regression curve that we are trying to fit would be a simple linear regression model, because we have just one explanatory variable.
$\hat{price}=\hat{\beta}_0+\hat{\beta}_1 beds$
df_train[['price','beds']]
price | beds | |
---|---|---|
1193 | 85 | 2.0 |
1392 | 602 | 4.0 |
338 | 335 | 5.0 |
1844 | 395 | 2.0 |
359 | 166 | 2.0 |
... | ... | ... |
1316 | 130 | 2.0 |
3363 | 120 | 2.0 |
2591 | 159 | 1.0 |
3291 | 275 | 2.0 |
1596 | 50 | 2.0 |
1648 rows × 2 columns
This means that our goal would be to find the optimal values for $\hat{\beta}_0,\hat{\beta}_1$ that minimized a 3-d curve like the one shown below.
$f(\hat{\beta}_0,\hat{\beta}_1)$
$=(85 - (\hat{\beta}_0+\hat{\beta}_1(2)))^2+...+(50 - (\hat{\beta}_0+\hat{\beta}_1(2))^2$

The process to find these optimal values of the intercept and slope requires multivariate calculus, which is beyond the scope of this course. However, the sci-kit learn LinearRegression()
function will return these optimal values for us.
For instance, the optimal intercept was found to be $\hat{\beta}_0^*=40.86$ and the optimal slope was found to be $\hat{\beta}_1^*=66.61$.
df_train = X_train.copy()
df_train['price']=y_train
df_train.head()
neighborhood | room_type | accommodates | bedrooms | beds | price | |
---|---|---|---|---|---|---|
1193 | Logan Square | Entire home/apt | 5 | 2.0 | 2.0 | 85 |
1392 | Logan Square | Entire home/apt | 8 | 4.0 | 4.0 | 602 |
338 | Lake View | Entire home/apt | 12 | 4.0 | 5.0 | 335 |
1844 | West Town | Entire home/apt | 5 | 2.0 | 2.0 | 395 |
359 | Lake View | Entire home/apt | 4 | 2.0 | 2.0 | 166 |
df_test = X_test.copy()
df_test['price']=y_test
df_test.head()
neighborhood | room_type | accommodates | bedrooms | beds | price | |
---|---|---|---|---|---|---|
2592 | Near West Side | Entire home/apt | 2 | 1.0 | 1.0 | 179 |
139 | Lake View | Private room | 1 | 1.0 | 1.0 | 63 |
471 | Lake View | Entire home/apt | 2 | 1.0 | 1.0 | 61 |
2015 | West Town | Entire home/apt | 5 | 2.0 | 2.0 | 122 |
3324 | Near North Side | Entire home/apt | 6 | 2.0 | 3.0 | 425 |
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train[['beds']], y_train)
print('slope: ', model.coef_)
print('intercept: ', model.intercept_)
slope: [66.6196058] intercept: 40.859249687151646
Hence our fitted ordinary least squares simple regression model is the following:
$\hat{price}=40.86+66.61 beds$