Sunday, September 8, 2019

Classification and Regression Using Supervised Learning 8 (Multivariable regression)

In the previous post, we discussed how to build a regression model for a single variable. In this post, we will see how to build a regression model with multidimensional data. See the following program:

import numpy as np
from sklearn import linear_model
import sklearn.metrics as sm
from sklearn.preprocessing import PolynomialFeatures

# Input file containing data
input_file = 'data_multivar_regr.txt'

# Load the data from the input file
data = np.loadtxt(input_file, delimiter=',')
X, y = data[:, :-1], data[:, -1]

# Split data into training and testing
num_training = int(0.8 * len(X))
num_test = len(X) - num_training
# Training data
X_train, y_train = X[:num_training], y[:num_training]
# Test data
X_test, y_test = X[num_training:], y[num_training:]

# Create the linear regressor model
linear_regressor = linear_model.LinearRegression()
# Train the model using the training sets
linear_regressor.fit(X_train, y_train)

# Predict the output
y_test_pred = linear_regressor.predict(X_test)

# Measure performance
print("Linear Regressor performance:")
print("Mean absolute error =", round(sm.mean_absolute_error(y_test,
y_test_pred), 2))
print("Mean squared error =", round(sm.mean_squared_error(y_test,
y_test_pred), 2))
print("Median absolute error =", round(sm.median_absolute_error(y_test,
y_test_pred), 2))
print("Explained variance score =",
round(sm.explained_variance_score(y_test, y_test_pred), 2))
print("R2 score =", round(sm.r2_score(y_test, y_test_pred), 2))


When we run the program it print the performance metrics as shown below:

Linear Regressor performance:
Mean absolute error = 3.58
Mean squared error = 20.31
Median absolute error = 2.99
Explained variance score = 0.86
R2 score = 0.86
------------------
(program exited with code: 0)

Press any key to continue . . .


Our program starts with importing all the required packages. Then we defined the input file containing data , I'm using data_multivar_regr.txt but you have to use your own:

input_file = 'data_multivar_regr.txt'

Next we use numpy's loadtxt() method to load the data from the input file:

data = np.loadtxt(input_file, delimiter=',')
X, y = data[:, :-1], data[:, -1]


As we did while building the regression model for a single variable, here also we split the data into training and testing:

num_training = int(0.8 * len(X))
num_test = len(X) - num_training


X_train, y_train = X[:num_training], y[:num_training]
X_test, y_test = X[num_training:], y[num_training:]

Next we create and train the linear regressor model:

linear_regressor = linear_model.LinearRegression() 
linear_regressor.fit(X_train, y_train)

Then we predict the output for the test dataset:

y_test_pred = linear_regressor.predict(X_test)

Finally our program measures the performance metrics and print them:

print("Linear Regressor performance:")
print("Mean absolute error =", round(sm.mean_absolute_error(y_test,
y_test_pred), 2))
print("Mean squared error =", round(sm.mean_squared_error(y_test,
y_test_pred), 2))
print("Median absolute error =", round(sm.median_absolute_error(y_test,
y_test_pred), 2))
print("Explained variance score =",
round(sm.explained_variance_score(y_test, y_test_pred), 2))
print("R2 score =", round(sm.r2_score(y_test, y_test_pred), 2))


Now create a polynomial regressor of degree 10 and train the regressor on the training dataset. Let's
take a sample data point and see how to perform prediction. The first step is to transform it into a polynomial as shown in the code below:

polynomial = PolynomialFeatures(degree=10)
X_train_transformed = polynomial.fit_transform(X_train)
datapoint = [[7.75, 6.35, 5.56]]
poly_datapoint = polynomial.fit_transform(datapoint)


If we look closely, this data point is very close to the data point on line 11 in our data file, which is [7.66, 6.29, 5.66]. So, a good regressor should predict an output that's close to 41.35. Create a linear regressor object and perform the polynomial fit. Perform the prediction using both linear and polynomial regressors to see the difference. See the code below:

poly_linear_model = linear_model.LinearRegression()
poly_linear_model.fit(X_train_transformed, y_train)
print("\nLinear regression:\n", linear_regressor.predict(datapoint))
print("\nPolynomial regression:\n",
poly_linear_model.predict(poly_datapoint))


Now let's run the program and see the output. It should be:

Linear Regressor performance:
Mean absolute error = 3.58
Mean squared error = 20.31
Median absolute error = 2.99
Explained variance score = 0.86
R2 score = 0.86

Linear regression:
 [36.05286276]

Polynomial regression:
 [41.44966348]
------------------
(program exited with code: 0)

Press any key to continue . . .


We can see that the polynomial regressor is closer to 41.35.
Share:

0 comments:

Post a Comment