In the previous post, we discussed how to build a regression model for a single variable. In this post, we will see how to build a regression model with multidimensional data. See the following program:
import numpy as np
from sklearn import linear_model
import sklearn.metrics as sm
from sklearn.preprocessing import PolynomialFeatures
# Input file containing data
input_file = 'data_multivar_regr.txt'
# Load the data from the input file
data = np.loadtxt(input_file, delimiter=',')
X, y = data[:, :-1], data[:, -1]
# Split data into training and testing
num_training = int(0.8 * len(X))
num_test = len(X) - num_training
# Training data
X_train, y_train = X[:num_training], y[:num_training]
# Test data
X_test, y_test = X[num_training:], y[num_training:]
# Create the linear regressor model
linear_regressor = linear_model.LinearRegression()
# Train the model using the training sets
linear_regressor.fit(X_train, y_train)
# Predict the output
y_test_pred = linear_regressor.predict(X_test)
# Measure performance
print("Linear Regressor performance:")
print("Mean absolute error =", round(sm.mean_absolute_error(y_test,
y_test_pred), 2))
print("Mean squared error =", round(sm.mean_squared_error(y_test,
y_test_pred), 2))
print("Median absolute error =", round(sm.median_absolute_error(y_test,
y_test_pred), 2))
print("Explained variance score =",
round(sm.explained_variance_score(y_test, y_test_pred), 2))
print("R2 score =", round(sm.r2_score(y_test, y_test_pred), 2))
When we run the program it print the performance metrics as shown below:
Linear Regressor performance:
Mean absolute error = 3.58
Mean squared error = 20.31
Median absolute error = 2.99
Explained variance score = 0.86
R2 score = 0.86
------------------
(program exited with code: 0)
Press any key to continue . . .
Our program starts with importing all the required packages. Then we defined the input file containing data , I'm using data_multivar_regr.txt but you have to use your own:
input_file = 'data_multivar_regr.txt'
Next we use numpy's loadtxt() method to load the data from the input file:
data = np.loadtxt(input_file, delimiter=',')
X, y = data[:, :-1], data[:, -1]
As we did while building the regression model for a single variable, here also we split the data into training and testing:
num_training = int(0.8 * len(X))
num_test = len(X) - num_training
X_train, y_train = X[:num_training], y[:num_training]
X_test, y_test = X[num_training:], y[num_training:]
Next we create and train the linear regressor model:
linear_regressor = linear_model.LinearRegression()
linear_regressor.fit(X_train, y_train)
Then we predict the output for the test dataset:
y_test_pred = linear_regressor.predict(X_test)
Finally our program measures the performance metrics and print them:
print("Linear Regressor performance:")
print("Mean absolute error =", round(sm.mean_absolute_error(y_test,
y_test_pred), 2))
print("Mean squared error =", round(sm.mean_squared_error(y_test,
y_test_pred), 2))
print("Median absolute error =", round(sm.median_absolute_error(y_test,
y_test_pred), 2))
print("Explained variance score =",
round(sm.explained_variance_score(y_test, y_test_pred), 2))
print("R2 score =", round(sm.r2_score(y_test, y_test_pred), 2))
Now create a polynomial regressor of degree 10 and train the regressor on the training dataset. Let's
take a sample data point and see how to perform prediction. The first step is to transform it into a polynomial as shown in the code below:
polynomial = PolynomialFeatures(degree=10)
X_train_transformed = polynomial.fit_transform(X_train)
datapoint = [[7.75, 6.35, 5.56]]
poly_datapoint = polynomial.fit_transform(datapoint)
If we look closely, this data point is very close to the data point on line 11 in our data file, which is [7.66, 6.29, 5.66]. So, a good regressor should predict an output that's close to 41.35. Create a linear regressor object and perform the polynomial fit. Perform the prediction using both linear and polynomial regressors to see the difference. See the code below:
poly_linear_model = linear_model.LinearRegression()
poly_linear_model.fit(X_train_transformed, y_train)
print("\nLinear regression:\n", linear_regressor.predict(datapoint))
print("\nPolynomial regression:\n",
poly_linear_model.predict(poly_datapoint))
Now let's run the program and see the output. It should be:
Linear Regressor performance:
Mean absolute error = 3.58
Mean squared error = 20.31
Median absolute error = 2.99
Explained variance score = 0.86
R2 score = 0.86
Linear regression:
[36.05286276]
Polynomial regression:
[41.44966348]
------------------
(program exited with code: 0)
Press any key to continue . . .
We can see that the polynomial regressor is closer to 41.35.
import numpy as np
from sklearn import linear_model
import sklearn.metrics as sm
from sklearn.preprocessing import PolynomialFeatures
# Input file containing data
input_file = 'data_multivar_regr.txt'
# Load the data from the input file
data = np.loadtxt(input_file, delimiter=',')
X, y = data[:, :-1], data[:, -1]
# Split data into training and testing
num_training = int(0.8 * len(X))
num_test = len(X) - num_training
# Training data
X_train, y_train = X[:num_training], y[:num_training]
# Test data
X_test, y_test = X[num_training:], y[num_training:]
# Create the linear regressor model
linear_regressor = linear_model.LinearRegression()
# Train the model using the training sets
linear_regressor.fit(X_train, y_train)
# Predict the output
y_test_pred = linear_regressor.predict(X_test)
# Measure performance
print("Linear Regressor performance:")
print("Mean absolute error =", round(sm.mean_absolute_error(y_test,
y_test_pred), 2))
print("Mean squared error =", round(sm.mean_squared_error(y_test,
y_test_pred), 2))
print("Median absolute error =", round(sm.median_absolute_error(y_test,
y_test_pred), 2))
print("Explained variance score =",
round(sm.explained_variance_score(y_test, y_test_pred), 2))
print("R2 score =", round(sm.r2_score(y_test, y_test_pred), 2))
When we run the program it print the performance metrics as shown below:
Linear Regressor performance:
Mean absolute error = 3.58
Mean squared error = 20.31
Median absolute error = 2.99
Explained variance score = 0.86
R2 score = 0.86
------------------
(program exited with code: 0)
Press any key to continue . . .
Our program starts with importing all the required packages. Then we defined the input file containing data , I'm using data_multivar_regr.txt but you have to use your own:
input_file = 'data_multivar_regr.txt'
Next we use numpy's loadtxt() method to load the data from the input file:
data = np.loadtxt(input_file, delimiter=',')
X, y = data[:, :-1], data[:, -1]
As we did while building the regression model for a single variable, here also we split the data into training and testing:
num_training = int(0.8 * len(X))
num_test = len(X) - num_training
X_train, y_train = X[:num_training], y[:num_training]
X_test, y_test = X[num_training:], y[num_training:]
Next we create and train the linear regressor model:
linear_regressor = linear_model.LinearRegression()
linear_regressor.fit(X_train, y_train)
Then we predict the output for the test dataset:
y_test_pred = linear_regressor.predict(X_test)
Finally our program measures the performance metrics and print them:
print("Linear Regressor performance:")
print("Mean absolute error =", round(sm.mean_absolute_error(y_test,
y_test_pred), 2))
print("Mean squared error =", round(sm.mean_squared_error(y_test,
y_test_pred), 2))
print("Median absolute error =", round(sm.median_absolute_error(y_test,
y_test_pred), 2))
print("Explained variance score =",
round(sm.explained_variance_score(y_test, y_test_pred), 2))
print("R2 score =", round(sm.r2_score(y_test, y_test_pred), 2))
Now create a polynomial regressor of degree 10 and train the regressor on the training dataset. Let's
take a sample data point and see how to perform prediction. The first step is to transform it into a polynomial as shown in the code below:
polynomial = PolynomialFeatures(degree=10)
X_train_transformed = polynomial.fit_transform(X_train)
datapoint = [[7.75, 6.35, 5.56]]
poly_datapoint = polynomial.fit_transform(datapoint)
If we look closely, this data point is very close to the data point on line 11 in our data file, which is [7.66, 6.29, 5.66]. So, a good regressor should predict an output that's close to 41.35. Create a linear regressor object and perform the polynomial fit. Perform the prediction using both linear and polynomial regressors to see the difference. See the code below:
poly_linear_model = linear_model.LinearRegression()
poly_linear_model.fit(X_train_transformed, y_train)
print("\nLinear regression:\n", linear_regressor.predict(datapoint))
print("\nPolynomial regression:\n",
poly_linear_model.predict(poly_datapoint))
Now let's run the program and see the output. It should be:
Linear Regressor performance:
Mean absolute error = 3.58
Mean squared error = 20.31
Median absolute error = 2.99
Explained variance score = 0.86
R2 score = 0.86
Linear regression:
[36.05286276]
Polynomial regression:
[41.44966348]
------------------
(program exited with code: 0)
Press any key to continue . . .
We can see that the polynomial regressor is closer to 41.35.
0 comments:
Post a Comment