Saturday, August 31, 2019

Classification and Regression Using Supervised Learning 2 (Label encoding)

While performing classification we often deal with labels which can be in the form of words, numbers, or something else. The machine learning functions in sklearn expect them to be numbers. So if they are already numbers, then we can use them directly to start training. But this is not usually the case as in real world, labels are in the form of words, because words are human readable. We label our training data with words so that the mapping can be tracked. To convert word labels into numbers, we need to use a label encoder. Label encoding refers to the process of transforming the word labels into numerical form. This enables the algorithms to operate on our data. The following program will define some sample labels and create the label encoder object:

import numpy as np
from sklearn import preprocessing

# Sample input labels
input_labels = ['red', 'black', 'red', 'green', 'black', 'yellow', 'white']

# Create label encoder and fit the labels
encoder = preprocessing.LabelEncoder()
encoder.fit(input_labels)

# Print the mapping
print("\nLabel mapping:")
for i, item in enumerate(encoder.classes_):
    print(item, '-->', i)

The output of the program is shown below which print the mapping between words and numbers:

Label mapping:
black --> 0
green --> 1
red --> 2
white --> 3
yellow --> 4
------------------
(program exited with code: 0)

Press any key to continue . . .

The class sklearn.preprocessing.LabelEncoder() Encode labels with value between 0 and n_classes-1. It can be used to normalize labels. It can also be used to transform non-numerical labels (as long as they are hashable and comparable) to numerical labels.The fit() Fit label encoder.

Now let's encode a set of randomly ordered labels to see how it performs. See the following program:

import numpy as np
from sklearn import preprocessing

# Sample input labels
input_labels = ['red', 'black', 'red', 'green', 'black', 'yellow', 'white']

# Create label encoder and fit the labels
encoder = preprocessing.LabelEncoder()
encoder.fit(input_labels)

# Print the mapping
print("\nLabel mapping:")
for i, item in enumerate(encoder.classes_):
print(item, '-->', i)

# Encode a set of labels using the encoder
test_labels = ['green', 'red', 'black']
encoded_values = encoder.transform(test_labels)
print("\nLabels =", test_labels)
print("Encoded values =", list(encoded_values))


The output of the program is shown below which prints encoded values:

Label mapping:
black --> 0
green --> 1
red --> 2
white --> 3
yellow --> 4

Labels = ['green', 'red', 'black']
Encoded values = [1, 2, 0]

------------------
(program exited with code: 0)

Press any key to continue . . .

Our next program decodes a random set of numbers:

import numpy as np
from sklearn import preprocessing

# Sample input labels
input_labels = ['red', 'black', 'red', 'green', 'black', 'yellow', 'white']

# Create label encoder and fit the labels
encoder = preprocessing.LabelEncoder()
encoder.fit(input_labels)

# Print the mapping
print("\nLabel mapping:")
for i, item in enumerate(encoder.classes_):
print(item, '-->', i)

# Encode a set of labels using the encoder
test_labels = ['green', 'red', 'black']
encoded_values = encoder.transform(test_labels)
print("\nLabels =", test_labels)
print("Encoded values =", list(encoded_values))

# Decode a set of values using the encoder
encoded_values = [3, 0, 4, 1]
decoded_list = encoder.inverse_transform(encoded_values)
print("\nEncoded values =", encoded_values)
print("Decoded labels =", list(decoded_list))

The output of the program is shown below which prints encoded values:

Label mapping:
black --> 0
green --> 1
red --> 2
white --> 3
yellow --> 4

Labels = ['green', 'red', 'black']
Encoded values = [1, 2, 0]

Encoded values = [3, 0, 4, 1]
Decoded labels = ['white', 'black', 'yellow', 'green']

------------------
(program exited with code: 0)

Press any key to continue . . .

The inverse_transform() function transform labels back to original encoding. From the output we can check the mapping to see that the encoding and decoding steps are correct.


Share:

0 comments:

Post a Comment