Metrics to evaluate machine learning model ~ Python is easy to learn

After implementing a machine learning algorithm, we need to find out how effective the model is. The criteria for measuring the effectiveness may be based upon datasets and metric. For evaluating different machine learning algorithms, we can use different performance metrics. For example, suppose if a classifier is used to distinguish between images of different objects, we can use the classification performance metrics such as average accuracy, AUC, etc. In one or other sense, the metric we choose to evaluate our machine learning model is very important because the choice of metrics influences how the performance of a machine learning algorithm is measured and compared. Following are some of the metrics:

Confusion Matrix

Basically it is used for classification problem where the output can be of two or more types of classes. It is the easiest way to measure the performance of a classifier. A confusion matrix is basically a table with two dimensions namely “Actual” and “Predicted”. Both the dimensions have “True Positives (TP)”, “True Negatives (TN)”, “False Positives (FP)”, “False Negatives (FN)”.

In the confusion matrix above, 1 is for positive class and 0 is for negative class.

Following are the terms associated with Confusion matrix:

True Positives: TPs are the cases when the actual class of data point was 1 and the predicted is also 1.
True Negatives: TNs are the cases when the actual class of the data point was 0 and the predicted is also 0.
False Positives: FPs are the cases when the actual class of data point was 0 and the predicted is 1.
False Negatives: FNs are the cases when the actual class of the data point was 1 and the predicted is 0.

Accuracy

The confusion matrix itself is not a performance measure as such but almost all the performance matrices are based on the confusion matrix. One of them is accuracy. In classification problems, it may be defined as the number of correct predictions made by the model over all kinds of predictions made. The formula for calculating the accuracy is as follows:

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦=𝑇𝑃+𝑇𝑁 / 𝑇𝑃+𝐹𝑃+𝐹𝑁+𝑇𝑁

Precision

It is mostly used in document retrievals. It may be defined as how many of the returned documents are correct. Following is the formula for calculating the precision:

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛=𝑇𝑃 / 𝑇𝑃+𝐹𝑃

Recall or Sensitivity

It may be defined as how many of the positives do the model return. Following is the formula for calculating the recall/sensitivity of the model: 𝑅𝑒𝑐𝑎𝑙𝑙=𝑇𝑃 / 𝑇𝑃+𝐹𝑁

Specificity

It may be defined as how many of the negatives do the model return. It is exactly opposite to recall. Following is the formula for calculating the specificity of the model:

𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦=𝑇𝑁 / 𝑇𝑁+𝐹𝑃

This is the end of today's post, next we'll talk about the class imbalance problem.

Python is easy to learn

Saturday, June 13, 2020

Metrics to evaluate machine learning model

0 comments:

Post a Comment