A decision tree is a binary tree flowchart where each node splits a group of observations according to some feature variable.
In this post, we are building a Decision Tree classifier for predicting male or female. We will take a very small data set having 19 samples. These samples would consist of two features – ‘height’ and ‘length of hair’.
For building the following classifier, we need to install pydotplus and graphviz. Basically, graphviz is a tool for drawing graphics using dot files and pydotplus is a module to Graphviz’s Dot language. It can be installed with the package manager or pip.
from sklearn import tree
from sklearn.datasets import load_iris
from sklearn.metrics import classification_report
from sklearn import cross_validation
import collections
Now, we need to provide the dataset as follows:
Y = ['Man','Woman','Woman','Man','Woman','Man','Woman','Man','Woman','Man','Woman','Man','Woman','Woman','Woman','Man','Woman','Woman','Man']
data_feature_names = ['height','length of hair']
X_train, X_test, Y_train, Y_test = cross_validation.train_test_split(X, Y, test_size=0.40, random_state=5)
data_feature_names = ['height','length of hair']
X_train, X_test, Y_train, Y_test = cross_validation.train_test_split(X, Y, test_size=0.40, random_state=5)
After providing the dataset, we need to fit the model which can be done as follows:
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X,Y)
clf = clf.fit(X,Y)
Prediction can be made with the help of the following code:
prediction = clf.predict([[133,37]])
We can visualize the decision tree with the help of the following Python code:
dot_data = tree.export_graphviz(clf,feature_names=data_feature_names,out_file=None,filled=True,rounded=True)
graph = pydotplus.graph_from_dot_data(dot_data)
colors = ('orange', 'yellow')
edges = collections.defaultdict(list)
for edge in graph.get_edge_list():edges[edge.get_source()].append(int(edge.get_destination()))
for edge in edges: edges[edge].sort()
for i in range(2):dest = graph.get_node(str(edges[edge][i]))[0]
graph = pydotplus.graph_from_dot_data(dot_data)
colors = ('orange', 'yellow')
edges = collections.defaultdict(list)
for edge in graph.get_edge_list():edges[edge.get_source()].append(int(edge.get_destination()))
for edge in edges: edges[edge].sort()
for i in range(2):dest = graph.get_node(str(edges[edge][i]))[0]
It will give the prediction for the above code as [‘Woman’] and create the following decision tree:
We can change the values of features in prediction to test it. This is the end of today's post, next we shall look into Random Forest Classifier.
Post a Comment