Wednesday, June 26, 2019

Building a Python Neural Network in TensorFlow

For our neural-network example in TensorFlow, we will use the same network that we used to implement an XOR gate with Python in our previous post. I've already covered TensorFlow, a Python-friendly application framework and collection of functions designed for AI uses, especially with neural networks and machine learning. It uses Python to provide a user-friendly convenient front-end while executing those applications by high performance C++ code.

Keras is an open source neural-network library that enables fast experimentation with neural networks, deep learning, and machine learning. It can be installed using the popular pip command-

pip install keras

Keras provides the excellent and intuitive set of abstractions and functions whereas TensorFlow provides the efficient underlying implementation. The five steps to implementing a neural network in Keras with TensorFlow are:

1. Load and format your data

This step is pretty trivial in our model but is often the most complex and difficult part of building the entire program. We have to look at our data (whether an XOR gate or a database of factors affecting diabetic heart patients) and figure out how to map our data and the results to get to the information and predictions that we want.

2. Define your neural network model and layers

Defining our network is one of the primary advantages of Keras over other frameworks. We  construct a stack of the neural layers we want our data to flow through. Remember TensorFlow is just that. Our matrices of data flowing through a neural network stack. Here we chose the configuration of our neural layer and activation functions.


3. Compile the model

Next we compile our model which hooks up our Keras layer model with the efficient underlying (what they call the back-end) to run on our hardware. We also choose what we want to use for a loss function.

4. Fit and train your model


This is where the real work of training our network takes place. We will determine how many epochs we want to go through. It also accumulates the history of what is happening through all the epochs, and we will use this to create our graphs.

5. Evaluate the model

Here we run the model to predict the outputs from all the inputs.

Our program using TensorFlow, NumPy, and Keras for the two-layer neural network is shown below:

import tensorflow as tf

from tensorflow.keras import layers

from tensorflow.keras.layers import Activation, Dense

import numpy as np

# X = input of our 3 input XOR gate
# set up the inputs of the neural network (right from the table)
X = np.array(([0,0,0],[0,0,1],[0,1,0],
            [0,1,1],[1,0,0],[1,0,1],[1,1,0],[1,1,1]), dtype=float)
# y = our output of our neural network
y  = np.array(([1], [0],  [0],  [0],  [0],
             [0],  [0],  [1]), dtype=float)


model = tf.keras.Sequential()

model.add(Dense(4, input_dim=3, activation='relu',
    use_bias=True))
model.add(Dense(1, activation='sigmoid', use_bias=True))

model.compile(loss='mean_squared_error',
        optimizer='adam',
        metrics=['binary_accuracy'])

print (model.get_weights())

history = model.fit(X, y, epochs=2000,
        validation_data = (X, y))


model.summary()


# printing out to file
loss_history = history.history["loss"]
numpy_loss_history = np.array(loss_history)
np.savetxt("loss_history.txt", numpy_loss_history,
        delimiter="\n")

binary_accuracy_history = history.history["binary_accuracy"]
numpy_binary_accuracy = np.array(binary_accuracy_history)
np.savetxt("binary_accuracy.txt", numpy_binary_accuracy, delimiter="\n")


print(np.mean(history.history["binary_accuracy"]))

result = model.predict(X ).round()

print (result)


As you may have noticed, this code is much simpler than our two layer model strictly in Python used earlier in this chapter. This is due to TensorFlow/Keras. Let's understand our code:

First, we import all the libraries we will need to run our example two-layer model. Note that TensorFlow includes Keras by default. Here also NumPy is used as the preferred way of handling matrices:

import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.layers import Activation, Dense
import numpy as np


Next we implement the step 1 of the 5 steps we discussed before, load and format your data. In this case, we just set up the truth table for our XOR gate in terms of NumPy arrays. This can get much more complex when we have large, diverse, cross-correlated sources of data.

# X = input of our 3 input XOR gate
# set up the inputs of the neural network (right from the table)
X = np.array(([0,0,0],[0,0,1],[0,1,0],
[0,1,1],[1,0,0],[1,0,1],[1,1,0],[1,1,1]), dtype=float)
# y = our output of our neural network
y = np.array(([1], [0], [0], [0], [0],
[0], [0], [1]), dtype=float)


Then we implement the step 2 of the 5 steps we discussed before, define your neural-network model and layers. This is where the real power of Keras shines. It is very simple to add more neural layers, and to change their size and their activation functions. We are also applying a bias to our activation
function (relu, in this case, with our friend the sigmoid for the final output layer),which we did not do in our pure Python model.

model = tf.keras.Sequential()
model.add(Dense(4, input_dim=3, activation='relu',
use_bias=True))
#model.add(Dense(4, activation='relu', use_bias=True))
model.add(Dense(1, activation='sigmoid', use_bias=True)) 


Next we implement the step 3 of the 5 steps we discussed before, compile your model. We are using the same loss function that we used in our pure Python implementation, mean_squared_error. New to us is the optimizer, ADAM (a method for stochastic optimization) is a good default optimizer. It provides a method for efficiently descending the gradient applied to the weights of the layers.

One thing to note is what we are asking for in terms of metrics. binary_accuracy means we are comparing our outputs of our network to either a 1 or a 0. We will see values of, say, 0.75, which, since we have eight possible outputs, means that six out of eight are correct. It is exactly what we  would expect from the name.

model.compile(loss='mean_squared_error',
optimizer='adam',
metrics=['binary_accuracy'])


Here we print out all the starting weights of our model. Note that they are assigned with a default random method, which we can seed (to do the same run with the same starting weights time after time) or we can change the way they are added.

print (model.get_weights())

Now we implement the step 4 of the 5 steps we discussed before, fit and train your model. We chose the number of epochs so we would converge to a binary accuracy of 1.0 most of the time. Here we load the NumPy arrays for the input to our network (X) and our expected output of the network (y). The validation_data parameter is used to compare the outputs of your trained network in each epoch and generates val_acc and val_loss for our information in each epoch as stored in the history variable.

history = model.fit(X, y, epochs=2000,
validation_data = (X, y))


Here we print a summary of our model so we can make sure it was constructed in the way expected.

model.summary()

Next, we print out the values from the history variable that we would like to graph.

# printing out to file
loss_history = history.history["loss"]
numpy_loss_history = np.array(loss_history)
np.savetxt("loss_history.txt", numpy_loss_history,
delimiter="\n")
binary_accuracy_history = history.history["binary_accuracy"]
numpy_binary_accuracy = np.array(binary_accuracy_history)
np.savetxt("binary_accuracy.txt", numpy_binary_accuracy, delimiter="\n")


Finally we implement the step5 of the 5 steps we discussed before,evaluate the model. Here we run the model to predict the outputs from all the inputs of X, using the round function to make them either 0 or 1. Note that this replaces the criteria we used in our pure Python model, which was <0.1 = "0" and >0.9 = "1". We also calculated the average of all the binary_accuracy values of all the epochs, but the number isn’t very useful — except that the closer to 1.0 it is, the faster the model succeeded.

print(np.mean(history.history["binary_accuracy"]))
result = model.predict(X ).round()
print (result)


When we run the program we get the following output:

Epoch 2000/2000
8/8 [==============================] - 0s 500us/sample - loss: 0.0515 - binary_a
ccuracy: 1.0000 - val_loss: 0.0514 - val_binary_accuracy: 1.0000
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
dense (Dense)                (None, 4)                 16
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 5
=================================================================
Total params: 21
Trainable params: 21
Non-trainable params: 0
_________________________________________________________________
0.7860625
[[1.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [1.]]
------------------
(program exited with code: 0)

Press any key to continue . . .


We see that by epoch 2,000 we had achieved the binary accuracy of 1.0, as hoped for, and the results of our model.predict function call at the end matches our truth table. The plot below shows the results of the loss function and binary accuracy values plotted against the epoch number as the training progressed.


We can see from the above plot that the loss function is a much smoother linear curve when it succeeds. This has to do with the activation choice (relu) and the optimizer function (ADAM). Another thing to remember is we will get a different curve (somewhat) each time because of the random number initial values in the weights. Seed the random number generator to make it the same each time we run it. This makes it easier to optimize your performance. Lastly, when the binary accuracy goes to 1.00 (about epoch 1556) our network is fully trained in this case.

Now let's add another layer to our neural network and change it to a three-layer neural network. To do so add the following line:

model.add(Dense(4,  activation='relu', use_bias=True))

Now we have a three-layer neural network with four neurons per layer as shown below:

model.add(Dense(4, input_dim=3, activation='relu',
use_bias=True))
model.add(Dense(4, activation='relu', use_bias=True))
model.add(Dense(1, activation='sigmoid', use_bias=True))


Run the program and it'll give the following output:

Epoch 2000/2000
8/8 [==============================] - 0s 375us/sample - loss: 0.0146 - binary_a
ccuracy: 1.0000 - val_loss: 0.0145 - val_binary_accuracy: 1.0000
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
dense (Dense)                (None, 4)                 16
_________________________________________________________________
dense_1 (Dense)              (None, 4)                 20
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 5
=================================================================
Total params: 41
Trainable params: 41
Non-trainable params: 0
_________________________________________________________________
0.9185
[[1.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [1.]]
------------------
(program exited with code: 0)

Press any key to continue . . .


We can see that now we have three layers in our neural network. The following plot shows the results of the three layer training:


From the plot we can notice that it converges to a binary accuracy of 1.00 at about epoch 916, much faster than epoch 1556 from our two-layer run. The loss function is less linear than the two-layer run’s.

Here I am ending today's post. Till we meet again, as practice, run your own experiments to get a good feel for the way your results will vary with different parameters, layers, and neuron counts.
Share:

0 comments:

Post a Comment