Sunday, June 30, 2019

AI in the Cloud, the next big thing!

I'll start with a brief overview on Cloud. Usually people define the cloud as software or services that run on the Internet rather than on your local machine. This is correct to a degree, but nothing really runs on the Internet;it runs on machines that are connected to the Internet. Understanding that in-the-cloud software runs on servers and is not “just out there”tends to really quickly demystify the cloud and its functions.

If you have two computers networked together and use the other computer for a data server, you have your own “cloud.”

This goes for basic services like storing your data in the cloud, but there is much more than just storage available on the cloud and that is where it gets really interesting.

The advantage of using the cloud is that you can use services and storage unavailable to you on your local network and (in one of the most important game changers of cloud computing) you can ramp your usage up and down depending on your computing needs on a dynamic basis.

Using the cloud requires Internet access. Not necessarily 100 percent of the time (you can fire off a cloud process and then come back to it later), but you do need connections some of the time. This limits the cloud in applications such as self-driving cars that aren’t guaranteed to have good Internet access all the time. Interestingly, this “fire and forget” mode is useful for IOT (Internet of Things) devices where you don’t want to stay connected to the net all the time for power considerations.

So, how do you use the cloud? 

That depends on the service and vendor, but in machine-learning applications, the most common way is to set up the Python on a computer that calls cloud-based functions and applications. All cloud vendors provide examples.

What is a great consumer example of cloud usage? 


The Amazon Echo and Alexa. It listens to you, compresses the speech data, sends it to the Amazon AWS cloud, translates and interprets your data and then sends back a verbal response or commands to make your lights come on.

The top cloud providers for AI are :
  • Google cloud
  • Amazon Web Services
  • IBM cloud
  • Microsoft Azure
Google cloud

The Google cloud is probably the most AI-focused cloud provider. Once can gain access to TPU (tensor processing units) in the cloud, which, like the Google TPU stick above, can accelerate our AI applications. Much of the Google cloud’s functionality reflects the core skill set of the company — that of search.

For example, the Cloud Vision API can detect objects, logos, and landmarks within images.

Other featured services of the Google cloud are: Video content search applications and speech-to-text/text-to-speech packages (think Google Home — very much like Amazon Alexa). Like Amazon and Microsoft, Google is using its own AI-powered applications to create new services for customers to use.

Amazon Web Services

Amazon Web Services (AWS) is focused on taking their consumer AI expertise and supplying this expertise to businesses. Many of these cloud services are built on the consumer product versions, so as Alexa improves, for example, the cloud services also improve.

Amazon not only has text and natural language offerings, but also machine learning visualization /creation tools, vision recognition, and analysis.

IBM cloud

Over the past few years IBM cloud has gotten a reputation for being hard to use. One of the big reasons was that there were so many different options on so many different platforms that it was almost impossible to figure out where to start.

In the past couple of years, it has gotten much better. IBM merged its three big divisions (IBM BlueMix cloud services, SoftLayer data services, and the Watson AI group) into one group under the Watson brand. There are still over 170 services available, so it is still hard to get going, but there is much better control and consistency over the process.

Their machine-learning environment is called the Watson Studio and is used to build and train AI models in one integrated environment. They also provide huge searchable knowledge catalogs and have one of the better IOT (Internet of Things) management platforms available.

One of the cool things they have is a service called Watson Personality Insights that predicts personality characteristics, needs, and values through written text.

Microsoft Azure

Microsoft Azure has an emphasis on developers. They breakdown their AI offerings into three AI categories:

1. AI services
2. AI tools and frameworks
3. AI infrastructures

Similar to Amazon and Google, their AI applications are built on consumer products that Microsoft has produced. Azure also has support for specialized FPGA (field programmable gate arrays — think hardware that can be changed by programming) and has built out the infrastructure to support a wide variety of accelerators.

Microsoft is one of the largest, if not the largest, customer of the Intel Movidius chips. They have products for machine learning, IOT toolkits, and management services, and a full and rich set of data services including databases, support for GPUs and custom silicon AI infrastructure, and a container service that can turn your inside applications into cloud apps.

AI and Cloud Computing together reshaping the IT Infrastructure

It should come as no surprise that many small to midsize business owners take pride in overseeing every aspect of their business. However, sometimes this can hamper productivity and business growth. I am talking about a situation where things cannot be controlled in regards to servers, data and software applications. And that is why incorporating the combination of Artificial Intelligence, and cloud computing turns out to be a viable option.

Contemplated and theorized in the 1950s, the ability of machines to perform intellectual tasks is what artificial intelligence is all about. Since then, the tech branched out into several sub-levels such as machine learning and deep learning. While on the other hand, one trend in computing has been pretty loud and clear: centralization, mainframe systems, personalized power-to-the-people, do-it-yourself PCs- Cloud computing. Surprisingly, both the technologies have led to the Internet's inexorable rise.

Competition has raised the standard bar to a level. This probably means you need to consider an unprecedented number of different measures to sustain. In the present scenario, AI-optimized application infrastructure is in vogue. More and more vendors are introducing IT platforms featuring pre-built combinations of storage, compute, and interconnect resources can accelerate and automate AI workloads.

AI being an unfamiliar discipline, professionals find AI hardware and software stacks complicated in regards to tuning and maintaining. As a result, data ingestion, preparation, modeling, training, inferencing and such AI workloads require optimization.

Lastly, AI-infused scaling, acceleration, automation, and management features have become a core competitive differentiator. Let’s delve into the details:


AI-ready computing platforms: 

As I said before, Artificial Intelligence workload is gaining momentum like never before, and operations are being readied to support. In addition to this, vendors are launching compute, storage, hyperconverged, and other platforms. At a hardware level, AI-ready storage/computer integration is becoming a core requirement for many enterprise customers.





Infrastructure optimization tools: 

With the incorporation of AI, there has been witnessed an inexorable rise in self-healing, self-managing, self-securing, self-repairing, and self-optimizing. AI’s growing role in the management of IT, data, applications, services, and other cloud infrastructure stems from its ability to automate and accelerate many tasks more scalably, predictably, rapidly, and efficiently than manual methods alone.

 Now before you incorporate AI, it is essential to ensure that all your computing platforms are ready for AI workloads. Its benefits include:


  • Hyperconverged infrastructure- This means high-end support to the flexible scaling of computing, storage, memory, and interconnect in the same environment
  • Often combined with Intel CPUs, NVIDIA GPUs, FPGAs, and other optimized chipsets, the Multi-accelerator AI hardware architectures act as a substantial workload within broader application platforms
  • Right from ultra-high memory bandwidth, storage class memory, and direct-attached Non-Volatile Memory Express (NVMe), the memory-based architectures drives to minimize latency and speed data transfers
  • Embedded AI to drive optimized, automated, and dynamic data storage and workload management across distributed fabrics within 24×7 DevOps environments


Conclusion

At present, there is a massive need for incorporating intelligent ways for IT infrastructure. With growing workloads, increased the pace of innovation, exponential data growth, and users in the system (IoT, machine agents), conventional IT methods are no longer used to cope with the rising demands.  Juct check out the AI-first cloud model, it offers:


  • Support for mainstream AI frameworks
  • GPU optimized infrastructure:
  • Management tools
  • AI-first infrastructure services
  • Integration with mainstream PaaS services


There’s just one catch. You’ve got to start somewhere. Ideas and opportunities don’t just materialize out of thin air. On and all, Artificial Intelligence Technology brings out a unique flair that can positively transform the next generation of cloud computing platforms.


Share:

Thursday, June 27, 2019

Using MatPlotLib to graph the loss and the accuracy for ML algorithms

In this post we are going to run our base code from the previous post again and do some analysis of the run using MatPlotLib. Assuming MatPlotLib is installed (if not pip3 install MatPlotLib) we'll do the following changes to the code from previous post:

1. Add the history variable to the output of the model.fit to collect data.
2. Add MatPlotLib commands to graph the loss and the accuracy from our epochs.
3. Add figure displays for our two individual image tests.

The following program incorporates the above mentioned changes:

#import libraries
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import seaborn as sns
import tensorflow as tf
from tensorflow.python.framework import ops
from tensorflow.examples.tutorials.mnist import input_data
from PIL import Image
# Import Fashion MNIST
fashion_mnist = input_data.read_data_sets('input/data', one_hot=True)
fashion_mnist = tf.keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.
load_data()
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
train_images = train_images / 255.0
test_images = test_images / 255.0
model = tf.keras.Sequential()
model.add(tf.keras.layers.Flatten(input_shape=(28,28)))
model.add(tf.keras.layers.Dense(128, activation='relu' ))
model.add(tf.keras.layers.Dense(10, activation='softmax' ))
model.compile(optimizer=tf.train.AdamOptimizer(),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
history = model.fit(train_images, train_labels, epochs=2) 

# Get training and test loss histories
training_loss = history.history['loss']
accuracy = history.history['acc']
# Create count of the number of epochs
epoch_count = range(1, len(training_loss) + 1)
# Visualize loss history
plt.figure(0)
plt.plot(epoch_count, training_loss, 'r--')
plt.plot(epoch_count, accuracy, 'b--')
plt.legend(['Training Loss', 'Accuracy'])
plt.xlabel('Epoch')
plt.ylabel('History')
plt.show(block=False);
plt.pause(0.001)
test_loss, test_acc = model.evaluate(test_images, test_labels)
#run test image from Fashion_MNIST data
img = test_images[15]
plt.figure(1)
plt.imshow(img)
plt.show(block=False)
plt.pause(0.001)
img = (np.expand_dims(img,0))
singlePrediction = model.predict(img,steps=1)
print ("Prediction Output")
print(singlePrediction)
print()
NumberElement = singlePrediction.argmax()
Element = np.amax(singlePrediction)
print ("Our Network has concluded that the image number '15' is a "
+class_names[NumberElement])

print (str(int(Element*100)) + "% Confidence Level")
print('Test accuracy:', test_acc)
# read test dress image
imageName = "Dress28x28.JPG"
testImg = Image.open(imageName)
plt.figure(2)
plt.imshow(testImg)
plt.show(block=False)
plt.pause(0.001)
testImg.load()
data = np.asarray( testImg, dtype="float" )
data = tf.image.rgb_to_grayscale(data)
data = data/255.0
data = tf.transpose(data, perm=[2,0,1])
singlePrediction = model.predict(data,steps=1)
NumberElement = singlePrediction.argmax()
Element = np.amax(singlePrediction)
print(NumberElement)
print(Element)
print(singlePrediction)
print ("Our Network has concluded that the file '"+imageName+"' is a
"+class_names[NumberElement])
print (str(int(Element*100)) + "% Confidence Level")
plt.show()


When we run this program the following plots are obtained:

Figure 0
 Figure 1
  Figure 2


The window labeled Figure 0 shows the accuracy data for each of the five epochs of the machine learning training, and you can see the accuracy slowly increases with each epoch. The window labeled Figure 1 shows the test picture used for the first recognition test (it found a pair of trousers, which is correct), and finally, the window labeled Figure 2 shows the dress picture, which is still incorrectly identified as a bag and our output window shows this:

10000/10000 [==============================] - 0s 43us/sample - loss: 0.3492 - a
cc: 0.8756
Prediction Output
[[4.1143132e-05 9.9862289e-01 6.8796166e-06 1.1105506e-03 2.1581816e-04
  8.0114615e-10 2.3793352e-06 3.6482924e-11 2.4894479e-07 3.7706857e-10]]

Our Network has concluded that the image number '15' is a Trouser
99% Confidence Level
Test accuracy: 0.8756
8
0.9999794
[[2.7322454e-07 4.4263427e-08 2.3696880e-07 1.3007481e-08 5.6515717e-08
  3.5395464e-11 1.9993138e-05 1.4521572e-13 9.9997938e-01 5.2564192e-13]]
Our Network has concluded that the file 'Dress28x28.JPG' is a Bag
99% Confidence Level
------------------
(program exited with code: 0)

Press any key to continue . . .


Here I am ending this post. Hope you begin to understand the theory behind a lot of the models we have use, you should now have the ability to build and experiment with making machines learn. See you soon with a new topic, till then keep practicing and learning Python as Python is easy to learn!

Share:

More on Machine Learning

We developed algorithms and programs that can learn things about data and about sensory input and apply that knowledge to new situations. However, our machines do not “understand” anything about what they have learned. They have just accumulated data about their inputs and have transformed that input to some kind of output.

Even if the machine does not “understand” what it has learned, that does not mean that we  cannot do impressive things using these machine-learning techniques that will be discussed in this post.

Ever thought what does it mean for a machine to learn something?

Well if a machine can take inputs and by some process transform those inputs to some useful outputs, then we can say the machine has learned something. This definition has a wide meaning. In writing a simple program to add two numbers you have taught that machine something. It has learned to add two numbers.

We’re going to focus in this post on machine learning in the sense of the use of algorithms and statistical models that progressively improve their performance on a specific task. Most of our goal setting (training the machine) will be done with known solutions to a problem: first training our machine and then applying the training to new, similar examples of the problem.

Although I've mentioned about the types of machine-learning algorithms in my post related to machine learning, I'd like to mention again in this post that there are three main types of machine-learning algorithms:

1. Supervised learning: This type of algorithm builds a model of data that contains both inputs and outputs. The data is known as training data. This is the kind of machine learning we show in this post.

2. Unsupervised learning: For this type of algorithm, the data contains only the inputs, and the algorithms look for the structures and patterns in the data.

3. Reinforcement learning: This area is concerned with software taking actions based on some kind of cumulative reward. These algorithms do not assume knowledge of an exact mathematical model and are used when exact models are unavailable. This is the most complex area of machine learning, and the one that may be used mostly in the future.

Creating a Machine-Learning Network for Detecting Clothes Types

It's time to build TensorFlow/Keras machine-learning application using the freely available training Fashion-MNIST (Modified National Institute of Standards and Technology) database that contains 60,000 fashion products from ten categories. It contains data in 28x28 pixel format with 6,000 items in each category. The categories are:

0 T-shirt/top
1 Trouser
2 Pullover
3 Dress
4 Coat
5 Sandal
6 Shirt
7 Sneaker
8 Bag
9 Ankle boot

Our first task is getting the data — The Fashion-MNIST dataset. It will take a while to first load it to our computer. After we run the program for the first time, it will use the Fashion-MNIST data copied to our computer.

Next step would be to train our machine-learning neural network using all 60,000 images of
clothes: 6,000 images in each of the ten categories. After training comes the testing of our network. Our trained network will be tested three different ways:

1) a set of 10,000 training photos from the Fashion_MNIST data set;
2) a selected image from the Fashion_MNIST data set; and
3) a photo of a woman’s dress.

The code for our network in shown below:

#import libraries
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg 
import seaborn as sns
import tensorflow as tf
from tensorflow.python.framework import ops
from tensorflow.examples.tutorials.mnist import input_data
from PIL import Image

# Import Fashion MNIST
fashion_mnist = input_data.read_data_sets('input/data',
        one_hot=True)

fashion_mnist = tf.keras.datasets.fashion_mnist

(train_images, train_labels), (test_images, test_labels) \
        = fashion_mnist.load_data()




class_names = ['T-shirt/top', 'Trouser',
        'Pullover', 'Dress', 'Coat',
        'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']


train_images = train_images / 255.0

test_images = test_images / 255.0


# Prepare the training images
train_images = train_images.reshape(train_images.shape[0], 28, 28, 1)

# Prepare the test images
test_images = test_images.reshape(test_images.shape[0], 28, 28, 1)


model = tf.keras.Sequential()

input_shape = (28, 28, 1)
model.add(tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
model.add(tf.keras.layers.BatchNormalization())

model.add(tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation='relu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(tf.keras.layers.Dropout(0.25))

model.add(tf.keras.layers.Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dropout(0.25))

model.add(tf.keras.layers.Conv2D(128, kernel_size=(3, 3), activation='relu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(tf.keras.layers.Dropout(0.25))

model.add(tf.keras.layers.Flatten())

model.add(tf.keras.layers.Dense(512, activation='relu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dropout(0.5))

model.add(tf.keras.layers.Dense(128, activation='relu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dropout(0.5))

model.add(tf.keras.layers.Dense(10, activation='softmax'))


model.compile(optimizer=tf.train.AdamOptimizer(),
                      loss='sparse_categorical_crossentropy',
                                    metrics=['accuracy'])


model.fit(train_images, train_labels, epochs=5)

# test with 10,000 images
test_loss, test_acc = model.evaluate(test_images, test_labels)

print('10,000 image Test accuracy:', test_acc)

#run test image from Fashion_MNIST data

img = test_images[15]
img = (np.expand_dims(img,0))
singlePrediction = model.predict(img,steps=1)
print ("Prediction Output")
print(singlePrediction)
print()
NumberElement = singlePrediction.argmax()
Element = np.amax(singlePrediction)

print ("Our Network has concluded that the image number '15' is a "
        +class_names[NumberElement])
print (str(int(Element*100)) + "% Confidence Level")



As usual we start with importing all the libraries needed to run our example two-layer model. Next  we load our data as shown by the code below:

# Import Fashion MNIST
fashion_mnist = input_data.read_data_sets('input/data',
one_hot=True)
fashion_mnist = tf.keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) \
= fashion_mnist.load_data() 


We also give some descriptive names to the ten classes within the Fashion_MNIST data.

class_names = ['T-shirt/top', 'Trouser',
'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']


Then we change all the images to be scaled from 0.0–1.0 rather than 0–255.

train_images = train_images / 255.0
test_images = test_images / 255.0


The next step is to define our neural-network model and layers. It is very simple to add more neural layers, and to change their sizes and their activation functions. We are also applying a bias to our activation function (relu), in this case with softmax, for the final output layer.

model = tf.keras.Sequential()
model.add(tf.keras.layers.Flatten(input_shape=(28,28)))
model.add(tf.keras.layers.Dense(128, activation='relu' ))
model.add(tf.keras.layers.Dense(10, activation='softmax' ))


Then comes compiling our model. Here we used the loss function sparse_categorical_crossentropy. It is used when we have assigned a different integer for each clothes category as we have in this example. ADAM (a method for stochastic optimization) is a good default optimizer. It provides a method well suited for problems that are large in terms of data and/or parameters. Sparse categorical crossentropy is a loss function used to measure the error between categories across the data set. Categorical refers to the fact that the data has more than two categories (binary) in the data set. Sparse refers to using a single integer to refer to classes (0–9, in our example). Entropy (a measure of disorder) refers to the mix of data between the categories.

model.compile(optimizer=tf.train.AdamOptimizer(),
loss='sparse_categorical_
crossentropy',
metrics=['accuracy'])


To fit and train our model, we chose the number of epochs as only 5 due to the time it takes to run the
model for our examples. Feel free to increase! Here we load the NumPy arrays
for the input to our network (the database train_images).

model.fit(train_images, train_labels, epochs=5)

For evaluation of  the model, the model.evaluate function is used, it compare the outputs of our trained network in each epoch and generates test_acc and test_loss for our information in each epoch as stored in the history variable.

# test with 10,000 images
test_loss, test_acc = model.evaluate(test_images,
test_labels)
print('10,000 image Test accuracy:', test_acc)


When we run the program we get the following output:

Epoch 1/5
60000/60000 [==============================] - 44s 726us/step - loss: 0.5009 -
acc: 0.8244
Epoch 2/5
60000/60000 [==============================] - 42s 703us/step - loss: 0.3751 -
acc: 0.8652
Epoch 3/5
60000/60000 [==============================] - 42s 703us/step - loss: 0.3359 -
acc: 0.8767
Epoch 4/5
60000/60000 [==============================] - 42s 701us/step - loss: 0.3124 -
acc: 0.8839
Epoch 5/5
60000/60000 [==============================] - 42s 703us/step - loss: 0.2960 -
acc: 0.8915
10000/10000 [==============================] - 4s 404us/step
10,000 image Test accuracy: 0.873


The test results shows that with our two-layer neural machine learning network, we are classifying 87 percent of the 10,000-image test database correctly. We upped the number of epochs to 50 and increased this to only 88.7 percent accuracy. Lots of extra computation with little increase in accuracy.

Testing a single test image

Next task is to test a single image (shown in the figure below)from the Fashion_MNIST database.


This is implemented as shown below:

#run test image from Fashion_MNIST data
img = test_images[15]
img = (np.expand_dims(img,0))
singlePrediction = model.predict(img,steps=1)
print ("Prediction Output")
print(singlePrediction)
print()
NumberElement = singlePrediction.argmax()
Element = np.amax(singlePrediction)
print ("Our Network has concluded that the image number '15' is a "
+class_names[NumberElement])
print (str(int(Element*100)) + "% Confidence Level")

Here are the results from a five-epoch run:

Prediction Output
[[1.2835168e-05 9.9964070e-01 6.2637120e-08 3.4126092e-04 4.4297972e-06
7.8450663e-10 6.2759432e-07 9.8717527e-12 1.2729484e-08 1.1002166e-09]]


Our Network has concluded that the image number '15' is a Trouser
99% Confidence Level


The result shows tt correctly identified the picture as a trouser. Remember,however, that we only had an overall accuracy level on the test data of 87 percent.

Testing on external pictures

To accomplish this test, We took a dress, hung it up on a wall (see Figure below) and took a picture of it with phone.



Next we converted it to a resolution of 28 x 28 pixels down from 3024x3024 pixels straight from the phone. (See Figure below)



The following code is for arranging the data from our JPG picture to fit the format required by TensorFlow.

# run Our test Image
# read test dress image
imageName = "Dress28x28.JPG"

testImg = Image.open(imageName)
testImg.load()
data = np.asarray( testImg, dtype="float" )
data = tf.image.rgb_to_grayscale(data)
data = data/255.0
data = tf.transpose(data, perm=[2,0,1])
singlePrediction = model.predict(data,steps=1)
print ("Prediction Output")
print(singlePrediction)
print()
NumberElement = singlePrediction.argmax()
Element = np.amax(singlePrediction)
print ("Our Network has concluded that the file '"
+imageName+"' is a "+class_names[NumberElement])
print (str(int(Element*100)) + "% Confidence Level")


Lets run the program to see the results. We put the Dress28x28.JPG file in the same directory as our program and ran a five- epoch training run. Here are the results:

Prediction Output
[[1.2717753e-06 1.3373902e-08 1.0487850e-06 3.3525557e-11 8.8031484e-09
7.1847245e-10 1.1177938e-04 8.8322977e-12 9.9988592e-01 3.2957085e-12]]


Our Network has concluded that the file 'Dress28x28.JPG' is a Bag
99% Confidence Level


The result shows that our neural network machine learning program, after classifying 60,000 pictures
and 6,000 dress pictures, concluded at a 99 percent confidence level that the dress is a bag.

Let's increase the training epochs to 50 and rerun the program. Here are the results from that run:

Prediction Output
[[3.4407502e-33 0.0000000e+00 2.5598763e-33 0.0000000e+00 0.0000000e+00

0.0000000e+00 2.9322060e-17 0.0000000e+00 1.0000000e+00 1.5202169e-39]]

Our Network has concluded that the file 'Dress28x28.JPG' is a Bag
100% Confidence Level


The dress is still a bag, but now our program is 100 percent confident that the dress is a bag. This illustrates one of the problems with machine learning. Being 100 percent certain that a picture is of a bag when it is a dress, is still 100 percent wrong. What is the real problem here?

Probably the neural-network configuration is just not good enough to distinguish the dress from a bag. We saw that additional training epochs didn’t seem to help at all, so the next thing to try is to increase the number of neurons in our hidden level. We can use CNN (convolutional neural networks), data augmentation (increasing the training samples by rotating, shifting, and zooming that pictures).

We changed the model layers in our program to use the following four-level convolutional-layer model. CNNs work by scanning images and analyzing them chunk by chunk, say at 5x5 window that moves by a stride length of two pixels each time until it spans the entire message. It’s like looking at an image using a microscope; we only see a small part of the picture at any one time, but eventually we see the whole picture.

The CNN model code has the same structure as the last program. The only significant change
is the addition of the new layers for the CNN network as shown below:

#import libraries
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import seaborn as sns
import tensorflow as tf

from tensorflow.python.framework import ops
from tensorflow.examples.tutorials.mnist import input_data
from PIL import Image
# Import Fashion MNIST
fashion_mnist = input_data.read_data_sets('input/data',
one_hot=True)
fashion_mnist = tf.keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) \
= fashion_mnist.load_data()
class_names = ['T-shirt/top', 'Trouser',
'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
train_images = train_images / 255.0
test_images = test_images / 255.0
# Prepare the training images
train_images = train_images.reshape(train_images.shape[0], 28, 28, 1)
# Prepare the test images
test_images = test_images.reshape(test_images.shape[0], 28, 28, 1)
model = tf.keras.Sequential()
input_shape = (28, 28, 1)
model.add(tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation='relu',
input_shape=input_shape))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation='relu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(tf.keras.layers.Dropout(0.25))
model.add(tf.keras.layers.Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dropout(0.25))

model.add(tf.keras.layers.Conv2D(128, kernel_size=(3, 3), activation='relu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(tf.keras.layers.Dropout(0.25))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(512, activation='relu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(128, activation='relu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(10, activation='softmax'))
model.compile(optimizer=tf.train.AdamOptimizer(),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=5)
# test with 10,000 images
test_loss, test_acc = model.evaluate(test_images, test_labels)
print('10,000 image Test accuracy:', test_acc)
#run test image from Fashion_MNIST data
img = test_images[15]
img = (np.expand_dims(img,0))
singlePrediction = model.predict(img,steps=1)
print ("Prediction Output")
print(singlePrediction)
print()
NumberElement = singlePrediction.argmax()
Element = np.amax(singlePrediction)
print ("Our Network has concluded that the image number '15' is a "
+class_names[NumberElement])
print (str(int(Element*100)) + "% Confidence Level")


When we run this program, results were as follows:

10,000 image Test accuracy: 0.8601
Prediction Output
[[5.9128129e-06 9.9997270e-01 1.5681641e-06 8.1393973e-06 1.5611777e-06
7.0504888e-07 5.5174642e-06 2.2484977e-07 3.0045830e-06 5.6888598e-07]]


Our Network has concluded that the image number '15' is a Trouser


The key number here is the 10,000-image test accuracy. At 86 percent, it was actually lower than our previous, simpler machine-learning neural network (87 percent). Why did this happen?

This is probably a case related to “overfitting” the training data. A CNN model such as this can use complex internal models to train (many millions of possibilities) and can lead to overfitting, which means the trained network recognizes the training set better but loses the ability to recognize new test data.

Choosing the machine-learning neural network to work with your data is one of the major decisions you will make in your design. However, understanding activation functions, dropout management, and loss functions will also deeply affect the performance of your machine-learning program.

Optimizing all these parameters at once is a difficult task that requires research and experience. Here I am ending this post and in the next post we'll run our base code again and do some analysis of the run using MatPlotLib. Till we meet again keep practicing and learning Python as Python is easy to learn!
Share:

Wednesday, June 26, 2019

Building a Python Neural Network in TensorFlow

For our neural-network example in TensorFlow, we will use the same network that we used to implement an XOR gate with Python in our previous post. I've already covered TensorFlow, a Python-friendly application framework and collection of functions designed for AI uses, especially with neural networks and machine learning. It uses Python to provide a user-friendly convenient front-end while executing those applications by high performance C++ code.

Keras is an open source neural-network library that enables fast experimentation with neural networks, deep learning, and machine learning. It can be installed using the popular pip command-

pip install keras

Keras provides the excellent and intuitive set of abstractions and functions whereas TensorFlow provides the efficient underlying implementation. The five steps to implementing a neural network in Keras with TensorFlow are:

1. Load and format your data

This step is pretty trivial in our model but is often the most complex and difficult part of building the entire program. We have to look at our data (whether an XOR gate or a database of factors affecting diabetic heart patients) and figure out how to map our data and the results to get to the information and predictions that we want.

2. Define your neural network model and layers

Defining our network is one of the primary advantages of Keras over other frameworks. We  construct a stack of the neural layers we want our data to flow through. Remember TensorFlow is just that. Our matrices of data flowing through a neural network stack. Here we chose the configuration of our neural layer and activation functions.


3. Compile the model

Next we compile our model which hooks up our Keras layer model with the efficient underlying (what they call the back-end) to run on our hardware. We also choose what we want to use for a loss function.

4. Fit and train your model


This is where the real work of training our network takes place. We will determine how many epochs we want to go through. It also accumulates the history of what is happening through all the epochs, and we will use this to create our graphs.

5. Evaluate the model

Here we run the model to predict the outputs from all the inputs.

Our program using TensorFlow, NumPy, and Keras for the two-layer neural network is shown below:

import tensorflow as tf

from tensorflow.keras import layers

from tensorflow.keras.layers import Activation, Dense

import numpy as np

# X = input of our 3 input XOR gate
# set up the inputs of the neural network (right from the table)
X = np.array(([0,0,0],[0,0,1],[0,1,0],
            [0,1,1],[1,0,0],[1,0,1],[1,1,0],[1,1,1]), dtype=float)
# y = our output of our neural network
y  = np.array(([1], [0],  [0],  [0],  [0],
             [0],  [0],  [1]), dtype=float)


model = tf.keras.Sequential()

model.add(Dense(4, input_dim=3, activation='relu',
    use_bias=True))
model.add(Dense(1, activation='sigmoid', use_bias=True))

model.compile(loss='mean_squared_error',
        optimizer='adam',
        metrics=['binary_accuracy'])

print (model.get_weights())

history = model.fit(X, y, epochs=2000,
        validation_data = (X, y))


model.summary()


# printing out to file
loss_history = history.history["loss"]
numpy_loss_history = np.array(loss_history)
np.savetxt("loss_history.txt", numpy_loss_history,
        delimiter="\n")

binary_accuracy_history = history.history["binary_accuracy"]
numpy_binary_accuracy = np.array(binary_accuracy_history)
np.savetxt("binary_accuracy.txt", numpy_binary_accuracy, delimiter="\n")


print(np.mean(history.history["binary_accuracy"]))

result = model.predict(X ).round()

print (result)


As you may have noticed, this code is much simpler than our two layer model strictly in Python used earlier in this chapter. This is due to TensorFlow/Keras. Let's understand our code:

First, we import all the libraries we will need to run our example two-layer model. Note that TensorFlow includes Keras by default. Here also NumPy is used as the preferred way of handling matrices:

import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.layers import Activation, Dense
import numpy as np


Next we implement the step 1 of the 5 steps we discussed before, load and format your data. In this case, we just set up the truth table for our XOR gate in terms of NumPy arrays. This can get much more complex when we have large, diverse, cross-correlated sources of data.

# X = input of our 3 input XOR gate
# set up the inputs of the neural network (right from the table)
X = np.array(([0,0,0],[0,0,1],[0,1,0],
[0,1,1],[1,0,0],[1,0,1],[1,1,0],[1,1,1]), dtype=float)
# y = our output of our neural network
y = np.array(([1], [0], [0], [0], [0],
[0], [0], [1]), dtype=float)


Then we implement the step 2 of the 5 steps we discussed before, define your neural-network model and layers. This is where the real power of Keras shines. It is very simple to add more neural layers, and to change their size and their activation functions. We are also applying a bias to our activation
function (relu, in this case, with our friend the sigmoid for the final output layer),which we did not do in our pure Python model.

model = tf.keras.Sequential()
model.add(Dense(4, input_dim=3, activation='relu',
use_bias=True))
#model.add(Dense(4, activation='relu', use_bias=True))
model.add(Dense(1, activation='sigmoid', use_bias=True)) 


Next we implement the step 3 of the 5 steps we discussed before, compile your model. We are using the same loss function that we used in our pure Python implementation, mean_squared_error. New to us is the optimizer, ADAM (a method for stochastic optimization) is a good default optimizer. It provides a method for efficiently descending the gradient applied to the weights of the layers.

One thing to note is what we are asking for in terms of metrics. binary_accuracy means we are comparing our outputs of our network to either a 1 or a 0. We will see values of, say, 0.75, which, since we have eight possible outputs, means that six out of eight are correct. It is exactly what we  would expect from the name.

model.compile(loss='mean_squared_error',
optimizer='adam',
metrics=['binary_accuracy'])


Here we print out all the starting weights of our model. Note that they are assigned with a default random method, which we can seed (to do the same run with the same starting weights time after time) or we can change the way they are added.

print (model.get_weights())

Now we implement the step 4 of the 5 steps we discussed before, fit and train your model. We chose the number of epochs so we would converge to a binary accuracy of 1.0 most of the time. Here we load the NumPy arrays for the input to our network (X) and our expected output of the network (y). The validation_data parameter is used to compare the outputs of your trained network in each epoch and generates val_acc and val_loss for our information in each epoch as stored in the history variable.

history = model.fit(X, y, epochs=2000,
validation_data = (X, y))


Here we print a summary of our model so we can make sure it was constructed in the way expected.

model.summary()

Next, we print out the values from the history variable that we would like to graph.

# printing out to file
loss_history = history.history["loss"]
numpy_loss_history = np.array(loss_history)
np.savetxt("loss_history.txt", numpy_loss_history,
delimiter="\n")
binary_accuracy_history = history.history["binary_accuracy"]
numpy_binary_accuracy = np.array(binary_accuracy_history)
np.savetxt("binary_accuracy.txt", numpy_binary_accuracy, delimiter="\n")


Finally we implement the step5 of the 5 steps we discussed before,evaluate the model. Here we run the model to predict the outputs from all the inputs of X, using the round function to make them either 0 or 1. Note that this replaces the criteria we used in our pure Python model, which was <0.1 = "0" and >0.9 = "1". We also calculated the average of all the binary_accuracy values of all the epochs, but the number isn’t very useful — except that the closer to 1.0 it is, the faster the model succeeded.

print(np.mean(history.history["binary_accuracy"]))
result = model.predict(X ).round()
print (result)


When we run the program we get the following output:

Epoch 2000/2000
8/8 [==============================] - 0s 500us/sample - loss: 0.0515 - binary_a
ccuracy: 1.0000 - val_loss: 0.0514 - val_binary_accuracy: 1.0000
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
dense (Dense)                (None, 4)                 16
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 5
=================================================================
Total params: 21
Trainable params: 21
Non-trainable params: 0
_________________________________________________________________
0.7860625
[[1.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [1.]]
------------------
(program exited with code: 0)

Press any key to continue . . .


We see that by epoch 2,000 we had achieved the binary accuracy of 1.0, as hoped for, and the results of our model.predict function call at the end matches our truth table. The plot below shows the results of the loss function and binary accuracy values plotted against the epoch number as the training progressed.


We can see from the above plot that the loss function is a much smoother linear curve when it succeeds. This has to do with the activation choice (relu) and the optimizer function (ADAM). Another thing to remember is we will get a different curve (somewhat) each time because of the random number initial values in the weights. Seed the random number generator to make it the same each time we run it. This makes it easier to optimize your performance. Lastly, when the binary accuracy goes to 1.00 (about epoch 1556) our network is fully trained in this case.

Now let's add another layer to our neural network and change it to a three-layer neural network. To do so add the following line:

model.add(Dense(4,  activation='relu', use_bias=True))

Now we have a three-layer neural network with four neurons per layer as shown below:

model.add(Dense(4, input_dim=3, activation='relu',
use_bias=True))
model.add(Dense(4, activation='relu', use_bias=True))
model.add(Dense(1, activation='sigmoid', use_bias=True))


Run the program and it'll give the following output:

Epoch 2000/2000
8/8 [==============================] - 0s 375us/sample - loss: 0.0146 - binary_a
ccuracy: 1.0000 - val_loss: 0.0145 - val_binary_accuracy: 1.0000
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
dense (Dense)                (None, 4)                 16
_________________________________________________________________
dense_1 (Dense)              (None, 4)                 20
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 5
=================================================================
Total params: 41
Trainable params: 41
Non-trainable params: 0
_________________________________________________________________
0.9185
[[1.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [1.]]
------------------
(program exited with code: 0)

Press any key to continue . . .


We can see that now we have three layers in our neural network. The following plot shows the results of the three layer training:


From the plot we can notice that it converges to a binary accuracy of 1.00 at about epoch 916, much faster than epoch 1556 from our two-layer run. The loss function is less linear than the two-layer run’s.

Here I am ending today's post. Till we meet again, as practice, run your own experiments to get a good feel for the way your results will vary with different parameters, layers, and neuron counts.
Share: