Difference between revisions of "Programming/Deep Learning"

From HPC
Jump to: navigation , search
m
m (Deep Learning)
 
(11 intermediate revisions by the same user not shown)
Line 24: Line 24:
 
There are 6 Types of Artificial Neural Networks Currently Being:
 
There are 6 Types of Artificial Neural Networks Currently Being:
  
* Recurrent Neural Network(RNN) – Long Short Term Memory
+
* Recurrent Neural Network(RNN) – Long Short-Term Memory
 
* Convolutional Neural Network
 
* Convolutional Neural Network
 
* Feedforward Neural Network – Artificial Neuron
 
* Feedforward Neural Network – Artificial Neuron
 
* Radial basis function Neural Network
 
* Radial basis function Neural Network
* Kohonen Self Organizing Neural Network
+
* Kohonen Self-Organizing Neural Network
 
* Modular Neural Network
 
* Modular Neural Network
  
Line 35: Line 35:
 
==Introduction to Machine Learning vs Deep Learning==
 
==Introduction to Machine Learning vs Deep Learning==
  
Before starting lets have a look at the two different terms which are Machine Learning and Deep Learning which are closely related. The following diagram show pictorially the key difference:
+
Before starting let us have a look at the two different terms which are Machine Learning and Deep Learning which are closely related. The following diagram shows pictorially the key difference:
  
 
[[File:mlvsdl.png]]
 
[[File:mlvsdl.png]]
  
With Machine Learning the approach works like the top half of the picture above. You would have to design a feature extraction algorithm which generally involved a lot of heavy mathematics (complex design), wasn’t very efficient, and didn’t perform too well at all (accuracy level just wasn’t suitable for real-world applications). After doing all of that you would also have to design a whole classification model to classify your input given the extracted features.
+
With Machine Learning the approach works as the top half of the picture above. You would have to design a feature extraction algorithm that generally involved a lot of heavy mathematics (complex design), wasn’t very efficient, and didn’t perform too well at all (accuracy level just wasn’t suitable for real-world applications). After doing all of that you would also have to design a whole classification model to classify your input given the extracted features.
  
 
With Deep Learning networks we can perform feature extraction and classification in one shot, which means we only have to design one model. This also means that with have a lot more layers (usually) and parameters to refine our model to an optimal point.
 
With Deep Learning networks we can perform feature extraction and classification in one shot, which means we only have to design one model. This also means that with have a lot more layers (usually) and parameters to refine our model to an optimal point.
Line 45: Line 45:
 
===Machine Learning===
 
===Machine Learning===
  
* + Good results
+
* + <span style="color:#009000">Good results</span>
* + Quick to train
+
* + <span style="color:#009000">Quick to train</span>
* - Need to try different features and classifiers to achieve best results
+
* - <span style="color:#900000">Need to try different features and classifiers to achieve best results</span>
* - Accuracy plateaus
+
* - <span style="color:#900000">Accuracy plateaus</span>
  
 
===Deep Learning===
 
===Deep Learning===
  
* + Learns features and classifiers automatically
+
* + <span style="color:#009000">Learns features and classifiers automatically</span>
* + Accuracy is unlimited
+
* + <span style="color:#009000">Accuracy is unlimited</span>
* - Requires very large data sets
+
* - <span style="color:#900000">Requires very large data sets</span>
* - Computationally intensive / expensive
+
* - <span style="color:#900000">Computationally intensive / expensive</span>
 
 
 
 
  
 
==Introduction to Python and its libraries==
 
==Introduction to Python and its libraries==
  
Python is a general-purpose high level programming language that is widely used in data science and for producing deep learning algorithms. This brief tutorial introduces Python and its libraries like Numpy, skimage, TensorFlow, Keras.
+
Python is a general-purpose high-level programming language that is widely used in data science and for producing deep learning algorithms. This brief tutorial introduces Python and its libraries like Numpy, TensorFlow.
  
 
Deep structured learning or hierarchical learning or deep learning in short is part of the family of machine learning methods which are themselves a subset of the broader field of Artificial Intelligence.
 
Deep structured learning or hierarchical learning or deep learning in short is part of the family of machine learning methods which are themselves a subset of the broader field of Artificial Intelligence.
Line 80: Line 78:
 
==Introduction to Neural Network==
 
==Introduction to Neural Network==
  
A typical neural network has anything from a few dozen to hundreds, thousands, or even millions of artificial neurons called units arranged in a series of layers, each of which connects to the layers on either side. Some of them, known as input units, are designed to receive various forms of information from the outside world that the network will attempt to learn about, recognise, or otherwise process. Other units sit on the opposite side of the network and signal how it responds to the information it's learned; those are known as output units. In between the input units and output units are one or more layers of hidden units, which, together, form most of the artificial brain. Most neural networks are fully connected, which means each hidden unit and each output unit is connected to every unit in the layers either side. The connections between one unit and another are represented by a number called a weight, which can be either positive (if one unit excites another) or negative (if one unit suppresses or inhibits another). The higher the weight, the more influence one unit has on another. (This corresponds to the way actual brain cells trigger one another across tiny gaps called synapses.)
+
A typical neural network has anything from a few dozen to hundreds, thousands or even millions of artificial neurons called units arranged in a series of layers, each of which connects to the layers on either side. Some of them, known as input units, are designed to receive various forms of information from the outside world that the network will attempt to learn about, recognise, or otherwise process. Other units sit on the opposite side of the network and signal how it responds to the information it's learned; those are known as output units. In between the input units and output, units are one or more layers of hidden units, which, together, form most of the artificial brain. Most neural networks are fully connected, which means each hidden unit and each output unit are connected to every unit in the layers on either side. The connections between one unit and another are represented by a number called a weight, which can be either positive (if one unit excites another) or negative (if one unit suppresses or inhibits another). The higher the weight, the more influence one unit has on another. (This corresponds to the way actual brain cells trigger one another across tiny gaps called synapses.)
  
Information flows through a neural network in two ways. When it's learning (being trained) or operating normally (after being trained), patterns of information are fed into the network via the input units, which trigger the layers of hidden units, and these in turn arrive at the output units. This common design is called a feedforward network. Not all units "fire" all the time. Each unit receives inputs from the units to its left, and the inputs are multiplied by the weights of the connections they travel along. Every unit adds up all the inputs it receives in this way and (in the simplest type of network) if the sum is more than a certain threshold value, the unit "fires" and triggers the units it's connected to (those on its right).
+
Information flows through a neural network in two ways. When it's learning (being trained) or operating normally (after being trained), patterns of information are fed into the network via the input units, which trigger the layers of hidden units, and these, in turn, arrive at the output units. This common design is called a feedforward network. Not all units "fire" all the time. Each unit receives inputs from the units to its left, and the inputs are multiplied by the weights of the connections they travel along. Every unit adds up all the inputs it receives in this way and (in the simplest type of network) if the sum is more than a certain threshold value, the unit "fires" and triggers the units it's connected to (those on its right).
  
For a neural network to learn, there has to be an element of feedback involved—just as children learn by being told what they're doing right or wrong. In fact, we all use feedback, all the time. Think back to when you first learned to play a game like ten-pin bowling. As you picked up the heavy ball and rolled it down the alley, your brain watched how quickly the ball moved and the line it followed, and noted how close you came to knocking down the skittles. Next time it was your turn, you remembered what you'd done wrong before, modified your movements accordingly, and hopefully threw the ball a bit better. So you used feedback to compare the outcome you wanted with what actually happened, figured out the difference between the two, and used that to change what you did next time ("I need to throw it harder," "I need to roll slightly more to the left," "I need to let go later," and so on). The bigger the difference between the intended and actual outcome, the more radically you would have altered your moves.
+
For a neural network to learn, there has to be an element of feedback involved—just as children learn by being told what they're doing right or wrong. In fact, we all user feedback, all the time. Think back to when you first learned to play a game like ten-pin bowling. As you picked up the heavy ball and rolled it down the alley, your brain watched how quickly the ball moved and the line it followed and noted how close you came to knocking down the skittles. Next time it was your turn, you remembered what you'd done wrong before, modified your movements accordingly, and hopefully threw the ball a bit better. So you used feedback to compare the outcome you wanted with what actually happened, figured out the difference between the two, and used that to change what you did next time ("I need to throw it harder," "I need to roll slightly more to the left," "I need to let go later," and so on). The bigger the difference between the intended and actual outcome, the more radically you would have altered your moves.
  
Below shows a neural network example with inputs on the left, a hidden layer, and an output layer. In code we will try to mimic this theory.
+
Below shows a neural network example with inputs on the left, a hidden layer, and an output layer. In code, we will try to mimic this theory.
  
 
[[File:NeuralNetwork.jpg]]
 
[[File:NeuralNetwork.jpg]]
Line 95: Line 93:
 
* output layer: the last layer of neurons that produces given outputs for the program.
 
* output layer: the last layer of neurons that produces given outputs for the program.
  
An artificial neural network consists of artificial neurons or processing elements and is organised in three interconnected layers: input, hidden that may include more than one layer, and output.
+
An artificial neural network consists of artificial neurons or processing elements and is organised in three interconnected layers: input, hidden which may include more than one layer and output.
  
 
The input layer contains input neurons that send information to the hidden layer. The hidden layer sends data to the output layer. Every neuron has weighted inputs (synapses), an activation function (defines the output given an input), and one output. Synapses are the adjustable parameters that convert a neural network to a parameter system.
 
The input layer contains input neurons that send information to the hidden layer. The hidden layer sends data to the output layer. Every neuron has weighted inputs (synapses), an activation function (defines the output given an input), and one output. Synapses are the adjustable parameters that convert a neural network to a parameter system.
Line 105: Line 103:
  
  
Neural networks learn things in exactly the same way, typically by a feedback process called back-propagation (sometimes abbreviated as "backprop"). This involves comparing the output a network produces with the output it was meant to produce and using the difference between them to modify the weights of the connections between the units in the network, working from the output units through the hidden units to the input units—going backward, in other words. In time, back-propagation causes the network to learn, reducing the difference between actual and intended output to the point where the two exactly coincide, so the network figures things out exactly as it should.
+
Neural networks learn things in exactly the same way, typically by a feedback process called back-propagation (sometimes abbreviated as "backprop"). This involves comparing the output a network produces with the output it was meant to produce and using the difference between them to modify the weights of the connections between the units in the network, working from the output units through the hidden units to the input units—going backwards, in other words. In time, back-propagation causes the network to learn, reducing the difference between actual and intended output to the point where the two exactly coincide, so the network figures things out exactly as it should.
  
 
===Gradient Descent===
 
===Gradient Descent===
  
Gradient descent has an analogy of looking for the easiest way down a mountain side, you're going to look for the most gentle way down. In reality it is a highly mathematical function but for most programmers this is hidden.
+
Gradient descent has an analogy of looking for the easiest way down a mountainside, you're going to look for the most gentle way down. In reality, it is a highly mathematical function but for most programmers this is hidden.
  
 
[[File:grad-descent2.jpg]]
 
[[File:grad-descent2.jpg]]
  
  
From the programming point of view gradient descent is an iterative method. We start with some set of values for our model parameters (weights and biases), and improve them slowly.
+
From the programming point of view, gradient descent is an iterative method. We start with some sets of values for our model parameters (weights and biases) and improve them slowly.
  
 
To improve a given set of weights, we try to get a sense of the value of the cost function (described below) for weights similar to the current weights (by calculating the gradient). Then we move in the direction which reduces the cost function.
 
To improve a given set of weights, we try to get a sense of the value of the cost function (described below) for weights similar to the current weights (by calculating the gradient). Then we move in the direction which reduces the cost function.
Line 124: Line 122:
 
By repeating this step thousands of times, we’ll continually minimise our cost function.
 
By repeating this step thousands of times, we’ll continually minimise our cost function.
  
On gradients and gradient learning algorithms, the main optimisation technique used to fit neural network weights to training data-sets.
+
On gradients and gradient learning algorithms, the main optimisation technique is used to fit neural network weights to training data sets.
  
This includes the important distinction between batch and stochastic gradient descent, and approximations via mini-batch gradient descent, today all simply referred to as stochastic gradient descent.
+
This includes the important distinction between batch and stochastic gradient descent, and approximations via mini-batch gradient descent, today all are simply referred to as stochastic gradient descent.
  
* Batch Gradient Descent. Gradient is estimated using all examples in the training data set.
+
* Batch Gradient Descent. The gradient is estimated using all examples in the training data set.
* Stochastic (Online) Gradient Descent. Gradient is estimated using subsets of samples in the training data set.
+
* Stochastic (Online) Gradient Descent. The gradient is estimated using subsets of samples in the training data set.
* Mini-Batch Gradient Descent. Gradient is estimated using each single pattern in the training data set.
+
* Mini-Batch Gradient Descent. The gradient is estimated using every single pattern in the training data set.
  
 
The mini-batch variant is offered as a way to achieve the speed of convergence offered by stochastic gradient descent with the improved estimate of the error gradient offered by batch gradient descent.
 
The mini-batch variant is offered as a way to achieve the speed of convergence offered by stochastic gradient descent with the improved estimate of the error gradient offered by batch gradient descent.
Line 147: Line 145:
 
Neural network models learn a mapping from inputs to outputs from examples and the choice of loss function must match the framing of the specific predictive modelling problem, such as classification or regression. Further, the configuration of the output layer must also be appropriate for the chosen loss function.
 
Neural network models learn a mapping from inputs to outputs from examples and the choice of loss function must match the framing of the specific predictive modelling problem, such as classification or regression. Further, the configuration of the output layer must also be appropriate for the chosen loss function.
  
As we keep applying training data to our neural network model we need to measure how close we are achieving our goal, with a suitable model we should describe something like the following:
+
As we keep applying training data to our neural network model we need to measure how close we are to achieving our goal, with a suitable model we should describe something like the following:
  
 
[[File:acc-loss.jpg]]
 
[[File:acc-loss.jpg]]
Line 153: Line 151:
  
  
==Tensorflow and Keras Libraries==
+
===Tensorflow===
  
===Tensorflow===
+
* [https://www.tensorflow.org/ https://www.tensorflow.org/]
  
* [https://www.tensorflow.org/tutorials/ https://www.tensorflow.org/tutorials/]
 
  
 +
Google's TensorFlow Version 2 is a python library, as of version 2 the library Keras is built into the library together with eager execution.
  
Google's TensorFlow is a python library. This library is a great choice for building commercial grade deep learning applications.
+
This library is a great choice for building commercial-grade deep-learning applications.
  
TensorFlow grew out of another library DistBelief V2 that was a part of Google Brain Project. This library aims to extend the portability of machine learning so that research models could be applied to commercial-grade applications.
+
TensorFlow grew out of another library DistBelief V2 which was a part of the Google Brain Project. This library aims to extend the portability of machine learning so that research models could be applied to commercial-grade applications.
  
 
Much like the Theano library, TensorFlow is based on computational graphs where a node represents persistent data or math operation and edges represent the flow of data between nodes, which is a multidimensional array or tensor; hence the name TensorFlow
 
Much like the Theano library, TensorFlow is based on computational graphs where a node represents persistent data or math operation and edges represent the flow of data between nodes, which is a multidimensional array or tensor; hence the name TensorFlow
Line 168: Line 166:
 
The output from an operation or a set of operations is fed as input into the next.
 
The output from an operation or a set of operations is fed as input into the next.
  
Even though TensorFlow was designed for neural networks, it works well for other nets where computation can be modelled as data flow graph.
+
Even though TensorFlow was designed for neural networks, it works well for other nets where computation can be modelled as a data flow graph.
  
TensorFlow also uses several features from Theano such as common and sub-expression elimination, auto differentiation, shared and symbolic variables.
+
TensorFlow also uses several features from Theano such as common and sub-expression elimination, auto differentiation, and shared and symbolic variables.
  
 
Different types of deep nets can be built using TensorFlow like convolutional nets, Autoencoders, RNTN, RNN, RBM, DBM/MLP and so on.
 
Different types of deep nets can be built using TensorFlow like convolutional nets, Autoencoders, RNTN, RNN, RBM, DBM/MLP and so on.
  
However, there is no support for hyper parameter configuration in TensorFlow. For this functionality, we can use Keras.
+
===MNIST Dataset Overview===
  
===Keras===
+
This example is using MNIST handwritten digits. The dataset contains 60,000 examples for training and 10,000 examples for testing. The digits have been size-normalized and centred in a fixed-size image (28x28 pixels) with values from 0 to 255.
  
* [https://keras.io/ https://keras.io/]
+
In this example, each image will be converted to float32, normalized to [0, 1] and flattened to a 1-D array of 784 features (28*28).
  
  
Keras is a powerful easy-to-use Python library for developing and evaluating deep learning models.
+
<pre>
 +
from __future__ import absolute_import, division, print_function
  
It has a minimalist design that allows us to build a net layer by layer; train it, and run it.
 
 
It wraps the efficient numerical computation libraries Theano and TensorFlow and allows us to define and train neural network models in a few short lines of code.
 
 
It is a high-level neural network API, helping to make wide use of deep learning and artificial intelligence. It runs on top of a number of lower-level libraries including TensorFlow, Theano,and so on. Keras code is portable; we can implement a neural network in Keras using Theano or TensorFlow as a back ended without any changes in code.
 
 
==mnist dataset==
 
 
 
The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments. Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels.
 
 
[[File:MnistExamples.jpg]]
 
 
 
 
The MNIST database contains 60,000 training images and 10,000 testing images. Half of the training set and half of the test set were taken from NIST's training dataset, while the other half of the training set and the other half of the test set were taken from NIST's testing dataset. There have been a several scientific papers on attempts to achieve the lowest error rate; one paper, using a hierarchical system of convolutional neural networks, manages to get an error rate on the MNIST database of 0.23%. The original creators of the database keep a list of some of the methods tested on it. In their original paper, they use a support vector machine to get an error rate of 0.8%. An extended dataset similar to MNIST called EMNIST has been published in 2017, which contains 240,000 training images, and 40,000 testing images of handwritten digits and characters
 
 
 
This classic dataset is the ideal for neural nets since it has image data and the correct answer to compare our test network.
 
 
We will now look at three different ways of running this network with different methods:
 
 
<pre>
 
#
 
# Described neural net builder program
 
# Darren Bird - HPC Viper Team
 
#
 
 
import tensorflow as tf
 
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
+
from tensorflow.keras import Model, layers
 +
import numpy as np
  
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
+
MNIST dataset parameters.
 +
num_classes = 10 # total classes (0-9 digits).
 +
num_features = 784 # data features (img shape: 28*28).
  
n_nodes_hl1= 30
+
# Training parameters.
n_nodes_hl2= 20
+
learning_rate = 0.1
 +
training_steps = 2000
 +
batch_size = 256
 +
display_step = 100
  
 +
# Network parameters.
 +
n_hidden_1 = 128 # 1st layer number of neurons.
 +
n_hidden_2 = 256 # 2nd layer number of neurons.
  
n_classes = 10
+
# Prepare MNIST data.
batch_size = 100
+
from tensorflow.keras.datasets import mnist
 
+
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# height x width
+
# Convert to float32.
# x is 784 is a flat array of a 28x28 pixel image
+
x_train, x_test = np.array(x_train, np.float32), np.array(x_test, np.float32)
#
+
# Flatten images to 1-D vector of 784 features (28*28).
 
+
x_train, x_test = x_train.reshape([-1, num_features]), x_test.reshape([-1, num_features])
x = tf.placeholder('float',[None, 784]);
+
# Normalize images value from [0, 255] to [0, 1].
y = tf.placeholder('float');
+
x_train, x_test = x_train / 255., x_test / 255.
 
 
# Limit parallelism on multicore system for fair play
 
 
 
config = tf.ConfigProto()
 
config.intra_op_parallelism_threads = 1
 
config.inter_op_parallelism_threads= 1
 
 
 
# This builds the computational graph up, but does not execute it here!
 
 
 
def neural_network_model(data):
 
  
        hidden_1_layer = {'weights':tf.Variable(tf.random_normal([784, n_nodes_hl1])),
+
Use tf.data API to shuffle and batch data.
                'biases':tf.Variable(tf.random_normal([n_nodes_hl1]))}
+
train_data = tf.data.Dataset.from_tensor_slices((x_train, y_train))
 +
train_data = train_data.repeat().shuffle(5000).batch(batch_size).prefetch(1)
 +
Create TF Model.
 +
class NeuralNet(Model):
 +
    # Set layers.
 +
    def __init__(self):
 +
        super(NeuralNet, self).__init__()
 +
        # First fully-connected hidden layer.
 +
        self.fc1 = layers.Dense(n_hidden_1, activation=tf.nn.relu)
 +
        # First fully-connected hidden layer.
 +
        self.fc2 = layers.Dense(n_hidden_2, activation=tf.nn.relu)
 +
        # Second fully-connecter hidden layer.
 +
        self.out = layers.Dense(num_classes)
  
         hidden_2_layer = {'weights':tf.Variable(tf.random_normal([n_nodes_hl1, n_nodes_hl2])),
+
    # Set forward pass.
                'biases':tf.Variable(tf.random_normal([n_nodes_hl2]))}
+
    def call(self, x, is_training=False):
 +
         x = self.fc1(x)
 +
        x = self.fc2(x)
 +
        x = self.out(x)
 +
        if not is_training:
 +
            # tf cross entropy expect logits without softmax, so only
 +
            # apply softmax when not training.
 +
            x = tf.nn.softmax(x)
 +
        return x
  
        output_layer = {'weights':tf.Variable(tf.random_normal([n_nodes_hl2, n_classes])),
+
# Build neural network model.
                'biases':tf.Variable(tf.random_normal([n_classes]))}
+
neural_net = NeuralNet()
  
        # (input_data * weights) + biases
+
# Cross-Entropy Loss.
 +
# Note that this will apply 'softmax' to the logits.
 +
def cross_entropy_loss(x, y):
 +
    # Convert labels to int 64 for tf cross-entropy function.
 +
    y = tf.cast(y, tf.int64)
 +
    # Apply softmax to logits and compute cross-entropy.
 +
    loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=x)
 +
    # Average loss across the batch.
 +
    return tf.reduce_mean(loss)
  
        l1 = tf.add(tf.matmul(data, hidden_1_layer['weights']), hidden_1_layer['biases'])
+
# Accuracy metric.
        l1 = tf.nn.relu(l1)
+
def accuracy(y_pred, y_true):
 +
    # Predicted class is the index of highest score in prediction vector (i.e. argmax).
 +
    correct_prediction = tf.equal(tf.argmax(y_pred, 1), tf.cast(y_true, tf.int64))
 +
    return tf.reduce_mean(tf.cast(correct_prediction, tf.float32), axis=-1)
  
        l2 = tf.add(tf.matmul(l1, hidden_2_layer['weights']), hidden_2_layer['biases'])
+
# Stochastic gradient descent optimizer.
         l2 = tf.nn.relu(l2)
+
optimizer = tf.optimizers.SGD(learning_rate)
 +
# Optimization process.
 +
def run_optimization(x, y):
 +
    # Wrap computation inside a GradientTape for automatic differentiation.
 +
    with tf.GradientTape() as g:
 +
        # Forward pass.
 +
        pred = neural_net(x, is_training=True)
 +
         # Compute loss.
 +
        loss = cross_entropy_loss(pred, y)
 
          
 
          
        output = tf.matmul(l2, output_layer['weights']) + output_layer['biases']
+
    # Variables to update, i.e. trainable variables.
 
+
    trainable_variables = neural_net.trainable_variables
        return output
 
 
 
def train_neural_network(x):
 
 
 
        prediction = neural_network_model(x)
 
 
 
        # print ("prediction " + prediction)
 
 
 
        cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits=prediction,
 
                labels=y) )
 
 
 
        # print ("cost " + cost)
 
 
 
        # learning_rate = 0.001 - Adam optimizer
 
 
 
        optimizer = tf.train.AdamOptimizer().minimize(cost)
 
 
 
        # cycles feed forward + backprop
 
        hm_epochs = 40
 
 
 
        # run computational graph
 
 
 
        with tf.Session(config=config) as sess:
 
                sess.run(tf.global_variables_initializer() )
 
 
 
                for epoch in range(hm_epochs):
 
                        epoch_loss = 0
 
 
 
                        for _ in range(int(mnist.train.num_examples/batch_size)):
 
                                epoch_x, epoch_y = mnist.train.next_batch(batch_size)
 
 
 
                                _, c = sess.run([optimizer, cost], feed_dict = {x: epoch_x, y: epoch_y})
 
 
 
                                epoch_loss += c
 
                        print('Epoch', epoch, ' completed out of ', hm_epochs, ' loss:',epoch_loss)
 
 
 
                correct = tf.equal(tf.argmax(prediction,1), tf.argmax(y,1))
 
 
 
                accuracy = tf.reduce_mean(tf.cast(correct, 'float'))
 
 
 
                print('Accuracy:',accuracy.eval( {x:mnist.test.images, y: mnist.test.labels} ))
 
 
 
        return
 
 
 
train_neural_network(x)
 
</pre>
 
 
 
====Program output====
 
 
 
<pre>
 
Extracting /tmp/data/train-images-idx3-ubyte.gz
 
Extracting /tmp/data/train-labels-idx1-ubyte.gz
 
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
 
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz
 
Epoch 0  completed out of  40  loss: 10784.182388782501
 
Epoch 1  completed out of  40  loss: 1482.2316706180573
 
Epoch 2  completed out of  40  loss: 1263.2460134029388
 
Epoch 3  completed out of  40  loss: 1169.1023557186127
 
.
 
.
 
.
 
Epoch 37  completed out of  40  loss: 101.28223147243261
 
Epoch 38  completed out of  40  loss: 98.11811654642224
 
Epoch 39  completed out of  40  loss: 95.58086830750108
 
Accuracy: 0.9345
 
</pre>
 
 
 
By running the above code you can see the computer going through the images in batches.
 
 
 
<pre>
 
epoch_x, epoch_y = mnist.train.next_batch(batch_size)
 
</pre>
 
 
 
It compares the outputs of our neural network with the actual outputs. Each time we try and minimise how far we are away from that answer
 
 
 
<pre>
 
optimizer = tf.train.AdamOptimizer().minimize(cost)
 
</pre>
 
 
 
By changing those neural network weights in relation to the outputs which we measure using a mathematical function called gradient descent and changing those weights by a process called back propagation.
 
 
 
<pre>
 
_, c = sess.run([optimizer, cost], feed_dict = {x: epoch_x, y: epoch_y})
 
</pre>
 
 
 
Same mnist data but more complex network
 
 
 
This uses the library KERAS on top of the TENSORFLOW library to simplify the construction of a more complex model. This can be seen in the following code below where each line defines a complete layer of the model:
 
  
<pre>
+
    # Compute gradients.
model = Sequential()
+
    gradients = g.gradient(loss, trainable_variables)
model.add(Conv2D(32, kernel_size=(3, 3),
+
   
                activation='relu',
+
    # Update W and b following gradients.
                input_shape=input_shape))
+
    optimizer.apply_gradients(zip(gradients, trainable_variables))
model.add(Conv2D(64, (3, 3), activation='relu'))
 
model.add(MaxPooling2D(pool_size=(2, 2)))
 
model.add(Dropout(0.25))
 
model.add(Flatten())
 
model.add(Dense(128, activation='relu'))
 
model.add(Dropout(0.5))
 
model.add(Dense(num_classes, activation='softmax'))
 
</pre>
 
  
==Keras code example==
+
# Run training for the given number of steps.
 +
for step, (batch_x, batch_y) in enumerate(train_data.take(training_steps), 1):
 +
    # Run the optimization to update W and b values.
 +
    run_optimization(batch_x, batch_y)
 +
   
 +
    if step % display_step == 0:
 +
        pred = neural_net(batch_x, is_training=True)
 +
        loss = cross_entropy_loss(pred, batch_y)
 +
        acc = accuracy(pred, batch_y)
 +
        print("step: %i, loss: %f, accuracy: %f" % (step, loss, acc))
  
* [https://keras.io/ https://keras.io/]
 
 
<pre>
 
'''Trains a simple convnet on the MNIST dataset.
 
Gets to 99.25% test accuracy after 12 epochs
 
(there is still a lot of margin for parameter tuning).
 
'''
 
 
from __future__ import print_function
 
import keras
 
from keras.datasets import mnist
 
from keras.models import Sequential
 
from keras.layers import Dense, Dropout, Flatten
 
from keras.layers import Conv2D, MaxPooling2D
 
from keras import backend as K
 
import time
 
 
K.set_session(K.tf.Session(config=K.tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)))
 
start_time = time.time()
 
 
batch_size = 128
 
num_classes = 10
 
epochs = 12
 
 
# input image dimensions
 
img_rows, img_cols = 28, 28
 
 
# the data, shuffled and split between train and test sets
 
(x_train, y_train), (x_test, y_test) = mnist.load_data()
 
 
if K.image_data_format() == 'channels_first':
 
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
 
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
 
    input_shape = (1, img_rows, img_cols)
 
else:
 
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
 
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
 
    input_shape = (img_rows, img_cols, 1)
 
 
x_train = x_train.astype('float32')
 
x_test = x_test.astype('float32')
 
x_train /= 255
 
x_test /= 255
 
print('x_train shape:', x_train.shape)
 
print(x_train.shape[0], 'train samples')
 
print(x_test.shape[0], 'test samples')
 
 
# convert class vectors to binary class matrices
 
y_train = keras.utils.to_categorical(y_train, num_classes)
 
y_test = keras.utils.to_categorical(y_test, num_classes)
 
 
# build neural net layers up, does not run them yet!
 
 
model = Sequential()
 
model.add(Conv2D(32, kernel_size=(3, 3),
 
                activation='relu',
 
                input_shape=input_shape))
 
model.add(Conv2D(64, (3, 3), activation='relu'))  #------ try changing activation to 'sigmoid'
 
model.add(MaxPooling2D(pool_size=(2, 2)))
 
model.add(Dropout(0.25))                      #---------- try changing these between 0 - 1.0
 
model.add(Flatten())
 
model.add(Dense(128, activation='relu'))      #---------- sigmoid
 
model.add(Dropout(0.5))                      #---------- try changing these between 0 - 1.0, then RUN
 
model.add(Dense(num_classes, activation='softmax'))
 
 
# make the graph and now run it with the data
 
 
model.compile(loss=keras.losses.categorical_crossentropy,
 
              optimizer=keras.optimizers.Adadelta(),
 
              metrics=['accuracy'])
 
 
model.fit(x_train, y_train,
 
          batch_size=batch_size,
 
          epochs=epochs,
 
          verbose=1,
 
          validation_data=(x_test, y_test))
 
score = model.evaluate(x_test, y_test, verbose=0)
 
 
# print out the answers
 
 
print('Test loss:', score[0])
 
print('Test accuracy:', score[1])
 
 
print("--- %s seconds ---" % (time.time() - start_time))
 
 
</pre>
 
</pre>
 
Keras code example (more refined again)
 
  
 
<pre>
 
<pre>
'''Trains a simple deep NN on the MNIST dataset.
+
step: 100, loss: 2.031049, accuracy: 0.535156
Gets to 98.40% test accuracy after 20 epochs(there is *a lot* of margin for parameter tuning).
+
step: 200, loss: 1.821917, accuracy: 0.722656
'''
+
step: 300, loss: 1.764789, accuracy: 0.753906
 
+
step: 400, loss: 1.677593, accuracy: 0.859375
from __future__ import print_function
+
step: 500, loss: 1.643402, accuracy: 0.867188
import keras
+
step: 600, loss: 1.645116, accuracy: 0.859375
from keras.datasets import mnist
+
step: 700, loss: 1.618012, accuracy: 0.878906
from keras.models import Sequential
+
step: 800, loss: 1.618097, accuracy: 0.878906
from keras.layers import Dense, Dropout
+
step: 900, loss: 1.616565, accuracy: 0.875000
from keras.optimizers import RMSprop
+
step: 1000, loss: 1.599962, accuracy: 0.894531
from keras import backend as K
+
step: 1100, loss: 1.593849, accuracy: 0.910156
import time
+
step: 1200, loss: 1.594491, accuracy: 0.886719
 
+
step: 1300, loss: 1.622147, accuracy: 0.859375
# Don't change this line
+
step: 1400, loss: 1.547483, accuracy: 0.937500
K.set_session(K.tf.Session(config=K.tf.ConfigProto(intra_op_parallelism_threads=10, inter_op_parallelism_threads=10)))
+
step: 1500, loss: 1.581775, accuracy: 0.898438
 
+
step: 1600, loss: 1.555893, accuracy: 0.929688
start_time = time.time()
+
step: 1700, loss: 1.578076, accuracy: 0.898438
batch_size = 128
+
step: 1800, loss: 1.584776, accuracy: 0.882812
num_classes = 10
+
step: 1900, loss: 1.563029, accuracy: 0.921875
epochs = 20
+
step: 2000, loss: 1.569637, accuracy: 0.902344
 
 
# the data, shuffled and split between train and test sets
 
(x_train, y_train), (x_test, y_test) = mnist.load_data()
 
 
 
x_train = x_train.reshape(60000, 784)
 
x_test = x_test.reshape(10000, 784)
 
x_train = x_train.astype('float32')
 
x_test = x_test.astype('float32')
 
x_train /= 255
 
x_test /= 255
 
print(x_train.shape[0], 'train samples')
 
print(x_test.shape[0], 'test samples')
 
 
 
# convert class vectors to binary class matrices
 
y_train = keras.utils.to_categorical(y_train, num_classes)
 
y_test = keras.utils.to_categorical(y_test, num_classes)
 
 
 
# build layers up
 
 
 
model = Sequential()
 
 
 
model.add(Dense(1024, activation='relu', input_shape=(784,)))
 
model.add(Dropout(0.99))
 
model.add(Dense(10, activation='relu'))
 
model.add(Dropout(0.85))
 
model.add(Dense(num_classes, activation='sigmoid'))
 
 
 
model.summary()
 
 
 
# make computational graph and run it with data
 
 
 
model.compile(loss='categorical_crossentropy',
 
              optimizer=RMSprop(),
 
              metrics=['accuracy'])
 
 
 
history = model.fit(x_train, y_train,
 
                    batch_size=batch_size,
 
                    epochs=epochs,
 
                    verbose=1,
 
                    validation_data=(x_test, y_test))
 
score = model.evaluate(x_test, y_test, verbose=0)
 
 
 
print('Test loss:', score[0])
 
print('Test accuracy:', score[1])
 
print("--- %s seconds ---" % (time.time() - start_time))
 
 
</pre>
 
</pre>
  
===Setting Hyperparameters===
 
 
Initial Learning Rate. The proportion that weights are updated; 0.01 is a good start.
 
Mini-batch Size. Number of samples used to estimate the gradient; 32 is a good start.
 
Training Iterations. Number of updates to the weights; set large and use early stopping.
 
The learning rate is presented as the most important parameter to tune. Although a value of 0.01 is a recommended starting point, dialing it in for a specific dataset and model is required.
 
 
==Using a GPU to run a neural network==
 
 
GPUs are so fast because they are so efficient for matrix multiplication and convolution ideal for Deep learning. They are not better than CPUs which are a much more general-purpose processing unit, CPUs can perform out of order computations and have large caches (L1, L2 and L3) which take up large amount of silicon. GPUs are very good at processing large array data in one go as they are predominantly small processors and shared memory.
 
 
[[File:cpu-gpu.png]]
 
 
Although GPUs are very fast, one disadvantage is moving the data from CPU memory (host) to the GPU (device), and backwards after processing.
 
 
[[File:ML-diffstages.png]]
 
 
Now the GPUs on the university's supercomputer are NVidia K40 which have 2880 streaming processors which makes it ideal for processing a large amount of data in one go. Although CPU are very good, they are latency optimized while GPUs are bandwidth optimized. You can visualize this as a CPU being a Ferrari and a GPU being a big truck. The task of both is to pick up packages from a random location A and to transport those packages to another random location B. The CPU (Ferrari) can fetch some memory (packages) in your RAM quickly while the GPU (big truck) is slower in doing that (much higher latency). However, the CPU (Ferrari) needs to go back and forth many times to do its job.
 
 
 
 
Nearly all production Deep model learning is done on GPU accelerators.
 
 
==Summary==
 
 
Deep learning has produced good results for a few applications such as computer vision, language translation, image captioning, audio transcription, molecular biology, speech recognition, natural language processing, self-driving cars, brain tumour detection, real-time speech translation, music composition, automatic game playing and so on.
 
 
Deep learning is the next big leap after machine learning with a more advanced implementation. Currently, it is heading towards becoming an industry standard bringing a strong promise of being a game changer when dealing with raw unstructured data.
 
 
Deep learning is currently one of the best solution providers fora wide range of real-world problems. Developers are building AI programs that, instead of using previously given rules, learn from examples to solve complicated tasks. With deep learning being used by many data scientists, deeper neural networks are delivering results that are ever more accurate.
 
 
The idea is to develop deep neural networks by increasing the number of training layers for each network; machine learns more about the data until it is as accurate as possible. Developers can use deep learning techniques to implement complex machine learning tasks, and train AI networks to have high levels of perceptual recognition.
 
 
===Viper Development Environments===
 
 
There are the following development environments already part of our HPC
 
 
* Python 3.5 with Tensorflow (and Keras), and theano.
 
* C/C++/Fortran with CUDA GPU programming.
 
* PGI compiler with openACC programming for C and Fortran.
 
* Matlab with deep learning libraries.
 
 
==Useful links for further reading==
 
 
If you're interested in looking into this subject have a look at the following links:
 
 
===Tutorials===
 
 
* https://medium.com/@erikhallstrm/hello-world-rnn-83cd7105b767 https://www.datacamp.com/community/tutorials/tensorflow-tutorial * * * *
 
* https://cs224d.stanford.edu/lectures/CS224d-Lecture7.pdf https://www.toptal.com/machine-learning/tensorflow-machine-learning-tutorial *
 
* https://pythonprogramming.net/tensorflow-introduction-machine-learning-tutorial/
 
* [https://www.nvidia.com/en-gb/deep-learning-ai/education/ https://www.nvidia.com/en-gb/deep-learning-ai/education/]
 
 
===Tensorflow===
 
 
* [https://www.tensorflow.org/tutorials/ https://www.tensorflow.org/tutorials/]
 
 
 
===Keras===
 
 
* [https://keras.io/ https://keras.io/]
 
  
===Software repositories===
+
==Software repositories==
  
* https://github.com/Hvass-Labs/TensorFlow-Tutorials https://github.com/aymericdamien/TensorFlow-Examples  
+
* https://github.com/Hvass-Labs/TensorFlow-Tutorials  
 +
* https://github.com/aymericdamien/TensorFlow-Examples  
 
* https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/tutorials
 
* https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/tutorials
  
  
  
{| class="wikitable"
 
|style="width:5%; border-width: 0" | [[File:icon_home.png]]
 
|style="width:95%; border-width: 0" |
 
* [[Main_Page|Home]]
 
* [[Applications|Application support]]
 
* [[General|General]]
 
* [[Training|Training]]
 
* [[Programming|Programming support]]
 
  
|-
+
[[Main Page]]    /    [[FurtherTopics/FurtherTopics #Programming| Back to Further Topics]]
|}
 

Latest revision as of 13:39, 17 November 2022

Deep Learning

Introduction

Deep learning (also known as deep structured learning or hierarchical learning) is part of a broader family of machine learning methods based on learning data representations, as opposed to task-specific algorithms. Learning can be supervised, semi-supervised or unsupervised.

There is a massive amount of possible applications where Deep Learning can be deployed, these include:

  • Automatic speech recognition
  • Image recognition
  • Visual art processing
  • Natural language processing
  • Drug discovery and toxicology
  • Customer relationship management
  • Recommendation systems
  • Bioinformatics
  • Health diagnostics
  • Image restoration
  • Financial fraud detection


There are 6 Types of Artificial Neural Networks Currently Being:

  • Recurrent Neural Network(RNN) – Long Short-Term Memory
  • Convolutional Neural Network
  • Feedforward Neural Network – Artificial Neuron
  • Radial basis function Neural Network
  • Kohonen Self-Organizing Neural Network
  • Modular Neural Network

The top two are the most used.

Introduction to Machine Learning vs Deep Learning

Before starting let us have a look at the two different terms which are Machine Learning and Deep Learning which are closely related. The following diagram shows pictorially the key difference:

Mlvsdl.png

With Machine Learning the approach works as the top half of the picture above. You would have to design a feature extraction algorithm that generally involved a lot of heavy mathematics (complex design), wasn’t very efficient, and didn’t perform too well at all (accuracy level just wasn’t suitable for real-world applications). After doing all of that you would also have to design a whole classification model to classify your input given the extracted features.

With Deep Learning networks we can perform feature extraction and classification in one shot, which means we only have to design one model. This also means that with have a lot more layers (usually) and parameters to refine our model to an optimal point.

Machine Learning

  • + Good results
  • + Quick to train
  • - Need to try different features and classifiers to achieve best results
  • - Accuracy plateaus

Deep Learning

  • + Learns features and classifiers automatically
  • + Accuracy is unlimited
  • - Requires very large data sets
  • - Computationally intensive / expensive

Introduction to Python and its libraries

Python is a general-purpose high-level programming language that is widely used in data science and for producing deep learning algorithms. This brief tutorial introduces Python and its libraries like Numpy, TensorFlow.

Deep structured learning or hierarchical learning or deep learning in short is part of the family of machine learning methods which are themselves a subset of the broader field of Artificial Intelligence.

Machine learning deals with a wide range of concepts. The concepts are listed below −

  • supervised
  • unsupervised
  • reinforcement learning
  • linear regression
  • cost functions
  • overfitting
  • under-fitting
  • hyper-parameter, etc.

In supervised learning, we learn to predict values from labelled data. One ML technique that helps here is classification, where target values are discrete values; for example, cats and dogs. Another technique in machine learning that could come of help is regression. Regression works on the target values. The target values are continuous values; for example, the stock market data can be analysed using Regression.

Introduction to Neural Network

A typical neural network has anything from a few dozen to hundreds, thousands or even millions of artificial neurons called units arranged in a series of layers, each of which connects to the layers on either side. Some of them, known as input units, are designed to receive various forms of information from the outside world that the network will attempt to learn about, recognise, or otherwise process. Other units sit on the opposite side of the network and signal how it responds to the information it's learned; those are known as output units. In between the input units and output, units are one or more layers of hidden units, which, together, form most of the artificial brain. Most neural networks are fully connected, which means each hidden unit and each output unit are connected to every unit in the layers on either side. The connections between one unit and another are represented by a number called a weight, which can be either positive (if one unit excites another) or negative (if one unit suppresses or inhibits another). The higher the weight, the more influence one unit has on another. (This corresponds to the way actual brain cells trigger one another across tiny gaps called synapses.)

Information flows through a neural network in two ways. When it's learning (being trained) or operating normally (after being trained), patterns of information are fed into the network via the input units, which trigger the layers of hidden units, and these, in turn, arrive at the output units. This common design is called a feedforward network. Not all units "fire" all the time. Each unit receives inputs from the units to its left, and the inputs are multiplied by the weights of the connections they travel along. Every unit adds up all the inputs it receives in this way and (in the simplest type of network) if the sum is more than a certain threshold value, the unit "fires" and triggers the units it's connected to (those on its right).

For a neural network to learn, there has to be an element of feedback involved—just as children learn by being told what they're doing right or wrong. In fact, we all user feedback, all the time. Think back to when you first learned to play a game like ten-pin bowling. As you picked up the heavy ball and rolled it down the alley, your brain watched how quickly the ball moved and the line it followed and noted how close you came to knocking down the skittles. Next time it was your turn, you remembered what you'd done wrong before, modified your movements accordingly, and hopefully threw the ball a bit better. So you used feedback to compare the outcome you wanted with what actually happened, figured out the difference between the two, and used that to change what you did next time ("I need to throw it harder," "I need to roll slightly more to the left," "I need to let go later," and so on). The bigger the difference between the intended and actual outcome, the more radically you would have altered your moves.

Below shows a neural network example with inputs on the left, a hidden layer, and an output layer. In code, we will try to mimic this theory.

NeuralNetwork.jpg


  • input layer: brings the initial data into the system for further processing by subsequent layers of artificial neurons.
  • hidden layer: a layer in between input layers and output layers, where artificial neurons take in a set of weighted inputs and produce an output through an activation function.
  • output layer: the last layer of neurons that produces given outputs for the program.

An artificial neural network consists of artificial neurons or processing elements and is organised in three interconnected layers: input, hidden which may include more than one layer and output.

The input layer contains input neurons that send information to the hidden layer. The hidden layer sends data to the output layer. Every neuron has weighted inputs (synapses), an activation function (defines the output given an input), and one output. Synapses are the adjustable parameters that convert a neural network to a parameter system.

The weighted sum of the inputs produces the activation signal that is passed to the activation function to obtain one output from the neuron. The commonly used activation functions are linear, step, sigmoid, tanh, and rectified linear unit (ReLu) functions.

Activations.jpg


Neural networks learn things in exactly the same way, typically by a feedback process called back-propagation (sometimes abbreviated as "backprop"). This involves comparing the output a network produces with the output it was meant to produce and using the difference between them to modify the weights of the connections between the units in the network, working from the output units through the hidden units to the input units—going backwards, in other words. In time, back-propagation causes the network to learn, reducing the difference between actual and intended output to the point where the two exactly coincide, so the network figures things out exactly as it should.

Gradient Descent

Gradient descent has an analogy of looking for the easiest way down a mountainside, you're going to look for the most gentle way down. In reality, it is a highly mathematical function but for most programmers this is hidden.

Grad-descent2.jpg


From the programming point of view, gradient descent is an iterative method. We start with some sets of values for our model parameters (weights and biases) and improve them slowly.

To improve a given set of weights, we try to get a sense of the value of the cost function (described below) for weights similar to the current weights (by calculating the gradient). Then we move in the direction which reduces the cost function.

Grad-descent.jpg


By repeating this step thousands of times, we’ll continually minimise our cost function.

On gradients and gradient learning algorithms, the main optimisation technique is used to fit neural network weights to training data sets.

This includes the important distinction between batch and stochastic gradient descent, and approximations via mini-batch gradient descent, today all are simply referred to as stochastic gradient descent.

  • Batch Gradient Descent. The gradient is estimated using all examples in the training data set.
  • Stochastic (Online) Gradient Descent. The gradient is estimated using subsets of samples in the training data set.
  • Mini-Batch Gradient Descent. The gradient is estimated using every single pattern in the training data set.

The mini-batch variant is offered as a way to achieve the speed of convergence offered by stochastic gradient descent with the improved estimate of the error gradient offered by batch gradient descent.

  • Larger batch sizes slow down convergence.
  • Smaller batch sizes offer a regularising effect due to the introduction of statistical noise in the gradient estimate.

Loss and accuracy

Deep learning neural networks are trained using the stochastic gradient descent optimisation algorithm.

As part of the optimisation algorithm, the error for the current state of the model must be estimated repeatedly. This requires the choice of an error function, conventionally called a loss function, that can be used to estimate the loss of the model so that the weights can be updated to reduce the loss on the next evaluation.

The maths function Mean Square Error (MSE) is the most commonly used regression loss function. MSE is the sum of squared distances between our target variable and predicted values.

Neural network models learn a mapping from inputs to outputs from examples and the choice of loss function must match the framing of the specific predictive modelling problem, such as classification or regression. Further, the configuration of the output layer must also be appropriate for the chosen loss function.

As we keep applying training data to our neural network model we need to measure how close we are to achieving our goal, with a suitable model we should describe something like the following:

Acc-loss.jpg


Tensorflow


Google's TensorFlow Version 2 is a python library, as of version 2 the library Keras is built into the library together with eager execution.

This library is a great choice for building commercial-grade deep-learning applications.

TensorFlow grew out of another library DistBelief V2 which was a part of the Google Brain Project. This library aims to extend the portability of machine learning so that research models could be applied to commercial-grade applications.

Much like the Theano library, TensorFlow is based on computational graphs where a node represents persistent data or math operation and edges represent the flow of data between nodes, which is a multidimensional array or tensor; hence the name TensorFlow

The output from an operation or a set of operations is fed as input into the next.

Even though TensorFlow was designed for neural networks, it works well for other nets where computation can be modelled as a data flow graph.

TensorFlow also uses several features from Theano such as common and sub-expression elimination, auto differentiation, and shared and symbolic variables.

Different types of deep nets can be built using TensorFlow like convolutional nets, Autoencoders, RNTN, RNN, RBM, DBM/MLP and so on.

MNIST Dataset Overview

This example is using MNIST handwritten digits. The dataset contains 60,000 examples for training and 10,000 examples for testing. The digits have been size-normalized and centred in a fixed-size image (28x28 pixels) with values from 0 to 255.

In this example, each image will be converted to float32, normalized to [0, 1] and flattened to a 1-D array of 784 features (28*28).


from __future__ import absolute_import, division, print_function

import tensorflow as tf
from tensorflow.keras import Model, layers
import numpy as np

 MNIST dataset parameters.
num_classes = 10 # total classes (0-9 digits).
num_features = 784 # data features (img shape: 28*28).

# Training parameters.
learning_rate = 0.1
training_steps = 2000
batch_size = 256
display_step = 100

# Network parameters.
n_hidden_1 = 128 # 1st layer number of neurons.
n_hidden_2 = 256 # 2nd layer number of neurons.

# Prepare MNIST data.
from tensorflow.keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# Convert to float32.
x_train, x_test = np.array(x_train, np.float32), np.array(x_test, np.float32)
# Flatten images to 1-D vector of 784 features (28*28).
x_train, x_test = x_train.reshape([-1, num_features]), x_test.reshape([-1, num_features])
# Normalize images value from [0, 255] to [0, 1].
x_train, x_test = x_train / 255., x_test / 255.

 Use tf.data API to shuffle and batch data.
train_data = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_data = train_data.repeat().shuffle(5000).batch(batch_size).prefetch(1)
 Create TF Model.
class NeuralNet(Model):
    # Set layers.
    def __init__(self):
        super(NeuralNet, self).__init__()
        # First fully-connected hidden layer.
        self.fc1 = layers.Dense(n_hidden_1, activation=tf.nn.relu)
        # First fully-connected hidden layer.
        self.fc2 = layers.Dense(n_hidden_2, activation=tf.nn.relu)
        # Second fully-connecter hidden layer.
        self.out = layers.Dense(num_classes)

    # Set forward pass.
    def call(self, x, is_training=False):
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.out(x)
        if not is_training:
            # tf cross entropy expect logits without softmax, so only
            # apply softmax when not training.
            x = tf.nn.softmax(x)
        return x

# Build neural network model.
neural_net = NeuralNet()

# Cross-Entropy Loss.
# Note that this will apply 'softmax' to the logits.
def cross_entropy_loss(x, y):
    # Convert labels to int 64 for tf cross-entropy function.
    y = tf.cast(y, tf.int64)
    # Apply softmax to logits and compute cross-entropy.
    loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=x)
    # Average loss across the batch.
    return tf.reduce_mean(loss)

# Accuracy metric.
def accuracy(y_pred, y_true):
    # Predicted class is the index of highest score in prediction vector (i.e. argmax).
    correct_prediction = tf.equal(tf.argmax(y_pred, 1), tf.cast(y_true, tf.int64))
    return tf.reduce_mean(tf.cast(correct_prediction, tf.float32), axis=-1)

# Stochastic gradient descent optimizer.
optimizer = tf.optimizers.SGD(learning_rate)
# Optimization process. 
def run_optimization(x, y):
    # Wrap computation inside a GradientTape for automatic differentiation.
    with tf.GradientTape() as g:
        # Forward pass.
        pred = neural_net(x, is_training=True)
        # Compute loss.
        loss = cross_entropy_loss(pred, y)
        
    # Variables to update, i.e. trainable variables.
    trainable_variables = neural_net.trainable_variables

    # Compute gradients.
    gradients = g.gradient(loss, trainable_variables)
    
    # Update W and b following gradients.
    optimizer.apply_gradients(zip(gradients, trainable_variables))

# Run training for the given number of steps.
for step, (batch_x, batch_y) in enumerate(train_data.take(training_steps), 1):
    # Run the optimization to update W and b values.
    run_optimization(batch_x, batch_y)
    
    if step % display_step == 0:
        pred = neural_net(batch_x, is_training=True)
        loss = cross_entropy_loss(pred, batch_y)
        acc = accuracy(pred, batch_y)
        print("step: %i, loss: %f, accuracy: %f" % (step, loss, acc))

step: 100, loss: 2.031049, accuracy: 0.535156
step: 200, loss: 1.821917, accuracy: 0.722656
step: 300, loss: 1.764789, accuracy: 0.753906
step: 400, loss: 1.677593, accuracy: 0.859375
step: 500, loss: 1.643402, accuracy: 0.867188
step: 600, loss: 1.645116, accuracy: 0.859375
step: 700, loss: 1.618012, accuracy: 0.878906
step: 800, loss: 1.618097, accuracy: 0.878906
step: 900, loss: 1.616565, accuracy: 0.875000
step: 1000, loss: 1.599962, accuracy: 0.894531
step: 1100, loss: 1.593849, accuracy: 0.910156
step: 1200, loss: 1.594491, accuracy: 0.886719
step: 1300, loss: 1.622147, accuracy: 0.859375
step: 1400, loss: 1.547483, accuracy: 0.937500
step: 1500, loss: 1.581775, accuracy: 0.898438
step: 1600, loss: 1.555893, accuracy: 0.929688
step: 1700, loss: 1.578076, accuracy: 0.898438
step: 1800, loss: 1.584776, accuracy: 0.882812
step: 1900, loss: 1.563029, accuracy: 0.921875
step: 2000, loss: 1.569637, accuracy: 0.902344


Software repositories



Main Page / Back to Further Topics