Monday, April 1, 2019

Machine Learning for Image Classification - Tensorflow Tutorial

This month's meetup was a tutorial on machine learning to do image classification with Tensorflow.
We also had a short talk looking deeper at the last session's sentiment analysis.


Barney's image classification slides are at: (pdf). David's notebook on sentiment analysis is at: (link).

A video of the talks is at: (youtube).


A Deeper Look Sentiment Analysis

At the previous session we explored simple approaches to sentiment analysis, in particular the lexical approach of summing scores associated with words known to be positive or negative.



David dig deeper into that approach and found that documents given a positive or negative overall score actually have many positive and negative scored words within them. The following histogram illustrates this.


David's explorations remind us that it is important to:
  • understand your data and not apply analysis blindly
  • understand the limits or weakness of an algorithm
  • a statistical answer isn't complete without a measure of "confidence"

You can find more if his code and results in his bitbucket.


Image Classification - Automating Manual Processes

Barney started in a very compelling scenario of a business manually sorting paper - invoices, cash claims, letters. The work itself is boring, very slow, prone to fatigue and error, and not good use of people's time.

A natural question occurred to him - could that manual process of classifying document be automated?


Barney looked to modern neural network based machine learning methods which have proven very successful at image classification.

This illustration shows a neural network learning to classify images from a data set as one of three particular characters.


Neural networks learn by adjusting link weights between nodes that make up layers of nodes. Given a training example, the error in its prediction is used to update link weights by a small amount to try to improve that prediction. Over many training examples, a neural network can get better and better at classifying a given image.

Although it is tempting to build a neural network from scratch, in many industrial applications it makes sense to use architectures that have been proven suitable for a given task. The neural network architecture for image classification will likely be different to one for natural language prediction.

Barney discussed Google's Inception network, a large network optimised for image classification.


There are some excellent articles online that explain the history and rationale of Google's Inception networks:



The deep (and wide) Inception network is trained on 1.2 million images, and is only practical with large large compute power available to organisations like Google.

Barney explained that we don't need to train such large complex networks from scratch - we can take advantage of the training that Google has done and simply extend that training to our own data. This is called transfer learning.


In essence the start (left part) of the network has learned to pick out features that help it with the task of image classification. We retain this learning and only train the small end of the network to focus on learning our own subset of images, making use of the same features learned from the huge training data.

You can read more about transfer learning here:



Barney's results were very promising and has generated significant excitement about automating the manual business process. He continues to develop and refine his solution.

Barney touched on an important aspect of automation - the impact on people and employment. His analysis is that businesses should focus on people shifting away from boring and low-skilled tasks towards more challenging and creative work - and the very same people previously employed can do this.


Tensorflow Walk-Through

Barney walked us through a python notebook which demonstrates the training of a simple network using the popular Tensorflow machine learning framework to classify images of fashion items from the MNIST fashion dataset.

The online colab notebook which you can run, and is very well commented, is here:




The central elements of the process are:

  • import Tensorflow and a higher-level API Keras which makes describing and using neural networks easier
  • import the MNIST fashion data set of 60,000 training images and 10,000 test images
  • convert the grey-scale image data from 0-255 to 0-1
  • construct a 3-layer network, the input layer of which is 28x28 which match the image size:

  • the network model is "compiled" with a loss-function and a method for adjusting the weights, commonly called gradient descent of which there are many options:
  • the neural network is then trained (5 times, or epochs) using the 60,000 image training data set:
  • after training, we check how well the network has been trained, by testing it on the 10,000 image test set:

That score of 0.8778 means the neural network correctly classified 88% of the 10,000 test images - an excellent initial result!

Barney did explore an important aspect of image classification. He first showed and example of an ankle boot and the result of a network prediction. It is clear that the network has a very clear and high belief the boot image is indeed of an ankle boot.


He then showed us a more interesting example. Here is the network confidently but incorrectly classifying a sneaker as a sandal. 


A different run shows the network correctly classifying the sneaker but the outputs of the network are high for both sandal and sneaker. The confidence isn't so clear cut.


Looking at this confidence is a useful enrichment to understanding the otherwise simple output of a network.



Overall Barney's walk through demonstrated the key stages of machine learning and highlighted some key issues, such as distinct training and test data, and understanding the confidence of a prediction.


Local Apps, AI in the Cloud

Barney then explained a useful architectural approach of having a lighter local app, perhaps a web app running on a smartphone, backed by a machine learning model hosted in the cloud which benefits from larger compute resources.


As a fun example, Barney did a live demo of a smartphone web app took a photo of scone and used a cloud hosted pre-trained model to determine if it was a Cornish or Devon scone!



Conclusion

Barney succeeded in conveying the key steps applicable to most machine learning exercises, whilst also showing how easy modern tools and technology make this process.

Both Barney and David also highlighted that although the tools and algorithms appear impressive and confident, it is important to look beneath the simple outputs to understand the confidence of those answers. David did this with sentiment analysis and Barney illustrated this with image classification.


Quite a few members said they were inspired to try the tools themselves.