Machine Learning – self-education

Lately I have become very much interested in Artificial Intelligence and Machine Learning. During my spare time I have been following courses about Machine Learning and Deep Learning. At this link you find the certificates which I earned from on-line courses. In order to practice what I have learned and to improve my knowledge I started some projects for image classification using the MNIST dataset . In particular, I chose to train a Convolutional Neural Network (CNN) to perform classification of handwritten digits. More details below.


Personal Project: Convolutional Neural Network (CNN) for handwritten digits classification

Author: Claudio Paliotta

Period: 2018

Objective: Build and train a CNN in order to classify handwritten digits. First dealing only with handwritten digits written in black on a white background. Then dealing with both black handwritten digits on a white background and white handwritten digits on a black background.

Description: The tools used for this project are:

  1. Python – high level programming language which is particularly suitable for neural network applications. In particular, I used the package TensorFlow which is a high level open source library for numerical computation. TensorFlow is particularly useful for Machine Learning applications;
  2. MNIST dataset  – contains 60,000 images for training purposes and 10,000 images for testing purposes. The format of the figures is grey scale and 28×28 pixels. The MNIST dataset is a subset of the larger NIST dataset.

The source code for the project can be found in my GitHub repository.

This project was divided into two phases. Phase 1 concerned the definition of a CNN which could classify with a good accuracy the MNIST images. Phase two concerned with a slight harder problem, that is, dealing also with images with inverted colors. That is, digit written in white on a black background.

Phase 1

The MNIST images are 28×28 pixels with white background. An example of a typical MNIST images is the following.

Typical MNIST image

The problem which I dealt with in Phase 1 is the definition of a CNN which is able to classify handwritten digits of the type described above.

In order to cope with this problem, I defined a CNN made of two CNN layers and two fully connected layers. The structure of the CNN is summed up here:

  1. first convolutional layer: characterized by 16 filters of size 7×7 pixels;
  2. second convolutional layer: characterized by 36 filters of size 7×7 pixels;
  3. first fully connected layer: 64 fully connected neurons activated by Leaky Relu activation function;
  4. second fully connected layer: 32 fully connected neurons activated by Leaky Relu activation function;
  5. dropout layer: the dropout probability is set to 0.4.

The following image gives a visual understanding og the aforementioned structure. (figure to be added)

The CNN optimization function is characterized by the following tuning:

  1. Batch size = 64 images
  2. Learning rate = 0.0001
  3. Number of iterations = 5000

As a result we obtain an accuracy of 96.9%.

Finally, I tried to see if the algorithm is able to predict the value of a figure which is not part of training set. In particular, I tested the following figure. (figure to be added)

In this case the algorithm performs well. However, inverting the colors of the figure the performance is not good. In fact, if I feed the following figure (figure to be added)

I obtain a wrong prediction.

Phase 2

The problem of Phase 2 begins where Phase 1 finished. As we have seen, the algorithm of Phase 1 is not able to predict digits with a black background. This is due to the dataset which we use for the training phase. In particular, all the MNIST images are characterized by a white background. This implies that the parameters of the CNN are not trained to recognize situations in which the images present different features. That is, we do not teach to the CNN what to do when the background is not white. To overcome this situation we can train the CNN with images with white handwritten digits and black background. This will cover the situation that the algorithm of Phase 1 is not able to cover. In order to do so, we decide to invert the colors of half of the MNIST images in the training set. We will then use these images to train the CNN. Note that we invert also the colors of half of the test images such that also the evaluation of the performance of the CNN is fair. The CNN used is the same as for Phase 1. The result in this case is more than positive since we have an accuracy of 97.1%.

The following figure is the one fed into the CNN and the prediction is right as it was in Phase 1.

(figure to be added)

Now we feed in also the same picture as above but with inverted colors. This time the CNN is able predict without error the digit written in the image.

(figure to be added)

This page is still under construction. For questions email to claudiopaliotta@ieee.org