MNIST Dataset
MNIST Dataset
In traditional programming, the programmer is able to articulate rules and conditions in their code that
their program can then use to act in the correct way. This approach continues to work exceptionally well
for a huge variety of problems.
Image classification, which asks a program to correctly classify an image it has never seen before into
its correct class, is near impossible to solve with traditional programming techniques. How could a
programmer possibly define the rules and conditions to correctly classify a huge variety of images,
especially taking into account images that they have never seen?
Deep learning excels at pattern recognition by trial and error. By training a deep neural network with
sufficient data, and providing the network with feedback on its performance via training, the network can
identify, though a huge amount of iteration, its own set of conditions by which it can act in the correct
way.
In the history of deep learning, the accurate image classification of the MNIST dataset
(http://yann.lecun.com/exdb/mnist/), a collection of 70,000 grayscale images of handwritten digits from 0
to 9, was a major development. While today the problem is considered trivial, doing image classification
with MNIST has become a kind of "Hello World" for deep learning.
When working with images for deep learning, we need both the images themselves, usually denoted as
X , and also, correct labels (https://developers.google.com/machine-learning/glossary#label) for these
images, usually denoted as Y . Furthermore, we need X and Y values both for training the model,
and then, a separate set of X and Y values for validating the performance of the model after it has
been trained. Therefore, we need 4 segments of data for the MNIST dataset:
The process of preparing data for analysis is called Data Engineering (https://medium.com/@rchang/a-
beginners-guide-to-data-engineering-part-i-4227c5c457d7). To learn more about the differences
between training data and validation data (as well as test data), check out this article
(https://machinelearningmastery.com/difference-test-validation-datasets/) by Jason Brownlee.
Tensors are mathematical objects from linear algebra and are used to represent multidimensional
objects. They can be used to perform the same arithmetic operations that are already familiar with
vectors or matrices, for example.
One of the many helpful features that Keras provides are modules containing many helper methods for
many common datasets (https://www.tensorflow.org/api_docs/python/tf/keras/datasets), including
MNIST.
WARNING:tensorflow:From C:\Users\pdsin\AppData\Local\Programs\Python\Python310\lib
\site-packages\keras\src\losses.py:2976: The name tf.losses.sparse_softmax_cross_en
tropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy in
stead.
With the mnist module, we can easily load the MNIST data, already partitioned into images and labels
for both training and validation:
We stated above that the MNIST dataset contained 70,000 grayscale images of handwritten digits. By
executing the following cells, we can see that Keras has partitioned 60,000 of these images for training,
and 10,000 for validation (after training), and also, that each image itself is a 2D array with the
dimensions 28x28:
In [5]: 1 x_train.shape
Furthermore, we can see that these 28x28 images are represented as a collection of unsigned 8-bit
integer values between 0 and 255, the values corresponding with a pixel's grayscale value where 0 is
black, 255 is white, and all other values are in between:
In [7]: 1 x_train.dtype
Out[7]: dtype('uint8')
In [8]: 1 x_train.min()
Out[8]: 0
In [9]: 1 x_train.max()
Out[9]: 255
In [10]: 1 x_train[0]
Out[10]: array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3,
18, 18, 18, 126, 136, 175, 26, 166, 255, 247, 127, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 30, 36, 94, 154, 170,
253 253 253 253 253 225 172 253 242 195 64 0 0
Using Matplotlib (https://matplotlib.org/), we can render one of these grayscale images in our dataset:
In [11]: 1 import matplotlib.pyplot as plt
2
3 image = x_train[4]
4 plt.imshow(image, cmap='gray')
In this way we can now see that this is a 28x28 pixel image of a 9. Or is it a 4? The answer is in the
y_train data, which contains correct labels for the data. Let's take a look:
In [20]: 1 y_train[4]
Out[20]: 9
In deep learning, it is common that data needs to be transformed to be in the ideal state for training. For
this particular image classification problem, there are 3 tasks we should perform with the data in
preparation for training:
1. Flatten the image data, to simplify the image input into the model
2. Normalize the image data, to make the image input values easier to work with for the model
3. Categorize the labels, to make the label values easier to work with for the model
Though it's possible for a deep learning model to accept a 2-dimensional image (in our case 28x28
pixels), we're going to simplify things to start and reshape
(https://www.tensorflow.org/api_docs/python/tf/reshape) each image into a single array of 784
continuous pixels (note: 28x28 = 784). This is also called flattening the image.
We can confirm that the image data has been reshaped and is now a collection of 1D arrays containing
784 pixel values each:
In [22]: 1 x_train.shape
In [23]: 1 x_train[0]
Out[23]: array([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 18, 18, 18,
126, 136, 175, 26, 166, 255, 247, 127, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 30, 36, 94, 154, 170, 253,
253, 253, 253, 253, 225, 172, 253, 242, 195, 64, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 49, 238, 253, 253, 253,
253, 253, 253, 253, 253, 251, 93, 82, 82, 56, 39, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 219, 253,
253, 253, 253, 253, 198, 182, 247, 241, 0, 0, 0, 0, 0,
0 0 0 0 0 0 0 0 0 0 0 0 0
Normalizing the Image Data
Deep learning models are better at dealing with floating point numbers between 0 and 1. Converting
integer values to floating point values between 0 and 1 is called normalization
(https://developers.google.com/machine-learning/glossary#normalization), and a simple approach we
will take here to normalize the data will be to divide all the pixel values (which if you recall are between
0 and 255) by 255:
We can now see that the values are all floating point values between 0.0 and 1.0 :
In [25]: 1 x_train.dtype
Out[25]: dtype('float64')
In [26]: 1 x_train.min()
Out[26]: 0.0
In [27]: 1 x_train.max()
Out[27]: 1.0
Categorical Encoding
Consider for a moment, if we were to ask, what is 7 - 2? Stating that the answer was 4 is closer than
stating that the answer was 9. However, for this image classification problem, we don't want the neural
network to learn this kind of reasoning: we just want it to select the correct category, and understand
that if we have an image of the number 5, that guessing 4 is just as bad as guessing 9.
As it stands, the labels for the images are integers between 0 and 9. Because these values represent a
numerical range, the model might try to draw some conclusions about its performance based on how
close to the correct numerical category it guesses.
Therefore, we will do something to our data called categorical encoding. This kind of transformation
modifies the data so that each value is a collection of all possible categories, with the actual category
that this particular value is set as true.
As a simple example, consider if we had 3 categories: red, blue, and green. For a given color, 2 of these
categories would be false, and the other would be true:
Rather than use "True" or "False", we could represent the same using binary, either 0 or 1:
Actual Color Is Red? Is Blue? Is Green?
Red 1 0 0
Green 0 0 1
Blue 0 1 0
Green 0 0 1
This is what categorical encoding is, transforming values which are intended to be understood as
categorical labels into a representation that makes their categorical nature explicit to the model. Thus, if
we were using these values for training, we would convert...
... which a neural network would have a very difficult time making sense of, instead to:
values = [
[1, 0, 0],
[0, 0, 1],
[0, 1, 0],
[0, 0, 1]
]
Here are the first 10 values of the training labels, which you can see have now been categorically
encoded:
In [30]: 1 y_train[0:10]
Out[30]: array([[0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
[0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
[0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
[0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
[0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 0., 0., 0., 0., 0.]], dtype=float32)
Creating the Model
With the data prepared for training, it is now time to create the model that we will train with the data.
This first basic model will be made up of several layers and will be comprised of 3 main parts:
WARNING:tensorflow:From C:\Users\pdsin\AppData\Local\Programs\Python\Python310\lib
\site-packages\keras\src\backend.py:873: The name tf.get_default_graph is deprecate
d. Please use tf.compat.v1.get_default_graph instead.
Next, we will add the input layer. This layer will be densely connected, meaning that each neuron in it,
and its weights, will affect every neuron in the next layer. To do this with Keras, we use Keras's Dense
(https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense) layer class.
The units argument specifies the number of neurons in the layer. We are going to use 512 (chosen
from experimentation). Choosing the correct number of neurons is what puts the "science" in "data
science" as it is a matter of capturing the statistical complexity of the dataset. Try playing around with
this value later to see how it affects training and to start developing a sense for what this number
means.
We will learn more about activation functions later, but for now, we will use the relu activation
function, which in short, will help our network to learn how to make more sophisticated guesses about
data than if it were required to make guesses based on some strictly linear function.
The input_shape value specifies the shape of the incoming data which in our situation is a 1D array
of 784 values:
In [34]: 1 model.add(Dense(units=512, activation='relu', input_shape=(784,)))
Now we will add an additional densely connected layer. These layers give the network more parameters
to contribute towards its guesses, and therefore, more subtle opportunities for accurate learning:
Finally, we will add an output layer. This layer uses the activation function softmax which will result in
each of the layer's values being a probability between 0 and 1 and will result in all the outputs of the
layer adding to 1. In this case, since the network is to make a guess about a single image belonging to 1
of 10 possible categories, there will be 10 outputs. Each output gives the model's guess (a probability)
that the image belongs to that specific class:
In [37]: 1 model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 512) 401920
=================================================================
Total params: 669706 (2.55 MB)
Trainable params: 669706 (2.55 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
Note the number of trainable parameters. Each of these can be adjusted during training and will
contribute towards the trained model's guesses.
The final step we need to do before we can actually train our model with data is to compile
(https://www.tensorflow.org/api_docs/python/tf/keras/Sequential#compile) it. Here we specify a loss
function (https://developers.google.com/machine-learning/glossary#loss) which will be used for the
model to understand how well it is performing during training. We also specify that we would like to track
while the model trains:
In [39]: 1 model.compile(loss='categorical_crossentropy', metrics=['accuracy'])
WARNING:tensorflow:From C:\Users\pdsin\AppData\Local\Programs\Python\Python310\lib
\site-packages\keras\src\optimizers\__init__.py:309: The name tf.train.Optimizer is
deprecated. Please use tf.compat.v1.train.Optimizer instead.
Now that we have prepared training and validation data, and a model, it's time to train our model with
our training data, and verify it with its validation data.
"Training a model with data" is often also called "fitting a model to data." Put this latter way, it highlights
that the shape of the model changes over time to more accurately understand the data that it is being
given.
When fitting (training) a model with Keras, we use the model's fit
(https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit) method. It expects the following
arguments:
Run the cell below to train the model. We will discuss its output after the training completes:
Epoch 1/5
WARNING:tensorflow:From C:\Users\pdsin\AppData\Local\Programs\Python\Python310\lib
\site-packages\keras\src\utils\tf_utils.py:492: The name tf.ragged.RaggedTensorValu
e is deprecated. Please use tf.compat.v1.ragged.RaggedTensorValue instead.
WARNING:tensorflow:From C:\Users\pdsin\AppData\Local\Programs\Python\Python310\lib
\site-packages\keras\src\engine\base_layer_utils.py:384: The name tf.executing_eage
rly_outside_functions is deprecated. Please use tf.compat.v1.executing_eagerly_outs
ide_functions instead.
For each of the 5 epochs, notice the accuracy and val_accuracy scores. accuracy states how
well the model did for the epoch on all the training data. val_accuracy states how well the model did
on the validation data, which if you recall, was not used at all for training the model.
The model did quite well! The accuracy quickly reached close to 100%, as did the validation accuracy.
We now have a model that can be used to accurately detect and classify hand-written images.
The next step would be to use this model to classify new not-yet-seen handwritten images. This is
called inference (https://blogs.nvidia.com/blog/2016/08/22/difference-deep-learning-training-inference-
ai/).
Try changing the m and the b in order to find the lowest possible loss. How did you find the best line?
Can you make a program to follow your strategy?
Loss: 40.0
Clear the Memory
Before moving on, please execute the following cell to clear up the GPU memory. This is required to
move on to the next notebook.