Neural Networks
Neural Networks
Algorithms
Share
Given a dataset comprised of inputs and outputs, we assume that there is an unknown
underlying function that is consistent in mapping inputs to outputs in the target domain and
resulted in the dataset. We then use supervised learning algorithms to approximate this
function.
Neural networks are an example of a supervised machine learning algorithm that is perhaps
best understood in the context of function approximation. This can be demonstrated with
examples of neural networks approximating simple one-dimensional functions that aid in
developing the intuition for what is being learned by the model.
In this tutorial, you will discover the intuition behind neural networks as function
approximation algorithms.
Kick-start your project with my new book Deep Learning With Python, including step-by-
step tutorials and the Python source code files for all examples.
Tutorial Overview
In supervised learning, a dataset is comprised of inputs and outputs, and the supervised
learning algorithm learns how to best map examples of inputs to examples of outputs.
We can think of this mapping as being governed by a mathematical function, called the
mapping function, and it is this function that a supervised learning algorithm seeks to best
approximate.
The true function that maps inputs to outputs is unknown and is often referred to as the
target function. It is the target of the learning process, the function we are trying to
approximate using only the data that is available. If we knew the target function, we would
not need to approximate it, i.e. we would not need a supervised machine learning algorithm.
Therefore, function approximation is only a useful tool when the underlying target mapping
function is unknown.
All we have are observations from the domain that contain examples of inputs and outputs.
This implies things about the size and quality of the data; for example:
● The more examples we have, the more we might be able to figure out about the
mapping function.
● The less noise we have in observations, the more crisp approximation we can make
of the mapping function.
The reason is that they are a universal approximator. In theory, they can be used to
approximate any function.
… the universal approximation theorem states that a feedforward network with a linear
output layer and at least one hidden layer with any “squashing” activation function (such as
the logistic sigmoid activation function) can approximate any […] function from one finite-
dimensional space to another with any desired non-zero amount of error, provided that the
network is given enough hidden units
In the next section, let’s define a simple function that we can later approximate.
We can define a simple function with one numerical input variable and one numerical output
variable and use this as the basis for understanding neural networks for function
approximation.
We can define a domain of numbers as our input, such as floating-point values from -50 to
50.
We can then select a mathematical operation to apply to the inputs to get the output values.
The selected mathematical operation will be the mapping function, and because we are
choosing it, we will know what it is. In practice, this is not the case and is the reason why we
would use a supervised learning algorithm like a neural network to learn or discover the
mapping function.
In this case, we will use the square of the input as the mapping function, defined as:
● y = x^2
We can develop an intuition for this mapping function by enumerating the values in the
range of our input variable and calculating the output value for each input and plotting the
result.
4 x = [i for i in range(-50,51)]
6 y = [i**2.0 for i in x]
8 pyplot.scatter(x,y)
12 pyplot.show()
Running the example first creates a list of integer values across the entire input domain.
The output values are then calculated using the mapping function, then a plot is created
with the input values on the x-axis and the output values on the y-axis.
Scatter Plot of Input and Output Values for the Chosen Mapping Function
Next, we can then pretend to forget that we know what the mapping function is and use a
neural network to re-learn or re-discover the mapping function.
This is a very simple mapping function, so we would expect a small neural network could
learn it quickly.
We will define the network using the Keras deep learning library and use some data
preparation tools from the scikit-learn library.
1 ...
Next, we can reshape the data so that the input and output variables are columns with one
observation per row, as is expected when using supervised learning models.
1 ...
3 x = x.reshape((len(x), 1))
4 y = y.reshape((len(y), 1))
The inputs will have a range between -50 and 50, whereas the outputs will have a range
between -50^2 (2500) and 0^2 (0). Large input and output values can make training neural
networks unstable, therefore, it is a good idea to scale data first.
We can use the MinMaxScaler to separately normalize the input values and the output
values to values in the range between 0 and 1.
1 ...
3 scale_x = MinMaxScaler()
4 x = scale_x.fit_transform(x)
5 scale_y = MinMaxScaler()
6 y = scale_y.fit_transform(y)
With some trial and error, I chose a model with two hidden layers and 10 nodes in each
layer. Perhaps experiment with other configurations to see if you can do better.
1 ...
3 model = Sequential()
5
model.add(Dense(10, activation='relu', kernel_initializer='he_uniform'))
6 model.add(Dense(1))
We will fit the model using a mean squared loss and use the efficient adam version of
stochastic gradient descent to optimize the model.
This means the model will seek to minimize the mean squared error between the
predictions made and the expected output values (y) while it tries to approximate the
mapping function.
1 ...
3 model.compile(loss='mse', optimizer='adam')
We don’t have a lot of data (e.g. about 100 rows), so we will fit the model for 500 epochs
and use a small batch size of 10.
Again, these values were found after a little trial and error; try different values and see if you
can do better.
1 ...
2 # ft the model on the training dataset
We will make a prediction for each example in the dataset and calculate the error. A perfect
approximation would be 0.0. This is not possible in general because of noise in the
observations, incomplete data, and complexity of the unknown underlying mapping function.
In this case, it is possible because we have all observations, there is no noise in the data,
and the underlying function is not complex.
1 ...
3 yhat = model.predict(x)
1 ...
2 # inverse transforms
3 x_plot = scale_x.inverse_transform(x)
4 y_plot = scale_y.inverse_transform(y)
5 yhat_plot = scale_y.inverse_transform(yhat)
We can then calculate and report the prediction error in the original units of the target
variable.
1 ...
Finally, we can create a scatter plot of the real mapping of inputs to outputs and compare it
to the mapping of inputs to the predicted outputs and see what the approximation of the
mapping function looks like spatially.
This is helpful for developing the intuition behind what neural networks are learning.
1 ...
2 # plot x vs yhat
3 pyplot.scatter(x_plot,yhat_plot, label='Predicted')
7 pyplot.legend()
8 pyplot.show()
13 x = x.reshape((len(x), 1))
14 y = y.reshape((len(y), 1))
17 x = scale_x.fit_transform(x)
18 scale_y = MinMaxScaler()
19 y = scale_y.fit_transform(y)
22 model = Sequential()
24
model.add(Dense(10, activation='relu', kernel_initializer='he_uniform'))
25
model.add(Dense(1))
26
# define the loss function and optimization algorithm
27
model.compile(loss='mse', optimizer='adam')
28 # ft the model on the training dataset
31 yhat = model.predict(x)
32 # inverse transforms
33 x_plot = scale_x.inverse_transform(x)
34 y_plot = scale_y.inverse_transform(y)
35 yhat_plot = scale_y.inverse_transform(yhat)
38 # plot x vs y
39 pyplot.scatter(x_plot,y_plot, label='Actual')
40 # plot x vs yhat
41 pyplot.scatter(x_plot,yhat_plot, label='Predicted')
45 pyplot.legend()
46 pyplot.show()
Running the example first reports the range of values for the input and output variables,
then the range of the same variables after scaling. This confirms that the scaling operation
was performed as we expected.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation
procedure, or differences in numerical precision. Consider running the example a few times
and compare the average outcome.
In this case, we can see that the mean squared error is about 1,300, in squared units. If we
calculate the square root, this gives us the root mean squared error (RMSE) in the original
units. We can see that the average error is about 36 units, which is fine, but not great.
3 MSE: 1300.776
A scatter plot is then created comparing the inputs versus the real outputs, and the inputs
versus the predicted outputs.
The difference between these two data series is the error in the approximation of the
mapping function. We can see that the approximation is reasonable; it captures the general
shape. We can see that there are errors, especially around the 0 input values.
This suggests that there is plenty of room for improvement, such as using a different
activation function or different network architecture to better approximate the mapping
function.
Scatter Plot of Input vs. Actual and Predicted Values for the Neural Net Approximation
Further Reading
This section provides more resources on the topic if you are looking to go deeper.
Tutorials
● Your First Deep Learning Project in Python with Keras Step-By-Step
Books
Articles
Summary
In this tutorial, you discovered the intuition behind neural networks as function
approximation algorithms.
Ask your questions in the comments below and I will do my best to answer.
Multilayer Perceptrons, Convolutional Nets and Recurrent Neural Nets, and more...
Share
●
● A Gentle Introduction To Approximation
●
● A Gentle Introduction to the Rectified Linear Unit (ReLU)
●
● Ensemble Learning Methods for Deep Learning Neural Networks
●
● Step-By-Step Framework for Imbalanced Classification…
●
● A Tour of Machine Learning Algorithms
●
● When to Use MLP, CNN, and RNN Neural Networks
Jason Brownlee, PhD is a machine learning specialist who teaches developers how to get results with
modern machine learning methods via hands-on tutorials.