0% found this document useful (0 votes)

74 views37 pages

Neural Networks

The document discusses how neural networks can be understood as function approximation algorithms for supervised learning problems. It provides an example of using a neural network to approximate a simple function, where the input is mapped to the output by squaring the input value. Specifically, it 1) defines this simple squaring function, 2) generates a dataset of inputs and outputs, 3) scales and prepares the data, 4) designs a neural network model with two hidden layers, and 5) trains the network to minimize the error between its predictions and the expected outputs, approximating the underlying squaring function.

Uploaded by

NAIMUL HASAN NAIM

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

74 views37 pages

Neural Networks

Uploaded by

NAIMUL HASAN NAIM

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 37

Neural Networks are Function Approximation

Algorithms

by Jason Brownlee on March 18, 2020 in Deep Learning

Tweet Tweet Share

Supervised learning in machine learning can be described in terms of function

approximation.

Given a dataset comprised of inputs and outputs, we assume that there is an unknown
underlying function that is consistent in mapping inputs to outputs in the target domain and
resulted in the dataset. We then use supervised learning algorithms to approximate this
function.

Neural networks are an example of a supervised machine learning algorithm that is perhaps
best understood in the context of function approximation. This can be demonstrated with
examples of neural networks approximating simple one-dimensional functions that aid in
developing the intuition for what is being learned by the model.
In this tutorial, you will discover the intuition behind neural networks as function
approximation algorithms.

After completing this tutorial, you will know:

● Training a neural network on data approximates the unknown underlying mapping

function from inputs to outputs.
● One dimensional input and output datasets provide a useful basis for developing the
intuitions for function approximation.
● How to develop and evaluate a small neural network for function approximation.

Kick-start your project with my new book Deep Learning With Python, including step-by-
step tutorials and the Python source code files for all examples.

Let’s get started.

Neural Networks are Function Approximation Algorithms

Photo by daveynin, some rights reserved.

Tutorial Overview

This tutorial is divided into three parts; they are:

1. What Is Function Approximation

2. Definition of a Simple Function
3. Approximating a Simple Function

What Is Function Approximation

Function approximation is a technique for estimating an unknown underlying function using

historical or available observations from the domain.

Artificial neural networks learn to approximate a function.

In supervised learning, a dataset is comprised of inputs and outputs, and the supervised
learning algorithm learns how to best map examples of inputs to examples of outputs.

We can think of this mapping as being governed by a mathematical function, called the
mapping function, and it is this function that a supervised learning algorithm seeks to best
approximate.

Neural networks are an example of a supervised learning algorithm and seek to

approximate the function represented by your data. This is achieved by calculating the error
between the predicted outputs and the expected outputs and minimizing this error during
the training process.

It is best to think of feedforward networks as function approximation machines that are

designed to achieve statistical generalization, occasionally drawing some insights from
what we know about the brain, rather than as models of brain function.
— Page 169, Deep Learning, 2016.

We say “approximate” because although we suspect such a mapping function exists, we

don’t know anything about it.

The true function that maps inputs to outputs is unknown and is often referred to as the
target function. It is the target of the learning process, the function we are trying to
approximate using only the data that is available. If we knew the target function, we would
not need to approximate it, i.e. we would not need a supervised machine learning algorithm.
Therefore, function approximation is only a useful tool when the underlying target mapping
function is unknown.

All we have are observations from the domain that contain examples of inputs and outputs.
This implies things about the size and quality of the data; for example:

● The more examples we have, the more we might be able to figure out about the
mapping function.
● The less noise we have in observations, the more crisp approximation we can make
of the mapping function.

So why do we like using neural networks for function approximation?

The reason is that they are a universal approximator. In theory, they can be used to
approximate any function.
… the universal approximation theorem states that a feedforward network with a linear
output layer and at least one hidden layer with any “squashing” activation function (such as
the logistic sigmoid activation function) can approximate any […] function from one finite-
dimensional space to another with any desired non-zero amount of error, provided that the
network is given enough hidden units

— Page 198, Deep Learning, 2016.

Regression predictive modeling involves predicting a numerical quantity given inputs.

Classification predictive modeling involves predicting a class label given inputs.

Both of these predictive modeling problems can be seen as examples of function

approximation.

To make this concrete, we can review a worked example.

In the next section, let’s define a simple function that we can later approximate.

Definition of a Simple Function

We can define a simple function with one numerical input variable and one numerical output
variable and use this as the basis for understanding neural networks for function
approximation.
We can define a domain of numbers as our input, such as floating-point values from -50 to
50.

We can then select a mathematical operation to apply to the inputs to get the output values.
The selected mathematical operation will be the mapping function, and because we are
choosing it, we will know what it is. In practice, this is not the case and is the reason why we
would use a supervised learning algorithm like a neural network to learn or discover the
mapping function.

In this case, we will use the square of the input as the mapping function, defined as:

● y = x^2

Where y is the output variable and x is the input variable.

We can develop an intuition for this mapping function by enumerating the values in the
range of our input variable and calculating the output value for each input and plotting the
result.

The example below implements this in Python.

1 # example of creating a univariate dataset with a given mapping function

2 from matplotlib import pyplot

3 # define the input data

4 x = [i for i in range(-50,51)]

5 # define the output data

6 y = [i**2.0 for i in x]

7 # plot the input versus the output

8 pyplot.scatter(x,y)

9 pyplot.title('Input (x) versus Output (y)')

10 pyplot.xlabel('Input Variable (x)')

11 pyplot.ylabel('Output Variable (y)')

12 pyplot.show()

Running the example first creates a list of integer values across the entire input domain.

The output values are then calculated using the mapping function, then a plot is created
with the input values on the x-axis and the output values on the y-axis.
Scatter Plot of Input and Output Values for the Chosen Mapping Function

The input and output variables represent our dataset.

Next, we can then pretend to forget that we know what the mapping function is and use a
neural network to re-learn or re-discover the mapping function.

Approximating a Simple Function

We can fit a neural network model on examples of inputs and outputs and see if the model
can learn the mapping function.

This is a very simple mapping function, so we would expect a small neural network could
learn it quickly.

We will define the network using the Keras deep learning library and use some data
preparation tools from the scikit-learn library.

First, let’s define the dataset.

1 ...

2 # define the dataset

3 x = asarray([i for i in range(-50,51)])

4 y = asarray([i**2.0 for i in x])

5 print(x.min(), x.max(), y.min(), y.max())

Next, we can reshape the data so that the input and output variables are columns with one
observation per row, as is expected when using supervised learning models.
1 ...

2 # reshape arrays into into rows and cols

3 x = x.reshape((len(x), 1))

4 y = y.reshape((len(y), 1))

Next, we will need to scale the inputs and the outputs.

The inputs will have a range between -50 and 50, whereas the outputs will have a range
between -50^2 (2500) and 0^2 (0). Large input and output values can make training neural
networks unstable, therefore, it is a good idea to scale data first.

We can use the MinMaxScaler to separately normalize the input values and the output
values to values in the range between 0 and 1.

1 ...

2 # separately scale the input and output variables

3 scale_x = MinMaxScaler()
4 x = scale_x.fit_transform(x)

5 scale_y = MinMaxScaler()

6 y = scale_y.fit_transform(y)

7 print(x.min(), x.max(), y.min(), y.max())

We can now define a neural network model.

With some trial and error, I chose a model with two hidden layers and 10 nodes in each
layer. Perhaps experiment with other configurations to see if you can do better.

1 ...

2 # design the neural network model

3 model = Sequential()

4 model.add(Dense(10, input_dim=1, activation='relu',

kernel_initializer='he_uniform'))

5
model.add(Dense(10, activation='relu', kernel_initializer='he_uniform'))
6 model.add(Dense(1))

We will fit the model using a mean squared loss and use the efficient adam version of
stochastic gradient descent to optimize the model.

This means the model will seek to minimize the mean squared error between the
predictions made and the expected output values (y) while it tries to approximate the
mapping function.

1 ...

2 # define the loss function and optimization algorithm

3 model.compile(loss='mse', optimizer='adam')

We don’t have a lot of data (e.g. about 100 rows), so we will fit the model for 500 epochs
and use a small batch size of 10.

Again, these values were found after a little trial and error; try different values and see if you
can do better.

1 ...
2 # ft the model on the training dataset

3 model.fit(x, y, epochs=500, batch_size=10, verbose=0)

Once fit, we can evaluate the model.

We will make a prediction for each example in the dataset and calculate the error. A perfect
approximation would be 0.0. This is not possible in general because of noise in the
observations, incomplete data, and complexity of the unknown underlying mapping function.

In this case, it is possible because we have all observations, there is no noise in the data,
and the underlying function is not complex.

First, we can make the prediction.

1 ...

2 # make predictions for the input data

3 yhat = model.predict(x)

We then must invert the scaling that we performed.

This is so the error is reported in the original units of the target variable.

1 ...

2 # inverse transforms

3 x_plot = scale_x.inverse_transform(x)

4 y_plot = scale_y.inverse_transform(y)

5 yhat_plot = scale_y.inverse_transform(yhat)

We can then calculate and report the prediction error in the original units of the target
variable.

1 ...

2 # report model error

3 print('MSE: %.3f' % mean_squared_error(y_plot, yhat_plot))

Finally, we can create a scatter plot of the real mapping of inputs to outputs and compare it
to the mapping of inputs to the predicted outputs and see what the approximation of the
mapping function looks like spatially.
This is helpful for developing the intuition behind what neural networks are learning.

1 ...

2 # plot x vs yhat

3 pyplot.scatter(x_plot,yhat_plot, label='Predicted')

4 pyplot.title('Input (x) versus Output (y)')

5 pyplot.xlabel('Input Variable (x)')

6 pyplot.ylabel('Output Variable (y)')

7 pyplot.legend()

8 pyplot.show()

Tying this together, the complete example is listed below.

1 # example of fitting a neural net on x vs x^2

2 from sklearn.preprocessing import MinMaxScaler

3 from sklearn.metrics import mean_squared_error

4 from keras.models import Sequential

5 from keras.layers import Dense

6 from numpy import asarray

7 from matplotlib import pyplot

8 # define the dataset

9 x = asarray([i for i in range(-50,51)])

10 y = asarray([i**2.0 for i in x])

11 print(x.min(), x.max(), y.min(), y.max())

12 # reshape arrays into into rows and cols

13 x = x.reshape((len(x), 1))

14 y = y.reshape((len(y), 1))

15 # separately scale the input and output variables

16 scale_x = MinMaxScaler()

17 x = scale_x.fit_transform(x)

18 scale_y = MinMaxScaler()

19 y = scale_y.fit_transform(y)

20 print(x.min(), x.max(), y.min(), y.max())

21 # design the neural network model

22 model = Sequential()

23 model.add(Dense(10, input_dim=1, activation='relu',

kernel_initializer='he_uniform'))

24
model.add(Dense(10, activation='relu', kernel_initializer='he_uniform'))

25
model.add(Dense(1))

26
# define the loss function and optimization algorithm

27
model.compile(loss='mse', optimizer='adam')
28 # ft the model on the training dataset

29 model.fit(x, y, epochs=500, batch_size=10, verbose=0)

30 # make predictions for the input data

31 yhat = model.predict(x)

32 # inverse transforms

33 x_plot = scale_x.inverse_transform(x)

34 y_plot = scale_y.inverse_transform(y)

35 yhat_plot = scale_y.inverse_transform(yhat)

36 # report model error

37 print('MSE: %.3f' % mean_squared_error(y_plot, yhat_plot))

38 # plot x vs y

39 pyplot.scatter(x_plot,y_plot, label='Actual')
40 # plot x vs yhat

41 pyplot.scatter(x_plot,yhat_plot, label='Predicted')

42 pyplot.title('Input (x) versus Output (y)')

43 pyplot.xlabel('Input Variable (x)')

44 pyplot.ylabel('Output Variable (y)')

45 pyplot.legend()

46 pyplot.show()

Running the example first reports the range of values for the input and output variables,
then the range of the same variables after scaling. This confirms that the scaling operation
was performed as we expected.

The model is then fit and evaluated on the dataset.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation
procedure, or differences in numerical precision. Consider running the example a few times
and compare the average outcome.
In this case, we can see that the mean squared error is about 1,300, in squared units. If we
calculate the square root, this gives us the root mean squared error (RMSE) in the original
units. We can see that the average error is about 36 units, which is fine, but not great.

What results did you get? Can you do better?

Let me know in the comments below.

1 -50 50 0.0 2500.0

2 0.0 1.0 0.0 1.0

3 MSE: 1300.776

A scatter plot is then created comparing the inputs versus the real outputs, and the inputs
versus the predicted outputs.

The difference between these two data series is the error in the approximation of the
mapping function. We can see that the approximation is reasonable; it captures the general
shape. We can see that there are errors, especially around the 0 input values.

This suggests that there is plenty of room for improvement, such as using a different
activation function or different network architecture to better approximate the mapping
function.
Scatter Plot of Input vs. Actual and Predicted Values for the Neural Net Approximation

● Deep Learning, 2016.

Articles

● Function approximation, Wikipedia.

Summary

In this tutorial, you discovered the intuition behind neural networks as function
approximation algorithms.

Specifically, you learned:

● Training a neural network on data approximates the unknown underlying mapping

Ask your questions in the comments below and I will do my best to answer.

Develop Deep Learning Projects with Python!

What If You Could Develop A Network in Minutes

...with just a few lines of Python

Discover how in my new Ebook:

Deep Learning With Python

It covers end-to-end projects on topics like:

Multilayer Perceptrons, Convolutional Nets and Recurrent Neural Nets, and more...

Finally Bring Deep Learning To

Your Own Projects

Skip the Academics. Just Results.

SEE WHAT'S INSIDE

Tweet Tweet Share

About Jason Brownlee

Jason Brownlee, PhD is a machine learning specialist who teaches developers how to get results with
modern machine learning methods via hands-on tutorials.

View all posts by Jason Brownlee →

Imbalanced Multiclass Classification with the E.coli Dataset

How to Perform Data Cleaning for Machine Learning with Python
42 Responses to Neural Networks are Function Approximation
Algorithms

Tarik M Hoshan March 18, 2020 at 6:14 pm #

May I suggest to generate random values as x_hat and predict y_hat. That would be more
realistic. You may even want to get random values for x and calculate y.
REPLY

Jason Brownlee March 19, 2020 at 6:22 am #

Great suggestion, thanks.
REPLY

Ashok Chaurasia March 22, 2020 at 1:03 am #

I also suggest scatter of residuals (y-y_hat) in the y-axis and your input variable
in the x-axis to check for discernible patterns. A discernible pattern implies the
function approximation needs improvement.
REPLY

Jason Brownlee March 22, 2020 at 6:56 am #

Great suggestion! Always look at your residuals.
REPLY

Fred Viole March 19, 2020 at 7:24 am #

Function approximation is exactly what nonparametric regressions do. For an introduction to
them see here (
https://cran.r-project.org/web/packages/NNS/vignettes/NNSvignette_Clustering_and_Regres
sion.html ) and for an extension to their use in machine learning, both continuous and
discrete classification problems see here (
https://github.com/OVVO-Financial/NNS/tree/NNS-Beta-Version/examples#3-machine-
learning)
To the problem at hand, using the above referenced NNS techniques, the MSE can do
exceptionally better than the NN demonstrated.
In R:
> library(NNS)
> x = seq(-50,50,1); y = x^2
> y.hat = NNS.reg(x,y)
Fitted.xy
y.hat
> mean((y.hat – y)^2)
[1] 149.4403
REPLY

Jason Brownlee March 19, 2020 at 10:12 am #

Nice, thanks for sharing!
REPLY

Fred Viole March 19, 2020 at 10:44 am #

YW! This article does a good job explaining the importance of
function approximation and its extensions.
REPLY

Jason Brownlee March 19, 2020 at 1:48 pm #

Thanks.
REPLY

Sean O'Connor March 19, 2020 at 10:18 am #

There is however the dilution problem with conventional artificial neural networks when there
is only one non-linear term per n weights. Where n is the width of the network. That means
with say a ReLU network there are fewer ‘break-points’ than if you had 1 non-linear term
(ReLU output) per weight. Giving poor fitting.
I would say (not giving myself airs) neural network researchers should go back and look at
the basics of what they are doing.
https://ai462qqq.blogspot.com/2019/11/artificial-neural-networks.html
https://discourse.numenta.org/t/and-there-they-stopped/7293
REPLY

Jason Brownlee March 19, 2020 at 1:47 pm #

Thanks for sharing.
REPLY

Leo March 20, 2020 at 9:53 am #

Love this succinct summary of what i consider to be one of the most overlooked and yet
fundamental concepts in ML (that you are approximating a function). Given your abilities I
would love to see you do your take on no free lunch. Any chance you would do an article on
a clear explanation of no free lunch? 🙂
REPLY
Jason Brownlee March 20, 2020 at 1:16 pm #
Thanks.
Great suggestion, I know it well. The take away is the reminder that you should
use careful and extensive experiments rather than favourite algorithm to get the
best results for a problem.
REPLY

Leo March 20, 2020 at 9:57 am #

I think it could also be helpful as others have mentioned to augment the article with a couple
small additions (or maybe include in a followup). You could next add a small normal curve
error around the observed y values and show what happens and explain why this is
important to know about. Another thing you could do something like chopping off a bit of the
two sides (the lowest and highest x values) and then show the effect of overfitting by
showing how the NN handles the edges once it runs out of training examples. The
expectation here for me would be that if you dont train properly for overfitting once the NN
hits the edges it may not know what to do. If trained correctly w test and train sets etc it then
may continue well along the edges and keep following the y=x^2 curve.
REPLY

Jason Brownlee March 20, 2020 at 1:16 pm #

Great extension!
REPLY

Thodoris April 5, 2020 at 7:18 am #

Well organized and very good presentation. I love to read your articles.
An other suggestion is an article about function approximation and multi-value data.
Functions such as x^2 , sinx, cosx, …..is one of them. It will be very interesting,
REPLY

Jason Brownlee April 5, 2020 at 1:40 pm #

Thanks.
Great suggestion.
REPLY

Scott May 17, 2020 at 6:17 am #

Hi Jason! I love your tutorials. I was just reviewing this same topic using essentially the same
problem via a Tensorflow introduction
(https://colab.research.google.com/github/lmoroney/mlday-tokyo/blob/master/Lab1-Hello-ML-
World.ipynb), and I was struck with the piecewise-linear nature of the approximation, and the
fact that it was an asymmetric fit to a symmetric function. I tried different numbers of hidden
neurons, different activations, and different number of epochs to try to better understand the
shape of the approximation as a function of these variables. Do you have any general
comments to add on these matters? Such reflections could be a useful addition to this
tutorial, which otherwise seems to end abruptly.
REPLY

Jason Brownlee May 17, 2020 at 6:43 am #

Try changing the activation function in the hidden layer to sigmoid.
REPLY

Scott May 17, 2020 at 7:44 am #

Ok…but I’d already done that before asking the question.
REPLY

Jason Brownlee May 18, 2020 at 6:04 am #

Thanks.
The “why” is hard for neural nets.
For a given model, we could explore the calculation of
each node and understand how specific outputs came to
be. Very time consuming with not a lot of payoff if we are
focused on model skill for a predictive modeling problem.
I generally try to stay away from crafting a “why” narrative
for models:
https://machinelearningmastery.com/faq/single-faq/how-
do-i-interpret-the-predictions-from-my-model
REPLY

JG July 20, 2020 at 4:35 am #

Hi Jason,
Great Function Approx. Tutorial. I like it!
Before working on your different tutorial using GAN, for function approach ( I am excited
about this radical “Adversarial Nets approach”!), I decided to spend some time on this one.
Here, I share my “experiments” based on variations on your script. Thanks you for offering
such as simple and tested code.
My comments:
1º) In y opinion Functions or data fitting approach are related mathematically to Taylor
series, Least Squares methods (solved on python with library “scipy.optimize.curve_fit( )”),
etc. But in this case of NN we do not have to have a previous assumption about the kind of
f(x) … we just fitted the data !!. So it is a more advance method, in my opinion…
2º) I added others layers to the 3 dense layers, such as BatchNormalization(), Dropout,() I
also used l1_l2 regularizers, but I get even worse results that your basic and simple model of
3 dense layers!
3º) I change the numbers of 10 units in dense layer to higher (50,100,…) and lower (5, ) , But
I did not get any improvement about your optimum one.
4º) I decided not to use normalized data (such as MinMaxScaler()) so I tested raw (X,Y)
data…without normalization …and surprisingly the code continuing performing well , on a
stable way…My only observation it is demanded more (X,Y) samples !
5º) I also tested Cubic functions (X^3), besides quadratics and linear…for all of them they fit
pretty well the whole data bunch … But with more samples and more epochs I got lower ‘root
mean squared error’ up to a limit around 0.03
6º) I decided to test also trigonometric functions (sine) with the same architecture and model
and it is was ver good performed.
7º) I decided to test exponential functions (a+e^bx+c) with the same architecture and model
and it also was very good performed.
So your model it is very simple and robust and performs very well for all that cases …My
only “finding”, after all these experiments it is that outside of the dataset learning range
(Xmin-Xmax, Ymin-Ymax), when I tested values out range …the approach or NN model
prediction start to get worse …the more outside …the more bads…So it learns well the data
inside range …but it can not generalizes well outside that range …unless you know the kind
of functions to apply subsequent methods such as “scipy.optimize.curve-fit()”…but in this
case you need to assume the type of expected function and predicted behaviour on values
outside range.
This is the reason because I prefer the “honesty” of NN …because outside of range values
… you can affirm nothing !…There is not values to test it!
As I said before I am excited about how can GAN approach handle these learnings on
functions approaching vs this very simple NN approach!
regards,
JG
REPLY

Jason Brownlee July 20, 2020 at 6:22 am #

Very cool, well done!
Depending on the form of the function, the neural nets may or may not
extrapolate well. In this case not so well. Functions might be a little too nonlinear
and the model a little too simple.
It might be fun to limit training to subsets of the valid input space and see if it can
learn to interpolate correctly. I expect it can. This would show it at least learns a
viable approximation within the domain – related to your extrapolation
experiments.
You could write your experiments up as a tutorial, I’d post them in a heart beat!
REPLY

JG July 22, 2020 at 2:29 am #

Hi Jason,
More COMMENTS on my experiments!.
I review several of the NN function approaches experiments (polynomials, trigonometric and
exponential), training for different space subset dataset and I confirm the solutions outside
the range of dataset training get worse as they are more faraway of training range. Anyway
the function approach gets better with dataset training normalization (apply to input dataset
process) and more training samples!.
The curious case of trigonometric function (e.g. F(X) = a*sine(f*X)+c I solved) it is, if I
increase the frequency(“f”, the function approach loss the rapid changes of F(X)…And the
only solution it is increase dramatically the dataset training (adapting the amount in order to
catch the rapid changes of F(X). I appreciate some sort of “nyquist-shannon” sample rate
necessity !.
But even worse, in this trigonometric case the function approach, outside the X training
range, does not work at all . It seems to follow a lower frequency pattern than the right one.
Anyway it is OK inside training range.
QUESTIONS:
1º) I have a big question. if the Neural Net it is a mapping between X (imagine a unique
feature or input or X), and output or F(X)=Y. Once the model is learnt (I can imagine we can
construct an explicit weights that multiply for input (X) plus adding some bias we get the final
“Y” or output (F(X).
So, how we can not EXPLICIT the function approach learnt by the model, in this case!,
instead of following considering the model as black box (with not possibility of explicit
mapping)?
2º) Following this case particular case of F(X) mapping…between X and F(X), if we can re-
construct this dependency of Y (output) vs X(unique input) in terms of weighs and bias (this
all about NN) How is it possible a final constant number (resulting from the weigh product ,
and adding bias to the Input X) could reproduce such as more complex functions of an e.g.
F(X) = an*X^2 + a(n-1)*X^(n-1) + …b =0?
probably I lost something key about model weighs and bias in NN or it is just a question of
summer heat on North Hemisphere:-))
REPLY

Jason Brownlee July 22, 2020 at 5:42 am #

Yes, you can use the feedforward network manually to perform the mapping, or a
framework like keras. Perhaps I don’t follow this first question?
A well-configured neural net can learn any function, nonlinear or otherwise, at
least in theory. The trick is finding a config and training schedule that can
actually fit it from examples.
REPLY

JG July 22, 2020 at 9:42 pm #

let me put in another way .
if the weights (multiplications coefficients between neurons) and bias
(adding between model layers) are constants (At least at the end of
model learnt process), so we can “ideally re-construct” a
mathematical relationship between the model input(X) and output
(F(x) …
This final explicit” equation (at least for this case where we have an
unique input X and unique output Y = F(X)) …could be on the way as
Y = Constant (final weights and bias multiplications and additions) *
X + bias (final bias results)
So we ended we an explicitly set up of the Mapping function of the
model learnt …(an important question to interpret or understand the
results of a learnt model that usually there is not way to do it) …is it
is possible or I am wrong!
Following this approach (my second derived question) is, therefore I
do not understand how this final equation , could reproduce e.g. a
more complex polynomial such as F(X) = a* x^4+ b *X^3 + c*X^2 +
d*X + sin(X) + .. .it it is only = Constant +X + Constant !!
I think a miss something key about the final weights and bias learnt
by any model…:-(
REPLY

Jason Brownlee July 23, 2020 at 6:10 am #

It is possible and some people do, but it will be large and
full of matrix products, e.g. a mess to read.
The approximate of the function would look very different,
unreadably different. This drives some practitioners crazy
and throw in the towel.
REPLY

JG July 24, 2020 at 5:53 pm #

:-)) thanks Jason!
REPLY

Jason Brownlee July 25, 2020 at 6:14 am #

You’re welcome.
REPLY

AG October 20, 2020 at 8:11 am #

Hi Jason,
First of all, thank you very much for the great content you’ve created with big effort.
I’ve a question. I created a simple inverse V shaped dataset and added the minimum of
number of layers and nodes that can fit this data as shown below. Half of my random training
attempts resulted with not fitting lines (very high loss value, maybe hitting local minimas?)
The other half fits perfectly (I guess it finds global minimum).
Hoever when I increase the layers and nodes in each layer as in your example, the chance
of fitting perfectly is about 95%. In the %5, again very high loss value.
My question is that how the increasing number of layers and nodes increases the chance of
fitting?
import numpy as np
import matplotlib.pyplot as plt
from keras import models
from keras import layers
from keras import optimizers
X_1 = np.linspace(-5,0,25)
y_1 = 5 + X_1 + 0.2*np.random.normal(0,1,size=25)
X_2 = np.linspace(0,5,25)
y_2 = 5 – X_2 + 0.2*np.random.normal(0,1,size=25)
X=np.concatenate((X_1,X_2))
y=np.concatenate((y_1,y_2))
fig, ax = plt.subplots()
ax.set_title(“Linear Regression”)
ax.scatter(X, y, label=”trainings data”,s=1)
ax.grid()
model = models.Sequential()
model.add(layers.Dense(2,activation = ‘relu’))
model.add(layers.Dense(1,activation=”linear”))
model.compile(loss=’mean_squared_error’, optimizer=optimizers.Adam(lr=0.1))
history = model.fit(X,y, epochs=500, verbose=1)
y_hat = model.predict(X)
ax.plot(X,y_hat,’r’)
fig.tight_layout()
REPLY
Jason Brownlee October 20, 2020 at 1:38 pm #
Good question, changing the architecture of the model will change the types of
functions that can be fit.
We have a tough job when using neural nets to find a good architecture (set of
functions) and a good optimization processing (learning configuration).
REPLY

Edwin Tilero December 30, 2020 at 7:45 am #

Hi Jason, excellent article. I am a very beginer using python. Sorry for next question (maybe
so simple for you). I was trying to reproduce the example but I felt. I am using Spyder
(anaconda 1.10). An error appeared regarding to keras:
File “C:\Users\Usuario\Documents\Data Science\1 Python_Examples\fitting_NeuralNet_on
dataset.py”, line 10, in
from keras.models import Sequential
ModuleNotFoundError: No module named ‘keras’
I installed keras as follow: python -m pip install –user keras. Even I tried the update of keras
but it still did not work. Could you help with this?
Best,
Edwin
REPLY

Jason Brownlee December 30, 2020 at 7:51 am #

Perhaps try copying the code into a new file, save the file and run the python
script from the command line:
https://machinelearningmastery.com/faq/single-faq/how-do-i-run-a-script-from-
the-command-line
Also, confirm keras is installed and available on the command line using the
advice in this tutorial:
https://machinelearningmastery.com/setup-python-environment-machine-
learning-deep-learning-anaconda/
REPLY

Edwin Tillero December 31, 2020 at 4:43 am #

Hi Jason,
Thanks a lot. Tutorial to confirm keras did work!. I appreciate what you do.
Just one more consultation, I was traying to run your tutorial in Pycharm but I think it is
happening something similar to what it happended in Anaconda. This time did not recognize
most of functions (pyplot, numpy,etc). An example below:
ModuleNotFoundError: No module named ‘matplotlib’
what do you suggest?
Best,
Edwin
REPLY

Jason Brownlee December 31, 2020 at 5:31 am #

Thanks!
Looks like you have not installed matplotlib library.
Also, I recommend not using an IDE and explain more here:
https://machinelearningmastery.com/faq/single-faq/why-dont-use-or-recommend-
notebooks
REPLY

James HH Rodriguez September 24, 2021 at 3:46 am #

Very well but I don’t use libraries
REPLY

Adrian Tam September 24, 2021 at 9:35 am #

Then you might like this post: https://machinelearningmastery.com/implement-
backpropagation-algorithm-scratch-python/
Making a neural network entirely from scatch!
REPLY

David White November 2, 2021 at 2:15 am #

Are there newer current examples of multivariate in and out versions of this for Keras? I have
a problem with 5 features that output 3 numbers that are likely some standard deviation
formula. Maybe some moving average. I have millions of samples for supervised learning
and this seems like a good way to decode it.
REPLY

Adrian Tam November 7, 2021 at 7:05 am #

For multivariate, the simple change would be change the input_dim and the
number in the parentheses of the final Dense(n) in the model. They corresponds
to the input and output dimensions. However, since multivariate are inherently
more complex, you may need more hidden layers to produce good result.
REPLY

Nikolay Zamyatin March 14, 2022 at 6:12 pm #

where to find the neuro lab program approximation of functions lesson 10. Thanks you!
Nick
REPLY

James Carmichael March 15, 2022 at 1:33 pm #

Hi Nick…Please clarify what you are referencing so that we can better assist
you.
REPLY

Nikolay Zamyatin March 15, 2022 at 5:19 pm #

Hi,thanks you for quickly answer. that is approximation of the dependence y=f(x) of
experimental data by a neural network as a reference https://www.youtube.com/watch?
v=kze0QxYzo5w
REPLY

Nikolay Zamyatin March 20, 2022 at 2:42 am #

Dear,James!
Sorry to disturb I have solved the problem.

Turing Machine Solved Examples
No ratings yet
Turing Machine Solved Examples
17 pages
Notes On Introduction To Deep Learning
No ratings yet
Notes On Introduction To Deep Learning
19 pages
Lecture 2
No ratings yet
Lecture 2
67 pages
TFM Lichtner Bajjaoui Aisha
No ratings yet
TFM Lichtner Bajjaoui Aisha
18 pages
DSA5105 Lecture5
No ratings yet
DSA5105 Lecture5
52 pages
NN Unit - 1
No ratings yet
NN Unit - 1
27 pages
26 Neural Nets
No ratings yet
26 Neural Nets
77 pages
Activation Functions in Neural Networks - 241102 - 224129
No ratings yet
Activation Functions in Neural Networks - 241102 - 224129
7 pages
Perceptron in Machine Learning
No ratings yet
Perceptron in Machine Learning
11 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
DeepLearing Theory
No ratings yet
DeepLearing Theory
51 pages
DEEP LEARNING Paper
No ratings yet
DEEP LEARNING Paper
12 pages
Module 2
No ratings yet
Module 2
44 pages
Unit-5 AI ETC
No ratings yet
Unit-5 AI ETC
64 pages
Activation Function in NN
No ratings yet
Activation Function in NN
29 pages
Machine Learning (CSO851) - Lecture 08
No ratings yet
Machine Learning (CSO851) - Lecture 08
27 pages
Unit Ii DNN
No ratings yet
Unit Ii DNN
24 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
0905 Cs 161183 Vishal
No ratings yet
0905 Cs 161183 Vishal
38 pages
Macro Finance
No ratings yet
Macro Finance
119 pages
Deep Learning Computer Vision
No ratings yet
Deep Learning Computer Vision
302 pages
Unit 2
No ratings yet
Unit 2
35 pages
Activation FN
No ratings yet
Activation FN
15 pages
Unit I
No ratings yet
Unit I
90 pages
Functii de Activare1
No ratings yet
Functii de Activare1
89 pages
Activation Functions
No ratings yet
Activation Functions
11 pages
Deep Learning As A Building Block in Probabilistic Models: Pierre-Alexandre Mattei
No ratings yet
Deep Learning As A Building Block in Probabilistic Models: Pierre-Alexandre Mattei
57 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
36 pages
ML Lec 10 Neural Networks
No ratings yet
ML Lec 10 Neural Networks
87 pages
Fundamentals Deep Learning Activation Functions When To Use Them
No ratings yet
Fundamentals Deep Learning Activation Functions When To Use Them
15 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
34 pages
Applications of ANN
No ratings yet
Applications of ANN
19 pages
DL M2 Tech
No ratings yet
DL M2 Tech
32 pages
MLfromBasics Ch2E
No ratings yet
MLfromBasics Ch2E
32 pages
Lec 22 Activations Functions Complete
No ratings yet
Lec 22 Activations Functions Complete
33 pages
cst414 - Deep Learning
No ratings yet
cst414 - Deep Learning
34 pages
Lecture - 05 (Introduction To ANN)
No ratings yet
Lecture - 05 (Introduction To ANN)
27 pages
CS 329 Lecture4 2025new
No ratings yet
CS 329 Lecture4 2025new
61 pages
Unit Iv
No ratings yet
Unit Iv
34 pages
Week 2 Artificial Neural Networks
No ratings yet
Week 2 Artificial Neural Networks
62 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
Unit II
No ratings yet
Unit II
12 pages
Activation Function To Back Pro
No ratings yet
Activation Function To Back Pro
22 pages
Neural Network As Universal Function Approximates
No ratings yet
Neural Network As Universal Function Approximates
13 pages
Artificial Neural Artificial Neural Networks
No ratings yet
Artificial Neural Artificial Neural Networks
40 pages
A Proposal of Neural Network Architecture For Non-Linear Function Approximation
No ratings yet
A Proposal of Neural Network Architecture For Non-Linear Function Approximation
4 pages
Ann MJJ-1
No ratings yet
Ann MJJ-1
64 pages
Neural Network As Universal Approximates
No ratings yet
Neural Network As Universal Approximates
5 pages
Artificial Neural Networks and Deep Learning
No ratings yet
Artificial Neural Networks and Deep Learning
22 pages
Machine Learning-Gkouzionis
No ratings yet
Machine Learning-Gkouzionis
14 pages
ML MU Unit 5NeuralNetworkpdf 2025 04 16 13 47 39
No ratings yet
ML MU Unit 5NeuralNetworkpdf 2025 04 16 13 47 39
57 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
Unit V
No ratings yet
Unit V
9 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
Unit 2 Introduction To Deep Learning
No ratings yet
Unit 2 Introduction To Deep Learning
79 pages
Neural Networks Thesis Final ChristofferMagnus
No ratings yet
Neural Networks Thesis Final ChristofferMagnus
36 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
Activation Function
No ratings yet
Activation Function
44 pages
Lecture Five Radial-Basis Function Networks: Associate Professor
No ratings yet
Lecture Five Radial-Basis Function Networks: Associate Professor
64 pages
Lecture Notes 5
No ratings yet
Lecture Notes 5
3 pages
Deep Learning Fundamentals in Python
From Everand
Deep Learning Fundamentals in Python
LazyProgrammer
4/5 (9)
Cfg-Context Free Grammar
No ratings yet
Cfg-Context Free Grammar
19 pages
Log TCap
No ratings yet
Log TCap
28 pages
File Handling in Python
No ratings yet
File Handling in Python
4 pages
1735921088376+código TP4 - Decomposicao de Benders
No ratings yet
1735921088376+código TP4 - Decomposicao de Benders
16 pages
XML Notes
No ratings yet
XML Notes
48 pages
BSC Computer Science Syllabus Nep
No ratings yet
BSC Computer Science Syllabus Nep
17 pages
Assignment 1-3 Java
No ratings yet
Assignment 1-3 Java
14 pages
Analiza Za Informatičare - Vežbe 2
No ratings yet
Analiza Za Informatičare - Vežbe 2
5 pages
DS Guess Paper 2024-25
No ratings yet
DS Guess Paper 2024-25
5 pages
Stone Duality
No ratings yet
Stone Duality
114 pages
Teorija Malog Sveta: Primer Evolucije Matematičkog Modelovanja Realnosti
No ratings yet
Teorija Malog Sveta: Primer Evolucije Matematičkog Modelovanja Realnosti
31 pages
Java 2marks Questions With Answer
No ratings yet
Java 2marks Questions With Answer
19 pages
Tudo Su Rajiv Aku Mike
No ratings yet
Tudo Su Rajiv Aku Mike
2 pages
(ვ. ჩხაიძე, ი. სალუქვაძე) ინფორმაციული ტექნოლოგიები გეოგრაფიაში 2016
No ratings yet
(ვ. ჩხაიძე, ი. სალუქვაძე) ინფორმაციული ტექნოლოგიები გეოგრაფიაში 2016
78 pages
Python Control Flow Statements and Loops: Pynative
No ratings yet
Python Control Flow Statements and Loops: Pynative
16 pages
Discrete Structures - Lecture 26, 27
No ratings yet
Discrete Structures - Lecture 26, 27
19 pages
Python For Everybody With Web Scraping
No ratings yet
Python For Everybody With Web Scraping
4 pages
Zoho String Questions
No ratings yet
Zoho String Questions
41 pages
CS115 FinalPrep
No ratings yet
CS115 FinalPrep
20 pages
ALP - Lab Manual 2023 - 24
No ratings yet
ALP - Lab Manual 2023 - 24
48 pages
CS153: Compilers Lecture 19: Optimization: Stephen Chong
No ratings yet
CS153: Compilers Lecture 19: Optimization: Stephen Chong
38 pages
Theory Lectures v2.3
No ratings yet
Theory Lectures v2.3
264 pages
UNIT 2 OS Notes
No ratings yet
UNIT 2 OS Notes
23 pages
Energy Efficient Multi-Robot Task Allocation Constrained by Time Window and Precedence
No ratings yet
Energy Efficient Multi-Robot Task Allocation Constrained by Time Window and Precedence
12 pages
Train and Peter
0% (1)
Train and Peter
14 pages
CS3401 Algorithms Unit III
No ratings yet
CS3401 Algorithms Unit III
104 pages
ADA - Study Material by MKS
No ratings yet
ADA - Study Material by MKS
113 pages
Test Case Generator Module - GUITAR
No ratings yet
Test Case Generator Module - GUITAR
24 pages
Anna University (Syllabus) V Semester (EEE) Cs-1261 Object Oriented Programming Lab Programs Java
No ratings yet
Anna University (Syllabus) V Semester (EEE) Cs-1261 Object Oriented Programming Lab Programs Java
8 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Neural Networks

Uploaded by

Neural Networks

Uploaded by

Neural Networks are Function Approximation

by Jason Brownlee on March 18, 2020 in Deep Learning

Tweet Tweet Share

Last Updated on August 27, 2020

Supervised learning in machine learning can be described in terms of function

After completing this tutorial, you will know:

● Training a neural network on data approximates the unknown underlying mapping

Let’s get started.

Photo by daveynin, some rights reserved.

This tutorial is divided into three parts; they are:

1. What Is Function Approximation

What Is Function Approximation

Function approximation is a technique for estimating an unknown underlying function using

Artificial neural networks learn to approximate a function.

Neural networks are an example of a supervised learning algorithm and seek to

It is best to think of feedforward networks as function approximation machines that are

We say “approximate” because although we suspect such a mapping function exists, we

So why do we like using neural networks for function approximation?

— Page 198, Deep Learning, 2016.

Regression predictive modeling involves predicting a numerical quantity given inputs.

Both of these predictive modeling problems can be seen as examples of function

To make this concrete, we can review a worked example.

Definition of a Simple Function

Where y is the output variable and x is the input variable.

The example below implements this in Python.

1 # example of creating a univariate dataset with a given mapping function

3 # define the input data

5 # define the output data

7 # plot the input versus the output

9 pyplot.title('Input (x) versus Output (y)')

10 pyplot.xlabel('Input Variable (x)')

11 pyplot.ylabel('Output Variable (y)')

The input and output variables represent our dataset.

Approximating a Simple Function

First, let’s define the dataset.

2 # define the dataset

3 x = asarray([i for i in range(-50,51)])

4 y = asarray([i**2.0 for i in x])

5 print(x.min(), x.max(), y.min(), y.max())

2 # reshape arrays into into rows and cols

Next, we will need to scale the inputs and the outputs.

2 # separately scale the input and output variables

7 print(x.min(), x.max(), y.min(), y.max())

We can now define a neural network model.

2 # design the neural network model

4 model.add(Dense(10, input_dim=1, activation='relu',

2 # define the loss function and optimization algorithm

3 model.fit(x, y, epochs=500, batch_size=10, verbose=0)

Once fit, we can evaluate the model.

First, we can make the prediction.

2 # make predictions for the input data

We then must invert the scaling that we performed.

2 # report model error

3 print('MSE: %.3f' % mean_squared_error(y_plot, yhat_plot))

4 pyplot.title('Input (x) versus Output (y)')

5 pyplot.xlabel('Input Variable (x)')

6 pyplot.ylabel('Output Variable (y)')

Tying this together, the complete example is listed below.

1 # example of fitting a neural net on x vs x^2

2 from sklearn.preprocessing import MinMaxScaler

3 from sklearn.metrics import mean_squared_error

5 from keras.layers import Dense

6 from numpy import asarray

7 from matplotlib import pyplot

8 # define the dataset

9 x = asarray([i for i in range(-50,51)])

10 y = asarray([i**2.0 for i in x])

11 print(x.min(), x.max(), y.min(), y.max())

12 # reshape arrays into into rows and cols

15 # separately scale the input and output variables

20 print(x.min(), x.max(), y.min(), y.max())

21 # design the neural network model

23 model.add(Dense(10, input_dim=1, activation='relu',

29 model.fit(x, y, epochs=500, batch_size=10, verbose=0)

30 # make predictions for the input data

36 # report model error