0% found this document useful (0 votes)
5 views13 pages

Keras1 - 1.3 Improving Your Model Performance

1-Introduction to Deep Learning with Keras 1.3 Improving Your Model Performance 1.3.1. Learning curve 1.3.2. Activation functions 1.3.3. Batch size and batch normalization

Uploaded by

Ayşe Bat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views13 pages

Keras1 - 1.3 Improving Your Model Performance

1-Introduction to Deep Learning with Keras 1.3 Improving Your Model Performance 1.3.1. Learning curve 1.3.2. Activation functions 1.3.3. Batch size and batch normalization

Uploaded by

Ayşe Bat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

10/22/24, 10:27 PM OneNote

1.3 Improving Your Model Performance


21 Nisan 2024 Pazar 15:03

1-Introduction to Deep Learning with Keras


1.3 Improving Your Model Performance
1.3.1. Learning curve
Learning curves provide a lot of information about your model. Now that you know how
to use the history callback to plot them, you will learn how to read them to get the most
value out of them. So far we've seen two types of learning curves: loss curves and
accuracy curves.

Loss curve
Loss tends to decrease as epochs go by. This is expected since our model is essentially
learning to minimize the loss function. Epochs are shown on the X axis and loss on the
Y-axis. As epochs go by our loss value decreases. After a certain amount of epochs,
the value converges, meaning it no longer gets much lower than that. We've arrived at a
minimum.

Accuracy curve
Accuracy curves are similar but opposite in tendency if the Y-axis shows accuracy it
now tends to increase as epochs go by. This shows that our model makes fewer
mistakes as it learns.

Overfitting
If we plot training versus validation data we can identify overfitting. We will see the
training and validation curves start to diverge. Overfitting is when our model starts
learning particularities of our training data which don't generalize well on unseen data.
The early stopping callback is useful to stop our model before it starts overfitting.

Unstable curves
But not all curves are smooth and pretty, many times we will find unstable curves. There
are many reasons that can lead to unstable learning curves; the chosen optimizer,
learning rate, batch-size, network architecture, weight initialization, etc. All these

https://onedrive.live.com/redir?resid=F8B40D88517D3A9B%212246&page=Edit&wd=target%28Keras1- Introduction to Deep Learning with Keras.one%7C9… 1/13


10/22/24, 10:27 PM OneNote
parameters can be tuned to improve our model learning curves, as we aim for better
accuracy and generalization power. We will cover this in the following videos.

Can we benefit from more data?


Neural networks are well known for surpassing traditional machine learning techniques
as we increase the size of our datasets. We can check whether collecting more data
would increase a model’s generalization and accuracy.We aim at producing a graph like
this one, where we have fitted our model with increasing amounts of training data and
plotted the values for the training and test accuracies of each run.If after using all our
data we see that our test still has a tendency to improve, that is, it's not parallel to our
training set curve and it's increasing, then it's worth it to gather more data if possible to
allow the model to keep learning.

Coding train size comparison


How would we go about coding a graph like the previous one? Imagine we want to
evaluate an already built and compiled model and that we have partitioned our data into
X_train, y_train, X_test and y_test. We first store the model initial weights, this is done
by calling get_weights on our model,we then initialize two lists to store train and test
accuracies.

We loop over a predefined list of train sizes and for each training size we get the
corresponding training data fraction. Before any training, we make sure our model starts
with the same set of weights by setting them to the initial_weights using the set_weights
function. After that, we can fit our model on the training fraction. We use an
EarlyStopping callback which monitors loss,but it's important to note that it's not
validation loss since we haven't provided the fit method with validation data. After the
training is done, we can get the accuracy for the training set fraction and the accuracy
from the test set and append it to our lists of accuracies. Observe that the same quantity
of test data observations were used to evaluate each iteration.

EXERSICE: Learning the digits


You're going to build a model on the digits dataset, a sample dataset that comes pre-
loaded with scikit learn. The digits dataset consist of 8x8 pixel handwritten digits
from 0 to 9:

You want to distinguish between each of the 10 possible digits given an image, so we
are dealing with multi-class classification.
The dataset has already been partitioned into X_train, y_train, X_test, and y_test, using
30% of the data as testing data. The labels are already one-hot encoded vectors, so
you don't need to use Keras to_categorical() function.
Let's build this new model!

https://onedrive.live.com/redir?resid=F8B40D88517D3A9B%212246&page=Edit&wd=target%28Keras1- Introduction to Deep Learning with Keras.one%7C9… 2/13


10/22/24, 10:27 PM OneNote

Is the model overfitting?


Let's train the model you just built and plot its learning curve to check out if it's
overfitting! You can make use of the loaded function plot_loss() to plot training loss
against validation loss, you can get both from the history callback.
If you want to inspect the plot_loss() function code, paste this in the
console: show_code(plot_loss)

Do we need more data?


It's time to check whether the digits dataset model you built benefits from more training
examples!
In order to keep code to a minimum, various things are already initialized and ready to
use:
• The model you just built.
• X_train,y_train,X_test, and y_test.
• The initial_weights of your model, saved after using model.get_weights().
• A pre-defined list of training sizes: training_sizes.
• A pre-defined early stopping callback monitoring loss: early_stop.
• Two empty lists to store the evaluation results: train_accs and test_accs.
Train your model on the different training sizes and evaluate the results on X_test.
End by plotting the results with plot_results().
The full code for this exercise can be found on the slides!

1.3.2. Activation functions


So far we've been using several activation functions in our models, but we haven't yet
covered their role in neural networks other than when it comes to obtaining the output
we want in our output layer. Inside the neurons of any neural network the same process
takes place:
https://onedrive.live.com/redir?resid=F8B40D88517D3A9B%212246&page=Edit&wd=target%28Keras1- Introduction to Deep Learning with Keras.one%7C9… 3/13
10/22/24, 10:27 PM OneNote

A summation of the inputs reaching the neuron multiplied by the weights of each
connection and the addition of the bias weight. This operation results in a number: a,
which can be anything, it is not bounded.

We pass this number into an activation function that essentially takes it as an input and
decides how the neuron fires and which output it produces. Activation functions impact
learning time, making our model converge faster or slower and achieving lower or
higher accuracy. They also allow us to learn more complex functions.

Activation zoo
Four very well known activation functions are: The sigmoid, which varies between 0 and
1 for all possible X input values. The tanh or Hyperbolic tangent, which is similar to the
sigmoid in shape but varies between -1 and 1.

The ReLU (Rectified linear unit) which varies between 0 and infinity and the leaky
ReLU, which we can look as a smoothed version of ReLU that doesn't sit at 0, allowing
negative values for negative inputs.

https://onedrive.live.com/redir?resid=F8B40D88517D3A9B%212246&page=Edit&wd=target%28Keras1- Introduction to Deep Learning with Keras.one%7C9… 4/13


10/22/24, 10:27 PM OneNote
Effects of activation functions
Changing the activation function used in the hidden layer of the model we built for
binary classification results in different classification boundaries.

We can see that the previous model can not completely separate red crosses from blue
circles if we use a sigmoid activation function in the hidden layer. Some blue circles are
misclassified as red crosses along the diagonal. However, when we use the tanh we
completely separate red crosses from blue circles, the separation region for the blue
and red classification is smooth.

Using a ReLU activation function we obtain sharper boundaries,the leaky ReLU shows
similar behavior for this dataset. It's important to note that these boundaries will be
different for every run of the same model because of the random initialization of weights
and other random variables that aren't fixed.

Which activation function to use?


All activation functions come with their pros and cons. There's no easy way to determine
which activation function is best to use. Based on their properties, the problem at
hand,and the layer we are looking at in our network, one activation function will perform
better in terms of achieving our goal. A way to go is to start with ReLU as they train fast
and will tend to generalize well to most problems,avoid sigmoids,and tune with
experimentation.It's easy to compare how models with different activation functions
perform if they are small enough and train fast. It's important to set a random seed with
numpy, that way the model weights are initialized the same for each activation function.
We then define a function that returns a fresh new model each time, using the
act_function parameter.

We can then use this function as we loop over several activation functions, training
different models and saving their history callback. We store all these callbacks in a
dictionary.

https://onedrive.live.com/redir?resid=F8B40D88517D3A9B%212246&page=Edit&wd=target%28Keras1- Introduction to Deep Learning with Keras.one%7C9… 5/13


10/22/24, 10:27 PM OneNote

With this dictionary of histories, we can extract the metrics we want to plot, build a
pandas dataframe and plot it.

Comparing activation functions


Comparing activation functions involves a bit of coding, but nothing you can't do!
You will try out different activation functions on the multi-label model you built for your
farm irrigation machine in chapter 2. The function get_model('relu') returns a copy of this
model and applies the 'relu' activation function to its hidden layer.
You will loop through several activation functions, generate a new model for each and
train it. By storing the history callback in a dictionary you will be able to visualize which
activation function performed best in the next exercise!
X_train, y_train, X_test, y_test are ready for you to use when training your models.

Comparing activation functions II


What you coded in the previous exercise has been executed to obtain
theactivation_results variable, this time 100 epochs were used instead of 20. This way
you will have more epochs to further compare how the training evolves per activation
function.
For every h_callback of each activation function in activation_results:
• The h_callback.history['val_loss'] has been extracted.
• The h_callback.history['val_accuracy'] has been extracted.
Both are saved into two
dictionaries: val_loss_per_function and val_acc_per_function.
Pandas is also loaded as pd for you to use. Let's plot some quick validation loss
and accuracy charts!

https://onedrive.live.com/redir?resid=F8B40D88517D3A9B%212246&page=Edit&wd=target%28Keras1- Introduction to Deep Learning with Keras.one%7C9… 6/13


10/22/24, 10:27 PM OneNote

1.3.3. Batch size and batch normalization


It’s time to learn the concepts of batch size and batch normalization. A mini-batch is a
subset of data samples. If we were training a neural network with images, each image in
our training set would be a sample and we could take mini-batches of different sizes
from the training set batch.

Mini-batch
Remember that during an epoch we feed our network, calculate the errors and update
the network weights. It's not very practical to update our network weights only once per
epoch after looking at the error produced by all training samples. In practice, we take a
mini-batch of training samples. And that way, if our training set has 9 images and we
choose a batch_size of 3, we will perform 3 weight updates per epoch, one per mini-
batch.

Networks tend to train faster with mini-batches since weights are updated often.
Sometimes datasets are so huge that they would struggle to fit in RAM memory if we
didn't use mini-batches. Also, the noise produced by a small batch-size can help escape
local minima. A couple of disadvantages are the need for more iterations and finding a
good batch size.

Effects of batch sizes


Here you can see how different batch sizes converge towards a minimum as training
goes by. Training with all samples is shown in blue. Mini-batching is shown in green.
Stochastic gradient descent, in red, uses a batch_size of 1. We can see how the path
towards the best value for our weights is noisier the smaller the batch_size. They reach
the same value after a different number of iterations.

https://onedrive.live.com/redir?resid=F8B40D88517D3A9B%212246&page=Edit&wd=target%28Keras1- Introduction to Deep Learning with Keras.one%7C9… 7/13


10/22/24, 10:27 PM OneNote

Batch size in Keras


You can set your own batch_size with the batch_size parameter on the model's fit
method. Keras uses a default batch-size of 32. Increasing powers of two tend to be
used. As a rule of thumb, you tend to make your batch size bigger the bigger your
dataset.

Normalization in machine learning


Normalization is a common pre-processing step in machine learning algorithms,
especially when features have different scales. One way to normalize data is to subtract
its mean value and divide by the standard deviation. We always tend to normalize our
model inputs. This avoids problems with activation functions and gradients.

This leaves everything centered around 0 with a standard deviation of 1.

Reasons for batch normalization


Normalizing neural networks inputs improve our model. But deeper layers are trained
based on previous layer outputs and since weights get updated via gradient descent,
consecutive layers no longer benefit from normalization and they need to adapt to
previous layers' weight changes, finding more trouble to learn their own weights. Batch
normalization makes sure that, independently of the changes, the inputs to the next
layers are normalized. It does this in a smart way, with trainable parameters that also
learn how much of this normalization is kept scaling or shifting it.

Batch normalization advantages


This improves gradient flow, allows for higher learning rates, reduces weight
initializations dependence, adds regularization to our network and limits internal
covariate shift; which is a funny name for a layer's dependence on the previous layer
outputs when learning its weights. Batch normalization is widely used today in many
deep learning models.

https://onedrive.live.com/redir?resid=F8B40D88517D3A9B%212246&page=Edit&wd=target%28Keras1- Introduction to Deep Learning with Keras.one%7C9… 8/13


10/22/24, 10:27 PM OneNote

Batch normalization in Keras


Batch normalization in Keras is applied as a layer. So we can place it in between two
layers. We import batch normalization from tensorflow.keras.layers. We then instantiate
a sequential model, add an input layer, and then add a batch normalization layer. We
finalize this binary classification model with an output layer.

Exercise: Changing batch sizes


You've seen models are usually trained in batches of a fixed size. The smaller a batch
size, the more weight updates per epoch, but at a cost of a more unstable gradient
descent. Specially if the batch size is too small and it's not representative of the entire
training set.
Let's see how different batch sizes affect the accuracy of a simple binary classification
model that separates red from blue dots.
You'll use a batch size of one, updating the weights once per sample in your training set
for each epoch. Then you will use the entire dataset, updating the weights only once per
epoch.

Great work! You can see that accuracy is lower when using a batch size equal to the
training set size. This is not because the network had more trouble learning the
optimization function: Even though the same number of epochs were used for both
batch sizes the number of resulting weight updates was very different!. With a
batch of size the training set and 5 epochs we only get 5 updates total, each update
computes and averaged gradient descent with all the training set observations. To
obtain similar results with this batch size we should increase the number of epochs so
that more weight updates take place.

Exercise: Batch normalizing a familiar model


Remember the digits dataset you trained in the first exercise of this chapter?

A multi-class classification problem that you solved using softmax and 10 neurons in
your output layer.
You will now build a new deeper model consisting of 3 hidden layers of 50 neurons
each, using batch normalization in between layers. The kernel_initializer parameter is
used to initialize weights in a similar way.

Batch normalization effects


Batch normalization tends to increase the learning speed of our models and make their
learning curves more stable. Let's see how two identical models with and without batch

https://onedrive.live.com/redir?resid=F8B40D88517D3A9B%212246&page=Edit&wd=target%28Keras1- Introduction to Deep Learning with Keras.one%7C9… 9/13


10/22/24, 10:27 PM OneNote
normalization compare.
The model you just built batchnorm_model is loaded for you to use. An exact copy of it
without batch normalization: standard_model, is available as well. You can check
their summary() in the console. X_train, y_train, X_test, and y_test are also loaded so that
you can train both models.
You will compare the accuracy learning curves for both models plotting them
with compare_histories_acc().
You can check the function pasting show_code(compare_histories_acc) in the console.

1.3.4 Hyperparameter tuning


You now know everything you need to perform hyperparameter tuning in neural
networks! Our aim is to identify those parameters that make our model generalize
better.

Sklearn recap
In sklearn we can perform hyperparameter search by using methods like
RandomizedSearchCV. We import RandomizedSearchCV from sklearn
model_selection. We instantiate a model, define a dictionary with a series of model
parameters to try and finally instantiate a RandomizedSearchCV object passing our
model, the parameters and a number of cross-validation folds. We fit it on our data and
print the best resulting combination of parameters. For this example, a
min_samples_leaf of 1, 3 max_features and a max_depth of 3 gave us the best results.

Turn a Keras model into a Sklearn estimator


We can do the same with our Keras models! But we first have to transform them into
sklearn estimators. We do this by first defining a function that creates our model. Then
we import the KerasClassifier wrapper from tensorflow.keras sci-kit learn wrappers. We
finish by simply instantiating a KerasClassifier object passing create_model as the
building function, other parameters like epochs and batch_size are optional but should
be passed if we want to specify them.

https://onedrive.live.com/redir?resid=F8B40D88517D3A9B%212246&page=Edit&wd=target%28Keras1- Introduction to Deep Learning with Keras.one%7C… 10/13


10/22/24, 10:27 PM OneNote

This is very cool! Our model is now just like any other sklearn estimator, so we can, for
instance, perform cross-validation on it to see the stability of its predictions across folds.
Import cross_val_score, passing in our recently converted Keras model, predictors,
labels, and the number of folds. We can then check the mean accuracy per fold or the
standard deviation. Note that 6 epochs and a batch_size of 16 were used since we
specified it before.

Tips for neural networks hyperparameter tuning


It's much more probable that a good combination of parameters will be found by using
random search instead of an exhaustive grid search. Grid search loops over all possible
combinations of parameters whilst random search tries a given number of random
combinations. Normally, not many epochs are needed to check how well your model is
performing, using a smaller representative sample of your dataset makes things faster if
you've got a huge dataset. It's easier to play with things like optimizers, batch_sizes,
activations, and learning rates.

Random search on Keras models


To perform randomized search on a Keras model we just need to define the parameters
to try. We can try different optimizers, activation functions for the hidden layers and
batch sizes. The keys in the parameter dictionary must be named exactly as the
parameters in our create_model function. We then instantiate a RandomizedSearchCV
object passing our model and parameters with 3 fold cross-validation. We end up fitting
our random_search object to obtain the results. We can print the best score and the
parameters that were used. We get an accuracy of 94% with the adam optimizer, 3
epochs, a batch_size of 10 and relu activation.

Tuning other hyperparameters


Parameters like the number of neurons per layer and the number of layers can also be
tuned using the same method. We just need to make some smart changes in our create
model function. The nl parameter determines the number of hidden layers and nn the
number of neurons in these layers, we can have a loop inside our function and add to
our sequential model as many layers as provided in nl with the given number of
neurons.

Then we just need to use the exact same names in the parameter dictionary as we have
in our function and repeat the process. The best result is 87% accuracy with 2 hidden
layers of 128 neurons each.

https://onedrive.live.com/redir?resid=F8B40D88517D3A9B%212246&page=Edit&wd=target%28Keras1- Introduction to Deep Learning with Keras.one%7C… 11/13


10/22/24, 10:27 PM OneNote

Exercise: Preparing a model for tuning


Let's tune the hyperparameters of a binary classification model that does well
classifying the breast cancer dataset.
You've seen that the first step to turn a model into a sklearn estimator is to build a
function that creates it. The definition of this function is important since hyperparameter
tuning is carried out by varying the arguments your function receives.
Build a simple create_model() function that receives both a learning rate and an
activation function as arguments. The Adam optimizer has been imported as an object
from tensorflow.keras.optimizers so that you can also change its learning rate
parameter.

Tuning the model parameters


It's time to try out different parameters on your model and see how well it performs!
The create_model() function you built in the previous exercise is ready for you to use.
Since fitting the RandomizedSearchCV object would take too long, the results you'd get
are printed in the show_results() function. You could try random_search.fit(X,y) in the
console yourself to check it does work after you have built everything else, but you will
probably timeout the exercise (so copy your code first if you try this or you can lose your
progress!).
You don't need to use the optional epochs and batch_size parameters when building
your KerasClassifier object since you are passing them as params to the random search
and this works already.

Training with cross-validation


Time to train your model with the best parameters found: 0.001 for the learning rate, 50
epochs, a 128 batch_size and relu activations.
The create_model() function from the previous exercise is ready for you to
use. X and y are loaded as features and labels.
Use the best values found for your model when creating your KerasClassifier object so
that they are used when performing cross_validation.
End this chapter by training an awesome tuned model on the breast cancer dataset!

https://onedrive.live.com/redir?resid=F8B40D88517D3A9B%212246&page=Edit&wd=target%28Keras1- Introduction to Deep Learning with Keras.one%7C… 12/13


10/22/24, 10:27 PM OneNote

https://onedrive.live.com/redir?resid=F8B40D88517D3A9B%212246&page=Edit&wd=target%28Keras1- Introduction to Deep Learning with Keras.one%7C… 13/13

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy