0% found this document useful (0 votes)
7 views14 pages

Bias Variance

The document discusses bias and variance in linear and logistic regression, highlighting the issues of overfitting and underfitting. It presents methods for model validation and selection, including holdout, repeated holdout, and k-fold cross-validation, as well as strategies to combat overfitting through feature reduction and regularization techniques like L1, L2, and Elastic Net. The importance of using separate training, validation, and test sets to ensure generalization is emphasized throughout the document.

Uploaded by

bharat.goel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views14 pages

Bias Variance

The document discusses bias and variance in linear and logistic regression, highlighting the issues of overfitting and underfitting. It presents methods for model validation and selection, including holdout, repeated holdout, and k-fold cross-validation, as well as strategies to combat overfitting through feature reduction and regularization techniques like L1, L2, and Elastic Net. The importance of using separate training, validation, and test sets to ensure generalization is emphasized throughout the document.

Uploaded by

bharat.goel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Bias and Variance

Minati Rath
Example: Linear regression (housing prices)
Fitting a linear function
Fitting a quadratic function
Price

Fitting a higher order function


Size

Bias vs. variance in linear regression


Price

Size
Bias vs. variance in linear regression
Price

Size

High bias “Just right” High variance


(underfitting) (overfitting)

Overfitting
If we have too many features, the learned hypothesis may fit the
training set very well

but fail to generalize to new examples.


Bias vs. variance in logistic regression

Example: Logistic regression


Sources of noise and error
While learning a target function using a training set
Two sources of noise
Some training points may not come exactly from the target
function: stochastic noise
The target function may be too complex to capture using the
chosen hypothesis set: deterministic noise
Generalization error: Model tries to fit the noise in the training data,
which gets extrapolated to the test set

Ways to handle noise


Validation
Check performance on data other than training data, and tune model
accordingly
Regularization
Constraint the model so that the noise cannot be learnt too well
Validation
Divide given data into train set and test set
E.g., 80% train and 20% test
Better to select randomly
Learn parameters using training set
Check performance (validate the model) on test set, using
measures such as accuracy, misclassification rate, etc.
Trade-off: more data for training vs. validation
An example: model selection
• Which order polynomial will best fit a given data? Polynomials
available: h1, h2, …, h10
• As if an extra parameter - degree of the polynomial - is to be
learned
• Approach 1
– Divide into train and test set
– Train each hypothesis on train set, measure error on test set
– Select the hypothesis with minimum test set error
• Problem with the previous approach
– The test set error we computed is not a true estimate of
generalization error
– Since our extra parameter (order of polynomial) is fit to the test
set
An example: model selection

Approach 2
– Divide data into train set (60%), validation set
(20%) and test set (20%)
– Select that hypothesis which gives lowest error on
validation set
– Use test set to estimate generalization error

Note: Test set not at all seen during training


Popular methods of evaluating a classifier
• Holdout method
– Split data into train and test set (usually 2/3 for train and 1/3 for
test). Learn model using train set and measure performance
over test set
– Usually used when there is sufficiently large data, since both
train and test data will be a part
• Repeated Holdout method
– Repeat the Holdout method multiple times with different
subsets used for train/test
– In each iteration, a certain portion of data is randomly selected
for training, rest for testing
– The error rates on the different iterations are averaged to yield
an overall error rate
– More reliable than simple Holdout
Popular methods of evaluating a classifier
• k-fold cross-validation
– First step: data is split into k subsets of equal size;
– Second step: each subset in turn is used for testing and the
remainder for training
– Performance measures averaged over all folds

Popular choice for k: 10 or 5


Advantage: all available data points being used to train as well test
model
Classifier

k-fold cross validation (shown for k=3)


train train test

train test train


Data
test train train
Regularization
Addressing overfitting: Two ways
1. Reduce number of features
— Manually select which features to keep
— Problem: loss of some information (discarded features)
2. Regularization
— Keep all the features, but reduce magnitude/values of parameters
— Works well when we have a lot of features, each of which contributes a
bit to predicting

Intuition of regularization

Price
Price

Size of house
Size of house

Suppose we penalize and make really small


Combatting Overfitting
➢ Problem of overfitting can be overcome by increasing the input
training data points
➢ Number of input data points should be at least more than 10
times the number of parameters or features
➢ But what if we have less data points:
➢ Put a bound on regression coefficients by using regularization

Regularization for linear regression


In regularized linear regression, we choose to minimize

By convention, regularization is
not applied on θ0 (makes little
difference to the solution)
λ: Regularization parameter

Smaller values of parameters lead to more generalizable models,


less overfitting
L1, L2 and Elastic net Regularization
What we are discussing is called L2 regularization or “ridge”
regularization – it adds squared magnitude of parameters as penalty
term

Look up L1 or “Lasso” regularization


– adds absolute value of magnitude of parameters as penalty term

Elastic Net (Combination of L1 and L2 Regularization)


Effect: Combines the benefits of both Ridge and Lasso. It allows
for some coefficients to be set to zero (like Lasso) while shrinking
others (like Ridge). It is useful when there is multicollinearity, and
some feature selection is needed

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy