Machine Learning Unit 4
Machine Learning Unit 4
UNIT 4
REGULARIZATION-SOLVING THE PROBLEM ON OVERFITTING :
REGULARIZATION:
There are several types of regularization techniques that can be used to solve the
problem of overfitting, including:
1. L1 regularization:
2. L2 regularization: .
4. Dropout:
The perceptron works by taking a weighted sum of the input features, and if
the sum exceeds a certain threshold, it activates the output (for example, by
predicting a 1). If the sum does not exceed the threshold, it does not activate
the output (for example, by predicting a 0).
Perceptrons are used for tasks such as binary classification, where the goal is
to predict whether an input belongs to one of two classes (e.g. spam vs. not
spam). They are trained using an algorithm called the perceptron learning rule,
which adjusts the weights of the perceptron based on the error between the
predicted output and the true label.
Perceptrons are simple and efficient, but they have some limitations. They can
only learn to make linear decision boundaries, which means they are not
suitable for tasks where the decision boundary is non-linear. Additionally, they
are only capable of learning one class at a time, which makes them less
powerful than multi-layer neural networks that can learn to classify multiple
classes at once.
Perceptron model is also treated as one of the best and simplest types of Artificial Neural
networks. However, it is a supervised learning algorithm of binary classifiers. Hence, we can
consider it as a single-layer neural network with four main parameters, i.e., input values,
weights and Bias, net sum, and an activation function.
How it works?
Single Layer Perceptron Model: This is one of the easiest Artificial neural
networks (ANN) types. A single-layered perceptron model consists feed-forward network
and also includes a threshold transfer function inside the model. The main objective of the
single-layer perceptron model is to analyze the linearly separable objects with binary
outcomes.
o Forward Stage: Activation functions start from the input layer in the forward
stage and terminate on the output layer.
o Backward Stage: In the backward stage, weight and bias values are modified
as per the model's requirement. In this stage, the error between actual output
and demanded originated backward on the output layer and ended on the
input layer
A perceptron model has limitations as follows:
NEURAL NETWORK:
A neural network is a machine learning model inspired by the structure and function
of the human brain. It is composed of layers of interconnected "neurons," which
process and transmit information.
Feedforward neural networks are the most basic type of neural network. They
consist of an input layer, one or more hidden layers, and an output layer. The input
layer receives the input data, and the hidden layers process the data using weights
and biases. The output layer produces the final output based on the processed data.
Convolutional neural networks (CNNs) are used for tasks such as image
classification and object detection. They are designed to process data with a grid-like
topology, such as an image, and are particularly effective at identifying patterns and
features in images.
Recurrent neural networks (RNNs) are used for tasks such as language translation
and text generation. They are designed to process sequential data, such as a time
series or a sentence, and are able to maintain a state or "memory" across time.
Neural networks are trained using a process called backpropagation, which involves
adjusting the weights and biases of the neurons in the network based on the error
between the predicted output and the true label. This process allows the network to
learn to make accurate predictions on new data.
Multi-class classification is a machine learning problem in which the goal is to predict the class of
an input sample from a set of predefined classes. For example, a multi-class classification model
might be trained to predict the type of animal in an image (e.g. dog, cat, bird) based on the
features in the image.
1. One-vs-rest (OvR) approach: This approach involves training a separate binary classifier for each
class, where the class is treated as the positive class and all other classes are treated as the
negative class. During prediction, the input is passed through each classifier, and the class with
the highest predicted probability is chosen as the final prediction.
2. One-vs-one (OvO) approach: This approach involves training a separate binary classifier for
each pair of classes. For example, if there are three classes, three classifiers would be trained: one
to distinguish class 1 from class 2, one to distinguish class 1 from class 3, and one to distinguish
class 2 from class 3. During prediction, the input is passed through each classifier, and the class
that is predicted the most times is chosen as the final prediction.
3. Multinomial logistic regression: This approach involves training a single model that can predict
the probability of each class for an input sample. The class with the highest predicted probability
is chosen as the final prediction.
4. Support vector machines (SVMs): This approach involves training a classifier that finds the
hyperplane in a high-dimensional space that maximally separates the different classes. During
prediction, the input is mapped to the high-dimensional space and classified based on which side
of the hyperplane it falls on.
Choosing the right approach for a multi-class classification problem depends on the nature of
the data and the complexity of the task. It may be necessary to try multiple approaches and
compare their performance in order to find the best solution.
There are several common activation functions that are used in neural networks,
including:
1. Sigmoid function: The sigmoid function maps input values to a range between 0
and 1, which makes it useful for classification tasks where the output is a probability.
It has a smooth, s-shaped curve and is differentiable, which makes it well-suited for
training with gradient descent. However, it can saturate for large input values and
produce slow convergence, which limits its use in deep networks.
2. Tanh function: The tanh function maps input values to a range between -1 and 1,
which makes it useful for classification tasks where the output is a probability. It has a
smooth, s-shaped curve and is differentiable, which makes it well-suited for training
with gradient descent. It has a stronger gradient than the sigmoid function, which
allows it to escape the saturation problem and make faster convergence.
3. ReLU function: The ReLU (Rectified Linear Unit) function maps input values to the
positive range, and outputs 0 for negative input values. It is fast to compute and
does not saturate, which makes it well-suited for use in deep networks. However, it
can produce "dead neurons" if the input values are consistently negative, which can
degrade the performance of the network.
4. Leaky ReLU function: The Leaky ReLU function is a variant of the ReLU function that
allows small negative input values to pass through, rather than outputting 0. This
helps to mitigate the problem of dead neurons and improve the performance of the
network.
Choosing the right activation function for a particular task depends on the nature of
the data and the complexity of the model. It may be necessary to try multiple
activation functions and compare their performance in order to find the best
solution.
DROPOUT AS REGULARIZATION: