Unit II
Unit II
a shallow neural network has only one (or just a few) hidden layers between the input and
output layers. The input layer receives the data, the hidden layer(s) process it, and the final
layer produces the output.
Shallow neural networks are simpler, more easily trained, and have greater computational
efficiency than deep neural networks, which may have thousands of hidden units in dozens
of layers. Shallow networks are typically used for simpler tasks such as linear regression,
binary classification, or low-dimensional feature extraction.
Logistic Regression
At the heart of logistic regression lies the logistic function, f(x) = 1 / (1 + e−x), which
has a sigmoidal shape and returns a value between 0 and 1 for all inputs x.
Random Forest
The Random Forest is a machine learning technique for classification and prediction
of data. The building block of the Random Forest is the Decision Tree.
Cluster Analysis
Unlike the three supervised machine learning techniques above, Cluster Analysis
is unsupervised. Its goal is to subdivide large data sets into clusters, groups of objects that
have similar properties or features compared to other groups.
1. Input Layers: It’s the layer in which we give input to our model. The number of
neurons in this layer is equal to the total number of features in our data (number
of pixels in the case of an image).
2. Hidden Layer: The input from the Input layer is then fed into the hidden layer.
There can be many hidden layers depending on our model and data size. Each
hidden layer can have different numbers of neurons which are generally greater
than the number of features. The output from each layer is computed by matrix
multiplication of the output of the previous layer with learnable weights of that
layer and then by the addition of learnable biases followed by activation function
which makes the network nonlinear.
3. Output Layer: The output from the hidden layer is then fed into a logistic
function like sigmoid or softmax which converts the output of each class into the
probability score of each class.
CNN Simple architecture
4. Hyperparameter tuning is the process of selecting the optimal values for a machine
learning model’s hyperparameters. Hyperparameters are settings that control the
learning process of the model, such as the learning rate, the number of neurons in a
neural network, or the kernel size in a support vector machine. The goal of
hyperparameter tuning is to find the values that lead to the best performance on a
given task.
5. Batch Normalization (BN) is a powerful technique that addresses these issues by
stabilizing the learning process and accelerating convergence. Batch
Normalization(BN) is a popular technique used in deep learning to improve the
training of neural networks by normalizing the inputs of each layer.
6. The XOR problem is a classic problem in artificial intelligence and machine
learning. XOR, which stands for exclusive OR, is a logical operation that takes
two binary inputs and returns true if exactly one of the inputs is true. The XOR
gate follows a specific truth table, where the output is true only when the inputs
differ. This problem is particularly interesting because a single-layer perceptron,
the simplest form of a neural network, cannot solve it.
The main features of Backpropagation are the iterative, recursive and efficient method
through which it calculates the updated weight to improve the network until it is not able to
perform the task for which it is being trained. Derivatives of the activation function to be
known at network design time is required to Backpropagation.
Now, how error function is used in Backpropagation and how Backpropagation works? Let
start with an example and do it mathematically to understand how exactly updates the weight
using Backpropagation.
Input values
X1=0.05
X2=0.10
Initial weight
W1=0.15 w5=0.40
W2=0.20 w6=0.45
W3=0.25 w7=0.50
W4=0.30 w8=0.55
Bias Values
b1=0.35 b2=0.60
Target Values
T1=0.01
T2=0.99
Forward Pass
To find the value of H1 we first multiply the input value from the weights
as
H1=x1×w1+x2×w2+b1
H1=0.05×0.15+0.10×0.20+0.35
H1=0.3775
To find the value of y1, we first multiply the input value i.e., the outcome
of H1 and H2 from the weights as
y1=H1×w5+H2×w6+b2
y1=0.593269992×0.40+0.596884378×0.45+0.60
y1=1.10590597
Our target values are 0.01 and 0.99. Our y1 and y2 value is not matched
with our target values T1 and T2.
8.Activation Function
Activation functions add a nonlinear property to the neural network. This allows the
network to model more complex data. ReLU should generally be used as an activation
function in the hidden layers. In the output layer, the expected value range of the
predictions must always be considered.
1. Input Layer
2. Hidden Layer
3. Output Layer
Activation Function
The normal data input to neural networks is unaffected by the complexity or other factors.
Activation Function
o Linear Function
o Sigmoid Function
Non-linear in nature. Observe that while Y values are fairly steep, X values range from -2 to
2. To put it another way, small changes in x also would cause significant shifts in the value of
Y. spans from 0 to 1.
Tanh Function
The activation that consistently outperforms sigmoid function is known as tangent hyperbolic
function. It's actually a sigmoid function that has been mathematically adjusted. Both are
comparable to and derivable from one another.
Equation:
Currently, the ReLU is the activation function that is employed the most globally. Since
practically all convolutional neural networks and deep learning systems employ it.
o Softmax Function
Although it is a subclass of the sigmoid function, the softmax function comes in handy when
dealing with multiclass classification issues.
Used frequently when managing several classes. In the output nodes of image classification
issues, the softmax was typically present. The softmax function would split by the sum of the
outputs and squeeze all outputs for each category between 0 and 1.
Gradient Descent in Machine Learning
Gradient Descent is known as one of the most commonly used optimization algorithms to
train machine learning models by means of minimizing errors between actual and expected
results. Further, gradient descent is also used to train Neural Networks.
The best way to define the local minimum or local maximum of a function using gradient
descent is as follows:
o If we move towards a negative gradient or away from the gradient of the function at
the current point, it will give the local minimum of that function.
o Whenever we move towards a positive gradient or towards the gradient of the
function at the current point, we will get the local maximum of that function.
Based on the error in various training models, the Gradient Descent learning algorithm can be
divided into Batch gradient descent, stochastic gradient descent, and mini-batch
gradient descent. Let's understand these different types of gradient descent:
Batch gradient descent (BGD) is used to find the error for each point in the training set and
update the model after evaluating all training examples. This procedure is known as the
training epoch. In simple words, it is a greedy approach where we have to sum over all
examples for each update.
Stochastic gradient descent (SGD) is a type of gradient descent that runs one training
example per iteration
Mini Batch gradient descent is the combination of both batch gradient descent and stochastic
gradient descent. It divides the training datasets into small batch sizes then performs the
updates on those batches separately.