0% found this document useful (0 votes)
9 views14 pages

Multi Percept Ron

A multilayer perceptron is a feedforward neural network with fully connected neurons and nonlinear activation functions, enabling it to handle non-linearly separable data. The network consists of hidden layers where neurons receive inputs, apply weights and biases, and utilize activation functions to learn complex patterns. Gradient descent is an optimization algorithm used to minimize the cost function by adjusting parameters iteratively, with variations including batch, stochastic, and mini-batch gradient descent.

Uploaded by

aasleeinfantvaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views14 pages

Multi Percept Ron

A multilayer perceptron is a feedforward neural network with fully connected neurons and nonlinear activation functions, enabling it to handle non-linearly separable data. The network consists of hidden layers where neurons receive inputs, apply weights and biases, and utilize activation functions to learn complex patterns. Gradient descent is an optimization algorithm used to minimize the cost function by adjusting parameters iteratively, with variations including batch, stochastic, and mini-batch gradient descent.

Uploaded by

aasleeinfantvaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Multilayer Perceptron

• A multilayer perceptron is a type of feedforward neural


network consisting of fully connected neurons with a
nonlinear kind of activation function.
• It is widely used to distinguish data that is not linearly
separable.
Multilayer Perceptron
Hidden layers
• Each neuron in a hidden layer receives input from all neurons in the
previous layer.
• The inputs are multiplied by corresponding weights, denoted as w.
• The weights determine how much influence the input from one neuron
has on the output of another.
• In addition to weights, each neuron in the hidden layer has an associated
bias, denoted as b.
• The bias provides an additional input to the neuron, allowing it to adjust
its output threshold.
• Like weights, biases are learned during training.
Activation Function
Activation functions are certainly the driving force behind the neural network’s
ability to handle real-world problems that are non-linear in nature.

This enables the models to learn complex patterns and helps them make accurate
predictions.
The Importance of Activation Functions
• Non-linearity
achieve non-linearity in the network to capture complex
patterns
• Gradient Propagation
Activation functions derivative defines the amount by which
each of the weights needs to be updated during back propagation
• Decision Making
assign different levels of importance to different inputs
depending on the task at hand
• Modeling Complex Relationships
By stacking multiple layers of neurons, the activation function
helps the neural network to learn hierarchical representations
z = w1 * a + w2 * b + bias
z = 0.5 * 2 + 0.3 * 3 + 2
z = 1 + 0.9 + 2
z = 3.9

Activation Function
f(z) = 1 / (1 + e^(-z))
f(z) = 1 / 1 + e^(-3.9))
f(z) = 1 / 1.02024
f(z) = 0.98
Gradient descent
• Gradient descent is an optimization algorithm
• It is used to find the minimum value of a function more quickly
• It is an algorithm to find the minimum of a convex function
• A convex function is a function that looks like a beautiful valley with a global
minimum in the center
• Gradient descent is also called “the deepest downward slope algorithm”
• it is used to minimize a cost function
Gradient Descent Optimization
• gradient descent is a first-order iterative optimization algorithm for finding
the local minimum of a differentiable function
• The job of gradient descent is to get you from your starting point at the
top of some slope (or random location) down to the lowest valley
• It does this by moving in the direction opposite to the gradient
• The more the cost is minimized, the more the machine will be able to
make good predictions
Learning Rate
How Gradient Descent Works
Initialize Parameters
Start by initializing your model parameters (e.g., weights and biases in a neural
network) to some random values or a specific distribution. For a simple example, let’s
denote our single parameter as x.
Compute the Cost Function
Calculate the cost (or loss) based on your current parameter values. A commonly used
cost function in simple regression problems is the Mean Squared Error (MSE). More
generally, you might have a function f(x) that outputs how “wrong” your model is.
Compute the Gradients
The gradient is the partial derivative of the cost function with respect to the
parameters. Symbolically, if f(x) is our cost function, we find dfdx​​. This tells us the
direction in which f(x) increases the fastest.
Update the Parameters
Adjust the parameters in the direction opposite to the gradient:
x ← x−α⋅dfdx​
Here, α (alpha) is called the learning rate -it controls how big a step you take on each
update.
Iterate Until Convergence
Keep repeating the previous steps - recalculate the cost, find the gradients, update
parameters - until changes become negligible or you reach a preset iteration count.
Types of Gradient Descent

Batch Gradient Descent


Stochastic Gradient Descent (SGD)
Mini-Batch Gradient Descent

Batch Gradient Descent = entire dataset per update.


Mini-Batch Gradient Descent = small subset (e.g., 32 examples)
per update.
Pure Stochastic Gradient Descent = 1 example per update.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy