Multi Percept Ron
Multi Percept Ron
This enables the models to learn complex patterns and helps them make accurate
predictions.
The Importance of Activation Functions
• Non-linearity
achieve non-linearity in the network to capture complex
patterns
• Gradient Propagation
Activation functions derivative defines the amount by which
each of the weights needs to be updated during back propagation
• Decision Making
assign different levels of importance to different inputs
depending on the task at hand
• Modeling Complex Relationships
By stacking multiple layers of neurons, the activation function
helps the neural network to learn hierarchical representations
z = w1 * a + w2 * b + bias
z = 0.5 * 2 + 0.3 * 3 + 2
z = 1 + 0.9 + 2
z = 3.9
Activation Function
f(z) = 1 / (1 + e^(-z))
f(z) = 1 / 1 + e^(-3.9))
f(z) = 1 / 1.02024
f(z) = 0.98
Gradient descent
• Gradient descent is an optimization algorithm
• It is used to find the minimum value of a function more quickly
• It is an algorithm to find the minimum of a convex function
• A convex function is a function that looks like a beautiful valley with a global
minimum in the center
• Gradient descent is also called “the deepest downward slope algorithm”
• it is used to minimize a cost function
Gradient Descent Optimization
• gradient descent is a first-order iterative optimization algorithm for finding
the local minimum of a differentiable function
• The job of gradient descent is to get you from your starting point at the
top of some slope (or random location) down to the lowest valley
• It does this by moving in the direction opposite to the gradient
• The more the cost is minimized, the more the machine will be able to
make good predictions
Learning Rate
How Gradient Descent Works
Initialize Parameters
Start by initializing your model parameters (e.g., weights and biases in a neural
network) to some random values or a specific distribution. For a simple example, let’s
denote our single parameter as x.
Compute the Cost Function
Calculate the cost (or loss) based on your current parameter values. A commonly used
cost function in simple regression problems is the Mean Squared Error (MSE). More
generally, you might have a function f(x) that outputs how “wrong” your model is.
Compute the Gradients
The gradient is the partial derivative of the cost function with respect to the
parameters. Symbolically, if f(x) is our cost function, we find dfdx. This tells us the
direction in which f(x) increases the fastest.
Update the Parameters
Adjust the parameters in the direction opposite to the gradient:
x ← x−α⋅dfdx
Here, α (alpha) is called the learning rate -it controls how big a step you take on each
update.
Iterate Until Convergence
Keep repeating the previous steps - recalculate the cost, find the gradients, update
parameters - until changes become negligible or you reach a preset iteration count.
Types of Gradient Descent