Gradient Descent
Gradient Descent
Gradient descent
Gradient Based Optimization
The goal of optimization in machine learning is to find the parameters of the model
that minimize (or maximize) a loss function. The loss function quantifies how well
the model's predictions match the actual data.
Gradient Based Optimization
Gradient descent is a widely used optimization algorithm for minimizing the loss
function. It involves the following steps:
● The model parameters are updated in the direction that reduces the loss. This
is done using the gradient information:
θ←θ−η∇L(θ)
Here, η (eta) is the learning rate, a hyperparameter that controls the size of the
step taken in the direction of the negative gradient.
Gradient Descent
Gradient Descent is a standard optimization algorithm. It is frequently the first optimization
algorithm introduced to train machine learning. A gradient is a measurement that quantifies the
steepness of a line or curve. Mathematically, it details the direction of the ascent or descent of a
line.
Descent is the action of going downwards. Therefore, the gradient descent algorithm quantifies
downward motion.
import numpy as np
X = [0.5,2.5] def gradient_descent():
Y = [0.2,0.9] w, b, eta, epochs = -0.1, -2.0, 1.0,
1000
def f(w,b,x): for i in range(epochs):
return 1.0/(1.0+np.exp(-(w*x - b))) dw, db = 0,0
for x,y in zip(X,Y):
def error(w,b): dw += grad_w(w, b, x, y)
err = 0.0 db += grad_b(w, b, x, y)
for x,y in zip(X,Y): w = w - eta * dw
fx = f(w,b,x) b = b - eta * db
err += 0.5 * (fx-y) ** 2 print(error(w,b))
return err print(w,b)
gradient_descent()
def grad_b(w,b,x,y):
fx = f(w,b,x)
return (fx- y) * fx * (1-fx)
def grad_w(w,b,x,y):
fx = f(w,b,x)
return (fx- y )*fx * (1-fx) * x
Types of Gradient Descent
Challenges with gradient descent
While gradient descent is the most common approach for optimization problems, it
does come with its own set of challenges. Some of them include:
Slowing Down Convergence: Gradient descent can get stuck at a saddle point
because the gradient is zero, and the algorithm doesn't know which direction to
move to make progress. This can slow down convergence and make the
optimization process more challenging, especially in high-dimensional spaces.
Saddle points
Difficulty arises from saddle points, i.e., points where one
dimention slopes up and another slopes down.
Adaptive Learning Rate Methods: Algorithms like Adam and RMSprop adjust the
learning rate based on past gradients to handle flat regions and local minima
better.