ML - Stochastic Gradient Descent (SGD) - GeeksforGeeks
ML - Stochastic Gradient Descent (SGD) - GeeksforGeeks
Search...
The update rule for the traditional gradient descent algorithm is:
θ = θ − η∇θ J (θ)
https://www.geeksforgeeks.org/ml-stochastic-gradient-descent-sgd/ 1/14
5/21/25, 1:05 PM ML | Stochastic Gradient Descent (SGD) | GeeksforGeeks
θ = θ − η∇θ J (θ; xi , yi )
Where:
example.
https://www.geeksforgeeks.org/ml-stochastic-gradient-descent-sgd/ 2/14
5/21/25, 1:05 PM ML | Stochastic Gradient Descent (SGD) | GeeksforGeeks
a small batch.
The key difference from traditional gradient descent is that, in SGD, the
parameter updates are made based on a single data point, not the entire
dataset. The random selection of data points introduces stochasticity,
which
Python CoursecanPython
be both an advantage
Tutorial and a challenge.
Interview Questions Python Quiz Python Glossa Sign In
1 import numpy as np
2
3 # Generate synthetic data
4 np.random.seed(42)
5 X = 2 * np.random.rand(100, 1)
6 y = 4 + 3 * X + np.random.randn(100, 1)
Where:
θ0 is the intercept (the bias term),
https://www.geeksforgeeks.org/ml-stochastic-gradient-descent-sgd/ 3/14
5/21/25, 1:05 PM ML | Stochastic Gradient Descent (SGD) | GeeksforGeeks
Here we define the core function for Stochastic Gradient Descent (SGD).
The function takes the input data X and y. It initializes the model
parameters, performs stochastic updates for a specified number of
epochs, and records the cost at each step.
In each epoch, the data is shuffled, and for each mini-batch (or single
sample), the gradient is calculated, and the parameters are updated.
The cost is calculated as the mean squared error, and the history of the
cost is recorded to monitor convergence.
In this step, we call the sgd() function to train the model. We specify
the learning rate, number of epochs, and batch size for SGD.
Output:
After training, we visualize how the cost function evolves over epochs.
This helps us understand if the algorithm is converging properly.
2
3 # Plot the cost history
4 plt.plot(cost_history)
5 plt.xlabel('Epochs')
6 plt.ylabel('Cost (MSE)')
7 plt.title('Cost Function during Training')
8 plt.show()
Output:
In this step, we visualize the data points and the fitted regression line
after training. We plot the data points as blue dots and the predicted
line (from the final theta) as a red line.
Output:
https://www.geeksforgeeks.org/ml-stochastic-gradient-descent-sgd/ 6/14
5/21/25, 1:05 PM ML | Stochastic Gradient Descent (SGD) | GeeksforGeeks
After training, we print the final parameters of the model, which include
the slope and intercept. These values are the result of optimizing the
model using SGD.
Output:
θ0 = 4.3,
θ1 = 3.4
y = 4.3 + 3.4 ⋅ X
This means:
https://www.geeksforgeeks.org/ml-stochastic-gradient-descent-sgd/ 7/14
5/21/25, 1:05 PM ML | Stochastic Gradient Descent (SGD) | GeeksforGeeks
https://www.geeksforgeeks.org/ml-stochastic-gradient-descent-sgd/ 8/14
5/21/25, 1:05 PM ML | Stochastic Gradient Descent (SGD) | GeeksforGeeks
https://www.geeksforgeeks.org/ml-stochastic-gradient-descent-sgd/ 9/14