Ca10bd6d De86 4bae 9427 c60d433d2076 Supervised Learning
Ca10bd6d De86 4bae 9427 c60d433d2076 Supervised Learning
Unit-1
Introduction to Machine Learning
Machine Learning (ML) is a subfield of artificial intelligence (AI) that focuses on
developing algorithms that allow computers to learn from and make predictions
based on data. Unlike traditional programming, where explicit rules are defined for
tasks, ML enables systems to automatically improve their performance through
experience.
Key Concepts:
1. Data: The foundation of ML models. It consists of features (input variables)
and labels (output variables).
4. Inference: Using the trained model to make predictions on new, unseen data.
1. Supervised Learning
In supervised learning, models are trained on labeled datasets, meaning that each
training example is associated with an output label. The model learns to map
inputs to outputs based on this labeled data.
Supervised Learning 1
Input: Labeled data (features + corresponding labels).
Goal: Predict outcomes for new, unseen data based on learned relationships.
Common Algorithms:
Linear Regression
Logistic Regression
Decision Trees
Applications:
2. Unsupervised Learning
In unsupervised learning, the model is trained using unlabeled data. The goal is to
find hidden patterns or structures in the data without pre-defined labels.
Common Algorithms:
K-Means Clustering
Hierarchical Clustering
Applications:
Customer segmentation
3. Reinforcement Learning
In reinforcement learning, an agent interacts with an environment and learns to
make decisions by receiving feedback in the form of rewards or penalties. The
Supervised Learning 2
agent learns from the consequences of its actions to maximize cumulative reward
over time.
Common Algorithms:
Q-Learning
Applications:
Key Concepts:
1. Training Data: A subset of the dataset used to train the model.
Supervised Learning 3
Support Vector Machines (SVM): A classification algorithm that finds the
hyperplane that best separates classes in the feature space.
2. Data Preprocessing: Clean and prepare data for training (handling missing
values, normalization, etc.).
4. Model Evaluation: Assess the model's performance using test data and
evaluation metrics.
Regression
Regression is a type of predictive modeling technique that estimates the
relationships among variables. It is primarily used when the output variable is
continuous.
Key Concepts:
Continuous Output: The target variable can take any value within a range.
Objective: To find a function that best fits the data and can predict new
values.
Supervised Learning 4
2. Polynomial Regression: Extends linear regression by considering polynomial
relationships between variables.
4. Lasso Regression: Similar to Ridge but can shrink some coefficients to zero,
effectively performing variable selection.
Applications:
Predicting house prices based on features like square footage, number of
bedrooms, etc.
Classification
Classification is a predictive modeling technique used when the output variable is
categorical (i.e., it can take on a limited number of classes).
Key Concepts:
Categorical Output: The target variable can take one of a limited set of values.
Applications:
Email filtering (spam detection).
Supervised Learning 5
Linear Regression
Linear Regression is one of the simplest and most widely used regression
algorithms. It establishes a relationship between the dependent variable \(y\) and
one or more independent variables \(X\).
Model Representation:
The relationship can be represented mathematically as:
\[
y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_n x_n + \epsilon
\]
Where:
\(\beta_1, \beta_2, ..., \beta_n\) are the coefficients for independent variables \
(x_1, x_2, ..., x_n\).
Assumptions:
1. Linearity: The relationship between the independent and dependent variables
is linear.
Model Training:
The parameters (\(\beta\)) are typically estimated using the Ordinary Least
Squares (OLS) method, which minimizes the sum of the squared differences
between observed and predicted values.
Applications:
Financial forecasting.
Supervised Learning 6
Risk assessment.
Logistic Regression
Logistic Regression is used for binary classification problems. Unlike linear
regression, logistic regression predicts the probability that an instance belongs to
a certain class using a logistic function.
Model Representation:
The logistic regression model can be expressed as:
\[
P(y=1 | X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_n
x_n)}}
\]
Where:
Assumptions:
1. Binary Outcome: The dependent variable is binary (0 or 1).
Model Training:
Logistic regression is trained using Maximum Likelihood Estimation (MLE), which
finds the parameters that maximize the likelihood of observing the given data.
Applications:
Disease prediction based on risk factors.
Supervised Learning 7
Model Evaluation Metrics
Evaluating the performance of regression and classification models is crucial for
understanding their effectiveness. Different metrics are used based on the type of
task.
Regression Metrics:
1. Mean Absolute Error (MAE):
\[
\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|
\]
3. R-squared (\(R^2\)):
\[
R^2 = 1 - \frac{\text{SS}
{res}}{\text{SS}{tot}}
\]
Classification Metrics:
1. Accuracy:
\[
\text{Accuracy} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{TN} +
\text{FP} + \text{FN}}
\]
Supervised Learning 8
Measures the proportion of correct predictions.
2. Precision:
\[
\text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}}
\]
3. Recall (Sensitivity):
\[
\text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}}
\]
4. F1 Score:
\[
F1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} +
\text{Recall}}
\]
5. Receiver Operating Characteristic (ROC) Curve and Area Under the Curve
(AUC):
The ROC curve plots the true positive rate against the false positive rate.
The AUC quantifies the overall ability of the model to discriminate between
positive and negative classes.
Unit-2
Deep Learning (DL) is a subset of machine learning that focuses on algorithms
inspired by the structure and function of the brain, known as artificial neural
networks (ANNs). It excels at handling large datasets and complex tasks, such as
image and speech recognition, natural language processing, and game playing.
Supervised Learning 9
1. Introduction to Deep Learning
Definition
Deep Learning involves training artificial neural networks with many layers (hence
"deep") to learn representations of data. This approach enables models to
automatically learn features from raw data, reducing the need for manual feature
extraction.
Importance
Feature Learning: Automatically discovers patterns and features from raw
data.
Scalability: Performs well with large datasets and can leverage parallel
processing capabilities of GPUs.
Hidden Layers: One or more layers where computations occur. Each layer can
have multiple neurons.
Supervised Learning 10
Neuron Model
Each neuron performs the following operations:
1. Weighted Sum: Each input \( x_i \) is multiplied by a weight \( w_i \), and a bias
\( b \) is added:
\[
z = \sum_{i=1}^{n} w_i x_i + b
\]
3. Recurrent Neural Networks (RNNs): Designed for sequential data (e.g., time
series or text).
3. Activation Functions
Activation functions determine the output of a neuron given an input or set of
inputs. They introduce non-linearity into the model, allowing it to learn complex
patterns.
Formula:
\[
f(x) = \frac{1}{1 + e^{-x}}
\]
Supervised Learning 11
Range: (0, 1)
Formula:
\[
f(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}
\]
Range: (-1, 1)
Formula:
\[
f(x) = \max(0, x)
\]
Range: [0, ∞)
Use: Widely used in hidden layers due to its simplicity and effectiveness.
4. Leaky ReLU
Formula:
\[
f(x) = \begin{cases}
x & \text{if } x > 0 \\
\alpha x & \text{if } x \leq 0
\end{cases}
\]
5. Softmax
Formula:
\[
Supervised Learning 12
f(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}}
\]
4. Loss Functions
Loss functions quantify the difference between the predicted output and the
actual output, guiding the optimization of the model.
Supervised Learning 13
\text{Loss} = \sum_{i} \max(0, 1 - y_i \cdot \hat{y}_i)
\]
5. Optimization Algorithms
Optimization algorithms adjust the weights of the network to minimize the loss
function during training. The goal is to find the set of weights that minimizes the
loss.
Formula:
\[
w := w - \eta \nabla L(w)
\]
Where \( \eta \) is the learning rate and \( \nabla L(w) \) is the gradient of
the loss function.
Formula:
\[
v_t = \beta v_{t-1} + (1 - \beta) \nabla L(w)
\]
\[
Supervised Learning 14
w := w - \eta v_t
\]
4. Adagrad: Adapts the learning rate for each parameter based on its gradients.
Formula:
\[
w_t = w_{t-1} - \frac{\eta}{\sqrt{G_t + \epsilon}} \nabla L(w)
\]
Where \( G_t \) is the sum of the squares of the gradients.
Formula:
\[
w_t = w_{t-1} - \frac{\eta}{\sqrt{E[g^2]_t} + \epsilon} \nabla L(w)
\]
Formula:
\[
m_t = \beta_1 m_{t-1} + (1 - \beta_1) \nabla L(w)
\]
\[
v_t = \beta_2 v_{t-1} + (1 - \beta_2) (\nabla L(w))^2
\]
\[
w_t = w_{t-1} - \frac{\eta m_t}{\sqrt{v_t} + \epsilon}
\]
Adam: Recommended for most applications due to its adaptive learning rates
and momentum.
Supervised Learning 15
6. Backpropagation Algorithm
The backpropagation algorithm is a supervised learning method used for training
neural networks. It computes the gradient of the loss function with respect to each
weight by applying the chain rule, allowing for efficient weight updates.
Compute the output of the network for a given input by passing it through
the layers.
2. Backward Pass:
Compute the gradient of the loss with respect to the output layer.
3. Weight Update:
Challenges
Vanishing Gradients: Occurs when gradients are very small, making it difficult
to update weights effectively in deep networks.
Supervised Learning 16
7. Regularization Techniques
Regularization techniques are used to prevent overfitting in neural networks by
adding constraints to the model training process. They help the model generalize
better to unseen data.
Supervised Learning 17