0% found this document useful (0 votes)
13 views21 pages

ML Asst.-01

The document provides an overview of the Random Forest algorithm in Machine Learning, explaining its functionality, key features, and implementation for both classification and regression tasks. It also discusses the advantages and limitations of Random Forest, as well as briefly introducing other algorithms like Naïve Bayes, KNN, and SVM. The document includes code examples for practical implementation using Python and relevant libraries.

Uploaded by

rkumar929778
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views21 pages

ML Asst.-01

The document provides an overview of the Random Forest algorithm in Machine Learning, explaining its functionality, key features, and implementation for both classification and regression tasks. It also discusses the advantages and limitations of Random Forest, as well as briefly introducing other algorithms like Naïve Bayes, KNN, and SVM. The document includes code examples for practical implementation using Python and relevant libraries.

Uploaded by

rkumar929778
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

I. K.

GUJRAL PUNJAB TECHNICAL UNIVERSITY,

JALANDHAR

ASSIGNMENT - 01

Submitted By: Submitted To:


Name: Rohit Kumar Ms. Kavita Bains
Class : B.Tech. CSE (D)
University Roll Number: 2224525
Semester:6th
Course Code: BTCS 618-18
Course Title : Machine Learning
Random Forest Algorithm in Machine Learning
A Random Forest is a collection of decision trees that work together to make predictions. In
this article, we'll explain how the Random Forest algorithm works and how to use it.
Understanding Intuition for Random Forest Algorithm
Random Forest algorithm is a powerful tree learning technique in Machine Learning to make
predictions and then we do voting of all the tress to make prediction. They are widely used
for classification and regression task.
 It is a type of classifier that uses many decision trees to make predictions.
 It takes different random parts of the dataset to train each tree and then it combines
the results by averaging them. This approach helps improve the accuracy of
predictions. Random Forest is based on ensemble learning.

Imagine asking a group of friends for advice on where to go for vacation. Each friend gives
their recommendation based on their unique perspective and preferences (decision trees
trained on different subsets of data). You then make your final decision by considering the
majority opinion or averaging their suggestions (ensemble prediction).
As explained in image: Process starts with a dataset with rows and their corresponding class
labels (columns).
 Then - Multiple Decision Trees are created from the training data. Each tree is trained
on a random subset of the data (with replacement) and a random subset of features.
This process is known as bagging or bootstrap aggregating.
 Each Decision Tree in the ensemble learns to make predictions independently.
 When presented with a new, unseen instance, each Decision Tree in the ensemble
makes a prediction.
The final prediction is made by combining the predictions of all the Decision Trees. This is
typically done through a majority vote (for classification) or averaging (for regression).
Key Features of Random Forest
 Handles Missing Data: Automatically handles missing values during training,
eliminating the need for manual imputation.
 Algorithm ranks features based on their importance in making predictions offering
valuable insights for feature selection and interpretability.
 Scales Well with Large and Complex Data without significant performance
degradation.
 Algorithm is versatile and can be applied to both classification tasks (e.g., predicting
categories) and regression tasks (e.g., predicting continuous values).

How Random Forest Algorithm Works?


The random Forest algorithm works in several steps:
 Random Forest builds multiple decision trees using random samples of the data. Each
tree is trained on a different subset of the data which makes each tree unique.
 When creating each tree the algorithm randomly selects a subset of features or
variables to split the data rather than using all available features at a time. This adds
diversity to the trees.
 Each decision tree in the forest makes a prediction based on the data it was trained on.
When making final prediction random forest combines the results from all the trees.
o For classification tasks the final prediction is decided by a majority vote. This
means that the category predicted by most trees is the final prediction.
o For regression tasks the final prediction is the average of the predictions from
all the trees.
 The randomness in data samples and feature selection helps to prevent the model from
overfitting making the predictions more accurate and reliable.
Assumptions of Random Forest
 Each tree makes its own decisions: Every tree in the forest makes its own
predictions without relying on others.
 Random parts of the data are used: Each tree is built using random samples and
features to reduce mistakes.
 Enough data is needed: Sufficient data ensures the trees are different and learn
unique patterns and variety.
 Different predictions improve accuracy: Combining the predictions from different
trees leads to a more accurate final results.
Now, as we've understood the concept behind the algorithm we'll try implementing:

Implementing Random Forest for Classification Tasks


import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
import warnings
warnings.filterwarnings('ignore')
# Corrected URL for the dataset
url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"
titanic_data = pd.read_csv(url)
# Drop rows with missing 'Survived' values
titanic_data = titanic_data.dropna(subset=['Survived'])
# Features and target variable
X = titanic_data[['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare']]
y = titanic_data['Survived']
# Encode 'Sex' column
X.loc[:, 'Sex'] = X['Sex'].map({'female': 0, 'male': 1})
# Fill missing 'Age' values with the median
X.loc[:, 'Age'].fillna(X['Age'].median(), inplace=True)
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize RandomForestClassifier
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
# Fit the classifier to the training data
rf_classifier.fit(X_train, y_train)
# Make predictions
y_pred = rf_classifier.predict(X_test)
# Calculate accuracy and classification report
accuracy = accuracy_score(y_test, y_pred)
classification_rep = classification_report(y_test, y_pred)
# Print the results
print(f"Accuracy: {accuracy:.2f}")
print("\nClassification Report:\n", classification_rep)
# Sample prediction
sample = X_test.iloc[0:1] # Keep as DataFrame to match model input format
prediction = rf_classifier.predict(sample)
# Retrieve and display the sample
sample_dict = sample.iloc[0].to_dict()
print(f"\nSample Passenger: {sample_dict}")
print(f"Predicted Survival: {'Survived' if prediction[0] == 1 else 'Did Not Survive'}")

Output:
Accuracy: 0.80
Classification Report:
precision recall f1-score support

0 0.82 0.85 0.83 105


1 0.77 0.73 0.75 74

accuracy 0.80 179


macro avg 0.79 0.79 0.79 179
weighted avg 0.80 0.80 0.80 179

Sample Passenger: {'Pclass': 3, 'Sex': 1, 'Age': 28.0, 'SibSp': 1, 'Parch': 1, 'Fare': 15.2458}
Predicted Survival: Did Not Survive
Implementing Random Forest for Regression Tasks
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score
# Load the California housing dataset
california_housing = fetch_california_housing()
california_data = pd.DataFrame(california_housing.data,
columns=california_housing.feature_names)
california_data['MEDV'] = california_housing.target
# Features and target variable
X = california_data.drop('MEDV', axis=1)
y = california_data['MEDV']
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize the RandomForestRegressor
rf_regressor = RandomForestRegressor(n_estimators=100, random_state=42)
# Train the regressor
rf_regressor.fit(X_train, y_train)
# Make predictions
y_pred = rf_regressor.predict(X_test)
# Calculate metrics
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
# Sample Prediction
single_data = X_test.iloc[0].values.reshape(1, -1)
predicted_value = rf_regressor.predict(single_data)
print(f"Predicted Value: {predicted_value[0]:.2f}")
print(f"Actual Value: {y_test.iloc[0]:.2f}")

# Print results
print(f"Mean Squared Error: {mse:.2f}")
print(f"R-squared Score: {r2:.2f}")
Output:
Predicted Value: 0.51
Actual Value: 0.48
Mean Squared Error: 0.26
R-squared Score: 0.80
Random Forest learns from the training data like a real estate expert. After training it predicts
house prices on the test set. We evaluate the model's performance using Mean Squared Error
and R-squared Score which show how accurate the predictions are and used a random sample
to check model prediction.
Advantages of Random Forest
 Random Forest provides very accurate predictions even with large datasets.
 Random Forest can handle missing data well without compromising with accuracy.
 It doesn’t require normalization or standardization on dataset.
 When we combine multiple decision trees it reduces the risk of overfitting of the
model.
Limitations of Random Forest
 It can be computationally expensive especially with a large number of trees.
 It’s harder to interpret the model compared to simpler models like decision trees.
Naïve Bayes Classifier
This algorithm is called Naïve because it works on the naïve assumption that the features are
independent. Naïve Bayes Classifier works with principle of Bayes Theorem. The Bayes’
theorem is one of the most fundamental concept in the field of analytics and it has a wide
range of applications. It often plays a crucial role in decision making process. Lets consider
two events A and B. The conditional probabilities associated is given by,

Conditional probability of an event A given B ( P(A|B) ) is the probability of A given that B


has already occurred. It is often defined as the ratio of joint probability of A and B
(probability of A and B occurring together) to the marginal probability of A (probability of
event A)
Using the above two equations, we can show that

This above equation explains the Bayes’ Theorem. So for an event B we can update the
probability associated when additional information is provided (here A is the additional
information).
Key terms in Bayes’ Theorem
1. Prior probability (P(B), P(A)) — The probability value without any additional
information
2. Posterior probability (P(B|A))- The probability of event B given the additional
information A
3. P(A|B)- The likelihood of observing A if B is true
There is an interesting game related to Bayes theorem. Interested in games? Go ahead and
read about the famous Monty Hall Problem.
When it comes to a classification problem the Bayes theorem can reinterpreted as below:

where c represents the class the data belongs to and y1.y2….yn represents the predictor
features or labels. If you see the denominator terms, there can be cases were the probability of
evidence is 0. This creates a problem for division. To tackle this issues the variables are
increased with a small value 1 so that the probability does not turn zero. This adjustment is
called Laplace Correction.
Steps for Naïve Bayes Classification
1. Calculate the Prior probabilities of the classes involved
2. Calculate the likelihood of evidence with each feature for each class
3. Calculate the Posterior probability using Bayes rule
4. The class with higher probability is selected for the inputs
When the feature x is categorical in nature, it is easier to calculate the probabilities
associated. When the feature x is continuous, we assume that the variable x is normally
distributed (Gaussian Naïve Bayes). The probability value is given by

Pros
1. Easy to implement
2. Performs reasonably well with noisy data
Cons
1. Poor performance with continuous features
2. Assumption that features are independent is risky
KNN Classifier
K-Nearest neighbors algorithm can be used to solve both classification and regression
problems. When algorithms such as Naïve Bayes Classifier uses probabilities from training
samples for predictions, KNN is Lazy learner that does not create any model in advance. The
just find the closest based on feature similarity.
Similarity Measures
The popular similarity measurement metrics are Distance measures. There are several
distance measures available.
1. Euclidean Distance
This is most commonly used distance measure. For two points (x1,x2) and (y1,y2) the
Euclidean distance is given by:

2. Manhattan Distance
Also known as the city block or absolute distance, it is inspired from the structure of
Manhattan city. For two points (x1,x2) and (y1,y2) the Manhattan distance is given by:

3. Chebyshev Distance
Also known as the chessboard or maximum value distance, for two points (x1,x2) and (y1,y2)
the Chebyshev distance is given by:

4. Minkowski Distance
This is a generalized distance measure. All the above mentioned distances can be obtained
from the generalized formula.

When c =1, Minkowski = Manhattan


When c =2 , Minkowski = Euclidean
When c =3 , Minkowski = Chebyshev
5. Mahalanobis Distance
To calculate the distance between two pints in multivariate space, we use the Mahalanobis
distance. The Mahalanobis distance is given by :
Here x and y are the vectors of same distribution in multivariate space. C is the inverse of the
covariance matrix.
Steps
1. Find K neighbors that are close to the chosen data point based on the similarity
measure used.
2. Using majority voting from the k neighbors identify which class the data point
belongs to.

Choosing the optimal value of K


There can two scenarios:
1. Small K, higher influence of noise in results
2. High K, computationally expensive
So, some articles suggest to choose K as sqrt(N)/2 where N is the number of data points.
Pros
1. Easy to implement
2. No assumptions involved
Cons
1. Optimal K is always a challenge
2. Lazy learner- computationally expensive

\
Support Vector Machine (SVM) Algorithm
Support Vector Machine (SVM) is a supervised machine learning algorithm used for
classification and regression tasks. While it can handle regression problems, SVM is
particularly well-suited for classification tasks.
SVM aims to find the optimal hyperplane in an N-dimensional space to separate data
points into different classes. The algorithm maximizes the margin between the closest
points of different classes.
Support Vector Machine (SVM) Terminology
 Hyperplane: A decision boundary separating different classes in feature space,
represented by the equation wx + b = 0 in linear classification.
 Support Vectors: The closest data points to the hyperplane, crucial for determining
the hyperplane and margin in SVM.
 Margin: The distance between the hyperplane and the support vectors. SVM aims to
maximize this margin for better classification performance.
 Kernel: A function that maps data to a higher-dimensional space, enabling SVM to
handle non-linearly separable data.
 Hard Margin: A maximum-margin hyperplane that perfectly separates the data
without misclassifications.
 Soft Margin: Allows some misclassifications by introducing slack variables,
balancing margin maximization and misclassification penalties when data is not
perfectly separable.
 C: A regularization term balancing margin maximization and misclassification
penalties. A higher C value enforces a stricter penalty for misclassifications.
 Hinge Loss: A loss function penalizing misclassified points or margin violations,
combined with regularization in SVM.
 Dual Problem: Involves solving for Lagrange multipliers associated with support
vectors, facilitating the kernel trick and efficient computation.
How does Support Vector Machine Algorithm Work?
The key idea behind the SVM algorithm is to find the hyperplane that best separates
two classes by maximizing the margin between them. This margin is the distance
from the hyperplane to the nearest data points (support vectors) on each side.
Multiple hyperplanes separate the data from two classes
The best hyperplane, also known as the “hard margin,” is the one that maximizes the
distance between the hyperplane and the nearest data points from both classes. This
ensures a clear separation between the classes. So, from the above figure, we choose
L2 as hard margin.
Let’s consider a scenario like shown below:

Selecting hyperplane for data with outlier


Here, we have one blue ball in the boundary of the red ball.
How does SVM classify the data?
It’s simple! The blue ball in the boundary of red ones is an outlier of blue balls. The
SVM algorithm has the characteristics to ignore the outlier and finds the best
hyperplane that maximizes the margin. SVM is robust to outliers.

Hyperplane which is the most optimized one


A soft margin allows for some misclassifications or violations of the margin to
improve generalization. The SVM optimizes the following equation to balance margin
maximization and penalty minimization:
Objective Function=(1margin)+λ∑penalty Objective Function=(margin1)+λ∑penalty
The penalty used for violations is often hinge loss, which has the following behavior:
 If a data point is correctly classified and within the margin, there is no penalty (loss =
0).
 If a point is incorrectly classified or violates the margin, the hinge loss increases
proportionally to the distance of the violation.
Till now, we were talking about linearly separable data(the group of blue balls and red
balls are separable by a straight line/linear line).

What to do if data are not linearly separable?


When data is not linearly separable (i.e., it can’t be divided by a straight line), SVM
uses a technique called kernels to map the data into a higher-dimensional space
where it becomes separable. This transformation helps SVM find a decision boundary
even for non-linear data.

Original 1D dataset for classification


A kernel is a function that maps data points into a higher-dimensional space without
explicitly computing the coordinates in that space. This allows SVM to work
efficiently with non-linear data by implicitly performing the mapping.
For example, consider data points that are not linearly separable. By applying a kernel
function, SVM transforms the data points into a higher-dimensional space where they
become linearly separable.
 Linear Kernel: For linear separability.
 Polynomial Kernel: Maps data into a polynomial space.
 Radial Basis Function (RBF) Kernel: Transforms data into a space based on
distances between data points.

Mapping 1D data to 2D to become able to separate the two classes


In this case, the new variable y is created as a function of distance from the origin.

Mathematical Computation: SVM


Consider a binary classification problem with two classes, labeled as +1 and -1. We
have a training dataset consisting of input feature vectors X and their corresponding
class labels Y.
The equation for the linear hyperplane can be written as:
wTx+b=0wTx+b=0

Where:
 ww is the normal vector to the hyperplane (the direction perpendicular to it).
 bb is the offset or bias term, representing the distance of the hyperplane from the
origin along the normal vector ww.
Distance from a Data Point to the Hyperplane
The distance between a data point x_i and the decision boundary can be calculated as:
di=wTxi+b∣∣w∣∣di=∣∣w∣∣wTxi+b
where ||w|| represents the Euclidean norm of the weight vector w. Euclidean norm of
the normal vector W
Linear SVM Classifier
Distance from a Data Point to the Hyperplane:
y^={1: wTx+b≥00: wTx+b <0y^={10: wTx+b≥0: wTx+b <0
Where y^y^ is the predicted label of a data point.
Optimization Problem for SVM
For a linearly separable dataset, the goal is to find the hyperplane that maximizes the
margin between the two classes while ensuring that all data points are correctly
classified. This leads to the following optimization problem:
minimizew,b12∥w∥2w,bminimize21∥w∥2
Subject to the constraint:
yi(wTxi+b)≥1fori=1,2,3,⋯,myi(wTxi+b)≥1fori=1,2,3,⋯,m
Where:
 yiyi is the class label (+1 or -1) for each training instance.
 xixi is the feature vector for the ii-th training instance.
 mm is the total number of training instances.
The condition yi(wTxi+b)≥1yi(wTxi+b)≥1 ensures that each data point is correctly
classified and lies outside the margin.
Soft Margin Linear SVM Classifier
In the presence of outliers or non-separable data, the SVM allows some
misclassification by introducing slack variables ζiζi. The optimization problem is
modified as:
minimize w,b12∥w∥2+C∑i=1mζiw,bminimize 21∥w∥2+C∑i=1mζi
Subject to the constraints:
yi(wTxi+b)≥1–ζiandζi≥0for i=1,2,…,myi(wTxi+b)≥1–ζiandζi≥0for i=1,2,…,m
Where:
 CC is a regularization parameter that controls the trade-off between margin
maximization and penalty for misclassifications.
 ζiζi are slack variables that represent the degree of violation of the margin by each
data point.
Dual Problem for SVM
The dual problem involves maximizing the Lagrange multipliers associated with the
support vectors. This transformation allows solving the SVM optimization using
kernel functions for non-linear classification.
The dual objective function is given by:
maximize α12∑i=1m∑j=1mαiαjtitjK(xi,xj)–∑i=1mαiαmaximize 21∑i=1m∑j=1mαiαj
titjK(xi,xj)–∑i=1mαi
Where:
 αiαi are the Lagrange multipliers associated with the ii-th training sample.
 titi is the class label for the iii-th training sample (+1+1+1 or −1-1−1).
 K(xi,xj)K(xi,xj) is the kernel function that computes the similarity between data
points xixi and xjxj. The kernel allows SVM to handle non-linear classification
problems by mapping data into a higher-dimensional space.
The dual formulation optimizes the Lagrange multipliers αiαi, and the support vectors
are those training samples where αi>0αi>0.
SVM Decision Boundary
Once the dual problem is solved, the decision boundary is given by:
w=∑i=1mαitiK(xi,x)+bw=∑i=1mαitiK(xi,x)+b
Where ww is the weight vector, xx is the test data point, and bb is the bias term.
Finally, the bias term bb is determined by the support vectors, which satisfy:
ti(wTxi–b)=1⇒b=wTxi–titi(wTxi–b)=1⇒b=wTxi–ti
Where xixi is any support vector.
This completes the mathematical framework of the Support Vector Machine
algorithm, which allows for both linear and non-linear classification using the dual
problem and kernel trick.
Types of Support Vector Machine
Based on the nature of the decision boundary, Support Vector Machines (SVM) can
be divided into two main parts:
 Linear SVM: Linear SVMs use a linear decision boundary to separate the data points
of different classes. When the data can be precisely linearly separated, linear SVMs
are very suitable. This means that a single straight line (in 2D) or a hyperplane (in
higher dimensions) can entirely divide the data points into their respective classes. A
hyperplane that maximizes the margin between the classes is the decision boundary.
 Non-Linear SVM: Non-Linear SVM can be used to classify data when it cannot be
separated into two classes by a straight line (in the case of 2D). By using kernel
functions, nonlinear SVMs can handle nonlinearly separable data. The original input
data is transformed by these kernel functions into a higher-dimensional feature space,
where the data points can be linearly separated. A linear SVM is used to locate a
nonlinear decision boundary in this modified space.
Implementing SVM Algorithm in Python
Predict if cancer is Benign or malignant. Using historical data about patients
diagnosed with cancer enables doctors to differentiate malignant cases and benign
ones are given independent attributes.
 Load the breast cancer dataset from sklearn.datasets
 Separate input features and target variables.
 Build and train the SVM classifiers using RBF kernel.
 Plot the scatter plot of the input features.
# Load the important packages
from sklearn.datasets import load_breast_cancer
import matplotlib.pyplot as plt
from sklearn.inspection import DecisionBoundaryDisplay
from sklearn.svm import SVC
# Load the datasets
cancer = load_breast_cancer()
X = cancer.data[:, :2]
y = cancer.target
#Build the model
svm = SVC(kernel="rbf", gamma=0.5, C=1.0)
# Trained the model
svm.fit(X, y)
# Plot Decision Boundary
DecisionBoundaryDisplay.from_estimator(
svm,

X,
response_method="predict",
cmap=plt.cm.Spectral,
alpha=0.8,
xlabel=cancer.feature_names[0],
ylabel=cancer.feature_names[1],
)
# Scatter plot
plt.scatter(X[:, 0], X[:, 1],
c=y,
s=20, edgecolors="k")
plt.show()
Output:

Breast Cancer Classifications with SVM RBF kernel


Advantages of Support Vector Machine (SVM)
1. High-Dimensional Performance: SVM excels in high-dimensional spaces, making it
suitable for image classification and gene expression analysis.
2. Nonlinear Capability: Utilizing kernel functions like RBF and polynomial, SVM
effectively handles nonlinear relationships.
3. Outlier Resilience: The soft margin feature allows SVM to ignore outliers, enhancing
robustness in spam detection and anomaly detection.
4. Binary and Multiclass Support: SVM is effective for both binary
classification and multiclass classification, suitable for applications in text
classification.
5. Memory Efficiency: SVM focuses on support vectors, making it memory efficient
compared to other algorithms.

Disadvantages of Support Vector Machine (SVM)


1. Slow Training: SVM can be slow for large datasets, affecting performance in SVM
in data mining tasks.
2. Parameter Tuning Difficulty: Selecting the right kernel and adjusting parameters
like C requires careful tuning, impacting SVM algorithms.
3. Noise Sensitivity: SVM struggles with noisy datasets and overlapping classes,
limiting effectiveness in real-world scenarios.
4. Limited Interpretability: The complexity of the hyperplane in higher dimensions
makes SVM less interpretable than other models.
5. Feature Scaling Sensitivity: Proper feature scaling is essential; otherwise, SVM
models may perform poorly.
Evaluate Regression Model (RMSE, MAE)
The objective of Linear Regression is to find a line that minimizes the prediction error of all
the data points.

The essential step in any machine learning model is to evaluate the accuracy of the model.
The Mean Squared Error, Mean absolute error, Root Mean Squared Error, and R-Squared or
Coefficient of determination metrics are used to evaluate the performance of the model in
regression analysis.
 The Mean absolute error represents the average of the absolute difference between the
actual and predicted values in the dataset. It measures the average of the residuals in
the dataset.

 Mean Squared Error represents the average of the squared difference between the
original and predicted values in the data set. It measures the variance of the residuals.

 Root Mean Squared Error is the square root of Mean Squared error. It measures the
standard deviation of residuals.
 The coefficient of determination or R-squared represents the proportion of the
variance in the dependent variable which is explained by the linear regression model.
It is a scale-free score i.e. irrespective of the values being small or large, the value of
R square will be less than one.

 Mean Squared Error(MSE) and Root Mean Square Error penalizes the large
prediction errors vi-a-vis Mean Absolute Error (MAE). However, RMSE is widely
used than MSE to evaluate the performance of the regression model with other
random models as it has the same units as the dependent variable (Y-axis).
 MSE is a differentiable function that makes it easy to perform mathematical
operations in comparison to a non-differentiable function like MAE. Therefore, in
many models, RMSE is used as a default metric for calculating Loss Function despite
being harder to interpret than MAE.
 The lower value of MAE, MSE, and RMSE implies higher accuracy of a regression
model. However, a higher value of R square is considered desirable.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy