0% found this document useful (0 votes)

21 views5 pages

Unit 2

First of all I really thankful to my Lovely Professional University because of them 1 could achieve the target. I express my sincere thanks to my project guide Mrs. Deepika Dhall who had guide to me throughout my project.

Uploaded by

babygirlandcutegirl143

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views5 pages

Unit 2

Uploaded by

babygirlandcutegirl143

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Machine Learning Algorithms Comparison M.

Rama Bhadra Rao 1

Steps K-Nearest Neighbors (KNN) Linear Regression Logistic Regression Naive Bayes (Gaussian) Decision Tree Random Forest Gradient Descent Support Vector Machine (SVM)
Imports
from sklearn.neighbors import from sklearn.linear_model import from sklearn.linear_model import from sklearn.naive_bayes import from sklearn.tree import from sklearn.ensemble import from sklearn.linear_model import from sklearn.svm import SVC
KNeighborsClassifier LinearRegression LogisticRegression GaussianNB DecisionTreeClassifier RandomForestClassifier SGDClassifier

Loading Dataset
from sklearn.datasets import load_iris from sklearn.datasets import load_iris from sklearn.datasets import load_iris from sklearn.datasets import load_iris from sklearn.datasets import load_iris from sklearn.datasets import load_iris from sklearn.datasets import load_iris from sklearn.datasets import load_iris
data = load_iris() data = load_iris() data = load_iris() data = load_iris() data = load_iris() data = load_iris() data = load_iris() data = load_iris()

X, y Division
X = data.data X = data.data X = data.data X = data.data X = data.data X = data.data X = data.data X = data.data
y = data.target y = data.target y = data.target y = data.target y = data.target y = data.target y = data.target y = data.target

Train-Test Split
from sklearn.model_selection import from sklearn.model_selection import from sklearn.model_selection import from sklearn.model_selection import from sklearn.model_selection import from sklearn.model_selection import from sklearn.model_selection import from sklearn.model_selection import
train_test_split train_test_split train_test_split train_test_split train_test_split train_test_split train_test_split train_test_split
X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test =
train_test_split(X, y, test_size train_test_split(X, y, test_size train_test_split(X, y, test_size train_test_split(X, y, test_size train_test_split(X, y, test_size train_test_split(X, y, test_size train_test_split(X, y, test_size train_test_split(X, y, test_size
=0.2, random_state=42) =0.2, random_state=42) =0.2, random_state=42) =0.2, random_state=42) =0.2, random_state=42) =0.2, random_state=42) =0.2, random_state=42) =0.2, random_state=42)

Initializing Algorithm
knn = KNeighborsClassifier(n_neighbors lr = LinearRegression() log_reg = LogisticRegression(max_iter gnb = GaussianNB() dt = DecisionTreeClassifier(random_state rf = RandomForestClassifier(n_estimators sgd = SGDClassifier(max_iter=1000, tol=1 svm = SVC(kernel=’linear’)
=3) =200) =42) =100, random_state=42) e-3)

Fitting Model
knn.fit(X_train, y_train) lr.fit(X_train, y_train) log_reg.fit(X_train, y_train) gnb.fit(X_train, y_train) dt.fit(X_train, y_train) rf.fit(X_train, y_train) sgd.fit(X_train, y_train) svm.fit(X_train, y_train)

Predicting
y_pred = knn.predict(X_test) y_pred = lr.predict(X_test) y_pred = log_reg.predict(X_test) y_pred = gnb.predict(X_test) y_pred = dt.predict(X_test) y_pred = rf.predict(X_test) y_pred = sgd.predict(X_test) y_pred = svm.predict(X_test)

Accuracy Calculation
accuracy = knn.score(X_test, y_test) accuracy = lr.score(X_test, y_test) accuracy = log_reg.score(X_test, y_test) accuracy = gnb.score(X_test, y_test) accuracy = dt.score(X_test, y_test) accuracy = rf.score(X_test, y_test) accuracy = sgd.score(X_test, y_test) accuracy = svm.score(X_test, y_test)
print(f"KNN Accuracy: {accuracy}") print(f"Linear Regression R^2: {accuracy print(f"Logistic Regression Accuracy: { print(f"Naive Bayes Accuracy: {accuracy} print(f"Decision Tree Accuracy: { print(f"Random Forest Accuracy: { print(f"SGD Classifier Accuracy: { print(f"SVM Accuracy: {accuracy}")
}") accuracy}") ") accuracy}") accuracy}") accuracy}")

Complete Code
from sklearn.neighbors import from sklearn.linear_model import from sklearn.linear_model import from sklearn.naive_bayes import from sklearn.tree import from sklearn.ensemble import from sklearn.linear_model import from sklearn.svm import SVC
KNeighborsClassifier LinearRegression LogisticRegression GaussianNB DecisionTreeClassifier RandomForestClassifier SGDClassifier from sklearn.model_selection import
from sklearn.model_selection import from sklearn.model_selection import from sklearn.model_selection import from sklearn.model_selection import from sklearn.model_selection import from sklearn.model_selection import from sklearn.model_selection import train_test_split
train_test_split train_test_split train_test_split train_test_split train_test_split train_test_split train_test_split from sklearn.datasets import load_iris
from sklearn.datasets import load_iris from sklearn.datasets import load_iris from sklearn.datasets import load_iris from sklearn.datasets import load_iris from sklearn.datasets import load_iris from sklearn.datasets import load_iris from sklearn.datasets import load_iris
data = load_iris()
data = load_iris() data = load_iris() data = load_iris() data = load_iris() data = load_iris() data = load_iris() data = load_iris() X = data.data
X = data.data X = data.data X = data.data X = data.data X = data.data X = data.data X = data.data y = data.target
y = data.target y = data.target y = data.target y = data.target y = data.target y = data.target y = data.target X_train, X_test, y_train, y_test =
X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test = train_test_split(X, y, test_size
train_test_split(X, y, test_size train_test_split(X, y, test_size train_test_split(X, y, test_size train_test_split(X, y, test_size train_test_split(X, y, test_size train_test_split(X, y, test_size train_test_split(X, y, test_size =0.2, random_state=42)
=0.2, random_state=42) =0.2, random_state=42) =0.2, random_state=42) =0.2, random_state=42) =0.2, random_state=42) =0.2, random_state=42) =0.2, random_state=42)
svm = SVC(kernel=’linear’)
knn = KNeighborsClassifier(n_neighbors lr = LinearRegression() log_reg = LogisticRegression(max_iter gnb = GaussianNB() dt = DecisionTreeClassifier(random_state rf = RandomForestClassifier(n_estimators sgd = SGDClassifier(max_iter=1000, tol=1 svm.fit(X_train, y_train)
=3) lr.fit(X_train, y_train) =200) gnb.fit(X_train, y_train) =42) =100, random_state=42) e-3) y_pred = svm.predict(X_test)
knn.fit(X_train, y_train) y_pred = lr.predict(X_test) log_reg.fit(X_train, y_train) y_pred = gnb.predict(X_test) dt.fit(X_train, y_train) rf.fit(X_train, y_train) sgd.fit(X_train, y_train)
y_pred = knn.predict(X_test) y_pred = log_reg.predict(X_test) y_pred = dt.predict(X_test) y_pred = rf.predict(X_test) y_pred = sgd.predict(X_test) accuracy = svm.score(X_test, y_test)
accuracy = lr.score(X_test, y_test) accuracy = gnb.score(X_test, y_test) print(f"SVM Accuracy: {accuracy}")
accuracy = knn.score(X_test, y_test) print(f"Linear Regression R^2: {accuracy accuracy = log_reg.score(X_test, y_test) print(f"Naive Bayes Accuracy: {accuracy} accuracy = dt.score(X_test, y_test) accuracy = rf.score(X_test, y_test) accuracy = sgd.score(X_test, y_test)
print(f"KNN Accuracy: {accuracy}") }") print(f"Logistic Regression Accuracy: { ") print(f"Decision Tree Accuracy: { print(f"Random Forest Accuracy: { print(f"SGD Classifier Accuracy: {
accuracy}") accuracy}") accuracy}") accuracy}")

Department of MCA, Swarnandhra 2024 September 28, 2024

Machine Learning Algorithms Comparison M.Rama Bhadra Rao 2

Algorithm Functions, Parameters, and Theoretical Analysis

K-Nearest Neighbors (KNeighborsClassifier)
Function Description: The ‘KNeighborsClassifier‘ is used for classification by identifying the ’k’ nearest data points in the feature space and assigning the majority class.
- **Arguments**:

• n neighbors: Specifies the number of neighbors to consider. Increasing ‘n neighbors‘ leads to a smoother decision boundary but might reduce model flexibility.
• weights: Determines how the neighbors influence the classification. Options include: - ‘uniform‘: All neighbors have equal weight. - ‘distance‘: Neighbors are weighted by their distance, with closer points having a greater influence.
• algorithm: Chooses the algorithm to compute nearest neighbors. Options include ‘auto‘, ‘ball tree‘, ‘kd tree‘, and ‘brute‘.

Theoretical Analysis: The K-Nearest Neighbors (KNN) algorithm is a non-parametric, instance-based learning technique. It classifies a new data point based on the majority class of its ’k’ nearest neighbors in the feature space. The algorithm is intuitive and simple to implement but can be computationally expensive with large datasets. The
choice of ’k’ significantly impacts its performance, and higher-dimensional data can lead to the ”curse of dimensionality”.

Linear Regression (LinearRegression)

Function Description: The ‘LinearRegression‘ model fits a linear relationship between input features and the target variable.
- **Arguments**:
• fit intercept: If set to ‘True‘, the model will calculate the intercept. If ‘False‘, no intercept will be used.
• normalize: If ‘True‘, the regressors are normalized before regression by subtracting the mean and scaling to unit variance.

• copy X: If ‘True‘, ‘X‘ will be copied; otherwise, it may be overwritten.

Theoretical Analysis: Linear Regression is a simple yet powerful regression model that fits a linear relationship between the input features and the target variable. It is widely used in various applications for its interpretability and ease of implementation. However, it assumes linearity, homoscedasticity, and independence of errors, which may not
hold in real-world data.

Logistic Regression (LogisticRegression)

Function Description: The ‘LogisticRegression‘ model is used for binary classification tasks. It predicts the probability of a class using a sigmoid function.
- **Arguments**:
• penalty: Specifies the regularization to use. Options are ‘l1‘, ‘l2‘, ‘elasticnet‘, or ‘none‘.

• solver: Chooses the algorithm to use for optimization. Examples include ‘liblinear‘, ‘saga‘, and ‘newton-cg‘.
• max iter: Specifies the maximum number of iterations for the solver to converge.
Theoretical Analysis: Logistic Regression is used for binary classification tasks, predicting the probability of class membership using a sigmoid function. It is simple and effective for linearly separable datasets but can struggle with complex or non-linear decision boundaries.

Naive Bayes (GaussianNB)

Function Description: The ‘GaussianNB‘ model is a probabilistic classifier based on Bayes’ theorem, assuming that the features are normally distributed.
- **Arguments**:
• priors: Specifies prior probabilities of the classes. If not provided, the model will calculate them from the data.

• var smoothing: Adds a small value to the variance for stability and to prevent zero variances.
Theoretical Analysis: Naive Bayes classifiers are based on the Bayes theorem, assuming independence between features. Despite this assumption, it performs surprisingly well in many applications, especially text classification. The Gaussian variant assumes normally distributed data, making it suitable for continuous input features.

Decision Tree (DecisionTreeClassifier)

Function Description: The ‘DecisionTreeClassifier‘ recursively splits the data to create a tree structure that minimizes impurity.
- **Arguments**:
• criterion: Defines the function to measure the quality of a split. Options include ‘gini‘ (Gini impurity) and ‘entropy‘ (information gain).
• splitter: Determines the strategy for splitting nodes: ‘best‘ or ‘random‘.

• max depth: Sets the maximum depth of the tree. Limiting the depth can prevent overfitting.
Theoretical Analysis: Decision Trees are versatile models capable of handling both regression and classification tasks. They recursively split the dataset into subsets based on feature values to minimize entropy or impurity. While easy to interpret, they are prone to overfitting and can create complex models that generalize poorly.

Random Forest (RandomForestClassifier)

Function Description: The ‘RandomForestClassifier‘ is an ensemble method that constructs multiple decision trees and averages their predictions.
- **Arguments**:
• n estimators: Specifies the number of trees in the forest. More trees can improve accuracy but increase computational cost.
• max features: Defines the number of features to consider when looking for the best split.

• bootstrap: If ‘True‘, subsets of samples are used with replacement for training each tree.
Theoretical Analysis: Random Forest is an ensemble learning method that combines multiple decision trees to improve model performance and reduce overfitting. It offers robust results by averaging multiple tree predictions and using random feature subsets for training. However, it can be computationally intensive and less interpretable than
single trees.

Stochastic Gradient Descent (SGDClassifier or SGDRegressor)

Function Description: SGD is an iterative optimization algorithm used to train linear models. It is efficient for large-scale problems.
- **Arguments**:
• loss: Defines the loss function to be minimized, such as ‘hinge‘ for SVM, ‘log‘ for logistic regression, and ‘squared loss‘ for linear regression.

• learning rate: Controls the step size during gradient descent. Options include ‘constant‘, ‘optimal‘, ‘invscaling‘, and ‘adaptive‘.
• max iter: Sets the maximum number of passes over the training data.
Theoretical Analysis: Gradient Descent is an optimization algorithm used for training models like linear and logistic regression. It iteratively updates model parameters in the direction of the negative gradient of the loss function. While it is efficient for large datasets, it can get stuck in local minima and may require careful tuning of learning rates.

Department of MCA, Swarnandhra 2024 September 28, 2024

Machine Learning Algorithms Comparison M.Rama Bhadra Rao 3

Support Vector Machine (SVC)

Function Description: The ‘SVC‘ classifier constructs a hyperplane to separate data points from different classes.
- **Arguments**:
• kernel: Specifies the kernel function to transform data. Options include ‘linear‘, ‘poly‘, ‘rbf‘, and ‘sigmoid‘.

• C: Regularization parameter controlling the trade-off between achieving a low error on training and testing datasets.
• gamma: Defines the influence of individual training examples on the decision boundary. A higher value results in a more flexible model.
Theoretical Analysis: Support Vector Machine (SVM) is a powerful classification algorithm that finds the hyperplane maximizing the margin between two classes. It is effective in high-dimensional spaces and can be used for non-linear classification with kernel tricks. However, it is sensitive to outliers and can be computationally expensive for
large datasets.

Conclusion
Understanding the significance of each algorithm’s function and its arguments is crucial for tuning models to achieve the best performance. Each argument allows users to control specific aspects of the model, influencing its behavior and outcomes on different datasets. Choosing appropriate values for these parameters can significantly enhance model
performance and generalization capabilities.

Department of MCA, Swarnandhra 2024 September 28, 2024

Machine Learning Algorithms Comparison M.Rama Bhadra Rao 4

Loading Dataset
from sklearn.datasets import from sklearn.datasets import from sklearn.datasets import from sklearn.datasets import from sklearn.datasets import from sklearn.datasets import from sklearn.datasets import from sklearn.datasets import
load_iris fetch_california_housing load_breast_cancer load_iris load_breast_cancer load_breast_cancer fetch_california_housing load_iris
data = load_iris() data = fetch_california_housing() data = load_breast_cancer() data = load_iris() data = load_breast_cancer() data = load_breast_cancer() data = fetch_california_housing() data = load_iris()

Train-Test Split
from sklearn.model_selection import from sklearn.model_selection import from sklearn.model_selection import from sklearn.model_selection import from sklearn.model_selection import from sklearn.model_selection import from sklearn.model_selection import from sklearn.model_selection import
train_test_split train_test_split train_test_split train_test_split train_test_split train_test_split train_test_split train_test_split
X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test =
train_test_split(X, y, train_test_split(X, y, train_test_split(X, y, train_test_split(X, y, train_test_split(X, y, train_test_split(X, y, train_test_split(X, y, train_test_split(X, y,
test_size=0.2, random_state=42) test_size=0.2, random_state=42) test_size=0.2, random_state=42) test_size=0.2, random_state=42) test_size=0.2, random_state=42) test_size=0.2, random_state=42) test_size=0.2, random_state=42) test_size=0.2, random_state=42)

Initializing Algorithm
knn = KNeighborsClassifier( lr = LinearRegression() log_reg = LogisticRegression( gnb = GaussianNB() dt = DecisionTreeClassifier( rf = RandomForestClassifier( sgd = SGDRegressor(max_iter=1000, svm = SVC(kernel=’linear’)
n_neighbors=3) max_iter=200) random_state=42) n_estimators=100, random_state tol=1e-3)
=42)

Accuracy Calculation
accuracy = knn.score(X_test, y_test) accuracy = lr.score(X_test, y_test) accuracy = log_reg.score(X_test, accuracy = gnb.score(X_test, y_test) accuracy = dt.score(X_test, y_test) accuracy = rf.score(X_test, y_test) accuracy = sgd.score(X_test, y_test) accuracy = svm.score(X_test, y_test)
print(f"KNN Accuracy: {accuracy}") print(f"Linear Regression R^2: { y_test) print(f"Naive Bayes Accuracy: { print(f"Decision Tree Accuracy: { print(f"Random Forest Accuracy: { print(f"SGD Regressor R^2: {accuracy print(f"SVM Accuracy: {accuracy}")
accuracy}") print(f"Logistic Regression Accuracy accuracy}") accuracy}") accuracy}") }")
: {accuracy}")

Complete Code
# K-Nearest Neighbors (Classification # Linear Regression (Regression on # Logistic Regression (Classification # Naive Bayes (Classification on Iris # Decision Tree (Classification on # Random Forest (Classification on # Stochastic Gradient Descent ( # Support Vector Machine (
on Iris Dataset) California Housing Dataset) on Breast Cancer Dataset) Dataset) Breast Cancer Dataset) Breast Cancer Dataset) Regression on California Housing Classification on Iris Dataset)
from sklearn.neighbors import from sklearn.linear_model import from sklearn.linear_model import from sklearn.naive_bayes import from sklearn.tree import from sklearn.ensemble import Dataset) from sklearn.svm import SVC
KNeighborsClassifier LinearRegression LogisticRegression GaussianNB DecisionTreeClassifier RandomForestClassifier from sklearn.linear_model import from sklearn.model_selection import
from sklearn.model_selection import from sklearn.model_selection import from sklearn.model_selection import from sklearn.model_selection import from sklearn.model_selection import from sklearn.model_selection import SGDRegressor train_test_split
train_test_split train_test_split train_test_split train_test_split train_test_split train_test_split from sklearn.model_selection import from sklearn.datasets import
from sklearn.datasets import from sklearn.datasets import from sklearn.datasets import from sklearn.datasets import from sklearn.datasets import from sklearn.datasets import train_test_split load_iris
load_iris fetch_california_housing load_breast_cancer load_iris load_breast_cancer load_breast_cancer from sklearn.datasets import
fetch_california_housing data = load_iris()
data = load_iris() data = fetch_california_housing() data = load_breast_cancer() data = load_iris() data = load_breast_cancer() data = load_breast_cancer() X = data.data
X = data.data X = data.data X = data.data X = data.data X = data.data X = data.data data = fetch_california_housing() y = data.target
y = data.target y = data.target y = data.target y = data.target y = data.target y = data.target X = data.data X_train, X_test, y_train, y_test =
X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test = y = data.target train_test_split(X, y,
train_test_split(X, y, train_test_split(X, y, train_test_split(X, y, train_test_split(X, y, train_test_split(X, y, train_test_split(X, y, X_train, X_test, y_train, y_test = test_size=0.2, random_state=42)
test_size=0.2, random_state=42) test_size=0.2, random_state=42) test_size=0.2, random_state=42) test_size=0.2, random_state=42) test_size=0.2, random_state=42) test_size=0.2, random_state=42) train_test_split(X, y,
test_size=0.2, random_state=42) svm = SVC(kernel=’linear’)
knn = KNeighborsClassifier( lr = LinearRegression() log_reg = LogisticRegression( gnb = GaussianNB() dt = DecisionTreeClassifier( rf = RandomForestClassifier( svm.fit(X_train, y_train)
n_neighbors=3) lr.fit(X_train, y_train) max_iter=200) gnb.fit(X_train, y_train) random_state=42) n_estimators=100, random_state sgd = SGDRegressor(max_iter=1000, y_pred = svm.predict(X_test)
knn.fit(X_train, y_train) y_pred = lr.predict(X_test) log_reg.fit(X_train, y_train) y_pred = gnb.predict(X_test) dt.fit(X_train, y_train) =42) tol=1e-3)
y_pred = knn.predict(X_test) y_pred = log_reg.predict(X_test) y_pred = dt.predict(X_test) rf.fit(X_train, y_train) sgd.fit(X_train, y_train) accuracy = svm.score(X_test, y_test)
accuracy = lr.score(X_test, y_test) accuracy = gnb.score(X_test, y_test) y_pred = rf.predict(X_test) y_pred = sgd.predict(X_test) print(f"SVM Accuracy: {accuracy}")
accuracy = knn.score(X_test, y_test) print(f"Linear Regression R^2: { accuracy = log_reg.score(X_test, print(f"Naive Bayes Accuracy: { accuracy = dt.score(X_test, y_test)
print(f"KNN Accuracy: {accuracy}") accuracy}") y_test) accuracy}") print(f"Decision Tree Accuracy: { accuracy = rf.score(X_test, y_test) accuracy = sgd.score(X_test, y_test)
print(f"Logistic Regression Accuracy accuracy}") print(f"Random Forest Accuracy: { print(f"SGD Regressor R^2: {accuracy
: {accuracy}") accuracy}") }")

Department of MCA, Swarnandhra 2024 September 28, 2024

Machine Learning Algorithms Comparison M.Rama Bhadra Rao 5

Steps K-Nearest Neighbors (Iris Linear Regression (California Logistic Regression (Breast Can- Naive Bayes (Iris Dataset) Decision Tree (Breast Cancer Random Forest (Breast Cancer Gradient Descent (California Support Vector Machine (Iris
Dataset) Housing Dataset) cer Dataset) Dataset) Dataset) Housing Dataset) Dataset)
Imports
from sklearn.neighbors import from sklearn.linear_model import from sklearn.linear_model import from sklearn.naive_bayes import from sklearn.tree import from sklearn.ensemble import from sklearn.linear_model import from sklearn.svm import SVC
KNeighborsClassifier LinearRegression LogisticRegression GaussianNB DecisionTreeClassifier RandomForestClassifier SGDRegressor from sklearn.metrics import
from sklearn.metrics import from sklearn.metrics import from sklearn.metrics import from sklearn.metrics import from sklearn.metrics import from sklearn.metrics import from sklearn.metrics import accuracy_score, f1_score
accuracy_score, f1_score mean_squared_error, accuracy_score, f1_score, accuracy_score, f1_score accuracy_score, f1_score accuracy_score, f1_score mean_squared_error,
mean_absolute_error, r2_score classification_report mean_absolute_error, r2_score

Train-Test Split
from sklearn.model_selection import from sklearn.model_selection import from sklearn.model_selection import from sklearn.model_selection import from sklearn.model_selection import from sklearn.model_selection import from sklearn.model_selection import from sklearn.model_selection import
train_test_split train_test_split train_test_split train_test_split train_test_split train_test_split train_test_split train_test_split
X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test = X_train, X_test, y_train, y_test =
train_test_split(X, y, train_test_split(X, y, train_test_split(X, y, train_test_split(X, y, train_test_split(X, y, train_test_split(X, y, train_test_split(X, y, train_test_split(X, y,
test_size=0.2, random_state=42) test_size=0.2, random_state=42) test_size=0.2, random_state=42) test_size=0.2, random_state=42) test_size=0.2, random_state=42) test_size=0.2, random_state=42) test_size=0.2, random_state=42) test_size=0.2, random_state=42)

Evaluation Metrics
accuracy = accuracy_score(y_test, mse = mean_squared_error(y_test, accuracy = accuracy_score(y_test, accuracy = accuracy_score(y_test, accuracy = accuracy_score(y_test, accuracy = accuracy_score(y_test, mse = mean_squared_error(y_test, accuracy = accuracy_score(y_test,
y_pred) y_pred) y_pred) y_pred) y_pred) y_pred) y_pred) y_pred)
f1 = f1_score(y_test, y_pred, mae = mean_absolute_error(y_test, f1 = f1_score(y_test, y_pred, f1 = f1_score(y_test, y_pred, f1 = f1_score(y_test, y_pred, f1 = f1_score(y_test, y_pred, mae = mean_absolute_error(y_test, f1 = f1_score(y_test, y_pred,
average=’weighted’) y_pred) average=’weighted’) average=’weighted’) average=’weighted’) average=’weighted’) y_pred) average=’weighted’)
print(f"KNN Accuracy: {accuracy}") r2 = r2_score(y_test, y_pred) report = classification_report( print(f"Naive Bayes Accuracy: { print(f"Decision Tree Accuracy: { print(f"Random Forest Accuracy: { r2 = r2_score(y_test, y_pred) print(f"SVM Accuracy: {accuracy}")
print(f"KNN F1 Score: {f1}") print(f"Linear Regression MSE: {mse} y_test, y_pred) accuracy}") accuracy}") accuracy}") print(f"SGD Regressor MSE: {mse}") print(f"SVM F1 Score: {f1}")
") print(f"Logistic Regression Accuracy print(f"Naive Bayes F1 Score: {f1}") print(f"Decision Tree F1 Score: {f1} print(f"Random Forest F1 Score: {f1} print(f"SGD Regressor MAE: {mae}")
print(f"Linear Regression MAE: {mae} : {accuracy}") ") ") print(f"SGD Regressor R\
") print(f"Logistic Regression F1 Score textsuperscript{2}: {r2}")
print(f"Linear Regression R\ : {f1}")
textsuperscript{2}: {r2}") print(report)

Evaluation Metrics
Accuracy
Accuracy is the ratio of correctly predicted observations to the total observations. It is suitable for balanced datasets but can be misleading for imbalanced classes.

Mean Squared Error (MSE)

MSE measures the average squared difference between actual and predicted values. It is sensitive to outliers, giving them a higher penalty due to squaring the errors.

Mean Absolute Error (MAE)

MAE calculates the average magnitude of errors in a set of predictions. Unlike MSE, it is more robust to outliers and provides a linear score.

R-Squared (R²)
R² measures the proportion of variance in the target variable explained by the model. It ranges from 0 to 1, with values closer to 1 indicating better performance.

F1 Score
F1 Score is the harmonic mean of precision and recall. It is more informative than accuracy for imbalanced datasets, as it considers both false positives and false negatives.

Department of MCA, Swarnandhra 2024 September 28, 2024

ML Cheatsheet
No ratings yet
ML Cheatsheet
4 pages
VND - Openxmlformats Officedocument - Wordprocessingml.document&rendition 1
No ratings yet
VND - Openxmlformats Officedocument - Wordprocessingml.document&rendition 1
24 pages
Jonnala Naresh New 1
No ratings yet
Jonnala Naresh New 1
53 pages
Practicalpgm ML
No ratings yet
Practicalpgm ML
33 pages
Project Document
No ratings yet
Project Document
58 pages
App Un-3
No ratings yet
App Un-3
47 pages
Mca Hrm@unit - Iii
No ratings yet
Mca Hrm@unit - Iii
37 pages
ML P-6 - 024
No ratings yet
ML P-6 - 024
22 pages
Mca Hrm@unit - I
No ratings yet
Mca Hrm@unit - I
19 pages
ML Lab-1
No ratings yet
ML Lab-1
32 pages
Classification Algorithms
No ratings yet
Classification Algorithms
3 pages
Machine Learning Practicals
No ratings yet
Machine Learning Practicals
7 pages
PYHTONPRACT
No ratings yet
PYHTONPRACT
4 pages
ML Lab 146
No ratings yet
ML Lab 146
50 pages
ML Brefing
No ratings yet
ML Brefing
28 pages
Assignment 1-ML
No ratings yet
Assignment 1-ML
4 pages
Adobe Scan 30-Sept-2024
No ratings yet
Adobe Scan 30-Sept-2024
7 pages
Najir Shaikh Practical 4
No ratings yet
Najir Shaikh Practical 4
4 pages
Aiml Practical
No ratings yet
Aiml Practical
17 pages
ML Prac1-10
No ratings yet
ML Prac1-10
32 pages
ML File
No ratings yet
ML File
17 pages
Assignment 2
No ratings yet
Assignment 2
4 pages
ML5 Implementation
No ratings yet
ML5 Implementation
32 pages
8 To 12 Jaimeen
No ratings yet
8 To 12 Jaimeen
34 pages
Ds Assign 33
No ratings yet
Ds Assign 33
7 pages
Random Forest
No ratings yet
Random Forest
2 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
MLT 1 - 7 Kanish
No ratings yet
MLT 1 - 7 Kanish
24 pages
Shashank ML
No ratings yet
Shashank ML
23 pages
Perform Prediction Using Regression Algorithm: Ex No: 1 Date
No ratings yet
Perform Prediction Using Regression Algorithm: Ex No: 1 Date
13 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
ML Lab Manual
No ratings yet
ML Lab Manual
19 pages
ML Lab
No ratings yet
ML Lab
4 pages
ML Using Python Programs
No ratings yet
ML Using Python Programs
12 pages
Prakhar - Week 5
No ratings yet
Prakhar - Week 5
8 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
11 pages
Prathamesh KRAI
No ratings yet
Prathamesh KRAI
38 pages
Home Work
No ratings yet
Home Work
12 pages
Classification Review
No ratings yet
Classification Review
8 pages
Codes For Project
No ratings yet
Codes For Project
8 pages
AI ML - Cycle 2 Programs
No ratings yet
AI ML - Cycle 2 Programs
15 pages
Practice 2+
No ratings yet
Practice 2+
25 pages
AI Lab M.Tech
No ratings yet
AI Lab M.Tech
29 pages
Machine Learning Classification Bootcamp Cheatsheet
No ratings yet
Machine Learning Classification Bootcamp Cheatsheet
7 pages
ML External Xerox
No ratings yet
ML External Xerox
1 page
ML Lab Programs
No ratings yet
ML Lab Programs
9 pages
Shobit Sharma (2124399) ML Lab File PDF
No ratings yet
Shobit Sharma (2124399) ML Lab File PDF
19 pages
ML With Python Practical
No ratings yet
ML With Python Practical
22 pages
23BCE7092 ML Lab Assignment
No ratings yet
23BCE7092 ML Lab Assignment
14 pages
ML Models
No ratings yet
ML Models
21 pages
ML File
No ratings yet
ML File
10 pages
Easy Pract ML
No ratings yet
Easy Pract ML
7 pages
MlLabManualdocx 2024 09 04 22 02 58
No ratings yet
MlLabManualdocx 2024 09 04 22 02 58
19 pages
ML Manual With Outputs
No ratings yet
ML Manual With Outputs
30 pages
Advanced Python
No ratings yet
Advanced Python
17 pages
Linearregression SVM
No ratings yet
Linearregression SVM
3 pages
Case Study - Classifier
No ratings yet
Case Study - Classifier
5 pages
Flask - V Unit
No ratings yet
Flask - V Unit
7 pages
Jadavpur University: Assignment Submission
No ratings yet
Jadavpur University: Assignment Submission
9 pages
Unit2 ML Programs
No ratings yet
Unit2 ML Programs
7 pages
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
No ratings yet
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
20 pages
Machine Learning: Engr. Ejaz Ahmad
No ratings yet
Machine Learning: Engr. Ejaz Ahmad
54 pages
Udacity Machine Learning Analysis Supervised Learning
100% (1)
Udacity Machine Learning Analysis Supervised Learning
504 pages
Machine Learning Cheatsheet
No ratings yet
Machine Learning Cheatsheet
5 pages
ML Algorithms
100% (1)
ML Algorithms
1 page
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit 2

Uploaded by

Unit 2

Uploaded by

Machine Learning Algorithms Comparison M.

Rama Bhadra Rao 1

Department of MCA, Swarnandhra 2024 September 28, 2024

Algorithm Functions, Parameters, and Theoretical Analysis

Linear Regression (LinearRegression)

• copy X: If ‘True‘, ‘X‘ will be copied; otherwise, it may be overwritten.

Logistic Regression (LogisticRegression)

Naive Bayes (GaussianNB)

Decision Tree (DecisionTreeClassifier)

Random Forest (RandomForestClassifier)

Stochastic Gradient Descent (SGDClassifier or SGDRegressor)

Department of MCA, Swarnandhra 2024 September 28, 2024

Support Vector Machine (SVC)

Department of MCA, Swarnandhra 2024 September 28, 2024

Department of MCA, Swarnandhra 2024 September 28, 2024

Mean Squared Error (MSE)

Mean Absolute Error (MAE)

Department of MCA, Swarnandhra 2024 September 28, 2024

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.