0% found this document useful (0 votes)
2 views8 pages

Machine Learning Theory Updated

The document provides an overview of various machine learning algorithms including Support Vector Machine (SVM), CART, Naive Bayes, PCA, LDA, CNN, RNN, K-Means, K-Modes, and K-Medoids. Each algorithm is described with key points, advantages, and disadvantages, highlighting their applications and differences. Additionally, it compares K-Means, K-Modes, and K-Medoids in a tabular format.

Uploaded by

nousenamae
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views8 pages

Machine Learning Theory Updated

The document provides an overview of various machine learning algorithms including Support Vector Machine (SVM), CART, Naive Bayes, PCA, LDA, CNN, RNN, K-Means, K-Modes, and K-Medoids. Each algorithm is described with key points, advantages, and disadvantages, highlighting their applications and differences. Additionally, it compares K-Means, K-Modes, and K-Medoids in a tabular format.

Uploaded by

nousenamae
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Machine Learning - 10 Marks Theoretical Questions

Support Vector Machine (SVM)

Support Vector Machine (SVM) is a Supervised Learning Algorithm mainly used for classification tasks. It

works by finding the best boundary (hyperplane) that separates the data into classes. It tries to maximize the

margin between the classes, and the closest points to the margin are called Support Vectors.

Key Points:

- Hyperplane: The decision boundary separating classes.

- Margin: Distance between the hyperplane and the closest points.

- Support Vectors: Important data points that define the boundary.

- Works for both linear and non-linear data using kernels.

- Common kernels: Linear, Polynomial, RBF (Gaussian).

Advantages:

- Works well with high-dimensional data.

- Effective with complex relationships.

Disadvantages:

- Slow with large datasets.

- Choosing the right kernel can be hard.

CART (Classification and Regression Tree)

CART stands for Classification And Regression Tree. It is a Supervised Learning algorithm that creates a

decision tree based on features. It works for both classification and regression tasks.

Key Points:

- Tree is built by splitting data using questions based on features.

- Root Node: First question.

- Decision Node: Internal question nodes.


Machine Learning - 10 Marks Theoretical Questions

- Leaf Node: Final output (class/value).

- Gini Index or Variance used to choose best split.

Advantages:

- Easy to understand and visualize.

- Handles both numerical and categorical data.

Disadvantages:

- Can overfit if not pruned.

- Sensitive to small data changes.

Naive Bayes Classifier

Naive Bayes is a Supervised Learning algorithm based on Bayes' Theorem. It assumes that features are

independent (naive assumption) and uses probabilities to classify data.

Key Points:

- Based on Bayes' Theorem.

- Assumes independence between features.

- Common in text classification (e.g., spam detection).

Types:

- Gaussian: For continuous data.

- Multinomial: For count data.

- Bernoulli: For binary data.

Advantages:

- Fast and efficient.

- Works well with large datasets.


Machine Learning - 10 Marks Theoretical Questions

Disadvantages:

- Assumes feature independence (not always true).

- Doesn't perform well with highly correlated features.

Principal Component Analysis (PCA)

PCA is an Unsupervised Learning technique for dimensionality reduction. It reduces the number of features

while keeping the most important information. It does this by creating new variables (principal components)

that capture the highest variance.

Key Points:

- Unsupervised: Does not use class labels.

- Finds directions with highest variance.

- Projects data into new space with fewer dimensions.

Advantages:

- Reduces overfitting.

- Faster computation.

Disadvantages:

- Hard to interpret new components.

- May lose some information.

Linear Discriminant Analysis (LDA)

LDA is a Supervised Learning technique for dimensionality reduction that also tries to separate classes as

much as possible. It projects data to a new space where class separation is maximized.

Key Points:

- Supervised: Uses class labels.


Machine Learning - 10 Marks Theoretical Questions

- Maximizes class separation.

- Used mainly for classification tasks.

Advantages:

- Improves classification.

- Reduces complexity.

Disadvantages:

- Assumes normal distribution.

- Works only for linearly separable data.

Difference Between PCA and LDA

| Feature | PCA | LDA |

|----------------------|-------------------------------|------------------------------|

| Type | Unsupervised | Supervised |

| Goal | Maximize data variance | Maximize class separation |

| Uses class labels? | No | Yes |

| Output | Principal Components | Linear Discriminants |

| Best For | Data compression | Classification |

| Application | Any dataset | Classification problems |

Convolutional Neural Network (CNN)

CNN (Convolutional Neural Network) is a type of deep learning model mainly used for image-related tasks. It

automatically detects patterns and features like edges, textures, or shapes from images.

Key Points:

- Input Layer: Takes image data (pixels).

- Convolution Layer: Applies filters to find features.


Machine Learning - 10 Marks Theoretical Questions

- ReLU Layer: Adds non-linearity.

- Pooling Layer: Reduces image size while keeping features.

- Fully Connected Layer: Makes final decisions.

Use Cases:

- Face recognition, medical image analysis, self-driving cars, image captioning.

Advantages:

- Good at image processing.

- Automatically finds features.

Disadvantages:

- Needs a lot of data and training time.

Recurrent Neural Network (RNN)

RNN (Recurrent Neural Network) is used for sequential data where order matters, like time series or

sentences. It remembers past inputs using loops.

Key Points:

- Input Layer: Takes sequences (words, time steps).

- Hidden Layer: Processes and remembers previous steps.

- Output Layer: Gives result.

Use Cases:

- Text generation, speech recognition, language translation, sentiment analysis.

Advantages:

- Great for sequences.

- Remembers past data.

Disadvantages:
Machine Learning - 10 Marks Theoretical Questions

- Hard to train.

- May forget long-term info (solved by LSTM/GRU).

K-Means Clustering

K-Means is an unsupervised algorithm that groups data into K clusters based on the mean (average). Each

cluster contains similar data points.

Steps:

1. Choose K clusters.

2. Pick initial centroids.

3. Assign points to closest centroid.

4. Update centroids by calculating mean.

5. Repeat until centroids stabilize.

Used For:

- Customer segmentation, image compression.

Pros:

- Simple, fast.

Cons:

- Only for numeric data, sensitive to outliers.

K-Modes Clustering

K-Modes is like K-Means but for categorical data. It uses the mode (most common value) instead of mean.

Steps:

1. Choose K clusters.

2. Pick initial modes.


Machine Learning - 10 Marks Theoretical Questions

3. Assign points based on matching categories.

4. Update cluster modes.

Used For:

- Survey data, customer categories.

Pros:

- Works on categorical data.

Cons:

- Struggles with large or complex data.

K-Medoids Clustering

K-Medoids is similar to K-Means but uses actual data points (medoids) as centers. It is more robust to

outliers.

Steps:

1. Choose K medoids.

2. Assign points to closest medoid.

3. Find new medoids with smallest total distance.

Used For:

- Sensitive data, small or mixed datasets.

Pros:

- Not affected by outliers.

Cons:

- Slower than K-Means.

K-Means vs K-Modes vs K-Medoids


Machine Learning - 10 Marks Theoretical Questions

| Feature | K-Means | K-Modes | K-Medoids |

|--------------------|------------------|---------------------|--------------------------------|

| Data Type | Numerical | Categorical | Mixed |

| Center Type | Mean | Mode | Medoid |

| Sensitive to Outliers | Yes | Yes | No |

| Speed | Fast | Fast | Slower |

| Uses Real Points? | No | No | Yes |

| Best For | Numeric data | Categorical data | Small or mixed datasets |

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy