UNIT II Material
UNIT II Material
Supervised learning can be used for Unsupervised learning can be used for
2
8 those cases where we know the input those cases where we have only input
as well as corresponding outputs. data and no corresponding output data.
Note: The supervised and unsupervised learning both are the machine learning
methods, and selection of any of these learning depends on the factors related to
the structure and volume of your dataset and the use cases of the problem.
3
Regression analysis is a statistical method to model the relationship between a
dependent (target) and independent (predictor) variables with one or more
independent variables. More specifically, Regression analysis helps us to
understand how the value of the dependent variable is changing corresponding
to an independent variable when other independent variables are held fixed. It
predicts continuous/real values such as temperature, age, salary, price, etc.
Example: Suppose there is a marketing company A, who does various
advertisement every year and get sales on that. The below list shows the
advertisement made by the company in the last 5 years and the corresponding
sales:
Now, the company wants to do the advertisement of $200 in the year 2019 and
wants to know the prediction about the sales for this year. So to solve such
type of prediction problems in machine learning, we need regression analysis.
Some examples of regression can be as:
1. Prediction of rain using temperature and other factors
2. Determining Market trends
3. Prediction of road accidents due to rash driving.
4
As mentioned above, Regression analysis helps in the prediction of a continuous
variable. There are various scenarios in the real world where we need some
future predictions such as weather condition, sales prediction, marketing trends,
etc.
1. Regression estimates the relationship between the target and the
independent variable.
2. It is used to find the trends in data.
3. It helps to predict real/continuous values.
4. By performing the regression, we can confidently determine the most
important factor, the least important factor, and how each factor is
affecting the other factors.
Types of Regression
There are various types of regressions which are used in data science and
machine learning. Each type has its own importance on different scenarios, but
at the core, all the regression methods analyze the effect of the independent
variable on dependent variables. Here we are discussing some important types
of regression which are given below:
1. Linear Regression
2. Logistic Regression
3. Polynomial Regression
4. Support Vector Regression
5. Decision Tree Regression
6. Random Forest Regression
7. Ridge Regression
8. Lasso Regression:
5
Decision Tree induction
6
Decision Tree Induction in Soft Computing involves integrating techniques from
soft computing (such as fuzzy logic, genetic algorithms, and neural networks) with
decision tree algorithms to handle imprecision, uncertainty, and complex problem
spaces more effectively.
1. Overview of Decision Tree Induction
Decision Trees are a supervised learning method used for classification and
regression tasks.
The tree structure:
o Nodes: Represent attributes.
o Edges: Represent attribute values.
o Leaves: Represent class labels (or predicted outcomes).
Advantages:
o Simple to understand and interpret.
o Handles both categorical and numerical data.
o Requires minimal data preprocessing.
5. Example Framework
A fuzzy decision tree induction algorithm might include:
1. Fuzzification: Convert crisp inputs into fuzzy sets.
2. Tree Construction: Split nodes using fuzzy information gain or entropy.
3. Pruning: Use genetic algorithms to simplify the tree.
4. Defuzzification: Map fuzzy outputs back to crisp decisions.
************************************
Decision Tree Algorithm
The Decision Tree Algorithm is a popular machine learning technique used for both
classification and regression tasks. It works by recursively splitting the dataset into
subsets based on specific feature values, ultimately forming a tree structure. The goal
is to partition the data such that the subsets are as homogeneous as possible with
respect to the target variable.
1. Root Node:
Represents the entire dataset.
Splits into child nodes based on the best feature and value.
2. Internal Nodes:
Represent decision points on features.
Each node corresponds to a condition (e.g., "Is feature x≤cx \leq cx≤c?").
9
3. Leaf Nodes:
Represent the outcome or prediction (e.g., a class label or a regression value).
4. Splitting Criterion:
Determines how the dataset is divided at each node.
Common criteria:
o Gini Impurity: Measures the probability of misclassification.
o Entropy (Information Gain): Measures the reduction in randomness.
o Variance Reduction: Used in regression tasks to minimize the variance
within subsets.
5. Pruning:
Reduces the size of the tree to avoid overfitting.
Can be done pre-emptively (before the tree grows) or post-hoc (by trimming
branches).
Advantages
Easy to Interpret: Intuitive structure that can be visualized.
Non-Parametric: Makes no assumptions about data distribution.
Feature Importance: Provides insight into which features are most important
for predictions.
Disadvantages
Overfitting: Fully grown trees may overfit the training data.
Bias to Dominant Features: May favor features with more levels/categories.
Instability: Small changes in data can lead to different trees.
Common Variants
1. CART (Classification and Regression Trees):
o Used for both classification and regression.
o Binary splits at each node.
2. ID3 (Iterative Dichotomiser 3):
o Uses entropy for splitting and creates multi-way splits.
3. C4.5:
10
o An improvement of ID3, handling both continuous and categorical
features.
4. Random Forest:
o An ensemble method that builds multiple decision trees and averages
their predictions.
5. Gradient Boosted Trees:
o Builds trees sequentially, optimizing errors of previous trees.
*************
11
A rule has two parts:
o Antecedent (Condition): The "if" part (e.g., IF age > 30 AND income < 50k).
o Consequent (Result): The "then" part (e.g., THEN class = Low_Risk).
Examples:
Greedy Hill Climbing: Starts with an empty or random graph and iteratively
adds/removes/reverses edges to maximize the score.
15
Simulated Annealing: Searches the space of structures while avoiding local
maxima by allowing occasional downhill moves.
Exact Search (Dynamic Programming): Searches exhaustively for the best
structure (feasible only for small networks).
Common Scoring Functions:
Bayesian Information Criterion (BIC)
Akaike Information Criterion (AIC)
Bayesian Dirichlet Equivalent (BDe)
Advantages:
Flexibility in combining prior knowledge.
Can work with partial or noisy data.
Disadvantages:
Computationally expensive for large networks.
Risk of overfitting with insufficient data.
Hybrid Algorithms
Combine constraint-based and score-based approaches to leverage the strengths
of both.
16
Examples:
Max-Min Hill Climbing (MMHC): Uses constraint-based methods to identify
a candidate set of edges and then applies score-based methods to refine the
structure.
Advantages:
Balances computational efficiency and accuracy.
Reduces sensitivity to errors in dependency tests.
2. Parameter Learning Algorithms
Once the structure is learned or predefined, the next step is to estimate the CPDs for
each variable. Parameter learning depends on whether the data is complete or
incomplete.
For Complete Data:
Maximum Likelihood Estimation (MLE): Estimates parameters directly
from the data.
Bayesian Estimation: Incorporates prior distributions to estimate CPDs.
For Incomplete Data:
Expectation-Maximization (EM): Iteratively estimates missing data and
updates parameters to maximize the likelihood.
18
o Evolve computer programs or symbolic expressions to solve problems.
o Represent solutions as tree structures.
4. Differential Evolution (DE):
o Specializes in handling continuous optimization problems.
o Operates on vector-based representations and relies on mutation and
crossover.
5. Particle Swarm Optimization (PSO):
o Inspired by social behavior of animals like birds and fish.
o Solutions (particles) adjust their positions based on individual and
collective experiences.
6. Memetic Algorithms:
o Combine evolutionary algorithms with local search techniques to refine
solutions.
Applications of Evolutionary Algorithms in Soft Computing
1. Optimization Problems:
o Scheduling (e.g., job-shop scheduling, task allocation).
o Network design (e.g., routing, topology optimization).
o Machine learning (e.g., hyperparameter tuning, feature selection).
2. Engineering Design:
o Structural optimization.
o Control systems design.
3. Robotics:
o Path planning.
o Evolution of autonomous behaviors.
4. Data Mining and Machine Learning:
o Rule extraction.
o Model optimization.
5. Game Development:
o Strategy optimization.
o Procedural content generation.
19
6. Bioinformatics:
o Protein structure prediction.
o Gene sequencing.
Advantages of Evolutionary Algorithms
Flexibility: Can handle a wide variety of problem types.
Global Optimization: Capable of avoiding local minima.
Scalability: Applicable to high-dimensional and complex problems.
Robustness: Performs well under noisy or uncertain environments.
Challenges in Evolutionary Algorithms
Computational Cost: May require significant computational resources for
large-scale problems.
Parameter Tuning: Performance depends on setting appropriate parameters
like population size, mutation rate, etc.
Premature Convergence: Risk of converging to suboptimal solutions.
Representation Dependency: Effectiveness can depend heavily on the
representation of solutions.
20