0% found this document useful (0 votes)
25 views

Unit 2 Data Analytics (1)

Data analysis is the process of inspecting and modeling data to extract useful information and support decision-making, involving steps like data collection, cleaning, exploration, modeling, visualization, and decision-making. Regression modeling, including linear and logistic regression, examines relationships between variables to make predictions, while multivariate analysis explores multiple variables simultaneously. Bayesian modeling and networks utilize prior information and probabilistic reasoning for predictions, and techniques like support vector machines and time series analysis help in classification and forecasting.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Unit 2 Data Analytics (1)

Data analysis is the process of inspecting and modeling data to extract useful information and support decision-making, involving steps like data collection, cleaning, exploration, modeling, visualization, and decision-making. Regression modeling, including linear and logistic regression, examines relationships between variables to make predictions, while multivariate analysis explores multiple variables simultaneously. Bayesian modeling and networks utilize prior information and probabilistic reasoning for predictions, and techniques like support vector machines and time series analysis help in classification and forecasting.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Unit-II: Data Analysis

Data analysis refers to the process of inspecting, cleaning, transforming, and modeling
data with the goal of discovering useful information, drawing conclusions, and
supporting decision- making. It is a critical component of many fields, including
business, finance, healthcare, engineering, and the social sciences. The data analysis
process typically involves the following steps:
1. Data collection: This step involves gathering data from various sources, such as
databases, surveys, sensors, and social media.
2. Data cleaning: This step involves removing errors, inconsistencies, and outliers
from the data. It may also involve imputing missing values, transforming
variables, and normalizing the data.
3. Data exploration: This step involves visualizing and summarizing the data to
gain insights and identify patterns. This may include statistical analyses, such as
descriptive statistics, correlation analysis, and hypothesis testing.
4. Data modeling: This step involves developing mathematical models to predict
or explain the behavior of the data. This may include regression analysis, time
series analysis, machine learning, and other techniques.
5. Data visualization: This step involves creating visual representations of the data
to communicate insights and findings to stakeholders. This may include charts,
graphs, tables, and other visualizations.
6. Decision-making: This step involves using the results of the data analysis to
make informed decisions, develop strategies, and take actions.
Data analysis is a complex and iterative process that requires expertise in statistics,
programming, and domain knowledge. It is often performed using specialized
software, such as R, Python, SAS, and Excel, as well as cloud-based platforms, such as
Amazon Web Services and Google Cloud Platform. Effective data analysis can lead to
better business outcomes, improved healthcare outcomes, and a deeper understanding
of complex phenomena.
Regression Modeling
Regression modeling is a statistical technique used to examine the relationship
between a dependent variable (also called the outcome or response variable) and one
or more independent variables (also called predictors or explanatory variables). The
goal of regression modeling is to identify the nature and strength of the relationship
between the dependent variable and the independent variable(s) and to use this
information to make predictions about the dependent variable.
There are many different types of regression models, including linear regression,
logistic regression, polynomial regression, and multivariate regression. Linear
regression is one of the most commonly used types of regression modeling, and it
assumes that the relationship between the dependent variable and the independent
variable(s) is linear.
Regression modeling is used in a wide range of fields, including economics, finance,
psychology, and epidemiology, among others. It is often used to understand the
relationships between different factors and to make predictions about future outcomes.
Regression
Simple Linear Regression Linear Regression
In statistics, linear regression is a linear approach to modeling the relationship between
a scalar response (or dependent variable) and one or more explanatory variables (or
independent variables). The case of one explanatory variable is called simple linear
regression.
 Linear regression is used to predict the continuous dependent variable using a
given set of independent variables.
 Linear Regression is used for solving Regression problem.
 In Linear regression, value of continuous variables is predicted.
 Linear regression tried to find the best fit line, through which the output can be
easily predicted.
 Least square estimation method is used for estimation of accuracy.
 The output for Linear Regression must be a continuous value, such as price,
age, etc.
 In Linear regression, it is required that relationship between dependent variable
and independent variable must be linear.
 In linear regression, there may be collinearity5 between the independent
variables.
Some Regression examples:
 Regression analysis is used in stats to find trends in data. For example, you might
guess that there is a connection between how much you eat and how much you weigh;
regression analysis can help you quantify that.
 Regression analysis will provide you with an equation for a graph so that you can
make predictions about your data. For example, if you’ve been putting on weight over
last few years, it can predict how much you’ll weigh in ten years time if you continue
to put on weight at the same rate.
 It is also called simple linear regression. It establishes the relationship between two
variables using a straight line. If two or more explanatory variables have a linear
relationship with the dependent variable, the regression is called a multiple linear
regression.

Logistic Regression
Logistic Regression use to resolve classification problems where given an element you
have to classify the same in N categories. Typical examples are for example given a
mail to classify it as spam or not, or given a vehicle find to which category it belongs
(car, truck, van, etc.). That’s basically the output is a finite set of discrete values
 Logistic Regression is used to predict the categorical dependent variable using a
given set of independent variables.
 Logistic regression is used for solving Classification problems.
 In logistic Regression, we predict the values of categorical variables.
 In Logistic Regression, we find the S-curve by which we can classify the
samples.
 Maximum likelihood estimation method is used for estimation of accuracy.
 The output of Logistic Regression must be a Categorical value such as 0 or 1,
Yes or No, etc.
 In Logistic regression, it is not required to have the linear relationship between
the dependent and independent variable.
 In logistic regression, there should not be collinearity between the independent
variable.

Multivariate Analysis

Multivariate analysis is a statistical technique used to examine the relationships


between multiple variables simultaneously. It is used when there are multiple
dependent variables and/or independent variables that are interrelated. Multivariate
analysis is used in a wide range of fields, including social sciences, marketing,
biology, and finance, among others. There are many different types of multivariate
analysis, including multivariate regression, principal component analysis, factor
analysis, cluster analysis, and discriminant analysis.
Multivariate similar to linear regression, but it involves more than one independent
variable. It is used to predict the value of a dependent variable based on two or more
independent variables. Principal component analysis (PCA) is a technique used to
reduce the dimensionality of data by identifying patterns and relationships between
variables. Factor analysis is a technique used to identify underlying factors that explain
the correlations between multiple variables. Cluster analysis is a technique used to
group objects or individuals into clusters based on similarities or dissimilarities.
Discriminant analysis is a technique used to determine which variables discriminate
between two or more groups.
Overall, multivariate analysis is a powerful tool for examining complex relationships
between multiple variables, and it can help researchers and analysts gain a deeper
understanding of the data they are working with.
Bayesian Modeling
Bayesian modeling is a statistical modeling approach that uses Bayesian inference to
make predictions and estimate parameters. It is named after Thomas Bayes, an 18th-
century statistician who developed the Bayes theorem, which is a key component of
Bayesian modeling.
In Bayesian modeling, prior information about the parameters of interest is combined
with data to produce a posterior distribution. This posterior distribution represents the
updated probability distribution of the parameters given the data and the prior
information. The posterior distribution is used to make inferences and predictions
about the parameters.
Bayesian modeling is particularly useful when there is limited data or when the data is
noisy or uncertain. It allows for the incorporation of prior knowledge and beliefs into
the modeling process, which can improve the accuracy and precision of predictions.
Bayesian modeling is used in a wide range of fields, including finance, engineering,
ecology, and social sciences. Some examples of Bayesian modeling applications
include predicting stock prices, estimating the prevalence of a disease in a population,
and analyzing the effects of environmental factors on a species.
Bayes Theorem
Goal — To determine the most probable hypothesis, given the data D plus any initial
knowledge about the prior probabilities of the various hypotheses in H.
Prior probability of h, P (h) — it reflects any background knowledge we have about
the chance that h is a correct hypothesis (before having observed the data).
Prior probability of D, P (D) — it reflects the probability that training data D will be
observed given no knowledge about which hypothesis h holds.
Conditional Probability of observation D, P (D|h) — it denotes the probability of
observing data D given some world in which hypothesis h holds.
Posterior probability of h, P (h|D) — it represents the probability that h holds given the
observed training data D. It reflects our confidence that h holds after we have seen the
training data D and it is the quantity that Machine Learning researchers are interested
in.
Bayes Theorem allows us to compute P (h|D)
P (h|D) = P (D|h) P(h)/P(D)
Maximum A Posteriori (MAP) Hypothesis and Maximum Likelihood.
Goal — To find the most probable hypothesis h from a set of candidate hypotheses H
given the observed data D. MAP Hypothesis,

hMAP = ∈ 𝑝( )

= ∈ 𝑝 𝑝(ℎ)/𝑝(𝐷)

= ∈ 𝑝 𝑝(ℎ))

If every hypothesis in H is equally probable a priori, we only need to consider the


likelihood of the data D given h, P (D|h). Then, hMAP becomes the Maximum
Likelihood,

hML = ∈ 𝑝 𝑝(ℎ))

Overall, Bayesian modeling is a powerful tool for making predictions and estimating
parameters in situations where there is uncertainty and prior information is available.

Inference and Bayesian networks


Inference in Bayesian networks is the process of using probabilistic reasoning to make
predictions or draw conclusions about a system or phenomenon. Bayesian networks
are graphical models that represent the relationships between variables using a directed
acyclic graph, where nodes represent variables and edges represent probabilistic
dependencies between the variables.
Inference in Bayesian networks involves calculating the posterior probability
distribution of one or more variables given evidence about other variables in the
network. This can be done using Bayesian inference, which involves updating the prior
probability distribution of the variables using Bayes’ theorem and the observed
evidence.
The posterior distribution can be used to make predictions or draw conclusions about
the system or phenomenon being modeled. For example, in a medical diagnosis
system, the posterior probability of a particular disease given a set of symptoms can be
calculated using a Bayesian network. This can help clinicians make a more accurate
diagnosis and choose appropriate treatments.
Bayesian networks and inference are widely used in many fields, including artificial
intelligence, decision making, finance, and engineering. They are particularly useful in
situations where there is uncertainty and probabilistic relationships between variables
need to be modeled and analyzed.
BAYESIAN NETWORKS
Abbreviation: BBN (Bayesian Belief Network)
Synonyms: Bayes (ian) network, Bayes(ian) model, Belief network, Decision network,
or probabilistic directed acyclic graphical model.
A BBN is a probabilistic graphical model that represents a set of variables and their
conditional dependencies via a Directed Acyclic Graph (DAG).

 BBNs enable us to model and reason about uncertainty. BBNs accommodate


both subjective probabilities and probabilities based on objective data.
 The most important use of BBNs is in revising probabilities in the light of actual
observations of events.
 Nodes represent variables in the Bayesian sense: observable quantities, hidden
variables or hypotheses. Edges represent conditional dependencies.
 Each node is associated with a probability function that takes, as input, a
particularr set of probabilities for values for the node’s parent variables, and
outputs the probability of the values of the variable represented by the node.
 Prior Probabilities: e.g. P(RAIN)
 Conditional Probabilities: e.g. P(SPRINKLER | RAIN)
 Joint Probability Function:
P(GRASS WET, SPRINKLER, RAIN) = P(GRASS WET | RAIN,
SPRINKLER) * P(SPRINKLER | RAIN) * P ( RAIN)

 Typically the probability functions are described in table form.

 BN cannot be used to model the correlation relationships between random


variables.
Overall, inference Bayesian networks are a powerful tool for making predictions and
drawing conclusions in situations where there is uncertainty and complex probabilistic
relationships between variables.
Support Vector and Kernel Methods

Support vector machines (SVMs) and kernel methods are commonly used in machine
learning and pattern recognition to solve classification and regression problems.

SVMs are a type of supervised learning algorithm that aims to find the optimal
hyperplane that separates the data into different classes. The optimal hyperplane is the
one that maximizes the margin, or the distance between the hyperplane and the closest
data points from each class. SVMs can also use kernel functions to transform the
original input data into a higher dimensional space, where it may be easier to find a
separating hyperplane.

Kernel methods are a class of algorithms that use kernel functions to compute the
similarity between pairs of data points. Kernel functions can transform the input data
into a higher dimensional feature space, where linear methods can be applied more
effectively. Some commonly used kernel functions include linear, polynomial, and
radial basis functions.

Kernel methods are used in a variety of applications, including image recognition,


speech recognition, and natural language processing.They are particularly useful in
situations where the data is non-linear and the relationship between variables is
complex.
History of SVM
 SVM is related to statistical learning theory.
 SVM was first introduced in 1992.
 SVM becomes popular because of its success in handwritten digit recognition
1.1% test error rate for SVM. This is the same as the error rates of a carefully
constructed neural network.
 SVM is now regarded as an important example of “kernel methods”, one of the
key area in machine learning
Overall, SVMs and kernel methods are powerful tools for solving classification and
regression problems. They can handle complex data and provide accurate predictions,
making them valuable in many fields, including finance, healthcare, and engineering.
Analysis of Time Series: Linear System Analysis and Non-Linear Dynamics
Time Series analysis is a statistical technique used to analyze time-dependent data. It
involves studying the patterns and trends in the data over time and making predictions
about future values.
Linear systems analysis is a technique used in time series analysis to model the
behavior of a system using linear equations. Linear models assume that the
relationship between variables is linear and that the system is time-invariant, meaning
that the relationship between variables does not change over time. Linear systems
analysis involves techniques such as autoregressive (AR) and moving average (MA)
models, which use past values of a variable to predict future values.
Nonlinear dynamics is another approach to time series analysis that considers systems
that are not described by linear equations. Nonlinear systems are often more complex
and can exhibit chaotic behavior, making them more difficult to model and predict.
Nonlinear dynamics involves techniques such as chaos theory and fractal analysis,
which use mathematical concepts to describe the behavior of nonlinear systems.
Both linear systems analysis and nonlinear dynamics have applications in a wide range
of fields, including finance, economics, and engineering. Linear models are often used
in situations where the data is relatively simple and the relationship between variables
is well understood. Nonlinear dynamics is often used in situations where the data is
more complex and the relationship between variables is not well understood.
There are several components of time series analysis, including:
1. Trend Analysis: Trend analysis is used to identify the long-term patterns and trends
in the data. It can be a linear or non-linear trend and may show an upward, downward
or flat trend.
2. Seasonal Analysis: Seasonal analysis is used to identify the recurring patterns in the
data that occur within a fixed time period, such as a week, month, or year.
3. Cyclical Analysis: Cyclical analysis is used to identify the patterns that are not
necessarily regular or fixed in duration, but do show a tendency to repeat over time,
such as economic cycles or business cycles.
4. Irregular Analysis: Irregular analysis is used to identify any random fluctuations or
noise in the data that cannot be attributed to any of the above components.
5. Forecasting: Forecasting is the process of predicting future values of a time series
based on its past behavior. It can be done using various statistical techniques such as
moving averages, exponential smoothing, and regression analysis.
Overall, time series analysis is a powerful tool for studying time-dependent data and
making predictions about future values. Linear systems analysis and nonlinear
dynamics are two approaches to time series analysis that can be used in different
situations to model and predict complex systems.
Rule Induction
Rule induction is a machine learning technique used to identify patterns in data and
create a set of rules that can be used to make predictions or decisions about new data.
It is often used in decision tree algorithms and can be applied to both classification and
regression problems.
The rule induction process involves analyzing the data to identify common patterns
and relationships between the variables. These patterns are used to create a set of rules
that can be used to classify or predict new data. The rules are typically in the form of
”if-then” statements, where the ”if” part specifies the conditions under which the rule
applies and the ”then” part specifies the action or prediction to be taken.
Rule induction algorithms can be divided into two main types: top-down and bottom-
up. Top-down algorithms start with a general rule that applies to the entire dataset and
then refine the rule based on the data and bottom up algorithms start with individual
data points and then group them together based on common attributes. Rule induction
has many applications in fields such as finance, healthcare, and marketing. For
example, it can be used to identify patterns in financial data to predict stock prices or
to analyze medical data to identify risk factors for certain diseases.
Overall, rule induction is a powerful machine learning technique that can be used to
identify patterns in data and create rules that can be used to make predictions or
decisions. It is a useful tool for solving classification and regression problems and has
many applications in various fields.
Neural Networks: Learning and Generalization
Neural networks are a class of machine learning algorithms that are inspired by the
structure and function of the human brain. They are used to learn complex patterns and
relationships in data and can be used for a variety of tasks, including classification,
regression, and clustering.
Learning in neural networks refers to the process of adjusting the weights and biases of
the network to improve its performance on a particular task. This is typically done
through a process called backpropagation, which involves propagating the errors from
the output layer back through the network and adjusting the weights and biases
accordingly.
Generalization in neural networks refers to the ability of the network to perform well
on new, unseen data. A network that has good generalization performance is able to
accurately predict the outputs for new inputs that were not included in the training set.
Generalization performance is typically evaluated using a separate validation set or by
cross-validation.
Overfitting is a common problem in neural networks, where the network becomes too
complex and starts to fit the noise in the training data, rather than the underlying
patterns. This can result in poor generalization performance on new data. Techniques
such as regularization, early stopping, and dropout are often used to prevent overfitting
and improve generalization performance.
Overall, learning and generalization are two important concepts in neural networks.
Learning involves adjusting the weights and biases of the network to improve its
performance, while generalization refers to the ability of the network to perform well
on new, unseen data. Effective techniques for learning and generalization are critical
for building accurate and useful neural network model.
Neural Network Concepts
The concept of neurons is the fundamental constituents of the brain. Brain contains
about 1010 basic units called neurons. Each unit in turn, is connected to 104 other
neurons. A neuron is small cell that receives electro-chemical signal from its various
sources and in term respond by transmitting electrical impulses to the other neurons.
An average brain weighs about 1.5 kg and an average neuron have weight about 1.5
×10-9 gms. Some of neurons of neurons perform input output operations referred
afferent and efferent cells respectively. Remaining neurons are part of interconnected
networks responsible for information storage and signal transmission.
Structure of Human Brain
 A neuron consists
 Soma -A neuron is composed of a nucleus –a cell body called soma.
 Dendrites-there is many filaments attached to soma are called dendrites.
These are the input channels means inputs from other neurons arrive
through dendrites.
 Axon- Another type of links attached to soma is called axons. Axons are
electrically active and serve as output channel.
 Synapse- Junction point of axon and dendrite is called synapse.

Working of Biological NN

If, the cumulative inputs received by soma raise the internal electrical potential of
the cell, known as cell membrane potential then neuron ‘fires’ by propagating the
action potential down the axon to excite other neurons. Axon terminates in
specialized contact called synapse. The synapse is a minute gap at the end of
dendrite link contains a neuro-transmitted fluid. It is responsible for accelerating or
retardating the electrical charges to the soma. In general a single neuron can have
many synaptic inputs and synaptic outputs. The size of synapses is believed to be
related to learning. The synapses with larger area are excitatory while those with
smaller area are inhibitory.
Artificial Neuron and Its Model
Artificial neural network (ANN) is an efficient information processing system
which resembles in characteristics with biological network. ANN possesses large
number of highly interconnected processing elements called nodes or units or
neurons which usually operate in parallel and are configured in regular
architectures.
Neurons are connected to other by a connection links. Each connection link is
associated with weights. This link contains information about input signal. This
information is used by neuron to solve a particular problem.
ANN’s collective behavior characterized by their ability to learn, recall and
generalize pattern or data similar to that of human brain thereby capability to model
network of original neurons as found in brain.
To understand basic operation of a neural net, consider a neural net as shown in
figure on next slide .
Neurons X1 and X2, transmitting signal to neuron Y (output neuron). Inputs are
connected to output neuron Y over interconnection links (W1 and W2).

Here, x1 ,x2 are inputs and w1, w2 are weights attached to input links.
Thus, weights here are multiplicative factors of inputs to account for the strength for
synapse.
Net Input yin= x1w1+x2w2
To generate the final output y, the sum is passed to activation function (f) or transfer
function of squash function which releases the output. Hence
y
= f(yin)

Here to obtain y slope is directly multiplied to input signal. It can be represented


graphically as
ANN Architectures

Competitive Learning
Competitive learning is a type of machine learning technique in which a set of neurons
compete to be activated by input data. The neurons are organized into a layer, and each
neuron receives the same input data. However, only one neuron is activated, and the
competition is based on a set of rules that determine which neuron is activated.
The competition competitive learning is typically based on a measure of similarity
between the input data and the weights of each neuron. The neuron with the highest
similarity to the input data is activated, and the weights of that neuron are updated to
become more similar to the input data. This process is repeated for multiple iterations,
and over time, the neurons learn to become specialized in recognizing different types
of input data.
Competitive learning is often used for unsupervised learning tasks, such as clustering
or feature extraction. In clustering, the neurons learn to group similar input data into
clusters, while in feature extraction, the neurons learn to recognize specific features in
the input data.
One of the advantages of competitive learning is that it can be used to discover hidden
structures and patterns in data without the need for labeled data. This makes it
particularly useful for applications such as image and speech recognition, where
labeled data can be difficult and expensive to obtain.
Overall, competitive learning is a powerful machine learning technique that can be
used for a variety of unsupervised learning tasks. It involves a set of neurons that
compete to be activated by input data, and over time, the neurons learn to become
specialized in recognizing different types of input data.
Principal Component Analysis and Neural Networks

Principal component analysis (PCA) and neural networks are both machine learning
techniques that can be used for a variety of tasks, including data compression, feature
extraction, and dimensionality reduction.
PCA is a linear technique that involves finding the principal components of a dataset,
which are the directions of greatest variance. The principal components can be used to
reduce the dimensionality of the data, while preserving as much of the original
variance as possible.
Neural networks, on the other hand, are nonlinear techniques that involve multiple
layers of interconnected neurons. Neural networks can be used for a variety of tasks,
including classification, regression, and clustering. They can also be used for feature
extraction, where the network learns to identify the most important features of the
input data.
PCA and neural networks can be used together for a variety of tasks. For example,
PCA can be used to reduce the dimensionality of the data before feeding it into a
neural network. This can help to improve the performance of the network by reducing
the amount of noise and irrelevant information in the input data. Neural networks can
also be used to improve the performance of PCA. In some cases, PCA can be limited
by its linear nature, and may not be able to capture complex nonlinear relationships in
the data. By combining PCA with a neural network, the network can learn to capture
these nonlinear relationships and improve the accuracy of the PCA results.
Overall, PCA and neural networks are both powerful machine learning techniques that
can be used for a variety of tasks. When used together, they can improve the
performance and accuracy of each technique and help to solve more complex
problems.
Dimension Reduction
Dimension Reduction-
In pattern recognition, Dimension Reduction is defined as
 It is a process of converting a data set having vast dimensions into a data set
with lesser dimensions.
 It ensures that the converted data set conveys similar information concisely.

Example- Consider the following example-


 The following graph shows two dimensions x1 and x2.
 x1 represents the measurement of several objects in cm.
 x2 represents the measurement of several objects in inches.
In machine learning,
 Using both these dimensions convey similar information.
 Also, they introduce a lot of noise in the system.
 So, it is better to use just one dimension.
Using dimension reduction techniques-
 We convert the dimensions of data from 2 dimensions (x1 and x2) to 1
dimension (z1).
 It makes the data relatively easier to explain.

Benefits-
Dimension reduction offers several benefits such as
 It compresses the data and thus reduces the storage space requirements.
 It reduces the time required for computation since less dimensions require less
computation.
 It eliminates the redundant features.
 It improves the model performance.
Dimension Reduction Techniques-
The two popular and well-known dimension reduction techniques are-
1. Principal Component Analysis (PCA)
2. Fisher Linear Discriminant Analysis (LDA)
In this article, we will discuss about Principal Component Analysis.
Principal Component Analysis-

Principal Component Analysis is a well-known dimension reduction technique.


 It transforms the variables into a new set of variables called as principal
components.
 These principal components are linear combination of original variables and are
orthogonal.
 The first principle component accounts for most of the possible variation of
original data.
 The second principal component does its best to capture the variance in the data.
 There can be only two principal components for a two-dimensional data set.
PCA Algorithm-

The steps involved in PCA Algorithm are as follows-


 Step-01: Get data.
 Step-02: Compute the mean vector (µ).
 Step-03: Subtract mean from the given data.
 Step-04: Calculate the covariance matrix.
 Step-05: Calculate the eigen vectors and eigen values of the covariance matrix.
 Step-06: Choosing components and forming a feature vector.
 Step-07: Deriving the new data set.

Problem: Given data {2, 3, 4, 5, 6, 7: 1, 5, 3, 6, 7, 8} compute principal components


using PCA algorithm.
Step-01: Get Data,
The given feature vectors are
 x1= (2, 1)
 x2= (3, 5)
 x3= (4, 3)
 x4= (5, 6)

 x5= (6, 7)
 x6= (7,8)
Point vector is

Step-02: Calculate the mean vector (µ)


Mean vector (µ) = ((2+3+4+5+6+7)/6, (1+5+3+6+7+8)/6)
= (4.5, 5)
4.5
Mean Vector (µ)=
5
Step-03: Subtract mean vector (µ) from the feature vector

 x1- µ = (2-4.5, 1-5) =(-2.5, -4)


 x2- µ = (3-4.5, 5-5) = (-1.5, 0)
 x3- µ = (4-4.5, 3-5) = (-0.5, -2)
 x4- µ = (5-4.5, 6-5)=(0.5, 1)
 x5- µ = (6-4.5, 7-5) =(1.5, 2)
 x6- µ = (7-4.5,8-5) =(2.5, 3)

Step-04: Calculate Co-varience Matrix

∑( µ)( µ)
Covarience Matrix ==
−2.5 ⌈ 6.25 10
𝑚 = (𝑥 − µ)(𝑥 − µ) = −2.5 −4⌉=
−4 10 16
−1.5 2.25 0
𝑚 = (𝑥 − µ)(𝑥 − µ) = ⌈−1.5 0⌉=
0 0 0

−0.5 ⌈ 0.25 1
𝑚 = (𝑥 − µ)(𝑥 − µ) = −0.5 −2⌉=
−2 1 4
0.5 ⌈ 0.25 0.5
𝑚 = (𝑥 − µ)(𝑥 − µ) = 0.5 1⌉=
1 0.5 1

1.5 ⌈ 2.25 3
𝑚 = (𝑥 − µ)(𝑥 − µ) = 1.5 2⌉=
2 3 4
2.5 ⌈ 6.25 7.5
𝑚 = (𝑥 − µ)(𝑥 − µ) = 2.5 3⌉=
3 7.5 9

= (m1 + m2 + m3 + m4 + m5 + m6) / 6
On adding the above matrices and dividing by 6, we get-

1 17.5 22
𝐶𝑜𝑣𝑎𝑟𝑖𝑒𝑛𝑐𝑒 𝑀𝑎𝑡𝑟𝑖𝑥 =
6 22 34
2.92 3.67
𝐶𝑜𝑣𝑎𝑟𝑖𝑒𝑛𝑐𝑒 𝑀𝑎𝑡𝑟𝑖𝑥 =
3.67 5.67

Step-05: Calculate the Eigen values and Eigen vectors of the covariance matrix. λ is an
Eigen value for a matrix M if it is a solution of the characteristic equation |M – λI| = 0.
So, we have-

From here,
(2.92 – λ)(5.67 – λ) – (3.67 x 3.67)
= 0 16.56 – 2.92λ – 5.67λ + λ2 – 13.47
= 0 λ2 – 8.59λ + 3.09 = 0
Solving this quadratic equation, we get λ = 8.22, 0.38
Thus, two Eigen values are λ1 = 8.22 and λ2 = 0.38.
Clearly, the second Eigen value is very small compared to the first Eigen value.
So, the second Eigen vector can be left out.
Eigen vector corresponding to the greatest Eigen value is the principal component for
the given data set.
So we find the Eigen vector corresponding to Eigen value λ1.
We use the following equation to find the Eigen vector- MX = λX
Where-
M = Covariance Matrix
X = Eigen vector
λ = Eigen value
Substituting the values in the above equation, we get-

Solving these, we get-


2.92X1 + 3.67X2
= 8.22X1 3.67X1 + 5.67X2
= 8.22X2
On simplification, we get- 5.3X1 = 3.67X2 ……… (1)
3.67X1 = 2.55X2 ……… (2)
From (1) and (2),
X1 = 0.69X2
From (2), the Eigen vector is39
Fuzzy Logic:: Extracting Fuzzy Models from Data

Fuzzy logic is a type of logic that allows for degrees of truth, rather than just true or
false values. It is often used in machine learning to extract fuzzy models from data.

A fuzzy model is a model that uses fuzzy logic to make predictions or decisions based
on uncertain or incomplete data. Fuzzy models are particularly useful in situations
where traditional models may not work well, such as when the data is noisy or when
there
re is a lot of uncertainty or ambiguity in the data.

To extract a fuzzy model from data, the first step is to define the input and output
variables of the model. The input variables are the features or attributes of the data,
while the output variable is the target variable that we want to predict or classify.
Next, we use fuzzy logic to define the membership functions for each input and output
variable. The membership functions describe the degree of membership of each data
point to each category or class.
ss. For example, a data point may have a high degree of
membership to the category ”low”, but a low degree of membership to the category
”high”.
Once the membership functions have been defined, we can use fuzzy inference to
make predictions or decisions based on the input data. Fuzzy inference involves using
the membership functions to determine the degree of membership of each data point to
each category or class, and then combining these degrees of membership to make a
prediction or decision.

Overall, extracting fuzzy models from data involves using fuzzy logic to define the
membership functions for each input and output variable, and then using fuzzy
inference to make predictions or decisions based on the input data. Fuzzy models are
particularly useful in situations where traditional models may not work well, and can
help to improve the accuracy and robustness of machine learning models.
Fuzzy Decision Trees

Fuzzy decision trees are a type of decision tree that use fuzzy logic to make decisions
based on uncertain or imprecise data. Decision trees are a type of supervised learning
technique that involves recursively partitioning the input space into regions that
correspond to different classes or categories.

Fuzzy decision trees extend traditional decision trees by allowing for degrees of
membership to each category or class, rather than just a binary classification. This is
particularly useful in situations where the data is uncertain or imprecise, and where a
single, crisp classification may not be appropriate.

To build a fuzzy decision tree, we start with a set of training data that consists of input-
output pairs. We then use fuzzy logic to determine the degree of membership of each
data point to each category or class. This is done by defining the membership functions
for each input and output variable, and using these to compute the degree of
membership of each data point to each category or class.
Next, we use the fuzzy membership values to construct a fuzzy decision tree. The tree
consists of a set of nodes and edges, where each node represents a test on one of the
input variables, and each edge represents a decision based on the result of the test. The
degree of membership of each data point to each category or class is used to determine
the probability of reaching each leaf node of the tree.

Fuzzy decision trees can be used for a variety of tasks, including classification,
regression, and clustering. They are particularly useful in situations where the data is
uncertain or imprecise, and where traditional decision trees may not work well.

Overall, fuzzy decision trees are a powerful machine learning technique that can be
used to make decisions based on uncertain or imprecise data. They extend traditional
decision trees by allowing for degrees of membership to each category or class, and
can help to improve the accuracy and robustness of machine learning models.
Stochastic Search Methods
Stochastic search methods are a class of optimization algorithms that use probabilistic
techniques to search for the optimal solution in a large search space. These methods
are commonly used in machine learning to find the best set of parameters for a model,
such as the weights in a neural network or the parameters in a regression model.
Stochastic search methods are often used when the search space is too large to
exhaustively search all possible solutions, or when the objective function is highly
nonlinear and has many local optima. The basic idea behind these methods is to
explore the search space by randomly sampling solutions and using probabilistic
techniques to move towards better solutions.
One common stochastic search method is called the stochastic gradient descent (SGD)
algorithm. In this method, the objective function is optimized by iteratively updating
the parameters in the directionof the negative gradient of the objective function. The
update rule includes a learning rate, which controls the step size and the direction of
the update. SGD is widely used in training neural networks and other deep learning
models.
Another stochastic search method is called simulated annealing. This method is based
on the physical process of annealing, which involves heating and cooling a material to
improve its properties. In simulated annealing, the search process starts with a high
temperature and gradually cools down over time. At each iteration, the algorithm
randomly selects a new solution and computes its fitness. If the new solution is better
than the current solution, it is accepted. However, if the new solution is worse, it may
still be accepted with a certain probability that decreases as the temperature decreases.
Other stochastic search methods include evolutionary algorithms, such as genetic
algorithms and particle swarm optimization, which mimic the process of natural
selection and evolution to search for the optimal solution.
Overall, stochastic search methods are powerful optimization techniques that are
widely used in machine learning and other fields. These methods allow us to
efficiently search large search spaces and find optimal solutions in the presence of
noise, uncertainty, and nonlinearity.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy