0% found this document useful (0 votes)

19 views21 pages

Mutivariate and Baysian

Uploaded by

aryan.22scse1280043

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views21 pages

Mutivariate and Baysian

Uploaded by

aryan.22scse1280043

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

School of Computing Science and Engineering

Course Code : …………….. Course Name: Data Analytics

UNIT 2
MULTIVARIATE ANALYSIS
AND
BAYESIAN MODELING

Name of the Faculty: Ms.KIRTI Program Name: BTech

Multivariate Analysis
There are many different techniques for multivariate analysis, and they can be divided into two categories:
• Dependence techniques
• Interdependence techniques
Multivariate analysis techniques: Dependence vs. interdependence
When we use the terms “dependence” and “interdependence,” we’re referring to different types of
relationships within the data. To give a brief explanation:
• Dependence methods
• Dependence methods are used when one or some of the variables are dependent on others. Dependence
looks at cause and effect;
• In other words, can the values of two or more independent variables be used to explain, describe, or
predict the value of another, dependent variable? To give a simple example, the dependent variable of
“weight” might be predicted by independent variables such as “height” and “age.”
• In machine learning, dependence techniques are used to build predictive models. The analyst enters
input data into the model, specifying which variables are independent and which ones are dependent—
in other words, which variables they want the model to predict, and which variables they want the
model to use to make those predictions.
Cont…..
Interdependence methods
• Interdependence methods are used to understand the structural makeup
and underlying patterns within a dataset. In this case, no variables are
dependent on others, so you’re not looking for causal relationships.
Rather, interdependence methods seek to give meaning to a set of
variables or to group them together in meaningful ways.
• So: One is about the effect of certain variables on others, while the
other is all about the structure of the dataset.
Cont……..
Some useful multivariate analysis techniques are:
• Multiple linear regression
• Multiple logistic regression
• Multivariate analysis of variance (MANOVA)
• Factor analysis
• Cluster analysis
Multiple linear regression

Multiple linear regression is a dependence method which looks at the relationship between
one dependent variable and two or more independent variables. A multiple regression
model will tell you the extent to which each independent variable has a linear relationship
with the dependent variable. This is useful as it helps you to understand which factors are
likely to influence a certain outcome, allowing you to estimate future outcomes.
Example of multiple regression:
As a data analyst, you could use multiple regression to predict crop growth. In this
example, crop growth is your dependent variable and you want to see how different factors
affect it. Your independent variables could be rainfall, temperature, amount of sunlight, and
amount of fertilizer added to the soil. A multiple regression model would show you the
proportion of variance in crop growth that each independent variable accounts for.
Multiple logistic regression

Logistic regression analysis is used to calculate (and predict) the probability

of a binary event occurring. A binary outcome is one where there are only two
possible outcomes; either the event occurs (1) or it doesn’t (0). So, based on a
set of independent variables, logistic regression can predict how likely it is
that a certain scenario will arise. It is also used for classification.
Example of logistic regression:
Let’s imagine you work as an analyst within the insurance sector and you
need to predict how likely it is that each potential customer will make a claim.
You might enter a range of independent variables into your model, such as
age, whether or not they have a serious health condition, their occupation, and
so on. Using these variables, a logistic regression analysis will calculate the
probability of the event (making a claim) occurring. Another cited example is
the filters used to classify email as “spam” or “not spam.”
Multivariate analysis of variance (MANOVA)

• Multivariate analysis of variance (MANOVA) is used to measure the effect

of multiple independent variables on two or more dependent variables.
With MANOVA, it’s important to note that the independent variables are
categorical, while the dependent variables are metric in nature. A
categorical variable is a variable that belongs to a distinct category—for
example, the variable “employment status” could be categorized into
certain units, such as “employed full-time,” “employed part-time,”
“unemployed,” and so on. A metric variable is measured quantitatively and
takes on a numerical value.
• In MANOVA analysis, you’re looking at various combinations of the
independent variables to compare how they differ in their effects on the
dependent variable.
Example of MANOVA:

Let’s imagine you work for an engineering company that is on a mission to

build a super-fast, eco-friendly rocket. You could use MANOVA to measure the
effect that various design combinations have on both the speed of the rocket
and the amount of carbon dioxide it emits. In this scenario, your categorical
independent variables could be:
Engine type, categorized as E1, E2, or E3
Material used for the rocket exterior, categorized as M1, M2, or M3
Type of fuel used to power the rocket, categorized as F1, F2, or F3
Your metric dependent variables are speed in kilometers per hour, and carbon
dioxide measured in parts per million. Using MANOVA, you’d test different
combinations (e.g. E1, M1, and F1 vs. E1, M2, and F1, vs. E1, M3, and F1, and
so on) to calculate the effect of all the independent variables. This should help
you to find the optimal design solution for your rocket.
Factor analysis

• Factor analysis is an interdependence technique which seeks to reduce

the number of variables in a dataset. If you have too many variables, it
can be difficult to find patterns in your data. At the same time, models
created using datasets with too many variables are susceptible to
overfitting. Overfitting is a modeling error that occurs when a model
fits too closely and specifically to a certain dataset, making it less
generalizable to future datasets, and thus potentially less accurate in
the predictions it makes.
• Factor analysis works by detecting sets of variables which correlate
highly with each other. These variables may then be condensed into a
single variable. Data analysts will often carry out factor analysis to
prepare the data for subsequent analyses.
Example:

• Let’s imagine you have a dataset containing data pertaining to a

person’s income, education level, and occupation. You might find a
high degree of correlation among each of these variables, and thus
reduce them to the single factor “socioeconomic status.” You might
also have data on how happy they were with customer service, how
much they like a certain product, and how likely they are to
recommend the product to a friend. Each of these variables could be
grouped into the single factor “customer satisfaction” (as long as they
are found to correlate strongly with one another). Even though you’ve
reduced several data points to just one factor, you’re not really losing
any information—these factors adequately capture and represent the
individual variables concerned. With your “streamlined” dataset,
you’re now ready to carry out further analyses.
Cluster analysis

• Another interdependence technique, cluster analysis is used to group

similar items within a dataset into clusters.
• When grouping data into clusters, the aim is for the variables in one
cluster to be more similar to each other than they are to variables in
other clusters. This is measured in terms of intracluster and intercluster
distance. Intracluster distance looks at the distance between data
points within one cluster. This should be small. Intercluster distance
looks at the distance between data points in different clusters. This
should ideally be large. Cluster analysis helps you to understand how
data in your sample is distributed, and to find patterns.
Example:
• A prime example of cluster analysis is audience segmentation. If you
were working in marketing, you might use cluster analysis to define
different customer groups which could benefit from more targeted
campaigns. As a healthcare analyst, you might use cluster analysis to
explore whether certain lifestyle factors or geographical locations are
associated with higher or lower cases of certain illnesses. Because it’s
an interdependence technique, cluster analysis is often carried out in
the early stages of data analysis.
Bayes Theorem in Machine Learning
Introduction:
Bayes theorem is given by an English statistician, philosopher, and
Presbyterian minister named Mr. Thomas Bayes in 17th century.
Bayes provides their thoughts in decision theory which is extensively used in
important mathematics concepts as Probability.
Bayes theorem is also widely used in Machine Learning where we need to
predict classes precisely and accurately.
An important concept of Bayes theorem named Bayesian method is used to
calculate conditional probability in Machine Learning application that
includes classification tasks.
Further, a simplified version of Bayes theorem (Naïve Bayes classification)
is also used to reduce computation time and average cost of the projects.
Bayes theorem is also extensively applied in health and medical, research
Cont……….
• Bayes theorem is one of the most popular machine learning concepts that helps to calculate the probability of
occurring one event with uncertain knowledge while other one has already occurred.
• Bayes' theorem can be derived using product rule and conditional probability of event X with known event Y:
• According to the product rule we can express as the probability of event X with known event Y as follows;
P(X ? Y)= P(X|Y) P(Y) {equation 1}
• Further, the probability of event Y with known event X:
P(X ? Y)= P(Y|X) P(X) {equation 2}
• Mathematically, Bayes theorem can be expressed by combining both equations on right hand side. We will get:

P(X|Y) = P(Y|X) P(X) / P(Y)

• Here, both events X and Y are independent events which means probability of
outcome of both events does not depends one another.
• The above equation is called as Bayes Rule or Bayes Theorem.
Cont…….
• The Formula : P(X|Y) = P(Y|X) P(X) / P(Y)

• P(X|Y) is called as posterior, which we need to calculate. It is defined as

updated probability after considering the evidence.
• P(Y|X) is called the likelihood. It is the probability of evidence when
hypothesis is true.
• P(X) is called the prior probability, probability of hypothesis before
considering the evidence
• P(Y) is called marginal probability. It is defined as the probability of
evidence under any consideration.
Hence, Bayes Theorem can be written as:
Prerequisites for Bayes Theorem

While studying the Bayes theorem, we need to understand few important concepts. These are
as follows:
1. Experiment
• An experiment is defined as the planned operation carried out under controlled condition such
as tossing a coin, drawing a card and rolling a dice, etc.
2. Sample Space
• During an experiment what we get as a result is called as possible outcomes and the set of all
possible outcome of an event is known as sample space. For example, if we are rolling a dice,
sample space will be:
• S1 = {1, 2, 3, 4, 5, 6}
• Similarly, if our experiment is related to toss a coin and recording its outcomes, then sample
space will be:
• S2 = {Head, Tail}
Cont……….
3. Event
• Event is defined as subset of sample space in an experiment. Further, it is also called as set of
outcomes.
Assume in our experiment of rolling a dice, there are two event A and B such that;
• A = Event when an even number is obtained = {2, 4, 6}
• B = Event when a number is greater than 4 = {5, 6}
• Probability of the event A ''P(A)''= Number of favourable outcomes / Total number of possible
outcomes
P(E) = 3/6 =1/2 =0.5
• Similarly, Probability of the event B ''P(B)''= Number of favourable outcomes / Total number of
possible outcomes
=2/6
=1/3
=0.333
• Union of event A and B:
Cont……
Intersection of event A and B:
A∩B= {6}
Disjoint Event: If the intersection of the event A and B is an empty set or null then such events are known
as disjoint event or mutually exclusive events also.
5. Exhaustive Event: As per the name suggests, a set of events where at least one event occurs at a time,
called exhaustive event of an experiment. Thus, two events A and B are said to be exhaustive if either A or B
definitely occur at a time and both are mutually exclusive for e.g., while tossing a coin, either it will be a
Head or may be a Tail.
6. Independent Event:
• Two events are said to be independent when occurrence of one event does not affect the occurrence of
another event. In simple words we can say that the probability of outcome of both events does not depends
one another.
Mathematically, two events A and B are said to be independent if:
P(A ∩ B) = P(AB) = P(A)*P(B)
7. Conditional Probability:Conditional probability is defined as the probability of an event A, given that
another event B has already occurred (i.e. A conditional B). This is represented by P(A|B) and we can define
it as:
Cont……
8. Marginal Probability:
• Marginal probability is defined as the probability of an event A occurring
independent of any other event B. Further, it is considered as the probability of
evidence under any consideration. Here ~B represents the event that B does not
occur.
• P(A) = P(A|B)*P(B) + P(A|~B)*P(~B)
How to apply Bayes Theorem or Bayes rule in Machine Learning?

• Bayes theorem helps us to calculate the single term P(B|A) in terms of P(A|B),
P(B), and P(A). This rule is very helpful in such scenarios where we have a
good probability of P(A|B), P(B), and P(A) and need to determine the fourth
term.
• Naïve Bayes classifier is one of the simplest applications of Bayes theorem
which is used in classification algorithms to isolate data as per accuracy, speed
and classes.
• Let's understand the use of Bayes theorem in machine learning with below
example.
• Suppose, we have a vector A with I attributes. It means
• A = A1, A2, A3, A4……………Ai
• Further, we have n classes represented as C1, C2, C3, C4…………Cn.
Cont….
• These are two conditions given to us, and our classifier that works on Machine Language has to predict A and
the first thing that our classifier has to choose will be the best possible class. So, with the help of Bayes theorem,
we can write it as:
P(Ci/A)= [ P(A/Ci) * P(Ci)] / P(A)
Here;
• P(A) is the condition-independent entity.
• P(A) will remain constant throughout the class means it does not change its value with respect to change in
class. To maximize the P(Ci/A), we have to maximize the value of term P(A/Ci) * P(Ci).
• With n number classes on the probability list let's assume that the possibility of any class being the right answer
is equally likely. Considering this factor, we can say that:
P(C1)=P(C2)=P(C3)=P(C4)=…..=P(Cn).
This process helps us to reduce the computation cost as well as time. This is how Bayes theorem plays a significant
role in Machine Learning and Naïve Bayes theorem has simplified the conditional probability tasks without
affecting the precision. Hence, we can conclude that:
P(Ai/C)= P(A1/C)* P(A2/C)* P(A3/C)*……*P(An/C)
Hence, by using Bayes theorem in Machine Learning we can easily describe the possibilities of smaller events.

DLL PR 2 Week 3
86% (7)
DLL PR 2 Week 3
3 pages
1-2C-Introduction To Statistical Analysis For Industrial Engineering 2
100% (1)
1-2C-Introduction To Statistical Analysis For Industrial Engineering 2
12 pages
Analytics
No ratings yet
Analytics
38 pages
BUAD 812 Summary Notebook
No ratings yet
BUAD 812 Summary Notebook
11 pages
Majorship Area: English Focus: Language and Literature Research LET Competencies
No ratings yet
Majorship Area: English Focus: Language and Literature Research LET Competencies
13 pages
BRM chp09
No ratings yet
BRM chp09
41 pages
A 1 Anxiety and Self Confidence As Predictors of Athletic Performance
No ratings yet
A 1 Anxiety and Self Confidence As Predictors of Athletic Performance
12 pages
01 Multivariate Analysis
100% (1)
01 Multivariate Analysis
40 pages
Wisdom and StatisticsTecq-Amitava
No ratings yet
Wisdom and StatisticsTecq-Amitava
18 pages
Data Processing and Analysis: The Purpose of Analyzing Data Is
No ratings yet
Data Processing and Analysis: The Purpose of Analyzing Data Is
13 pages
Multivariate Data Analysis
No ratings yet
Multivariate Data Analysis
24 pages
The Impact of Social Media To The Academic Performance of Senior Highschool Students of Polytechnic College of Botolan Sy:2019-2020
No ratings yet
The Impact of Social Media To The Academic Performance of Senior Highschool Students of Polytechnic College of Botolan Sy:2019-2020
9 pages
Unit-Iii 3.1 Regression Modelling
100% (1)
Unit-Iii 3.1 Regression Modelling
7 pages
Basics of Multivariate Analysis (Mva)
No ratings yet
Basics of Multivariate Analysis (Mva)
13 pages
Consolidated DA
No ratings yet
Consolidated DA
41 pages
Preventive Maintenance and Fault Detection For Win PDF
No ratings yet
Preventive Maintenance and Fault Detection For Win PDF
24 pages
Da 2
No ratings yet
Da 2
31 pages
Machine Learning - Classification: CS102 Winter 2019
No ratings yet
Machine Learning - Classification: CS102 Winter 2019
36 pages
Unit 2 Da
No ratings yet
Unit 2 Da
31 pages
Multivariate Analysis
No ratings yet
Multivariate Analysis
15 pages
Simple Regression Analysis 2
No ratings yet
Simple Regression Analysis 2
4 pages
The Effect of Extracurricular Activities On School Dropout
No ratings yet
The Effect of Extracurricular Activities On School Dropout
67 pages
PGCA Convention 2010 - Vinluan
No ratings yet
PGCA Convention 2010 - Vinluan
23 pages
Multivariate Data Analysis: Overview of Methods
100% (1)
Multivariate Data Analysis: Overview of Methods
30 pages
Data Analytics: Relation Analysis
No ratings yet
Data Analytics: Relation Analysis
88 pages
Effect of Relationship Marketing On Customer Satisfaction in Banks
No ratings yet
Effect of Relationship Marketing On Customer Satisfaction in Banks
6 pages
Drug and Materials Prediction
No ratings yet
Drug and Materials Prediction
22 pages
Classification of Multivariate Techniques
No ratings yet
Classification of Multivariate Techniques
25 pages
THESIS SUMMARY - Akhsa Sinaga - 008201400007
No ratings yet
THESIS SUMMARY - Akhsa Sinaga - 008201400007
15 pages
Multivariate Analysis An Overview
No ratings yet
Multivariate Analysis An Overview
9 pages
LABB
No ratings yet
LABB
5 pages
Science: Biology
No ratings yet
Science: Biology
22 pages
Unit 5
No ratings yet
Unit 5
104 pages
Bivariate
No ratings yet
Bivariate
8 pages
Multivariate Analysis
No ratings yet
Multivariate Analysis
23 pages
1.1 Introduction To Data Analysis
No ratings yet
1.1 Introduction To Data Analysis
8 pages
Descriptive Analysis
No ratings yet
Descriptive Analysis
35 pages
Multivariate Statistics
No ratings yet
Multivariate Statistics
6 pages
خطة بحث- انكليزي
No ratings yet
خطة بحث- انكليزي
15 pages
Sipna College of Engineering & Technology, Amravati. Department of Computer Science & Engineering Session 2022-2023
No ratings yet
Sipna College of Engineering & Technology, Amravati. Department of Computer Science & Engineering Session 2022-2023
5 pages
2 - Review Article - Introduction To Multivariate Analysis
No ratings yet
2 - Review Article - Introduction To Multivariate Analysis
8 pages
Habitat and Conservation of The Enigmatic Damselfly Ischnura Pumilio
No ratings yet
Habitat and Conservation of The Enigmatic Damselfly Ischnura Pumilio
12 pages
Doc-20240330-Wa0001 240330 194806
No ratings yet
Doc-20240330-Wa0001 240330 194806
7 pages
Unit-3 Research Methods-MCA
No ratings yet
Unit-3 Research Methods-MCA
15 pages
5th Module SDS
No ratings yet
5th Module SDS
13 pages
Nonparametric Correlations: Nonpar Corr /VARIABLES Kelompok Terapi /print Spearman Twotail Nosig /missing Pairwise
No ratings yet
Nonparametric Correlations: Nonpar Corr /VARIABLES Kelompok Terapi /print Spearman Twotail Nosig /missing Pairwise
7 pages
Business Management Student Research Workbook
No ratings yet
Business Management Student Research Workbook
78 pages
Unit 2
No ratings yet
Unit 2
34 pages
Tutorial 1
No ratings yet
Tutorial 1
9 pages
Assignment 6: Descriptive Statistics
No ratings yet
Assignment 6: Descriptive Statistics
4 pages
SMDE - (US) Experts Sesion Multivariate Analysis
No ratings yet
SMDE - (US) Experts Sesion Multivariate Analysis
4 pages
Bda Unit 5
No ratings yet
Bda Unit 5
14 pages
MR Unit-V
No ratings yet
MR Unit-V
13 pages
General Psychology (Fall 2018)
No ratings yet
General Psychology (Fall 2018)
313 pages
The Effect of Vigor, Dedication and Absorption On The Employee Performance of PT Garuda Indonesia Cargo
No ratings yet
The Effect of Vigor, Dedication and Absorption On The Employee Performance of PT Garuda Indonesia Cargo
6 pages
A Review of Software To Easily Plot Johnson-Neyman Figures
No ratings yet
A Review of Software To Easily Plot Johnson-Neyman Figures
9 pages
Predictive Analytics
No ratings yet
Predictive Analytics
22 pages
Finals-Predictive-Time-Series-Analysis - Module
No ratings yet
Finals-Predictive-Time-Series-Analysis - Module
14 pages
Multivariate Analysis
No ratings yet
Multivariate Analysis
26 pages
Group 33 Mid
No ratings yet
Group 33 Mid
16 pages
Regression
No ratings yet
Regression
86 pages
Inferential Statistics
No ratings yet
Inferential Statistics
22 pages
Unit 2 Da
No ratings yet
Unit 2 Da
31 pages
Econometrics For Accounting Students
No ratings yet
Econometrics For Accounting Students
132 pages
Chapter 2
No ratings yet
Chapter 2
136 pages
Multivariate Normal Distribution
No ratings yet
Multivariate Normal Distribution
19 pages
Unit Iv
No ratings yet
Unit Iv
15 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
9 pages
Unit 2
No ratings yet
Unit 2
6 pages
Unit Iii Data Analysis and Reporting
No ratings yet
Unit Iii Data Analysis and Reporting
27 pages
DataAnalytics (Unit 2)
No ratings yet
DataAnalytics (Unit 2)
131 pages
MLA LabManual1
No ratings yet
MLA LabManual1
52 pages
Introduction
No ratings yet
Introduction
27 pages
KCA 034 - Unit 2
No ratings yet
KCA 034 - Unit 2
97 pages
BDA Unit5
No ratings yet
BDA Unit5
9 pages
Unit 1
No ratings yet
Unit 1
24 pages
DWDM 4 Unit Notes
No ratings yet
DWDM 4 Unit Notes
21 pages
Unit 2 Data Analytics
No ratings yet
Unit 2 Data Analytics
33 pages
Unit 2 Da
No ratings yet
Unit 2 Da
31 pages
Data Analytics Unit 3
No ratings yet
Data Analytics Unit 3
104 pages
Descriptive and Inferential Statistics
No ratings yet
Descriptive and Inferential Statistics
8 pages
RMBD 1
No ratings yet
RMBD 1
11 pages
Module Ii
No ratings yet
Module Ii
31 pages
Wa0016.
No ratings yet
Wa0016.
60 pages
Notes of DA Unit-II
No ratings yet
Notes of DA Unit-II
91 pages
Unit 2
No ratings yet
Unit 2
48 pages
Secrets of Statistical Data Analysis and Management Science!
From Everand
Secrets of Statistical Data Analysis and Management Science!
Andrei Besedin
No ratings yet
Data Analytics
From Everand
Data Analytics
Jeffery Short
1/5 (1)
Glossary of Research Methodology
From Everand
Glossary of Research Methodology
Dr. Awadhesh Kishore
No ratings yet
Gale Researcher Guide for: Econometric Models
From Everand
Gale Researcher Guide for: Econometric Models
Chupp
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Mutivariate and Baysian

Uploaded by

Mutivariate and Baysian

Uploaded by

School of Computing Science and Engineering

Course Code : …………….. Course Name: Data Analytics

Name of the Faculty: Ms.KIRTI Program Name: BTech

Logistic regression analysis is used to calculate (and predict) the probability

• Multivariate analysis of variance (MANOVA) is used to measure the effect

Let’s imagine you work for an engineering company that is on a mission to

• Factor analysis is an interdependence technique which seeks to reduce

• Let’s imagine you have a dataset containing data pertaining to a

• Another interdependence technique, cluster analysis is used to group

P(X|Y) = P(Y|X) P(X) / P(Y)

• P(X|Y) is called as posterior, which we need to calculate. It is defined as

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.