2 - Univerate and Multiveriate
2 - Univerate and Multiveriate
Multivariate Analysis
Univariate analysis:
• refers to the analysis of one variable.
• “uni” means “one.”
• Simplest form of analysis since the information deals with only one quantity that changes.
• It does not deal with causes or relationships and the main purpose of the analysis is to
describe the data and find patterns that exist within it.
Multivariate analysis:
refers to the analysis of more than one variable. “multi” means “more than one.”
Identify Patterns:
Discover hidden patterns or relationships within complex datasets, let see how variables are
connected.
Reduce Complexity:
Simplify high-dimensional data to focus on the most important information while minimizing noise.
Visualize Relationships:
Create visual representations to make sense of complex interactions among multiple variables.
Predict Outcomes:
Build models that use several variables to make predictions or classify data into categories.
Segment Data:
Group similar observations or entities to gain insights and make tailored decisions or
recommendations.
Advantages and disadvantages of multivariate data analysis (MVDA)
Advantages Disadvantages
1. Comprehensive Insights:
1. Complexity:
provides a holistic view of data, uncovering
can be challenging to understand and implement for non-experts.
complex relationships.
1.Summary Statistics
•We can calculate measures of central tendency like the mean or median for one
variable.
•We can also calculate measures of dispersion such as the standard deviation for one
variable.
3. Charts: Create charts like boxplots, histograms, density curves, etc. to visualize the
distribution of values for one variable.
There are two common ways to perform multivariate analysis:
1. Scatterplot Matrix : Visualize the relationship b/w each pairwise combination of variables in a dataset.
Unsupervised learning algorithm like principal components analysis to find structure and relationships between multiple
variables in a dataset at once.
Multivariate is similar to bivariate but contains more than one dependent variable.
Some of the techniques are regression analysis, path analysis, factor analysis and multivariate
analysis of variance (MANOVA).
Performing both univariate and multivariate analysis on a dataset
It involves exploring the characteristics of individual variables (univariate) and examining relationships between
variables (multivariate)
Take a dataset about students, including their age, study hours, and exam scores.
let's calculate the univariate statistics for the "Age" variable and perform a simple multivariate analysis by calculating the
correlation between "Age" and "Exam Score" using the dataset :
Study Exam
Univariate Analysis for "Age":
SID Age Hours Score
Mean Age:
1 18 2.5 85
Sum of ages: 18 + 20 + 19 + 22 + 21 = 100
Mean Age: 100 / 5 = 20 years 2 20 3.0 88
Median Age:
Since there is an odd number of observations (5), the median 3 19 2.8 90
is the middle value, which is 20 years. 4 22 3.5 92
Standard Deviation of Age (Sample):
Calculate the squared differences between each age and the 5 21 3.2 91
mean age=>>
Multivariate Analysis
-
Where: