0% found this document useful (0 votes)

17 views9 pages

Exploratory Data Analysis

exploratory data analysis

Uploaded by

Neha Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views9 pages

Exploratory Data Analysis

exploratory data analysis

Uploaded by

Neha Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Exploratory Data Analysis

Exploratory Data Analysis is a critical step in the data science process. It is the
foundation for understanding and interpreting complex data sets. EDA helps data
scientists identify patterns, spot anomalies, test hypotheses, and check
assumptions through various statistical and graphical techniques.
It is the initial examination of data and should occur before any assumptions or
conclusions are made to avoid faulty analysis.
EDA is important for understanding and preparing data before using it for
machine learning or complex modeling. It can help identify issues like missing
data, outliers, and anomalies.
Exploratory Data Analysis (EDA) is a process of describing the data by means of
statistical and visualization techniques in order to bring important aspects of that
data into focus for further analysis.
The four types of EDA are univariate non-graphical, multivariate non- graphical,
univariate graphical, and multivariate graphical.
The main purpose of EDA is to help look at data before making any assumptions.
It can help identify obvious errors, as well as better understand patterns within
the data, detect outliers or anomalous events, find interesting relations among
the variables.
Python libraries for EDA include Pandas for data manipulation, Matplotlib and
Seaborn for visualisations, Plotly for interactive plots, and Dask for scalable
computing. These libraries enhance data analysis by offering powerful tools for
summarising, visualising, and managing large datasets effectively.
Applications: EDA can be applied to enhance customer segmentation, optimize
marketing strategies, perform market basket analysis, detect anomalies, and
predict trends. This analysis informs decisions across various departments, from
marketing to risk management.

Exploratory Data Analysis (EDA) uses both quantitative and graphical

techniques:
 Quantitative techniques
These techniques include interval estimation and hypothesis testing:
 Interval estimation: This technique constructs a range of values that
a variable is likely to fall within. A confidence interval is an example
of interval estimation.
 Hypothesis testing: This technique determines if a proposition is
true or false.
 Graphical techniques
These techniques summarize data visually or diagrammatically. Some examples
of graphical techniques include:
 Scatterplot: This technique plots one variable on the x-axis and
another on the y-axis to show the points for each case in the
dataset.
 Run chart: This technique plots data as a line graph over time.
 Heat map: This technique uses color to represent values in the
data.
 Multivariate chart: This technique graphically shows the
relationships between factors and response.
 Bubble chart: This technique displays multiple circles (bubbles) in a
two-dimensional plot.
 Rootogram: This technique plots the square roots of the number of
observations in different ranges of a quantitative variable.
EDA techniques are often graphical because graphics help analysts explore data
openly and discover new insights.

Exploratory Data Analysis (EDA) is an approach/philosophy for data analysis

that employs a variety of techniques (graphical and quantitative) to better
understand data. It is easy to get lost in the visualizations of EDA and to also lose
track of the purpose of EDA. EDA aims to make the downstream analysis easier.
To put EDA in context, the Data Science steps are: Obtain data, Clean and load
data; Exploratory Data Analysis; Model building; Model evaluation; Data
visualization and presentation
The Objectives of EDA are to discover underlying patterns, spot anomalies, frame
the hypothesis and check assumptions with the aim to find a good fitting model
(if one exists). At a more granular level, EDA involves understanding the
relationship between variables including determining relationships among the
explanatory variables; assessing the relationships between explanatory and
outcome variables (direction and rough size estimates); the presence of outliers;
a ranking of the important explanatory variables; conclusions as to whether
individual explanatory variables are statistically significant.
In this post, we present a systematic approach to EDA (based on the sources
listed below) to present EDA techniques in a concise manner.
Categorising EDA techniques
EDA techniques are either graphical or quantitative. Each of these techniques
are in turn, either univariate or multivariate (usually just bivariate). Quantitative
methods normally involve calculation of summary statistics. Graphical methods
summarize the data in a diagrammatic or visual way. Univariate methods look at
one variable (data column) at a time, while multivariate methods look at two or
more variables at a time to explore relationships. Usually, multivariate EDA will
be bivariate (looking at exactly two variables). Thus, the four types of EDA
techniques are Univariate non-graphical; Univariate graphical; Multivariate non-
graphical; Multivariate graphical. Non-graphical and graphical methods
complement each other. We can see graphical methods as more qualitative
(providing subjective analysis) vs quantitative methods as objective.
If we are focusing on data from observation of a single variable on n subjects, i.e.
a sample of size n, we also need to look graphically at the distribution of the
sample. Given a large enough sample size, we assume that the distribution is
normal. A more detailed explanation is HERE. There are exceptions to this idea –
for example – distributions could evolve over time, the distribution could be
unknown etc but for most cases, the normality conditions apply.
Univariate non-graphical EDA
Univariate non-graphical EDA techniques are concerned with understanding the
underlying sample distribution and make observations about the population. This
also involves Outlier detection. For univariate categorical data, we are
interested in the range and the frequency. Univariate EDA for quantitative
data involves making preliminary assessments about the population distribution
of the variable using the data from the observed sample. The characteristics of
the population distribution inferred include center, spread, modality, shape
and outliers. Measures of central tendency include Mean, Median, Mode. The
most common measure of central tendency is the mean. For skewed distribution
or when there is concern about outliers, the median may be
preferred. Measures of spread include variance, standard deviation, and
interquartile range. Spread is an indicator of how far away from the center we
are still likely to find data values. Univariate EDA also involves finding the
skewness (measure of asymmetry) and Kurtosis (measure of peakedness relative
to a Gaussian shape).
Univariate graphical EDA
For graphical analysis of univariate categorical data, histograms are typically
used. The histogram represents the frequency (count) or proportion (count/total
count) of cases for a range of values. Typically, between about 5 and 30 bins are
chosen. Histograms are one of the best ways to quickly learn a lot about your
data, including central tendency, spread, modality, shape and outliers. Stem
and Leaf plots could also be used for the same purpose. Boxplots can also be
used to present information about the central tendency, symmetry and skew, as
well as outliers. Quantile normal plots or QQ plots and other techniques could
also be used here.
Multivariate non-graphical EDA
Multivariate non-graphical EDA techniques generally show the relationship
between two or more variables in the form of either cross-tabulation or statistics.
For each combination of categorical variable (usually explanatory) and one
quantitative variable (usually outcome), we can create a statistic for a
quantitative variables separately for each level of the categorical variable, and
then compare the statistics across levels of the categorical variable. Comparing
the means is an informal version of ANOVA. Comparing medians is a robust
informal version of one-way ANOVA. (adapted from source. For two quantitative
variables, we can calculate co-variance and/or correlation. When we have many
quantitative variables, we typically calculate the pairwise covariances and/or
correlations and assemble them into a matrix.
Multivariate graphical EDA
For categorical multivariate quantities, the most commonly used graphical
technique is the barplot with each group representing one level of one of the
variables and each bar within a group representing the levels of the other
variable. For each category, we could have side-by-side boxplots or Parallel box
plots. For two quantitative multivariate variables, the basic graphical EDA
technique is the scatterplot which has one variable on the x-axis, one on the y-
axis and a point for each case in your dataset. Typically, the explanatory variable
goes on the X axis. Additional categorical variables can be accommodated by the
use of colour or symbols.
Univariate, Bivariate and Multivariate data and its analysis
Univariate data refers to a type of data in which each observation or data point
corresponds to a single variable. In other words, it involves the measurement or
observation of a single characteristic or attribute for each individual or item in
the dataset. Analyzing univariate data is the simplest form of analysis in
statistics.

Heights (in 16 167 17 174 17 18 18

cm) 4 .3 0 .2 8 0 6

Suppose that the heights of seven students in a class is recorded (above table).
There is only one variable, which is height, and it is not dealing with any cause or
relationship.
Key points in Univariate analysis:
1. No Relationships: Univariate analysis focuses solely on describing and
summarizing the distribution of the single variable. It does not explore
relationships between variables or attempt to identify causes.
2. Descriptive Statistics: Descriptive statistics, such as measures of
central tendency (mean, median, mode) and measures of
dispersion (range, standard deviation), are commonly used in the analysis
of univariate data.
3. Visualization: Histograms, box plots, and other graphical representations
are often used to visually represent the distribution of the single variable.
Bivariate data
Bivariate data involves two different variables, and the analysis of this type of
data focuses on understanding the relationship or association between these two
variables. Example of bivariate data can be temperature and ice cream sales in
summer season.

Temperat Ice Cream

ure Sales

20 2000

25 2500

35 5000

Suppose the temperature and ice cream sales are the two variables of a bivariate
data(table 2). Here, the relationship is visible from the table that temperature
and sales are directly proportional to each other and thus related because as the
temperature increases, the sales also increase.
Key points in Bivariate analysis:
1. Relationship Analysis: The primary goal of analyzing bivariate data is to
understand the relationship between the two variables. This relationship
could be positive (both variables increase together), negative (one
variable increases while the other decreases), or show no clear pattern.
2. Scatterplots: A common visualization tool for bivariate data is a
scatterplot, where each data point represents a pair of values for the two
variables. Scatterplots help visualize patterns and trends in the data.
3. Correlation Coefficient: A quantitative measure called the correlation
coefficient is often used to quantify the strength and direction of the linear
relationship between two variables. The correlation coefficient ranges from
-1 to 1.
Multivariate data
Multivariate data refers to datasets where each observation or sample point
consists of multiple variables or features. These variables can represent different
aspects, characteristics, or measurements related to the observed phenomenon.
When dealing with three or more variables, the data is specifically categorized as
multivariate.
Example of this type of data is suppose an advertiser wants to compare the
popularity of four advertisements on a website.

Advertisem Gend Click

ent er rate

Ad1 Male 80

Femal
Ad3 55
e

Femal
Ad2 123
e

Ad1 Male 66

Ad3 Male 35

The click rates could be measured for both men and women and relationships
between variables can then be examined. It is similar to bivariate but contains
more than one dependent variable.
Key points in Multivariate analysis:
1. Analysis Techniques:The ways to perform analysis on this data depends
on the goals to be achieved. Some of the techniques are regression
analysis, principal component analysis, path analysis, factor analysis
and multivariate analysis of variance (MANOVA).
2. Goals of Analysis: The choice of analysis technique depends on the
specific goals of the study. For example, researchers may be interested in
predicting one variable based on others, identifying underlying factors that
explain patterns, or comparing group means across multiple variables.
3. Interpretation: Multivariate analysis allows for a more nuanced
interpretation of complex relationships within the data. It helps uncover
patterns that may not be apparent when examining variables individually.

Univariate Bivariate Multivariate

It only summarize
It only summarize two It only summarize more
single variable at a
variables than 2 variables.
time.

It does not deal with It does deal with causes It does not deal with
causes and and relationships and causes and relationships
relationships. analysis is done. and analysis is done.

It is similar to bivariate
It does not contain any It does contain only one
but it contains more
dependent variable. dependent variable.
than 2 variables.

The main purpose is to

The main purpose is to The main purpose is to
study the relationship
describe. explain.
among them.

Example, Suppose an
advertiser wants to
compare the popularity
The example of of four advertisements
The example of a bivariate can be on a website.
univariate can be temperature and ice Then their click rates
height. sales in summer could be measured for
vacation. both men and women
and relationships
between variable can be
examined

Exploratory Data Analysis: A First Look at The Data
No ratings yet
Exploratory Data Analysis: A First Look at The Data
9 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
173 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
13 pages
Data Science Is A Deep Study of The Massive Amount of Data
No ratings yet
Data Science Is A Deep Study of The Massive Amount of Data
2 pages
What Is Exploratory Data Analysis (EDA) ?
No ratings yet
What Is Exploratory Data Analysis (EDA) ?
6 pages
Data Science - Module 2 (Updated)
No ratings yet
Data Science - Module 2 (Updated)
94 pages
AIDS C04-Session-22
No ratings yet
AIDS C04-Session-22
22 pages
03 Phan Tich Dau Tu Nang Cao - Phan Tich Kham Pha Du Lieu
No ratings yet
03 Phan Tich Dau Tu Nang Cao - Phan Tich Kham Pha Du Lieu
47 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
23 pages
Best Journal
No ratings yet
Best Journal
11 pages
IDA Question Bank Ch2
No ratings yet
IDA Question Bank Ch2
26 pages
Exploratory Data Analysis Types
No ratings yet
Exploratory Data Analysis Types
14 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
10 pages
Unit 3
No ratings yet
Unit 3
222 pages
Unit II. Methods and Techniques For Data Analytics
No ratings yet
Unit II. Methods and Techniques For Data Analytics
91 pages
Komorowski EDA2016
No ratings yet
Komorowski EDA2016
20 pages
DataAnalytics (Unit 2)
No ratings yet
DataAnalytics (Unit 2)
131 pages
Unit 3 Ids Notes
No ratings yet
Unit 3 Ids Notes
31 pages
Siwes Report
No ratings yet
Siwes Report
15 pages
03a EDA
No ratings yet
03a EDA
47 pages
Exploratory Data Analysis (EDA) in Data
No ratings yet
Exploratory Data Analysis (EDA) in Data
12 pages
Chapter 7 SQQS1033
No ratings yet
Chapter 7 SQQS1033
37 pages
07 Eda
No ratings yet
07 Eda
5 pages
Unit 4 Exploratory Data Analysis and The Data Science Process
No ratings yet
Unit 4 Exploratory Data Analysis and The Data Science Process
9 pages
Comparing Tools Provided by Python and R For Exploratory Data Analysis
No ratings yet
Comparing Tools Provided by Python and R For Exploratory Data Analysis
12 pages
C21 Sma Exp4
No ratings yet
C21 Sma Exp4
12 pages
05 AIHC Exp02
No ratings yet
05 AIHC Exp02
11 pages
Exploratory Data Analysis: M. Srinath
No ratings yet
Exploratory Data Analysis: M. Srinath
19 pages
EDA
No ratings yet
EDA
3 pages
Why Exploratory Data Analysis Is Important
No ratings yet
Why Exploratory Data Analysis Is Important
2 pages
DSE 3 Unit 4
No ratings yet
DSE 3 Unit 4
8 pages
Unit 3
No ratings yet
Unit 3
31 pages
AI6322 - Module 3 - Exploratory Data Analysis (EDA) - MODULE
No ratings yet
AI6322 - Module 3 - Exploratory Data Analysis (EDA) - MODULE
15 pages
EDA Exploratory Data Analysis
No ratings yet
EDA Exploratory Data Analysis
6 pages
EDA Feature Eng - Estimation Inference and Hypothesis
No ratings yet
EDA Feature Eng - Estimation Inference and Hypothesis
53 pages
Unit 1 - Intro To EDA
No ratings yet
Unit 1 - Intro To EDA
40 pages
Datascience Unit-4
No ratings yet
Datascience Unit-4
6 pages
Document
No ratings yet
Document
21 pages
Lecture 1 Exploratory Data Analysis
No ratings yet
Lecture 1 Exploratory Data Analysis
41 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
5 pages
Lab Sessions MINE 467 F20
No ratings yet
Lab Sessions MINE 467 F20
23 pages
Unit 4 (B) NGP
No ratings yet
Unit 4 (B) NGP
127 pages
Wa0000.
No ratings yet
Wa0000.
15 pages
Unit 2 Lec4
No ratings yet
Unit 2 Lec4
24 pages
UNIT 1 Exploratory Data Analysis
100% (1)
UNIT 1 Exploratory Data Analysis
8 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
17 pages
Pranita Dane - IBM - Internship Project Submission - Data Analytics
No ratings yet
Pranita Dane - IBM - Internship Project Submission - Data Analytics
28 pages
Module 2
No ratings yet
Module 2
81 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
3 pages
B.Sc. Comp - Science With Data Science (Resolved)
No ratings yet
B.Sc. Comp - Science With Data Science (Resolved)
37 pages
Edashsh
No ratings yet
Edashsh
7 pages
Tasks of A Business Analyst
No ratings yet
Tasks of A Business Analyst
42 pages
BI-LEc 3
No ratings yet
BI-LEc 3
24 pages
Unit 1
No ratings yet
Unit 1
52 pages
22amh32 - Data Analytics and Data Science Unit I & Exploratory Data Analysis (Eda) 1. Exploratory Data Analysis (Eda)
No ratings yet
22amh32 - Data Analytics and Data Science Unit I & Exploratory Data Analysis (Eda) 1. Exploratory Data Analysis (Eda)
9 pages
IOT Domain
No ratings yet
IOT Domain
70 pages
FDS Unit 2
No ratings yet
FDS Unit 2
15 pages
Exploratory Data Analysis in ML
No ratings yet
Exploratory Data Analysis in ML
7 pages
Amit Khilare Used Device Data PM Project
No ratings yet
Amit Khilare Used Device Data PM Project
25 pages
Unit 3
No ratings yet
Unit 3
77 pages
Data Science Process
No ratings yet
Data Science Process
30 pages
Data Science Notes - Hamza
No ratings yet
Data Science Notes - Hamza
110 pages
PDF Experiments-1 DADV
No ratings yet
PDF Experiments-1 DADV
41 pages
7 Low Level Design
No ratings yet
7 Low Level Design
10 pages
Unit 3
No ratings yet
Unit 3
47 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
2 pages
Picture
No ratings yet
Picture
19 pages
JD - Data Science Analyst 2025
No ratings yet
JD - Data Science Analyst 2025
2 pages
AIML Curriculum Powered by IBM - Pregrad-Merged
No ratings yet
AIML Curriculum Powered by IBM - Pregrad-Merged
66 pages
Class PPT - Unit2
No ratings yet
Class PPT - Unit2
139 pages
Exploratory Data Analysis - Komorowski PDF
No ratings yet
Exploratory Data Analysis - Komorowski PDF
20 pages
Machine Learning-Based Cryptocurrency Prediction Enhancing Market Forecasting With Advanced Predictive Models
No ratings yet
Machine Learning-Based Cryptocurrency Prediction Enhancing Market Forecasting With Advanced Predictive Models
23 pages
DS Unit 2
No ratings yet
DS Unit 2
42 pages
Rit 39
No ratings yet
Rit 39
19 pages
E Commerce project-NL
No ratings yet
E Commerce project-NL
35 pages
EDAusingpython SAlaruri
No ratings yet
EDAusingpython SAlaruri
50 pages
Unit 4 - BI
No ratings yet
Unit 4 - BI
27 pages
Data Exploration and Visualisation LP
No ratings yet
Data Exploration and Visualisation LP
4 pages
Maximising Operational Uptime: A Strategic Approach To Mitigate Unplanned Machine Downtime and Boost Productivity Using Machine Learning Techniques
No ratings yet
Maximising Operational Uptime: A Strategic Approach To Mitigate Unplanned Machine Downtime and Boost Productivity Using Machine Learning Techniques
13 pages
Slides CNN Unit 3
No ratings yet
Slides CNN Unit 3
36 pages
Python
No ratings yet
Python
2 pages
FDS - 3 Solved
No ratings yet
FDS - 3 Solved
21 pages
Project Report - Rishabh Rai
No ratings yet
Project Report - Rishabh Rai
51 pages
Introduction To Data Science Important Questions
No ratings yet
Introduction To Data Science Important Questions
3 pages
CNN Notes Unit-3
No ratings yet
CNN Notes Unit-3
12 pages
Lec 02 - DS100 Fa23 - Pandas 1
No ratings yet
Lec 02 - DS100 Fa23 - Pandas 1
61 pages
Zomoto Data Analysis Using Python - 1
No ratings yet
Zomoto Data Analysis Using Python - 1
10 pages
Summer Internship: Global Unicorn Metrics Analysis
No ratings yet
Summer Internship: Global Unicorn Metrics Analysis
10 pages
Assignment EDA
No ratings yet
Assignment EDA
4 pages
Final
No ratings yet
Final
11 pages
Internal QP Format Ad3301
No ratings yet
Internal QP Format Ad3301
1 page
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Exploratory Data Analysis

Uploaded by

Exploratory Data Analysis

Uploaded by

Exploratory Data Analysis

Exploratory Data Analysis (EDA) uses both quantitative and graphical

Exploratory Data Analysis (EDA) is an approach/philosophy for data analysis

Heights (in 16 167 17 174 17 18 18

Temperat Ice Cream

Advertisem Gend Click

Univariate Bivariate Multivariate

The main purpose is to

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.