0% found this document useful (0 votes)

9 views99 pages

DSILYTC Session 5 - Descriptive Statistics

This document provides an overview of descriptive statistics, focusing on numerical measures such as mean, median, mode, and measures of variability like range and standard deviation. It also discusses the importance of understanding the relationship between variables through covariance and correlation coefficients, as well as the use of data dashboards for effective data presentation. Key concepts include measures of location, variability, distribution shape, and methods for detecting outliers.

Uploaded by

starguardianlux123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views99 pages

DSILYTC Session 5 - Descriptive Statistics

Uploaded by

starguardianlux123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 99

DSILYTC:

Introduction to
Analytics
SESSION 5: DESCRIPTIVE STATISTICS
Objective of the Study

 DescriptiveAnalytics
 Case Analysis:
 Applications of
Numerical Measures
DESCRIPTIVE STATISTIC: NUMERICAL MEASURES

 MEASURE OF LOCATION • MEASURE OF VARIABLES

NUMERICAL MEASUREMENT

 If
the measures are computed for data from
a sample, they are called SAMPLE STATISTIC.
 Ifthe measures are computed for data form
a population, they are called POPULATION
PARAMETERS.
A sample statistic is referred to as the point
estimator of the corresponding population
parameter.
MEASURE OF LOCATION

 Mean
 Median
 Mode
 Weighted Mean
 Geometric Mean
 Percentile
 Quartiles
MEAN

 Perhaps the most important measure of location

is the mean
 The mean provides a measure of central location
 The mean of a data set is the average of all the data
values
 Thesample mean is the point estimator of the
population means
MEDIAN

 The median of a data set is the value in the middle when

the data items are arranged in ascending order.
 Whenever a data set has extreme values, median is the
preferred measure of central location.
 The median is the measure of location most often
reported for annual income and property value data.
 A few extremely large income or property values can inflate
the mean.
MEDIAN
MEDIAN
MEDIAN
TRIMMED MEAN

 Another measure sometimes used when extreme

values are present is the TRIMMED MEAN.
 It is obtained by deleting a percentage of the
smallest and largest values from a data set and
then computing the mean of the remaining
values.
 For example, the 5% trimmed mean is obtained by
removing the smallest 5% and the largest 5% of the
data values and then computing the mean of the
remaining values.
MODE

 The mode of a data set is the value that

occurs with greatest frequency.
 The greatest frequency can occur at two
or more different values.
 If the data have exactly 2 modes, the
data are BIMODAL.
 If the data have more than 2 modes, the
data are MULTIMODAL.
MODE
WEIGHT MEAN

 In some instance the mean is computed by

giving each observation a weight that
reflects its relative importance.
 The choice of weights depends on the
applications.
 The weight might be the number of credit
hours earned for each grade, as in GPA
 In other weighted mean computations,
quantities such as pounds, dollars, or volume
are frequently used.
WEIGHT MEAN
WEIGHT MEAN
WEIGHT MEAN
GEOMETRIC MEAN

 The geometric mean is calculated by findings the

nth root of the product of n values.
 It is often used in analyzing growth rates in financial
data (where using the arithmetic mean will provide
misleading results).
 It should be applied anytime you want to determine
the mean rate of change over several successive
periods (be it years, quarters, weeks,…)
 Other common applications include: change in
population of species, crop yields, pollution levels,
and birth and death rates.
GEOMETRIC MEAN
GEOMETRIC MEAN
PERCENTILES

 A percentiles provides information about how the

data are spread over the interval from the smallest
value to the largest value.
 Admission test scores for colleges and universities
are frequently reported in terms of percentiles.
 The pth percentiles of a data set is value such that
at least p percent of the items takes on this value
or less and at least (100 – p) percent of the items
take on this value or more.
PERCENTILES
QUARTILES
MEASURES OF VARIABILITY

 Itis often desirable to consider measures

of variability (dispersion), as well as
measures of location.
 Forexample, in choosing supplier A or
supplier B we might consider not only the
average delivery time for each, but also
the variability in delivery time for each.
MEASURES OF VARIABILITY

 Range
 Interquartile Range
 Variance
 Standard deviation
 Coefficient of variation
RANGE

 The range of data set is the difference between the largest and
smallest data values.

RANGE = LARGEST Values -

SMALLEST Values

 It is the simplest measure of variability

 It is very sensitive to the smallest and largest data values.
INTERQUARTILE RANGE

 The interquartile range of a data set is

the difference between the 3rd
quartile and the 1st quartile.
 It is the range for the middle 50% of
the data.
 It overcomes the sensitivity to extreme
data values.
INTERQUARTILE RANGE
VARIANCE

 The variance is a measure of variability that

utilize all the data.
 It is based on the difference between the value
of each observation

 The variance is useful in comparing the

variability of 2 or more variables
VARIANCE
STANDARD DEVIATION

 The standard deviation of a data set

is the positive square root of the
variance.
 It is measured in the same units as the
data, making it more easily
interpreted than the variance.
STANDARD DEVIATION
COEFFICIENT OF VARIATION

 Thecoefficient of variation indicates

how large the standard deviation is in
relation to the mean.
 The coefficient of variation is computed as
follows:
Descriptive Statistic: Numerical
Measures (part 2)
 Measure of Distribution shape, Relative
Location and Detecting Outliers.
 Five number summaries and Box plots
 Measures of Association between 2
variables
 DataDashboard: Adding numerical
measures to improve effectiveness
Measure of Distribution shape, Relative
Location and Detecting Outliers.

 Distribution shape
 Z-scores
 Chebyshev’s Theorem
 Empirical Rule
 Detecting Outliers
Distribution shape

 An important measure of the shape of a

distribution is called SKEWNESS.
 The formula for the skewness of sample data is

 The skewness can be easily computed using

statistical software.
Distribution shape: SKEWNESS
Distribution shape: SKEWNESS
Distribution shape: SKEWNESS
Distribution shape: SKEWNESS
Z-Scores

 The z-scores is often called the standardized

value.
 It denotes the number of standard deviations a
data value is form the mean.

 Excel’s STANDARDIZE function can be used to

compute the z-scores.
Z-Scores

 An observations’ z-scores is a measure of the

relative location of observation in a data set.
 A data value less than the sample mean will
have a z-score less than zero.
 A data value greater than the sample mean will
have a z-score greater than zero.
 A data value equal to the sample mean will
have a z-score of zero.
Z-Scores
Chebyshev’s Theorem
Chebyshev’s Theorem

 At least 75% of the data values must be

within z= 2 standard deviation of the
mean.
 At least 89% of the data values must be
within z=3 standard deviation of the
mean.
 At least 94% of the data values must be
within z=4 standard deviation of the
mean.
Chebyshev’s Theorem
Chebyshev’s Theorem
EMPIRICAL RULE

 Whenthe data are believed to

approximate a bell-shaped distribution:
 Theempirical rule can be used to
determine the percentage of data values
that must be within a specified number of
standard deviations of the mean.
 The empirical rule is based on the normal
distribution, which is covered in chapter 6.
EMPIRICAL RULE

For data having bell-shaped distribution.

Approximately 68% of the data values will be
within +/-1 standard deviation of its mean.
Approximately 95% of the data values will be
within +/-2 standard deviation of its means.
Almost all of the data values will be within +/-3
standard deviation of its means.
EMPIRICAL RULE
Detecting Outliers

 An outliers is an unusually small or unusually large

value in a data set.
 A data value with a z-score less than -3 or greater
than +3 might be considered an outlier.
 It might be
 An incorrectly recorded data value
 A data value that was incorrectly included in the
data set
 A correctly recorded unusual data value that belongs
in the data set.
OUTLIERS
FIVE NUMBER SUMMARIES & BOX
PLOT
 Summary statistic and easy-to-draw
graphs can be used to quickly
summarize large quantities of data.
 Two tools that accomplish this are
five-number summaries and box plots.
FIVE NUMBER SUMMARY

 Smallest value
 First quartile
 Median
 Third quartile
 Largest Value
FIVE NUMBER SUMMARY
BOX PLOT

A box plot is a graphical summary of data

that is based on a five-number summary
A key to the development of a box plot is
the computation of median and quartiles
Q1 and Q3
 Box plot provide another way to identify
outliers.
BOX PLOT
BOX PLOT

 Limitsare located (not drawn) using the

interquartile range (IQR).
 Data outside these limits are considered
(outliers).
 The location of each outlier is shown with
the symbols.
BOX PLOT
BOX PLOT
Measures of Association between 2
variables
 Thus far we have examined numerical
methods used to summarize the data for one
variables at a time.
 Often a manager or decision maker is
interested in the relationship between 2
variables.
 Two descriptive measures of the relationship
between 2 variables COVARIANCE and
CORRELATION COEFFICIENT.
COVARIANCE

 The covariance is a measure of the

linear association between 2
variables.
 Positive values indicates a positive
relationship.
 Negative values indicates a
negative relationship.
COVARIANCE
CORRELATION COEFFICIENT

Correlation is a measure of linear

association and not necessarily
causation
Just between 2 variables are highly
correlated, it does not mean that one
variables is the cause of the other.
CORRELATION COEFFICIENT
CORRELATION COEFFICIENT

 The coefficient can take on value

between -1 and +1
 Values near -1 indicate a strong negative
linear relationship.
 Values near +1 indicate a strong positive
linear relationship
 The closer the correlation to zero, the
weaker relationship.
CORRELATION COEFFICIENT
COVARIANCE & CORRELATION
COEFFICIENT
COVARIANCE & CORRELATION
COEFFICIENT
COVARIANCE & CORRELATION
COEFFICIENT
DATA DASHBOARDS:
Adding numerical measure to
 Data improve effectiveness
dashboard are not limited to graphical
displays
 The addition of numerical measures, such as
the mean and standard deviation of KPI’s, to
a data dashboard is often critical.
 Dashboards are often interactive.
 Drilling Down refers to functionally in
interactive dashboards that allows the users
to access information and analyses at
increasingly detailed level.
DATA DASHBOARDS:
Adding numerical measure to
improve effectiveness
DSILYTC:
Introduction to
Analytics
SESSION 5: DESCRIPTIVE STATISTICS

Summary Measures
No ratings yet
Summary Measures
26 pages
ML Unit 3
No ratings yet
ML Unit 3
17 pages
STA641 Handouts (VUStudent - PK)
No ratings yet
STA641 Handouts (VUStudent - PK)
338 pages
Analysis of Statistcal Data
No ratings yet
Analysis of Statistcal Data
46 pages
Lecture 1 21022024 033638pm
No ratings yet
Lecture 1 21022024 033638pm
30 pages
OSTA-WS2024-Lecture 03
No ratings yet
OSTA-WS2024-Lecture 03
38 pages
Week 6+7+8
No ratings yet
Week 6+7+8
37 pages
Session 2
No ratings yet
Session 2
14 pages
Part 2-Chapter 3 - Describing Data - Edit
No ratings yet
Part 2-Chapter 3 - Describing Data - Edit
46 pages
EECM3724 Unit 1 Ch3 Slides 2022
No ratings yet
EECM3724 Unit 1 Ch3 Slides 2022
48 pages
Descriptive Statistics and Exploratory Data Analysis
No ratings yet
Descriptive Statistics and Exploratory Data Analysis
36 pages
Stat 102 Module 3
No ratings yet
Stat 102 Module 3
8 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
25 pages
STATISTICS (Averages and Variation)
No ratings yet
STATISTICS (Averages and Variation)
8 pages
Chapter 3, Numerical Descriptive Measures: - Data Analysis Is
No ratings yet
Chapter 3, Numerical Descriptive Measures: - Data Analysis Is
21 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
24 pages
ch03 Ver3
No ratings yet
ch03 Ver3
25 pages
Data Management (1) (1) - Compressed
No ratings yet
Data Management (1) (1) - Compressed
46 pages
Unit I Bbbbbbbbbbbbbba
No ratings yet
Unit I Bbbbbbbbbbbbbba
8 pages
Chapter 3: Statistics
No ratings yet
Chapter 3: Statistics
3 pages
DDDDDD 2
No ratings yet
DDDDDD 2
5 pages
Topic 1 Describing Data II
No ratings yet
Topic 1 Describing Data II
68 pages
Discriptive Statistics
No ratings yet
Discriptive Statistics
50 pages
City Uni of New York
No ratings yet
City Uni of New York
33 pages
Probability and Statistics: Lums Undergraduate SS-4-6
No ratings yet
Probability and Statistics: Lums Undergraduate SS-4-6
17 pages
Chapter 5
No ratings yet
Chapter 5
6 pages
1 Basics of Stat (Statistics IEM 2-2)
No ratings yet
1 Basics of Stat (Statistics IEM 2-2)
29 pages
Introduction To The Practice of Basic Statistics (Textbook Outline)
100% (14)
Introduction To The Practice of Basic Statistics (Textbook Outline)
65 pages
Class 1 - 20th August 2024 - Descriptive Statistic
No ratings yet
Class 1 - 20th August 2024 - Descriptive Statistic
6 pages
Statistics For Data Science
100% (1)
Statistics For Data Science
27 pages
Statistics I Chapter 2: Univariate Data Analysis
No ratings yet
Statistics I Chapter 2: Univariate Data Analysis
27 pages
Stat Chapter 5-9
No ratings yet
Stat Chapter 5-9
32 pages
SLIDES - Statistics-Descriptive Statistics
No ratings yet
SLIDES - Statistics-Descriptive Statistics
25 pages
RMBS BPT402
No ratings yet
RMBS BPT402
103 pages
MATM111 Midterms REVIEWER
No ratings yet
MATM111 Midterms REVIEWER
3 pages
Statistics Lecture 1
No ratings yet
Statistics Lecture 1
20 pages
Biostats Lesson 3
No ratings yet
Biostats Lesson 3
6 pages
Notes Stats Quiz 2
No ratings yet
Notes Stats Quiz 2
10 pages
CH 2 Lecture Notes
No ratings yet
CH 2 Lecture Notes
12 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
35 pages
History Reporting
No ratings yet
History Reporting
61 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
Descriptive Statistics 1
No ratings yet
Descriptive Statistics 1
63 pages
Spring Semester, 2020-2021
No ratings yet
Spring Semester, 2020-2021
40 pages
MMW Reviewer
No ratings yet
MMW Reviewer
9 pages
Basic Business Statistics: Numerical Descriptive Measures
No ratings yet
Basic Business Statistics: Numerical Descriptive Measures
33 pages
Module 3 4 MMW
No ratings yet
Module 3 4 MMW
6 pages
ISDS 361A - Cheat Sheet Exam 1 PDF
No ratings yet
ISDS 361A - Cheat Sheet Exam 1 PDF
2 pages
Unit 3 - Descriptive Statistics
No ratings yet
Unit 3 - Descriptive Statistics
44 pages
Camm 3e Ch02 PPT PDF
No ratings yet
Camm 3e Ch02 PPT PDF
112 pages
Descriptive Statistics Summary (Session 1-5) : Types of Data - Two Types
No ratings yet
Descriptive Statistics Summary (Session 1-5) : Types of Data - Two Types
4 pages
Statistics Midterm Review
No ratings yet
Statistics Midterm Review
21 pages
Glossary of Statistical Terms: Roger Stern, Ian Dale and Sandro Leidi
No ratings yet
Glossary of Statistical Terms: Roger Stern, Ian Dale and Sandro Leidi
23 pages
Ge8 Statistics
No ratings yet
Ge8 Statistics
2 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
Reviewer Part 1
No ratings yet
Reviewer Part 1
9 pages
6 CE 411 - HYDROLOGY (Statistical Measures)
No ratings yet
6 CE 411 - HYDROLOGY (Statistical Measures)
33 pages
Partition Values and Box-Plot
No ratings yet
Partition Values and Box-Plot
14 pages
Chapter5 Measures of Variability
No ratings yet
Chapter5 Measures of Variability
31 pages
Statistical Analysis - Descriptive Stat
No ratings yet
Statistical Analysis - Descriptive Stat
6 pages
Jaggia4e Chap003 PPT
No ratings yet
Jaggia4e Chap003 PPT
54 pages
Study of Quants
No ratings yet
Study of Quants
15 pages
Exploratory Spatial Data Analysis
No ratings yet
Exploratory Spatial Data Analysis
54 pages
Chapter 3, Part A Descriptive Statistics: Numerical Measures
No ratings yet
Chapter 3, Part A Descriptive Statistics: Numerical Measures
7 pages
Introductory of Statistics - Chapter 3
No ratings yet
Introductory of Statistics - Chapter 3
7 pages
Class Test 1 Revision Notes
No ratings yet
Class Test 1 Revision Notes
10 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
4 pages
MATHS LIT Data Handling Content Manual
No ratings yet
MATHS LIT Data Handling Content Manual
31 pages
ACCCOB2 BOOOK - Removed
No ratings yet
ACCCOB2 BOOOK - Removed
31 pages
Essential Oil Tick Study
No ratings yet
Essential Oil Tick Study
11 pages
DSILYTC Session 3 - Data Visualization and Presentation
No ratings yet
DSILYTC Session 3 - Data Visualization and Presentation
63 pages
Business Case - Netflix - Data Exploration and Visualisation - Ipynb - Colab
No ratings yet
Business Case - Netflix - Data Exploration and Visualisation - Ipynb - Colab
9 pages
Math 23-24 S6 Mock Paper Solutions
No ratings yet
Math 23-24 S6 Mock Paper Solutions
14 pages
Newbold, P. (2019) - Statistics For Business and Economics. 9thed, Pearson
No ratings yet
Newbold, P. (2019) - Statistics For Business and Economics. 9thed, Pearson
20 pages
Chapter 3 Numerical Technique
No ratings yet
Chapter 3 Numerical Technique
56 pages
FDS Unit 2
No ratings yet
FDS Unit 2
15 pages
(POWERPOINT 1) Simple Interest
No ratings yet
(POWERPOINT 1) Simple Interest
54 pages
Performance Analysis of Indian Railway Zones Using MCDM Approaches
No ratings yet
Performance Analysis of Indian Railway Zones Using MCDM Approaches
19 pages
(POWERPOINT 3) Compound Interest and Present Value
No ratings yet
(POWERPOINT 3) Compound Interest and Present Value
50 pages
Measures of Dispersion (Q)
No ratings yet
Measures of Dispersion (Q)
45 pages
QTM Assignment-2: Submitted by NAME - Akash Malik ROLL NUMBER-170102018
0% (1)
QTM Assignment-2: Submitted by NAME - Akash Malik ROLL NUMBER-170102018
7 pages
Data Minig and Techniquezz
No ratings yet
Data Minig and Techniquezz
48 pages
AS Multiple Choices
No ratings yet
AS Multiple Choices
91 pages
8 - Global Environmental Issues and The Preservation of The Environment
No ratings yet
8 - Global Environmental Issues and The Preservation of The Environment
19 pages
FINBUSF Formulas
No ratings yet
FINBUSF Formulas
14 pages
Lecturer3-Descriptive Analysis
No ratings yet
Lecturer3-Descriptive Analysis
24 pages
Economics Measures of Dispersion
No ratings yet
Economics Measures of Dispersion
4 pages
Chapter 2
No ratings yet
Chapter 2
11 pages
Box-And-Whisker Plot ONLY
No ratings yet
Box-And-Whisker Plot ONLY
4 pages
Quiz 1 MH Sample Solutions
No ratings yet
Quiz 1 MH Sample Solutions
5 pages
??statistical Concepts That Every Data Scientist Should Know?? - ??? - ?!! - by Dhilip Maharish - AI Mind
No ratings yet
??statistical Concepts That Every Data Scientist Should Know?? - ??? - ?!! - by Dhilip Maharish - AI Mind
22 pages
STAB22 Midterm 2009W
No ratings yet
STAB22 Midterm 2009W
14 pages
Quiz 1 TF Sample Solutions
No ratings yet
Quiz 1 TF Sample Solutions
2 pages
Test Week 4 Answers
No ratings yet
Test Week 4 Answers
18 pages
Introduction To Descriptive Statistics
No ratings yet
Introduction To Descriptive Statistics
12 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
4 pages
ED H14b Cumualtive Frequency, Box Plots, Histogram
No ratings yet
ED H14b Cumualtive Frequency, Box Plots, Histogram
2 pages
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
Overview Of Bayesian Approach To Statistical Methods: Software
From Everand
Overview Of Bayesian Approach To Statistical Methods: Software
Vinaitheerthan Renganathan
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

DSILYTC Session 5 - Descriptive Statistics

Uploaded by

DSILYTC Session 5 - Descriptive Statistics

Uploaded by

DSILYTC:

 MEASURE OF LOCATION • MEASURE OF VARIABLES

 Perhaps the most important measure of location

 The median of a data set is the value in the middle when

 Another measure sometimes used when extreme

 The mode of a data set is the value that

 In some instance the mean is computed by

 The geometric mean is calculated by findings the

 A percentiles provides information about how the

 Itis often desirable to consider measures

RANGE = LARGEST Values -

 It is the simplest measure of variability

 The interquartile range of a data set is

 The variance is a measure of variability that

 The variance is useful in comparing the

 The standard deviation of a data set

 Thecoefficient of variation indicates

 An important measure of the shape of a

 The skewness can be easily computed using

 The z-scores is often called the standardized

 Excel’s STANDARDIZE function can be used to

 An observations’ z-scores is a measure of the

 At least 75% of the data values must be

 Whenthe data are believed to

For data having bell-shaped distribution.

 An outliers is an unusually small or unusually large

A box plot is a graphical summary of data

 Limitsare located (not drawn) using the

 The covariance is a measure of the

Correlation is a measure of linear

 The coefficient can take on value

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.