0% found this document useful (0 votes)

55 views17 pages

Probability and Statistics: Lums Undergraduate SS-4-6

This document discusses numerical descriptive statistics used to describe datasets. It covers measures of central tendency like mean, median and mode. It also discusses measures of variability such as range, variance, standard deviation and coefficient of variation. Finally, it discusses measures of shape like skewness and kurtosis. Standardizing data using z-scores is described to identify outliers. The empirical rule and Chebychev's inequality are also summarized.

Uploaded by

M.Hasan Arshad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views17 pages

Probability and Statistics: Lums Undergraduate SS-4-6

Uploaded by

M.Hasan Arshad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Probability and Statistics

LUMS
Undergraduate
SS-4-6
Numerical Descriptive Statistics
• Numerical descriptive statistics take a different approach to
answer the same set of questions:
– Provide more precise information about a dataset’s distribution.
– The increased precision comes at the cost of stronger aggregation.
• Three basic types of numerical descriptive statistics:
– Measures of Central Location: Mean, Median, Mode
– Measures of Variability: Range, Variance, Standard Deviation,
Coefficient of Variation, Percentiles and Quartiles.
– Measures of Shape: Skewness and Kurtosis
• Ideally, employ visual and numerical descriptive statistics in
tandem to shed light on information embedded in datasets.
Measures of Central Location
• Average (i.e. arithmetic mean) is the most popular
measure of central location:
– computed by adding all the observations and dividing by the
total number of observations.
– appropriate for describing quantitative data only.
– Possesses nice theoretical properties:
• Sum of deviations from mean is zero.
• Linked to the measures of variation in a dataset.
• Changing value of a single observation changes the average.
• Central Limit Theorem
– Sensitive to outliers (extreme values) e.g. what happens to
average household income in a poor neighborhood when a
billionaire moves in?
Measures of Central Location
• Median: Place observations in ascending order, whereby,
observation/s falling in the middle is the median.
– Median not sensitive to outliers
– Often used for income and property values datasets.
– Cannot be computed for nominal data.
• Mode: value/class that occurs most frequently in a dataset.
– Most suitable for nominal data, but also used for ordinal data.
– Datasets may have more than one modal class.
– Not a good measure of central location for quantitative data.
Measures of Variability
• Measures of central location fail to tell the complete story
about a dataset’s distribution e.g. how are observations
spread out around the mean (on average)?
Measures of Variability
• Range: simplest measure of variability, calculated by
subtracting smallest observation from largest observation.
– Fails to provide information on the dispersion of the observations
located between the two end points.
• Variance, and its related measure Standard Deviation, is a
measure of variability that incorporates all the data points.
– Variance calculated by subtracting the mean from each number in a
dataset, squaring the differences, and dividing the sum of the
squares by the number of observations in the dataset.
– Standard deviation (square root of the variance) used to compare
the average degree of variability between two quantitative datasets.
• Commonly used as a measure of risk in finance.
Measures of Variability
• Coefficient of variation: Standard deviation of a variable
divided by its mean:
– A standardized measure of variation, when comparing the degree of
variability between variables with different means:
• Variation in salaries of managers and CEOs?
• Variation in the weights of watermelons and apples?
– Interpreted as variation in a variable as percentage of it’s mean.
• All of the above-mentioned measures of variability are
sensitive to outliers.
– Measures of relative variability are not sensitive to outliers.
• Provides information about position of a particular observation
relative to the entire dataset, often used to define benchmarks
in business applications.
Measures of Variability
– For example suppose your SAT score of 1340 is on the 80th percentile-
implies that 80% of students scored below you, while 20% of students
scored above you.
– Caution: This doesn’t mean you scored 80% on the exam!
• Difference between Q1 (25th percentile) and Q3 (75th
percentile) is called the interquartile range:
– Median is known as Q2 (50th percentile)
– Measures the spread around the middle 50% of the observations.
– Large values indicative of a high variability and presence of outliers.
• Measures of variation don’t tell us much about symmetry of
distribution, outliers and concentration of data in tails relative
to center of distribution.
Measures of Shape
Normal Distribution: A special type of symmetric uni-modal
distribution that is bell shaped, frequently encountered in
statistical modelling:
Many statistical techniques
require/assume that data
follows a bell shaped Frequency
distribution.

Variable

A Normal Distributions has Bell

Shaped Histogram
Measure of Shape
Skewness: A skewed distribution is one with a long tail
extending either to the right or the left of the distribution.

Positively Skewed or Right Skewed Negatively Skewed or left skewed

Implies mean>median i.e. more outliers Implies mean<median i.e. more
on the RHS outliers on the LHS
Measures of Shape
Kurtosis: Measure of relative concentration of data in the tails,
relative to the center of the distribution:
– Negative Excess Kurtosis: Relatively less concentration in the tails.
– Positive Excess Kurtosis: Relatively more concentration in the tails.
Overview-Numerical Descriptive Statistics

Describing Data Numerically

Central Tendency Variation Shape

Mean Range
Skewness
Median Variance/Std. Deviation and Kurtosis

Coefficient of Variation
Mode
Interquartile Range
Some Rules of Expectations Operation
• If 𝑘 is some constant, then we can mathematically prove the
following results:
– Rule-1: If 𝐸 𝑥𝑖 = 𝑥ҧ then 𝐸 𝑥𝑖 + 𝑘 = 𝑥ҧ + 𝑘
• Adding a constant to each observation changes average by that constant.
– Rule-2: If 𝑉𝑎𝑟 𝑥𝑖 = 𝜎 2 then 𝑉𝑎𝑟 𝑥𝑖 + 𝑘 = 𝜎 2
• Adding a constant to each observation does not change the variance.
– Rule-3: If 𝐸 𝑥𝑖 = 𝑥ҧ then 𝐸 𝑘𝑥𝑖 = 𝑘𝐸 𝑥𝑖 = 𝑘𝑥ҧ
• Multiplying each observation by a constant, changes the average by a factor of
that constant.
– Rule-4: If 𝑉𝑎𝑟 𝑥𝑖 = 𝜎 2 then 𝑉𝑎𝑟 𝑘𝑥𝑖 = 𝑘 2 𝑉𝑎𝑟 𝑥𝑖 = 𝑘 2 𝜎 2
• Multiplying each observation by a constant, changes the variance by the squared
factor of that constant.
• We apply these rules to standardize datasets to identify outliers.
Standardizing Datasets
• Z-scores used to identify outliers in a dataset. To calculate Z-
score of each observation:
– Subtract from each observation the mean of the variable
– Divide each observation by the standard deviation of the variable.
– The resulting distribution (of Z-scores) has a mean of 0 and standard
deviation of 1.
– Each observations Z-score is interpreted as number of standard
deviations it is above or below the mean
• Converting each observation into it’s corresponding Z score
does not change a non-normal distribution into a normal
distribution.
The Empirical Rule
Approximately 68% of all observations fall
within one standard deviation of the mean.

Approximately 95% of all observations fall

within two standard deviations of the mean.

Approximately 99.7% of all observations fall

within three standard deviations of the mean.
4.16
Chebychev’s Inequality
• For any type of distribution and any number k > 1, at least
1
100 × 1 − 1 − 2 % of the observations lie within 𝑘
𝑘
standard deviations of either side of the mean.
• Two special cases of Chebychev’s inequality are applied
frequently, namely, when k = 2 and k = 3:
– At least 75% of the observations in any data set lie within 2 standard
deviations to either side of the mean.
– At least 89% of the observations in any data set lie within 3 standard
deviations to either side of the mean.
• Does the empirical rule violate Chebychev’s inequality?

Route Surveying PDF
60% (5)
Route Surveying PDF
21 pages
Module 3 Descriptive Statistics Numerical Measures
No ratings yet
Module 3 Descriptive Statistics Numerical Measures
28 pages
DSILYTC Session 5 - Descriptive Statistics
No ratings yet
DSILYTC Session 5 - Descriptive Statistics
99 pages
OSTA-WS2024-Lecture 03
No ratings yet
OSTA-WS2024-Lecture 03
38 pages
Unit 3 Measure of Central Location
No ratings yet
Unit 3 Measure of Central Location
29 pages
Discriptive Statistics
No ratings yet
Discriptive Statistics
50 pages
Introduction To Probability and Statistics Thirteenth Edition
No ratings yet
Introduction To Probability and Statistics Thirteenth Edition
46 pages
Chapter 3, Part A Descriptive Statistics: Numerical Measures
No ratings yet
Chapter 3, Part A Descriptive Statistics: Numerical Measures
7 pages
8409 Statistics
No ratings yet
8409 Statistics
17 pages
EECM3724 Unit 1 Ch3 Slides 2022
No ratings yet
EECM3724 Unit 1 Ch3 Slides 2022
48 pages
Descriptive Statistics 1
No ratings yet
Descriptive Statistics 1
63 pages
Class 1 - 20th August 2024 - Descriptive Statistic
No ratings yet
Class 1 - 20th August 2024 - Descriptive Statistic
6 pages
1 Descriptive
No ratings yet
1 Descriptive
42 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
24 pages
Chapter 2
No ratings yet
Chapter 2
46 pages
Stat 102 Module 3
No ratings yet
Stat 102 Module 3
8 pages
M-1 CH-3 Descriptive Statistcs
No ratings yet
M-1 CH-3 Descriptive Statistcs
27 pages
Analysis of Statistcal Data
No ratings yet
Analysis of Statistcal Data
46 pages
Chapter 3
No ratings yet
Chapter 3
28 pages
Chapter 3: Statistics
No ratings yet
Chapter 3: Statistics
3 pages
CH 2 Lecture Notes
No ratings yet
CH 2 Lecture Notes
12 pages
Numerical Summary Statistics
No ratings yet
Numerical Summary Statistics
19 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
25 pages
Statistical Data
No ratings yet
Statistical Data
41 pages
Module 3 - Measures of Dispersion and Shape
No ratings yet
Module 3 - Measures of Dispersion and Shape
6 pages
Statistics ClassNotes - 2
No ratings yet
Statistics ClassNotes - 2
10 pages
Statistics I Chapter 2: Univariate Data Analysis
No ratings yet
Statistics I Chapter 2: Univariate Data Analysis
27 pages
ch03 Ver3
No ratings yet
ch03 Ver3
25 pages
Descriptive Stat
No ratings yet
Descriptive Stat
13 pages
Biostat Ch-5
No ratings yet
Biostat Ch-5
58 pages
Chap 4
No ratings yet
Chap 4
7 pages
Lecture No. 6 Measures of Variability
No ratings yet
Lecture No. 6 Measures of Variability
25 pages
Topic II Part II
No ratings yet
Topic II Part II
22 pages
Statistics For Business Topic - Chapter 3, 4 - Descriptive Statistics
No ratings yet
Statistics For Business Topic - Chapter 3, 4 - Descriptive Statistics
1 page
Part 2-Chapter 3 - Describing Data - Edit
No ratings yet
Part 2-Chapter 3 - Describing Data - Edit
46 pages
Class Test 1 Revision Notes
No ratings yet
Class Test 1 Revision Notes
10 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
4 pages
Lecture III-Measures of Dispersion
No ratings yet
Lecture III-Measures of Dispersion
33 pages
Lecture 2-3 Data Analysis Location & Dispression
No ratings yet
Lecture 2-3 Data Analysis Location & Dispression
43 pages
Intro To Stat1
No ratings yet
Intro To Stat1
31 pages
Stat
No ratings yet
Stat
16 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
Chapter 3 Review
100% (1)
Chapter 3 Review
12 pages
Introductory of Statistics - Chapter 3
No ratings yet
Introductory of Statistics - Chapter 3
7 pages
Lecture Afffasfafa
No ratings yet
Lecture Afffasfafa
29 pages
Intro To Stat
No ratings yet
Intro To Stat
50 pages
RMBS BPT402
No ratings yet
RMBS BPT402
103 pages
1 Basics of Stat (Statistics IEM 2-2)
No ratings yet
1 Basics of Stat (Statistics IEM 2-2)
29 pages
City Uni of New York
No ratings yet
City Uni of New York
33 pages
Unit 3 - Measures of Central Tendency
No ratings yet
Unit 3 - Measures of Central Tendency
2 pages
Basic of Statistics #5 (!!!)
No ratings yet
Basic of Statistics #5 (!!!)
49 pages
Lecture 5
No ratings yet
Lecture 5
25 pages
Chapter 4 Fin534
No ratings yet
Chapter 4 Fin534
38 pages
Chapter 02
No ratings yet
Chapter 02
46 pages
Topic 1 Describing Data II
No ratings yet
Topic 1 Describing Data II
68 pages
3) Statistical Measures of Asset Returns
No ratings yet
3) Statistical Measures of Asset Returns
6 pages
Ec310 Day 2 Lecture Notes
No ratings yet
Ec310 Day 2 Lecture Notes
10 pages
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet
Overview Of Bayesian Approach To Statistical Methods: Software
From Everand
Overview Of Bayesian Approach To Statistical Methods: Software
Vinaitheerthan Renganathan
No ratings yet
Quantitative Method-Breviary - SPSS: A problem-oriented reference for market researchers
From Everand
Quantitative Method-Breviary - SPSS: A problem-oriented reference for market researchers
Jens K. Perret
No ratings yet
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Probability and Statistics: Lums Undergraduate SS8
No ratings yet
Probability and Statistics: Lums Undergraduate SS8
3 pages
Probability and Statistics: Lums Undergraduate SS7
No ratings yet
Probability and Statistics: Lums Undergraduate SS7
5 pages
Probability and Statistics: Lums Undergraduate SS 11
No ratings yet
Probability and Statistics: Lums Undergraduate SS 11
9 pages
DISC 112 Computer and Problem Solving: Sessions 7-8
No ratings yet
DISC 112 Computer and Problem Solving: Sessions 7-8
29 pages
Introduction To Management Science: Queuing Theory
No ratings yet
Introduction To Management Science: Queuing Theory
26 pages
Non Verbal Reasoning Answers
No ratings yet
Non Verbal Reasoning Answers
7 pages
Cobb Douglas
No ratings yet
Cobb Douglas
14 pages
Mathematics Test Sample Questions: X Xy y
No ratings yet
Mathematics Test Sample Questions: X Xy y
2 pages
Binomial Series Questions
No ratings yet
Binomial Series Questions
2 pages
Machine Learning Is Fun! - Adam Geitgey - Part 1
No ratings yet
Machine Learning Is Fun! - Adam Geitgey - Part 1
10 pages
Aops Community 2017 Dutch Imo TST
No ratings yet
Aops Community 2017 Dutch Imo TST
2 pages
Elementary Topology
100% (2)
Elementary Topology
229 pages
Terminating and Non-Terminating Decimals
No ratings yet
Terminating and Non-Terminating Decimals
2 pages
First Quarterly Examination in General Mathematics
No ratings yet
First Quarterly Examination in General Mathematics
2 pages
153-TG Math Problems
100% (1)
153-TG Math Problems
44 pages
Complex Quiz 2
No ratings yet
Complex Quiz 2
1 page
Instrumented Principal Component Analysis
No ratings yet
Instrumented Principal Component Analysis
71 pages
Mahatma Gandhi University: Time Table
No ratings yet
Mahatma Gandhi University: Time Table
6 pages
03 Seatwork 1
No ratings yet
03 Seatwork 1
2 pages
On Neutrosophic Semi-Open Sets in Neutrosophic Topological Spaces
No ratings yet
On Neutrosophic Semi-Open Sets in Neutrosophic Topological Spaces
10 pages
Line It Up Understood
No ratings yet
Line It Up Understood
4 pages
An Epistemic Model of Task Design in Dynamic Geometry Environment
No ratings yet
An Epistemic Model of Task Design in Dynamic Geometry Environment
12 pages
Circle Vocabulary PDF
No ratings yet
Circle Vocabulary PDF
4 pages
Unit 4 DM Maths
No ratings yet
Unit 4 DM Maths
59 pages
Spinors and The Dirac Equation
No ratings yet
Spinors and The Dirac Equation
22 pages
Ybus Singular
No ratings yet
Ybus Singular
5 pages
Monotonocity: A. Definitions
No ratings yet
Monotonocity: A. Definitions
11 pages
Partial Differential Equation
No ratings yet
Partial Differential Equation
23 pages
La ( K I) FR: (Number System)
No ratings yet
La ( K I) FR: (Number System)
124 pages
Solution To Problem 113 Normal Stress - Strength of Materials Review
0% (1)
Solution To Problem 113 Normal Stress - Strength of Materials Review
5 pages
Python Assignment Topics
No ratings yet
Python Assignment Topics
5 pages
LP Practice Solutions-18-19
No ratings yet
LP Practice Solutions-18-19
13 pages
PDF (Free) : National Defence Academy NDA Exam Pattern & Syllabus
No ratings yet
PDF (Free) : National Defence Academy NDA Exam Pattern & Syllabus
4 pages
Math340Fa23 FinalExamKey
No ratings yet
Math340Fa23 FinalExamKey
16 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Probability and Statistics: Lums Undergraduate SS-4-6

Uploaded by

Probability and Statistics: Lums Undergraduate SS-4-6

Uploaded by

Probability and Statistics

A Normal Distributions has Bell

Positively Skewed or Right Skewed Negatively Skewed or left skewed

Describing Data Numerically

Central Tendency Variation Shape

Approximately 95% of all observations fall

Approximately 99.7% of all observations fall

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.