0% found this document useful (0 votes)

80 views28 pages

Intro To Descriptive Statistics: By: Mahmoud Galal

This document provides an introduction to descriptive statistics. It covers the history of statistics, data types and terminology, summary statistics, and exploratory data analysis. [1] Key founders who developed early statistical methods and code breaking algorithms are mentioned. [2] Different data types like categorical and numerical variables are defined, as well as common terminology like samples and outliers. [3] Popular summary statistics are explained, including measures of center (mean, median, mode), spread (range, variance, standard deviation), and shape (skewness, kurtosis). Common techniques for handling outliers are also summarized. The document concludes by noting that further topics like measures of spread, visualizations, and full exploratory data analysis will be covered.

Uploaded by

Tarek Tarek El-safrani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views28 pages

Intro To Descriptive Statistics: By: Mahmoud Galal

Uploaded by

Tarek Tarek El-safrani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Intro to Descriptive Statistics

By: Mahmoud Galal

AGENDA

01 History
Knowing the past …
Respecting the present.
02 Data Types and Terminologies
Knowing the type of the data
is knowing what to do with it.

Exploratory Data Analysis

03 Summary Statistics
Representing a great sum of 04 Using the swords we mastered to
slice down the beast, Just to know
data with just one number ! him better (We are not savages !)
-- It’s said that data is the biggest beast of our age --
01
History…
the past … the present.
How statistics eveolved ?

Until the 18th century data Nowadays, statistcs has evolved to

was collected and putted contribute in every field of study in
into dummy tables. our lives and even just Fun !

From the 19th century and onwards, Life got

complicated, and people started to think about
how to collect, summarize and present the data.
Founders …
Abu Youssef Yaaqob Al Kindi Developed the first code breaking
algorithm based on . He wrote a book
entitled "Manuscript on Deciphering Cryptographic Messages",
containing detailed discussions on .

Check rest of the founder like Bayes , Laplace

and Gauss here.
Who got it ?
02
Data Types and Terminologies
Knowing the type of the data
is knowing what to do with it.
Data Types

Cat Or Dog 25.33

-3
Gender
Data Terminologies
• Variable are also called Column,
Feature, Dimension, field and Attribute.
Country Age Score
• Samples are also called Observations,
Records, Instances and rows.
Egypt 30 4
• Variables and Samples make up the
term “Data Set” or “Data Frame”.
Morocco 21 4

Germany 29 3
Outliers

1. An outlier is a data point that differs

from other observations.

2. We usually tend to remove the outliers to

make sure that we are making accurate
analysis.

3. can cause serious problems

in statistical analyses
Removing Outliers
There are different techniques to capture outliers

1. We usually tend to remove the outliers to make sure that

we are making accurate analysis, But other times we keep
them. Example:

2. We can our outliers to make them look like the

majority, not to remove them if their removal will have
bad effects like in in case of small datasets.

3. So we have to always visualize our data and analyze it

properly before making any moves
Removing Outliers – Tukey’s Method
• One of the most used methods to detect and remove outliers is the Tukey’s method.

• First We calculate the IQR like IQR = Q3 value – Q1 value

• Then any data points < (Q1 – 1.5 * IQR) and > (Q3 + 1.5 * IQR) is considered an outlier.

• Outliers are , Where

Real-World
Example
03
Summary Statistics
“ONE NUMBER TO RULE THEM ALL ,
ONE NUMBER TO FIND THEM ALL ,
ONE NUMBER TO BRING THEM ALL ”

-Gandalf The Grey

Summary statistics types
• A sample summary is a , A population summary
is a .

• We can summarize our data with different measures.

Each of them adds a certain power to the analysis.

1. Measures of location (Mean, Median, mode)

2. Measures of spread (Min, Max, Variance and Standard Deviation)
3. Measures of shape (Skewness and Kurtosis)
Measures of location (Center)

It’s the measures that describes the value

of a data set, And Its most popular forms are:
1. Mean

2. Median

3. Mode Frequency(X)

Reference
Mean
It’s the sum of all values of the data set divided by its
records number

1. It’s the computed summary statistic.

2. Suitable for general-purposed analysis.

. Median and
Mode can not be algebraically manipulated. Reference

4. The mean is more widely used than median and

mode.

5. Very Sensitive to outliers and .

6. Can’t handle features.

of data points.
Practice
5-Min Break
Please submit your attendance while
Making yourself something hot
Median
It’s the middle value of our data set.

1. The median value is fixed by its position and is

not reflected by the individual value.

2. Can be used to determine an approximate

average if there were outliers in the data.
is odd
3. Can’t be computed Algebarically.

4. Before applying the law of the median,

.
Mode
Mode
It’s the element that appeared the most in our dataset.

1. We can have in the dataset.

2. Unlike mean, it has no mathematical property
3. Unlike mean, Mode is .
4. It’s the most suitable measure for .
Practice
Pros and Cons

Algebric Qualitative Fluctuations of

Outliers Sensitive ?
Manipulation Expression sampling
7,540,860,914
Just imagine a dataset with this number of rows...

1
CONCLUSION

1. It’s strongly believed that Arabs are the pioneers of statistics.

2. Outliers are generally bad for our analysis but sometimes they are
the most important.

3. Summary statistics is a must when working with data.

4. Mean is the most popular measure of location or center.

5. We can use the other summaries like median and mode in special
cases like outliers presence.

6. When data are on interval scale the suitable measure of central

tendency is mean. Median is suitable when data are on ordinal
scale. Mode is calculated when data are on nominal scale.
What is coming ?

Measures of Spread Visuals

Box plot,
Dataset Variability histograms ..etc

Measures of Shape Full EDA

Skewness and Practising what we
Kurtosis have learned.
“Statistics is the grammar of Sceince

—Karl Pearson (Who is he ?)

THANKS
Any Questions? …
Head to google meet !

ECOE 1302 Spring 2017 2slide
0% (1)
ECOE 1302 Spring 2017 2slide
295 pages
Business Analytics: Describing The Distribution of A Single Variable
No ratings yet
Business Analytics: Describing The Distribution of A Single Variable
58 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
93 pages
Safari
No ratings yet
Safari
385 pages
SJ-20121105113235-005-ZXSDR BS8900B Components Replacement Guide - 477578
No ratings yet
SJ-20121105113235-005-ZXSDR BS8900B Components Replacement Guide - 477578
47 pages
SJ-20100511105923-001-ZXSDR BS8900A C100 (HV2.0) Hardware Installation Guide - 448709
No ratings yet
SJ-20100511105923-001-ZXSDR BS8900A C100 (HV2.0) Hardware Installation Guide - 448709
121 pages
Business CH 6
100% (2)
Business CH 6
6 pages
R20 M.Tech DS
No ratings yet
R20 M.Tech DS
64 pages
Statistics
No ratings yet
Statistics
21 pages
Statistics For Data Science
100% (1)
Statistics For Data Science
27 pages
Data Science Lab Manual..
No ratings yet
Data Science Lab Manual..
54 pages
Raci Chart 108
No ratings yet
Raci Chart 108
5 pages
It0089 Finalreviewer
No ratings yet
It0089 Finalreviewer
143 pages
Quick Help Acrylic WiFi HeatMaps-V2.0 (ENG)
No ratings yet
Quick Help Acrylic WiFi HeatMaps-V2.0 (ENG)
36 pages
Desc Excel
No ratings yet
Desc Excel
65 pages
Statistics
No ratings yet
Statistics
86 pages
CH 4
No ratings yet
CH 4
49 pages
Statistics
No ratings yet
Statistics
41 pages
DSILYTC Session 5 - Descriptive Statistics
No ratings yet
DSILYTC Session 5 - Descriptive Statistics
99 pages
Lesson 02 Probability and Statistics
No ratings yet
Lesson 02 Probability and Statistics
127 pages
PC 2 Statistics by Praveen Mathur
No ratings yet
PC 2 Statistics by Praveen Mathur
44 pages
Grantt Chart Template 15
No ratings yet
Grantt Chart Template 15
13 pages
STA301 Short Notes (1 To 22)
No ratings yet
STA301 Short Notes (1 To 22)
94 pages
Normal Probability Curve: Dr. K Uldeep Kaur
No ratings yet
Normal Probability Curve: Dr. K Uldeep Kaur
37 pages
SJ-20140527134643-019-ZXUR 9000 GSM (V6.50.202) Software Version Management Operation Guide
No ratings yet
SJ-20140527134643-019-ZXUR 9000 GSM (V6.50.202) Software Version Management Operation Guide
35 pages
Chapter2-Statistical Analysis
No ratings yet
Chapter2-Statistical Analysis
86 pages
Chapter 3 Descriptive Statistics
No ratings yet
Chapter 3 Descriptive Statistics
78 pages
Quant Descriptive Statistics
No ratings yet
Quant Descriptive Statistics
37 pages
Statistics Ppt.1
No ratings yet
Statistics Ppt.1
39 pages
Introduction and Descriptive Statistics
No ratings yet
Introduction and Descriptive Statistics
50 pages
Ch1 Prob&Stat NEW
No ratings yet
Ch1 Prob&Stat NEW
35 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
63 pages
Share MBBS - Lecture 4 (1) - 1
No ratings yet
Share MBBS - Lecture 4 (1) - 1
68 pages
Unit I & Ii Qa
No ratings yet
Unit I & Ii Qa
42 pages
OMF000001 Um Interface and Radio Channels ISSUE2.1
No ratings yet
OMF000001 Um Interface and Radio Channels ISSUE2.1
44 pages
Industrial Training Report Final Nitin Tyagi
No ratings yet
Industrial Training Report Final Nitin Tyagi
65 pages
Statistics 24 04 2021 20210618114031
No ratings yet
Statistics 24 04 2021 20210618114031
41 pages
RM Module 3
No ratings yet
RM Module 3
34 pages
Basics of Statistics
No ratings yet
Basics of Statistics
32 pages
Types of Statistics
No ratings yet
Types of Statistics
7 pages
Descriptive Statsistics
No ratings yet
Descriptive Statsistics
34 pages
Getting To Know Your Data
No ratings yet
Getting To Know Your Data
42 pages
Basic Concepts of Statistics
No ratings yet
Basic Concepts of Statistics
41 pages
SJ-20121024095807-004-ZXSDR BS8800 (V3.8.51.00) Indoor Macro Base Station Routine Maintenance Guide - 452587
No ratings yet
SJ-20121024095807-004-ZXSDR BS8800 (V3.8.51.00) Indoor Macro Base Station Routine Maintenance Guide - 452587
65 pages
IEM 4103 Quality Control & Reliability Analysis IEM 5103 Breakthrough Quality & Reliability
No ratings yet
IEM 4103 Quality Control & Reliability Analysis IEM 5103 Breakthrough Quality & Reliability
46 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
26 pages
(2021) Torabi Et Al., Effect of Sensory Experience On Customer Word-Of-Mouth Intention, Considering The Roles of CE, CS, CL
No ratings yet
(2021) Torabi Et Al., Effect of Sensory Experience On Customer Word-Of-Mouth Intention, Considering The Roles of CE, CS, CL
18 pages
Click To Add Text Dr. Cemre Erciyes: Soc 2003 Statistical Methods and Computer Applications in Social Sciences 18/19
No ratings yet
Click To Add Text Dr. Cemre Erciyes: Soc 2003 Statistical Methods and Computer Applications in Social Sciences 18/19
69 pages
02 Data
No ratings yet
02 Data
36 pages
ds1 Iat Ans
No ratings yet
ds1 Iat Ans
18 pages
Module 2c - Exploratory Data Analysis
No ratings yet
Module 2c - Exploratory Data Analysis
18 pages
SJ-20130704144811-007-ZXSDR OMMB (V12.13.30) Dynamic Data Management Operation Guide
No ratings yet
SJ-20130704144811-007-ZXSDR OMMB (V12.13.30) Dynamic Data Management Operation Guide
19 pages
Data Analyst
No ratings yet
Data Analyst
21 pages
Unit 8. Data Analysis
No ratings yet
Unit 8. Data Analysis
69 pages
C4 Descriptive Statistics
No ratings yet
C4 Descriptive Statistics
34 pages
Stats 1 Module Updated
No ratings yet
Stats 1 Module Updated
53 pages
STA 104 Reading Habits Among University Students (MBA1112G)
No ratings yet
STA 104 Reading Habits Among University Students (MBA1112G)
44 pages
Unit 3 - Descriptive Statistics
No ratings yet
Unit 3 - Descriptive Statistics
44 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
24 pages
Lecture 1
No ratings yet
Lecture 1
32 pages
CSD502 Standard Probability Dist
No ratings yet
CSD502 Standard Probability Dist
15 pages
Gauging Gage Minitab
No ratings yet
Gauging Gage Minitab
16 pages
Ch7 Floods and Methods of Flood Discharge Estimation
No ratings yet
Ch7 Floods and Methods of Flood Discharge Estimation
15 pages
Biostats Lesson 3
No ratings yet
Biostats Lesson 3
6 pages
Statistics
No ratings yet
Statistics
13 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
Stat 102 Module 3
No ratings yet
Stat 102 Module 3
8 pages
Chapter 3
No ratings yet
Chapter 3
17 pages
S6 Skewness2
No ratings yet
S6 Skewness2
42 pages
Statistics Notes
No ratings yet
Statistics Notes
16 pages
Exploring Data: AP Statistics Unit 1: Chapters 1-4
No ratings yet
Exploring Data: AP Statistics Unit 1: Chapters 1-4
83 pages
Review Chapter 3
No ratings yet
Review Chapter 3
8 pages
MMW Nursing
No ratings yet
MMW Nursing
23 pages
Assignment1 ISOM2500 Spring2024 L4L5 Soltuion
No ratings yet
Assignment1 ISOM2500 Spring2024 L4L5 Soltuion
6 pages
Article Review 1 Eng
No ratings yet
Article Review 1 Eng
30 pages
Chapter1 Statistics
No ratings yet
Chapter1 Statistics
17 pages
CH 2 Lecture Notes
No ratings yet
CH 2 Lecture Notes
12 pages
Statistics ClassNotes - 2
No ratings yet
Statistics ClassNotes - 2
10 pages
Descriptive - Statistics Data Discret chp2
No ratings yet
Descriptive - Statistics Data Discret chp2
7 pages
3RD Quarter Statistics and Probability
No ratings yet
3RD Quarter Statistics and Probability
7 pages
Elementary Statisctics Reviewer
No ratings yet
Elementary Statisctics Reviewer
5 pages
BM-707 & BM-607 MID Assignment
No ratings yet
BM-707 & BM-607 MID Assignment
4 pages
ECS2221 - Statistics For Economics UG 2nd Sem 2023
No ratings yet
ECS2221 - Statistics For Economics UG 2nd Sem 2023
3 pages
Regression Paper
No ratings yet
Regression Paper
12 pages
Case Processing Summary
No ratings yet
Case Processing Summary
5 pages
Statistics Theory
No ratings yet
Statistics Theory
3 pages
Central Tendency: Mode, Median, and Mean
No ratings yet
Central Tendency: Mode, Median, and Mean
15 pages
Question Paper
No ratings yet
Question Paper
4 pages
Measure of Dispersion
No ratings yet
Measure of Dispersion
4 pages
MMW Quiz 2 Reviewer
No ratings yet
MMW Quiz 2 Reviewer
8 pages
Vibrations From Blasting For A Road Tunnel in Hong Kong - A Statistical Review - FINAL - PUBLISHED - VERSION
100% (1)
Vibrations From Blasting For A Road Tunnel in Hong Kong - A Statistical Review - FINAL - PUBLISHED - VERSION
18 pages
ISDS 361A - Cheat Sheet Exam 1 PDF
No ratings yet
ISDS 361A - Cheat Sheet Exam 1 PDF
2 pages
Unit-3 DS Students
No ratings yet
Unit-3 DS Students
35 pages
Effect of Computer Simulation On Learning Outcomes in Relation To Self-Efficacy
No ratings yet
Effect of Computer Simulation On Learning Outcomes in Relation To Self-Efficacy
7 pages
Chapter 3 PDF
No ratings yet
Chapter 3 PDF
15 pages
Educ 201
No ratings yet
Educ 201
2 pages
Lesson 1
No ratings yet
Lesson 1
18 pages
CHAPTER8 QS026 semII 2009 10
No ratings yet
CHAPTER8 QS026 semII 2009 10
13 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Intro To Descriptive Statistics: By: Mahmoud Galal

Uploaded by

Intro To Descriptive Statistics: By: Mahmoud Galal

Uploaded by

Intro to Descriptive Statistics

By: Mahmoud Galal

Exploratory Data Analysis

Until the 18th century data Nowadays, statistcs has evolved to

From the 19th century and onwards, Life got

Check rest of the founder like Bayes , Laplace

Cat Or Dog 25.33

1. An outlier is a data point that differs

2. We usually tend to remove the outliers to

3. can cause serious problems

1. We usually tend to remove the outliers to make sure that

2. We can our outliers to make them look like the

3. So we have to always visualize our data and analyze it

• First We calculate the IQR like IQR = Q3 value – Q1 value

• Outliers are , Where

-Gandalf The Grey

• We can summarize our data with different measures.

1. Measures of location (Mean, Median, mode)

It’s the measures that describes the value

1. It’s the computed summary statistic.

2. Suitable for general-purposed analysis.

4. The mean is more widely used than median and

5. Very Sensitive to outliers and .

6. Can’t handle features.

1. The median value is fixed by its position and is

2. Can be used to determine an approximate

4. Before applying the law of the median,

1. We can have in the dataset.

Algebric Qualitative Fluctuations of

1. It’s strongly believed that Arabs are the pioneers of statistics.

3. Summary statistics is a must when working with data.

4. Mean is the most popular measure of location or center.

6. When data are on interval scale the suitable measure of central

Measures of Spread Visuals

Measures of Shape Full EDA

—Karl Pearson (Who is he ?)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.