0% found this document useful (0 votes)
80 views28 pages

Intro To Descriptive Statistics: By: Mahmoud Galal

This document provides an introduction to descriptive statistics. It covers the history of statistics, data types and terminology, summary statistics, and exploratory data analysis. [1] Key founders who developed early statistical methods and code breaking algorithms are mentioned. [2] Different data types like categorical and numerical variables are defined, as well as common terminology like samples and outliers. [3] Popular summary statistics are explained, including measures of center (mean, median, mode), spread (range, variance, standard deviation), and shape (skewness, kurtosis). Common techniques for handling outliers are also summarized. The document concludes by noting that further topics like measures of spread, visualizations, and full exploratory data analysis will be covered.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views28 pages

Intro To Descriptive Statistics: By: Mahmoud Galal

This document provides an introduction to descriptive statistics. It covers the history of statistics, data types and terminology, summary statistics, and exploratory data analysis. [1] Key founders who developed early statistical methods and code breaking algorithms are mentioned. [2] Different data types like categorical and numerical variables are defined, as well as common terminology like samples and outliers. [3] Popular summary statistics are explained, including measures of center (mean, median, mode), spread (range, variance, standard deviation), and shape (skewness, kurtosis). Common techniques for handling outliers are also summarized. The document concludes by noting that further topics like measures of spread, visualizations, and full exploratory data analysis will be covered.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Intro to Descriptive Statistics

By: Mahmoud Galal


AGENDA

01 History
Knowing the past …
Respecting the present.
02 Data Types and Terminologies
Knowing the type of the data
is knowing what to do with it.

Exploratory Data Analysis

03 Summary Statistics
Representing a great sum of 04 Using the swords we mastered to
slice down the beast, Just to know
data with just one number ! him better (We are not savages !)
-- It’s said that data is the biggest beast of our age --
01
History…
the past … the present.
How statistics eveolved ?

Until the 18th century data Nowadays, statistcs has evolved to


was collected and putted contribute in every field of study in
into dummy tables. our lives and even just Fun !

From the 19th century and onwards, Life got


complicated, and people started to think about
how to collect, summarize and present the data.
Founders …
Abu Youssef Yaaqob Al Kindi Developed the first code breaking
algorithm based on . He wrote a book
entitled "Manuscript on Deciphering Cryptographic Messages",
containing detailed discussions on .

Check rest of the founder like Bayes , Laplace


and Gauss here.
Who got it ?
02
Data Types and Terminologies
Knowing the type of the data
is knowing what to do with it.
Data Types

Cat Or Dog 25.33

-3
Gender
Data Terminologies
• Variable are also called Column,
Feature, Dimension, field and Attribute.
Country Age Score
• Samples are also called Observations,
Records, Instances and rows.
Egypt 30 4
• Variables and Samples make up the
term “Data Set” or “Data Frame”.
Morocco 21 4

Germany 29 3
Outliers

1. An outlier is a data point that differs


from other observations.

2. We usually tend to remove the outliers to


make sure that we are making accurate
analysis.

3. can cause serious problems


in statistical analyses
Removing Outliers
There are different techniques to capture outliers

1. We usually tend to remove the outliers to make sure that


we are making accurate analysis, But other times we keep
them. Example:

2. We can our outliers to make them look like the


majority, not to remove them if their removal will have
bad effects like in in case of small datasets.

3. So we have to always visualize our data and analyze it


properly before making any moves
Removing Outliers – Tukey’s Method
• One of the most used methods to detect and remove outliers is the Tukey’s method.

• First We calculate the IQR like IQR = Q3 value – Q1 value

• Then any data points < (Q1 – 1.5 * IQR) and > (Q3 + 1.5 * IQR) is considered an outlier.

• Outliers are , Where


Real-World
Example
03
Summary Statistics
“ONE NUMBER TO RULE THEM ALL ,
ONE NUMBER TO FIND THEM ALL ,
ONE NUMBER TO BRING THEM ALL ”

-Gandalf The Grey


Summary statistics types
• A sample summary is a , A population summary
is a .

• We can summarize our data with different measures.


Each of them adds a certain power to the analysis.

1. Measures of location (Mean, Median, mode)


2. Measures of spread (Min, Max, Variance and Standard Deviation)
3. Measures of shape (Skewness and Kurtosis)
Measures of location (Center)

It’s the measures that describes the value


of a data set, And Its most popular forms are:
1. Mean

2. Median

3. Mode Frequency(X)

Reference
Mean
It’s the sum of all values of the data set divided by its
records number

1. It’s the computed summary statistic.

2. Suitable for general-purposed analysis.

. Median and
Mode can not be algebraically manipulated. Reference

4. The mean is more widely used than median and


mode.

5. Very Sensitive to outliers and .

6. Can’t handle features.

of data points.
Practice
5-Min Break
Please submit your attendance while
Making yourself something hot
Median
It’s the middle value of our data set.

1. The median value is fixed by its position and is


not reflected by the individual value.

2. Can be used to determine an approximate


average if there were outliers in the data.
is odd
3. Can’t be computed Algebarically.

4. Before applying the law of the median,


.
Mode
Mode
It’s the element that appeared the most in our dataset.

1. We can have in the dataset.


2. Unlike mean, it has no mathematical property
3. Unlike mean, Mode is .
4. It’s the most suitable measure for .
Practice
Pros and Cons

Algebric Qualitative Fluctuations of


Outliers Sensitive ?
Manipulation Expression sampling
7,540,860,914
Just imagine a dataset with this number of rows...

1
CONCLUSION

1. It’s strongly believed that Arabs are the pioneers of statistics.

2. Outliers are generally bad for our analysis but sometimes they are
the most important.

3. Summary statistics is a must when working with data.

4. Mean is the most popular measure of location or center.

5. We can use the other summaries like median and mode in special
cases like outliers presence.

6. When data are on interval scale the suitable measure of central


tendency is mean. Median is suitable when data are on ordinal
scale. Mode is calculated when data are on nominal scale.
What is coming ?

Measures of Spread Visuals


Box plot,
Dataset Variability histograms ..etc

Measures of Shape Full EDA


Skewness and Practising what we
Kurtosis have learned.
“Statistics is the grammar of Sceince

—Karl Pearson (Who is he ?)


THANKS
Any Questions? …
Head to google meet !

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy