0% found this document useful (0 votes)
47 views20 pages

2021 - Lecture 4 - Descriptive Statistics - Slides

This document provides an overview of a lecture on descriptive statistics. It discusses univariate descriptive statistics like measures of central tendency (mean, median, mode) and dispersion (range, interquartile range, variance, standard deviation). It also discusses bivariate descriptive statistics like correlation. Examples of calculating various univariate statistics and interpreting skewness, kurtosis, and frequency distributions are provided. Scatterplots and calculating correlation to analyze relationships between two variables are also introduced.

Uploaded by

Fanelo Felicity
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views20 pages

2021 - Lecture 4 - Descriptive Statistics - Slides

This document provides an overview of a lecture on descriptive statistics. It discusses univariate descriptive statistics like measures of central tendency (mean, median, mode) and dispersion (range, interquartile range, variance, standard deviation). It also discusses bivariate descriptive statistics like correlation. Examples of calculating various univariate statistics and interpreting skewness, kurtosis, and frequency distributions are provided. Scatterplots and calculating correlation to analyze relationships between two variables are also introduced.

Uploaded by

Fanelo Felicity
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

2021/08/25

Lecture 4

GIS220:
Descriptive statistics

Prof Gregory Breetzke


greg.breetzke@up.ac.za
Room 1-19, Geography

Lecture overview

• What are descriptive statistics?


• Types of descriptive statistics
– Univariate
– Bivariate
• Examples

1
2021/08/25

Descriptive statistics

• Provide an initial entry point

• Some research questions can satisfactory be answered


using descriptive statistics

Types of descriptive statistics

• Univariate and bivariate statistics


– U: mean, mode, range, standard deviation
– B: correlation coefficient

2
2021/08/25

Types of descriptive statistics

UNIVARIATE

• Measures of central tendency


– Mean
– Mode
– Median
• Measures of dispersion
– Range
– Interquartile range
– Variance
– Standard deviation

The mean

• The mean is a measure of central value


– What most people mean by “average”
– Sum of a set of numbers divided by the number
of numbers in the set

3
2021/08/25

The median
• Middlemost or most central item in the set of
ordered numbers; it separates the distribution
into two equal halves
• If odd, then n is the middle value of sequence
– if X = [1,2,4,6,9,10,12,14,17]
– then 9 is the median
• If even, then n, average of 2 middle values
– if X= [1,2,4,6,9,10,11,12,14,17]
– then 9.5 is the median; i.e., (9+10)/2
• Median is not affected by extreme values

The mode
• The mode is the most frequently occurring
number in a distribution
– if X = [1,2,4,7,7,7,8,10,12,14,17]
– then 7 is the mode
• Easy to see in a simple frequency distribution
• Possible to have no modes or more than one
mode
– bimodal and multimodal
• Don’t have to be exactly equal frequency
– major mode, minor mode
• Mode is not affected by extreme values

4
2021/08/25

When to use what…?


• Mean is a great measure. But, there are time when its
usage is inappropriate or impossible
– Nominal data: Mode
– The distribution is bimodal: Mode
– You have ordinal data: Median or mode
– Are a few extreme scores: Median

Dispersion
• Dispersion
– How tightly clustered or how
variable the values are in a data
set
• Example
– Data set 1: [0,25,50,75,100]
– Data set 2: [48,49,50,51,52]
– Both have a mean of 50, but data
set 1 clearly has greater variability than data set 2

5
2021/08/25

Range
• The difference between the maximum and
minimum values in a set
• Example
– Data set 1: [1,25,50,75,100]; R: 100-1 = 99
– Data set 2: [48,49,50,51,52]; R: 52-48 = 4
– The range ignores how data are distributed and
only takes the extreme scores into account

• RANGE = (Xlargest –Xsmallest)

Quartiles
• Split ordered data into four quarters

= first quartile = (25th percentile)


= second quartile = Median (50th percentile)
= third quartile = (75th percentile)

6
2021/08/25

Interquartile range (IQR)


• Difference between third and first quartiles
– Interquartile Range = Q3-Q1

• Spread in middle 50%

• Not affected by extreme values

• The IQR is used to measure how spread out the data points in a set
are from the mean of the data set

• The higher the IQR, the more spread out the data points

• The smaller the IQR, the more bunched up the data points are
around the mean

• It is best used with other measurements such as the median and


total range to build a complete picture of a data set’s tendency to
cluster around its mean.

Example

• Given the set of values: 27, 18, 19, 12, 15, 1,


2, 6, 5, 9, 7, find the…
– Mean
– Median
– Range
– Interquartile range

7
2021/08/25

Standard deviation
• Let X = [3, 4, 5 ,6, 7]
– X=5
– (X - X) = [-2, -1, 0, 1, 2]
• Subtract x from each number in X
– (X - X)2 = [4, 1, 0, 1, 4]
• Squared deviations from the mean
– – S (X - X)2 = 10
• Sum of squared deviations from the mean (SS)
– S (X - X)2 /n-1 = 10/5 = 2.5
• Average squared deviation from the mean
– S (X - X)2 /n-1 = 2.5 = 1.58
• Square root of averaged squared deviation

Standard deviation
• Most South African employers issue raises based on
percent of salary
• Why do supervisors think the most fair raise is a
percentage raise?
• Answer:
1)Because higher paid persons get the most money.
2)The easiest thing to do is raise everyone’s salary by a fixed
percent.
• If your budget went up by 5%, salaries can go up by 5%.
• The problem is that the flat percent raise gives
unequal increased rewards

8
2021/08/25

Standard deviation
• Acme Toilet Cleaning Services
• Salary Pool: R200,000

Incomes:
• President: R100K; Manager: R50K; Secretary: R40K; and
Toilet Cleaner: R10K
• Mean: R50K - These can be considered
• Range: R90K “measures of inequality”

• Variance: R1,050,000,000
• Standard Deviation: R32.4K
• Now, let’s apply a 5% raise

Standard deviation
• After a 5% raise, the pool of money increases by R10K to
R210,000

• Incomes:
– President: R105K; Manager: R52.5K; Secretary: R42K; and Toilet Cleaner:
R10.5K
– Mean: R52.5K –went up by 5%
– Range: R94.5K –went up by 5%
– Variance: R1,157,625,000
– Standard Deviation: R34K –went up by 5%

• The flat percentage raise increased


inequality. The top earner got 50% of
the new money. The bottom earner
got 5% of the new money. Measures of
inequality went up by 5%.

9
2021/08/25

Skew
• Skewness is a measure of the asymmetry of the
probability distribution
• Roughly speaking, a distribution has positive skew
(right-skewed) if the right (higher value) tail is
longer and a negative skew (left-skewed) if the left
(lower value) tail is longer (confusing the two is a
common error)

Skew

10
2021/08/25

Kurtosis

• A high kurtosis distribution has a sharper "peak"


and fatter "tails", while a low kurtosis distribution
has a more rounded peak with wider "shoulders".

11
2021/08/25

Frequency distributions
• Symmetrical distribution
– Approximately equal numbers of observations above and
below the middle
• Skewed distribution
– One side is more spread out that the other, like a tail
– Direction of the skew
• Positive or negative (right or left)
• Side with the fewer scores
• Side that looks like a tail

Symmetrical vs. skewed distributions

12
2021/08/25

Types of descriptive statistics

BIVARIATE

• Correlation
– linear pattern of relationship between one variable (x) and
another variable (y) –an association between two variables
• Relative position of one variable correlates with relative
distribution of another variable
• Warning:
– No proof of causality
– Cannot assume x causes y

Scatterplots and correlation


• A scatter plot (or scatter diagram) is used to show
the relationship between two variables
– Scatter diagram plots pairs of bivariate observations (x, y)
on the X-Y plane
– Y is called the dependent variable
– X is called an independent variable
• Correlation analysis is used to measure strength of
the association (linear relationship) between two
variables
– Only concerned with strength of the
relationship
– No causal effect is implied

13
2021/08/25

Types of correlation
• Positive correlation
– High values of X tend to be associated with high values of Y.
– As X increases, Y increases
• Negative correlation
– High values of X tend to be associated with low values of Y.
– As X increases, Y decreases
• No correlation
• No consistent tendency for values on Y to increase or
decrease as X increases

14
2021/08/25

15
2021/08/25

Applications

Individual vs Group (Neighbourhood)

16
2021/08/25

What type of relationship?


Scatterplot:Video Games and Alcohol Consumption

20
Average Number of Alcoholic Drinks

18
16
14
Per Week

12
10
8
6
4
2
0
0 5 10 15 20 25
Average Hours of Video Games Per Week

What type of relationship?


Scatterplot: Video Games and Test Score

100
90
80
70
Exam Score

60
50
40
30
20
10
0
0 5 10 15 20
Average Hours of Video Games Per Week

17
2021/08/25

Each point represents something or


some PLACE!!

18
2021/08/25

19
2021/08/25

Practical 1

Date: Thursday 26th August 1130-1430 (Posted on Thursday)

Location: Remotely or on-campus (Brown & Orange & Red IT labs)

Assistance: Thursdays 1130-1420 and Thursdays 14:00-16:00 by


appointment via Doodle

Due: Thursday 9th September at 1130 (upload on ClickUp)

Task: Sampling exercise and gaining familiarity with GeoDa and


ArcPro

Software: Excel, GeoDa and ArcPro

20

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy