0% found this document useful (0 votes)
49 views98 pages

Descriptive Statistics Alp2019

This document discusses descriptive statistics and provides information about key concepts. It defines statistics as the study of collecting, analyzing, interpreting, presenting, and organizing data [1]. The purposes of data collection and analysis are described as explanation, prediction, and control [2]. Different data types and levels of measurement are outlined, including categorical nominal and ordinal scales as well as continuous interval and ratio scales [3]. Objectives of learning to use statistical computing packages and understand core statistical concepts like modeling, distributions, and inferences are also mentioned [4].
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views98 pages

Descriptive Statistics Alp2019

This document discusses descriptive statistics and provides information about key concepts. It defines statistics as the study of collecting, analyzing, interpreting, presenting, and organizing data [1]. The purposes of data collection and analysis are described as explanation, prediction, and control [2]. Different data types and levels of measurement are outlined, including categorical nominal and ordinal scales as well as continuous interval and ratio scales [3]. Objectives of learning to use statistical computing packages and understand core statistical concepts like modeling, distributions, and inferences are also mentioned [4].
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 98

DESCRIPTIVE STATISTICS

Agni Laili Perdani, MS


Department of Pediatic Nursing- STIKEP PPNI JAWA BARAT
Outline
•What is statistics?
•Statistics is the study of the collection, analysis,
interpretation, presentation, and organization of
data.
•Why is statistics needed?
•An investigative cycle in problem solving

10/16/2019 2
What is statistics?
• Collection of data
(1) Explanatory, predictors, covariates, independent variables;
(2) response, outcome, dependent variables; (3) intermediate variables.
• Management of data
Coding, editing, organization and storage.
• Analysis of data
Statistical modelling and applying statistical techniques.
• Interpretation of data
Statistical significance and/or clinical significance.
• Communication of data
Using numbers, tables and graphs to explain the data with reference to
context knowledge.
10/16/2019 3
The purposes of data collection and analyses
• Explanation
To describe and explain the relationship between the explanatory
variables and the outcome. It may be causal or non causal.
• Prediction
To predict the outcome from the predictor variables using a
prediction model.
• Control
To manipulate input variables (such as treatment) and observe the
output variables (such as response).

10/16/2019 4
Objectives
• To learn how to utilize statistical computing packages to analyze data
• The rationale is that nowadays statistical computing packages are readily
available. Some commonly used statistical methods are more accessible and
been modularized.
• To understand the core statistical concepts behind every data analysis
• Namely, statistical modelling, sampling distributions of statistics, and
statistical inferences, estimation and hypothesis testing.

10/16/2019 5
Characteristics of a defined population
Who
Male/female, white/black, children/adolescent/adult,…
When
Calendar year, year of birth, …
Where
Country, city, school, hospital, …
What
Socioeconomic position, educational level,…
Disorders and diseases of clinical populations
Data type and level of measurement
• Data type
• Categorical data such as gender, blood type, attitude, opinion and etc.
• Metric data such as number of children, number of accidents, weight, height,
temperature, IQ and etc.
• Level of measurement
• 1. Categorical
• Nominal: naming or labelling such as sex (categorical data)
• Ordinal: naming plus ordering but can not measure the distance such as severity,
Likert scale, semantic differential scale, rating scale
2. Continous data
• Interval: A scale that had an unit but the scale does not have an absolute zero
such as temperature and IQ. For example, you know 50 degrees and 25 degrees
differ by 25 degrees but you can not say 50 degrees is twice as much as 25
degrees.
10/16/2019
• Ratio: a scale that has an unit and an absolute zero such as weight and height. 7
Level of Measurement (Categorical)
1. Nominal Scales: naming or labelling such as sex, ethnicity,
religion, marital status, region, atc. There is no order or
categories, does not represent any kind of meaningful
order. (Example : Gender = Male: 0. Female:1)
2. Ordinal Scales: naming plus ordering, meaningful
numerical order but can not measure the distance
(Example : disease severity, Likert scale, semantic
differential scale, rating scale) Health status 1=Excellent,
2=Good, 3=Fair, 4=Poor, 5=Very Poor
Level of Measurement (Continous)
1. Interval Scales: Meaningful numerical order, meangingful
interval between values but the scale does not have an
absolute zero such as temperature, IQ, SAT exam, the Gre, etc.
You know 50 degrees and 25 degrees differ by 25 degrees but
you can not say 50 degrees is twice as much as 25 degrees.
2. Ratio Scales: Meaningful numerical order, meaningful interval
between, has absolute zero, not arbitriary but determined by
nature (Examples: weight (not from -5kg but from 0kg), height,
blood pressure, pulse rate, age, income, number of children)
General Description
Tittle:
Predictors of Postpartum Depression among Rural Women in Minia, Egypt: An
Epidemiological Study.

Purpose:
To determine the prevalence of postpartum depression (PPD) in a certain
rural area in Upper Egypt, and to determine the risk factors of PPD.

10
Key Variable
• Dependent Variable
• Postpartum depression (PPD) is a form of clinical depression that can affect women
and, less frequently, men after child-birth, in this research postpartum depression
women is a married woman who encompasses several mood disorders that follow
childbirth. Postpartum depression in this research is assessed with EPDS (Edinburgh
Postpartum Depression Screening) with four-point scale (0-3).

• No mention about scale range description in this • It will be better if they mention
research. about description of scale range like
• No write how they make the cut point of 0 for disagree and 3 for completely
postpartum depression category. agree.

11
Key Variable
Independent Variable
Variable
Demographic Age, woman’s and husband’s education and occupation, total household income.
Data
Data related to Type of delivery, assistance of delivery, personnel attending the delivery, place of
delivery delivery, complication after delivery, method of contraception.
Data related to Rank of birth, age, sex, weight at birth, breast feeding, sleeping habits.
child
Data related to Parity, pregnancy weight gain, previous diagnosis of depression, financial problem
pregnancy after delivery, compilations after delivery, support of family and friends after delivery,
support of husband after delivery, victim of domestic violence.
Previous history of depression

12
Methods
 Cross-sectional study with community based approach.
 Study was conducted in El-Burgaia village, 5 km north to El-Minia
city over a period of three months, between December 1st 2011 and
February 29th 2012.
 Sample selected using systematic random sampling technique for
women who had given birth within 14 months, sample size are 200
women.
 Descriptive analysis is used for describing demographic data, data
related to delivery, data related to child, data related to pregnancy,
and previous history of depression and presented as mean and
standard deviation.
13
Data Type

Categorical Data

14
Level of Measurement

Pick one!

Ordinal

Nominal

15
Graphical Displays of Data
Strength
Construct a pie chart no more
than six sectors.
 Use percentage corresponding
than absolute frequency.
Drawbacks
 Use 3D key-shading.
Better to use 2D shading patterns,
so the patterns of pie chart does
not detract the meaning of the pie
chart itself (Wallgren et al, 1996).

16
Tittle and source:
Nurses’ Knowledge about Palliative Care.
Journal of Hospice & Palliative Nursing, 16(1), 23-30.

Purpose:
The aim of the study is to evaluate palliative care
knowledge among nurses in Jordan.

17
Key variable
Dependent variable
• Palliative care knowledge is a cognitive understanding
toward palliative care including philosophy and
principles of palliative care; managing of pain and other
symptoms; and psychosocial and spiritual care to
individuals and families. It is measured by Palliative Care
Quiz for Nurses (PCQN) (M Ross, McDonald, & McGuinness, 1996).

18
Key variable
Independent variables
Demographic characteristics

gender educational level


Professional characteristics

units of working years of clinical experience


palliative care education

19
Research methods
A quantitative descriptive cross-sectional survey design with
convenience sampling.

20 items of Correct 1 point


5 government 190 nurses Wrong/don’t
PCQN
hospitals know 0 point

Validity & reliability of PCQNinternal consistency: Kuder-


Richardson (KR-20) 0.78).
Higher score indicates a better level of knowledge (M Ross,
McDonald, & McGuinness, 1996).
20
Data type

Categorical
21
data
Any
Level of measurement comment?

Nominal data Ordinal data 22


Uniqueness

Gender
Some studies reported that
currently few men decide be
nurse because nursing
perceived as feminine
profession (Al-Zein & Al-Khawaldeh, 2015;
Ashkenazi, Livshiz-Riven, Romem, & Grinstein-Cohen,
.
2016)

In Jordan  18.874 nurses &


midwives
53% are males (Jordan Nurses &
Midwives Council, 2009).

23
Suggestion
Regarding to the
The author
original better
article of
visualize
PCQN the number
explained that
of sample
higher scorein indicates
each
ahospitals intoof
better level bar
chart or pie(Ross,
knowledge chart with&
McDonald,

different1996)
McGuinness, .
colour
The author
clear andbetter
attractive.
show the total mean
score
Chart of PCQN
easier tointo
read
table  valuable in
and more
understandable 
mind (Plichta & Kelvin, 2013).
determine palliative
care knowledge
among nurses.
24
General Description
1. Title
Cross-sectional study of patients with type 2
diabetes in OR Tambo district, South Africa

2. Purpose
To examines the sociodemographic and clinical
determinants of uncontrolled type 2 diabetes
mellitus (T2DM) in individuals attending primary
healthcare in OR Tambo district, South Africa
Key Variables
Independent Variable
Sociodemographic characteristics gender, type of residence, level of
education

Disease characteristic type 2 diabetes mellitus duration


Lifestyle habits smoking, excessive alcohol consumption
and physical activity
Body Mass Index, blood pressure, lipid
profile, creatinine level
Dependent Variable glycosylated haemoglobin (HbA1c) level
Research Method
This cross-sectional study design

Fifteen community health centres (primary care


centres) in OR Tambo district, South Africa, to
Mthatha General
Hospital

A total of 360 participants with inclusion criteria


age ≥30 years at diagnosis of DM and had been
on treatment for at least 1 year
Research Method
Data were recorded by personal interview and
abstraction, including sociodemographic
characteristics, lifestyle habits, smoking history,
participants’ consumption of alcohol.

duration of diabetes and current medications


was abstracted from medical record

Body weight and height, blood pressure, and


blood sample was obtained for lipid profile,
which includes total cholesterol, HDL-C LDL-C
and triglycerides, creatinine level, glycosylated
haemoglobin (HbA1c) level
Data Type

Categoric data

Metric continuous
Level Measurement

Nominal Scale

Ordinal Scale

Ratio Scale → Ordinal Scale


• Calculation of the
percentage is more
than 100%.
• Inappropriate symbol
Data Type

Metric Continuous
Data
Level of Measurement

Ratio Scale
The mean is most appropriately used to describe ratio data (Plichta, Kelvin, &
Munro, 2013, p. 40).

Number of participants considered as uncontrolled T2DM and controlled T2DM?


Practice! What type of level measurement?
1. Gender :
2. Temperature in celcius
3. Weight in pounds
4. Weight in kilograms
5. Age in years
6. Age in categories (0-6 months, 7-12 months, 13+ months)
7. Blood type
8. Ethnic identity
Practice! What type of level measurement?
1. Gender :
2. Temperature in celcius interval
3. Weight in pounds
4. Weight in kilograms
5. Age in years
6. Age in categories (0-6 months, 7-12 months, 13+ months) ordinal
7. Blood type
8. Ethnic identity
Practice! What type of level measurement?
1. Number of years spent in childhood
2. Highest educational degree obtained
3. Satisfaction with nursing care received (scale 0-10) ordinal
4. Religion
5. IQ Score interval
6. Smoking status
7. Birth order ordinal
8. Marital status
9. Number of children
10. Score on satisfaction scale Likert scaleordinal
11. Annual income
Organizing Metric Data

2019/10/16 38
Organizing numerical data
Numerical Data 41, 24, 32, 26, 27, 27, 30, 24, 38, 21

Frequency Distributions
Ordered Array
Cumulative Distributions
21, 24, 24, 26, 27, 27, 30, 32, 38, 41

Stem and Leaf 2 144677 Ogive


Histograms
Display 3 028
4 1
Tables Polygons
2019/10/16 39
Ordered array: what and why
• The ordered array is a sorted data series from the smallest value to
the largest value.
• The ordered can show range (min to max).
• The array can provide some signals about variability within the range.
(from uniform to peaking within the range).
• The array may help identify outliers (unusual observations).

2019/10/16 40
The stem-and-leaf plot: what and why
• Data in raw form (as collected):
24, 26, 24, 21, 27, 27, 30, 41, 32, 38
• Data in ordered array from smallest to largest:
21, 24, 24, 26, 27, 27, 30, 32, 38, 41
• Stem-and-leaf display (what is it?):
Separate the sorted data 2 144677
Series into leading digits 3 028
4 1
(stems) and the trailing
digits (leaves)
2019/10/16 41
The stem-and-leaf plot: how
• Choose the leading (10’s) digits as the ‘stem’ units.
• The remaining trailing digits are the leaves.
• Complete the stem-and-leaf plot

Stem leaves
2 144677
3 028
4 1
2019/10/16 42
The frequency distribution: what and why
• What is a frequency distribution?
• A frequency distribution is a list or a table
containing the values of a variable and the corresponding
frequencies with which each value occurs.
• Why use the frequency distribution?
• It is a way to summarize data.
• The distribution condenses the raw data into a more useful
form and allows a quick visual interpretation of data.

2019/10/16 43
The frequency distribution: how
• Sort raw data in ascending order:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58.

• Find range: 58 - 12 = 46.


• Select number of classes: 5 (usually between 5 and 15).
• Compute class interval (width): 10 (46/5 then round up).
• Determine class boundaries (limits): 10, 20, 30, 40, 50, 60.
• Compute class midpoints: 15, 25, 35, 45, 55.

• Count observations & assign to classes.

2019/10/16 44
Frequency distributions, relative Frequency distributions and
percentage distributions
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Relative
Class Frequency Frequency Percentage
10 but under 20 3 .15 15
20 but under 30 6 .30 30
30 but under 40 5 .25 25
40 but under 50 4 .20 20
50 but under 60 2 .10 10
Total 20 1 100
2019/10/16 Chap 2-45
How many class intervals?
• There is more to be said about the widths of the class intervals,
sometimes called bin widths. Your choice of bin width determines the
number of class intervals. This decision, along with the choice of
starting point for the first interval, affects the shape of the histogram.

2019/10/16 46
2019/10/16 47
How many class intervals? contd
• The best advice is to experiment with different choices of width, and
to choose a histogram according to how well it communicates the
shape of the distribution.

2019/10/16 48
How many class intervals? Cont’d
• Sturges' rule is to set the number of intervals as close as possible to 1
+ Log2(N), where Log2(N) is the base 2 log of the number of
observations. The formula can also be written as 1 + 3.3 Log10(N),
where Log10(N) is the log base 10 of the number of observations.
According to Sturges' rule, 1000 observations would be graphed with
11 class intervals since 10 is the closest integer to Log2(1000). We
prefer the Rice rule, which is to set the number of intervals to twice
the cube root of the number of observations.

2019/10/16 49
General guideline
• Number of data points
• Number of classes
• Under 50
• 5-7
• 50-100
• 6-10
• 100-250
• 7-12
• Over 500
• 10-20

2019/10/16 50
The histogram: what and why
• What is it?
• A histogram is a bar graph of raw data that creates a
picture of the data distribution.
• Why use the histogram?
• The need to visualize the central location, spread, and
shape of the data.

2019/10/16 51
The histogram: what
• The classes or intervals are shown on the horizontal axis.
• Frequency (or relative frequency) is measured on the vertical axis.
• Bars of appropriate heights can be used to represent the number of
observations within each class.
• Such a graph is called a histogram.

2019/10/16 52
The histogram: create the graph
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Histogram

7 6
Frequency 6 5
5 4 No Gaps
4 3
3 2
Between
2 Bars
1 0 0
0
5 15 25 36 45 55 More

Class Boundaries
2019/10/16 Class Midpoints 53
The frequency polygon: what
• It is a line graph that shows the distribution of numerical data.
• The horizontal axis is the variable.
• The vertical axis is the frequency.
• The line connects points of (midpoints of the class intervals, frequency).
• Tie down to the midpoints of the classes with zero frequency.

2019/10/16 54
The frequency polygon: how
• Determine the frequency distribution.
• Complete the frequency polygon.

2019/10/16 55
The frequency polygon: create the graph
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Frequenc y

3
Tie down to the
midpoints of
2
the classes with
1
zero frequency
0

5 15 25 36 45 55 M ore

2019/10/16 Class Midpoints 56


Cumulative frequency distribution: what and
why
• It is a distribution that shows the cumulative frequency up to a data
value.
• The cumulative frequency can give us an idea of cumulative
proportion up to a given data value such as

2019/10/16 57
Cumulative frequency: table
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Cumulative Cumulative
Class Frequency % Frequency
10 but under 20 3 15
20 but under 30 9 45
30 but under 40 14 70
40 but under 50 18 90
50 but under 60 20 100

2019/10/16 58
Cumulative frequency: create the ogive
• The ogive is a line graph, where we plot the values of a variable on
the horizontal axis and the cumulative frequency on the vertical axis.
If we plot the cumulative relative frequency on the vertical axis, then
the line graph is called the relative frequency ogive.

2019/10/16 59
The Ogive (Cumulative % Polygon)
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Ogive

100

80
60
40
20

0
10 20 30 40 50 60

Class Boundaries (Not Midpoints)


2019/10/16 60
Central tendency
Mode, Mean, and Median

2019/10/16 61
Central tendency
• What: It is a typical value that represents the distribution of data.
• Measures: Commonly used measures of central tendency are mode,
arithmetic mean and median.
• Mode is simply the commonest occurrence in the data.
• Arithmetic mean is simply the sum of the numbers divided by the number of
observations (n).
• Median is simply the middle of the dataset, defined as the point below which
half the data points lie, and above which half the data lie.

2019/10/16 62
The mode
• This is simply the commonest occurrence in the data. Most real
datasets don’t have a mode, as all values are different.
• As such, the Mode is easily the least useful technique for data
description
• Appropriate measure of tendency for variables at all levels : nominal,
ordinal, interval, ratio

2019/10/16 63
What is the arithmetic mean?
• The arithmetic mean is the most common
measure of central tendency. It is simply the
sum of the numbers divided by the number of
observations.
• The ‘Mean’ is the name given by statisticians to
what everyone else calls the ‘average’.
• Easy to calculate: add up the numbers and
divide by n

2019/10/16 64
Median
• This is the middle of the dataset, defined as the point below which
half the data points lie, and above which half the data lie.
• Arrange the data in increasing order:
• If the number of observations is odd, the median is the
observation exactly in the middle of the ordered list. The
position of the median is calculated as (n+1)/2.
• If the number of observations is even, the median is the mean
(or average) of two middle values. The positions of these two
values are calculated as n/2

2019/10/16 65
Median, cont’d
• The median is an under-rated tool, often preferable to the more widely used
mean, because it gives a sensible answer whatever the shape of data distribution
• It is a special case of a more general descriptive technique known as centiles.
• The median is the 50th centile of a dataset, meaning that 50% of the data points
lie below it.

2019/10/16 66
Mode, mean and median
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

• Mode: 24, and 27.


• Mean=(12+13+…+53+58)/20=32.4.
• Median=(30+32)/2=31.

2019/10/16 67
Mean Versus Median

2019/10/16 68
Mean versus Median
• Mean gives equal weights to every data values when averaging them
whereas median put a weight of one to the middle value and zeros to other
data values.
• Mean is influenced by the presence of extreme values, either too small or
too large whereas the median is resistant to the presence of extreme
values.
• Mean and median are equal when you have a symmetric distribution.
• Mean and median are unequal when you have an asymmetric distribution.
Mean is larger than median when the distribution has a long tail to the
right. Mean is smaller than median when the distribution has a long tail to
the left.
• When you have an symmetric distribution, mean may not be an
appropriate representation of central tendency.

2019/10/16 69
Number of observations
A symmetrical distribution

Size of value
Mean and median
about the same

Mean An asymmetrical distribution.


Note that the mean is
misleading here

Size of value
Median
2019/10/16 70
Organizing Categorical Data

71 2019/10/16
Tabulating and graphing categorical data
Categorical Data

Graphing Data
Tabulating Data
The Summary Table
Pie Charts

Bar Charts Pareto Diagram


Dispersion

Range, Standard Deviation, Variance, Coeff Variance

73 2019/10/16
The spread, scatter and variation

Mean absolute difference, Variance, Standard deviation,


and coefficient of variation

74 2019/10/16
Spread
 What: It is the variation of data.
 Measures: Commonly used measures of spread are standard
deviation (SD), variance, range, interquartile range (IQR), coefficient
of variation (CV).
 Variance is simply the sum of squared deviation from mean divided by n
or n-1.
 Standard deviation is simply the square root of variance.
 Range is simply the maximum data value minus the smallest data value.
 Interquartile range is simply third quartile (75th percentile) minus the first
quartile (25th percentile).
 Coefficient of variation is simply the standard deviation divided by mean ,
multiplied by 100%.

75 2019/10/16
Number of observations

Two data sets.


Distribution A In which one are you more
likely to guess the next
value correctly? B
Size of value

Number of observations

Distribution B

Size of value
76 2019/10/16
Variance
 Having got the Sum of Squares
 Variance is the mean value of SS (what)
 Variance = SS/n
 (an alternative formula also used:
 Variance = SS / (n-1)
 This estimates the variance of the whole population,
while /n gives variance just for the sample taken.
(How)
 Geographers tend to prefer
 Variance = SS/n
 Biologists tend to prefer
 Variance = SS/(n-1)

77 2019/10/16
Standard deviation
 Is the square root of variance (what)
 Because there are 2 ways to calculate variance, there are 2 s.d.s
 Sd = (SS/n)1/2. This is labelled σ on many calculators
or

Sd = (SS/(n-1))1/2. This is labelled s on many calculators

78 2019/10/16
SD
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

 Standard deviation
=((12-32.4)**2+(13-32.4)**2+…+(53-32.4)**2+(58-32.4)**2)/(20-1).
=12.67.
 Variance
=12.67**2
=160.57.

79 2019/10/16
What is the coefficient variation (C.V.)
 It is a measure of variation not dependent on units of
measurements and can be used for comparisons of the
variations of measures.
 It is a standardized measure of the spread of the distribution.
 Definition: it is the standard deviation divided by mean*100%.

80 2019/10/16
81 2019/10/16
 Harga 5 mobil bekas masing-masing Rp 4 juta, Rp 4,5 Jt, Rp 5 jt,
Rp 4,750 Jt dan Rp 4,250 Jt dan harga 5 ekor ayam masing
masing Rp 600, Rp 800, Rp 900, Rp 550 dan Rp 1.000. • Hitung
simpangan baku harga mobil (SD ) dan simpangan baku harga
ayam ( SD ) dan mana yang lebih bervariasi (heterogen), harga
mobil atau harga ayam ?

82 2019/10/16
1. Mencari SD Mobil dan SD Ayam
2. Mencari Mean Mobil dan Mean Ayam
 Mean Mobil = 1/5 (Rp. 4.000.000 + 4.500.000 + ………….. + 4.250.000) =
Rp. 4.500.000
 Mean Ayam= 1/5 (Rp. 600 + 800 + ……………….. + 1.000) = Rp. 770
Mencari SD Mobil dan Ayam
 SD Mobil = Rp. 353.550
 SD Ayam = Rp. 172,05
Mencari CV Mobil dan Ayam
 CV mobil = 353.550 / 4.500.000 x 100% = 7,86%
 CV ayam = 172,05 / 770 x 100% = 22,34%
 Simpulan : karena CV ayam > CV mobil, maka harga ayam lebih
bervariasi (heterogen) dibandingkan harga mobil
83 2019/10/16
Centiles

84 2019/10/16
Centile
 What: Centile is the location of a data value in an ordered array.
 Measures: Commonly used centiles are quartiles and deciles.
 Quartiles are 25th- 50th- (i.e., median) and 75th- percentiles.
 Deciles are 10th-, 20th-, …, and 90th- percentiles.

85 2019/10/16
What is the percentile?

f(x)
F(x)
c.d.f.

X=x X=x
(P*100%)th (P*100%)th
percentile percentile

86 2019/10/16
What
 Percentiles
 Percentiles divide the data into 100 equal parts, 1/100, 2/100,….,
100/100.
 Quartiles and median
 Quartiles divide the data into 4 equal parts,
 1st quartile divides the bottom 25% from the top 75%.
 2nd quartile divides the bottom 50% from the top 50% also as median.
 3rd quartile divides the bottom 75% from the top 25%.

87 2019/10/16
What are the deciles?
 Example:Income

0.1 0.1
X=x
10% 90th
percentile percentile

88 2019/10/16
What are the quartiles ?
 Example:Body Mass Index (BMI)

0.25 0.25

0.25 0.25
X=x
25th 50th 75th
(median)

89 2019/10/16
How to locate the 1st quartile (25th
percentile) and 3rd quartile (75th
percentile)
 1st quartile: find the median of the lower half of
the ordered list.
 3rd quartile: find the median of the upper half of
the ordered list.

90 2019/10/16
25th, 50th and 75th percentiles
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

 25th percentile=1st quartile (Q1)


=(24+24)/2=24.
 50th percentile=2nd quartile (Q2)=median
=(30+32)/2=31.
 75th percentile=3rd quartile
=(41+43)/2=42.

91 2019/10/16
Five-number summary
 Min.,1st quartile, median, 3rd quartile, and max.
#n
M Median=31
F 1st quartile=24 3rd quartile=42
Min.=12 Max.=58

92 2019/10/16
What is the box-and-whisker plot
 A box-and-whisker plot is a graphical display that involves a five-
number summary of a distribution of values, consisting of minimum
value, the lower quartile, the median, the upper quartile and the
maximum value.

93 2019/10/16
Box-and-Whisker plot
100
Highest value

 Five number summary 75th centile


 Minimum, first quartile
(25th percentile), median
median (second 50
quartile; 50th
percentile), third
25th centile
quartile (75th
percentile) , and Lowest value
maximum.
0

94 2019/10/16
Interquartile range (IQR)
 IQR=3rd quartile-1st quartile. This actually the range of the middle
50% of the data.
 IQR=42-24=18.

95 2019/10/16
Why use the box-and-whisker plot
 It can be used to visualize the quartiles, min, max, and outliers.
 It can be used to compute IQR.
 It can be used to visualize whether the distribution is symmetric or
asymmetric.
 It can be used to visualize the differences in distribution between
groups.
 It can be used to identify outliers.

96 2019/10/16
Schematic box-and-whisker plot

97 2019/10/16
Thank you for your attention

98 2019/10/16

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy