Descriptive Statistics
Descriptive Statistics
DESCRIPTIVE STATISTICS
Variables: definition and operativization
• Variable: Concept (a characteristic of an object or a
participant) which can vary between at least two
values.
• Conceptual and operative definition
• Types:
• Methodological criterion:
• Explicative variables: independent and dependent; predictor and
criterion
• Confounding variables
• Statistical criterion:
• Qualitative and (continuous and discrete) quantitative variables
Variables: definition and operativization
• Measurement types and measurement scales
X1 / X2 = X3 / X4
Ratio Existence of a real zero (b·X) where b > 0 Reaction time
X1 / X2 < X3 / X4
Participants: sampling techniques
Individual
Characteristics of the sample:
• It is part of the universe
• Amplitude proportional to the universe
• No distortion in the selection
Sample Population • Representative of the universe
statistic parameter
EM = θ − θˆ
Characteristics of the sampling units:
• Defined so that their identification is unequivocal
• No overlapping among them
• Each unit is assigned a probably of being selected
• All units should belong to the population object of the study.
Participants:
sampling techniques
Probabilistic:
Simple random sampling
Systematic random sampling
Stratified random sampling
Cluster sampling
Nonprobabilistic:
Accidental sampling
Intentional sampling
Empirical:
By quotas
DESCRIPTIVE STATISTICS
CATEGORICAL VARIABLES
What are qualitative data?
País ni fi Pi
Reino Unido 15 .375 37.5
Rusia 14 .350 35.0
Letonia 11 .275 27.5
TOTAL 40 1.00 100.0
“Concerns” variable
ni : absolute frequency
Ni: accumulated frequency
fi = ni/N : relative frequency
Fi: accumulated relative frequency
pi = (ni/N): individual proportion
Pi = (ni/N) x 100 : individual percentage
Graphical representations
• Based on the frequency tables.
Pais
Letonia
100%
Reino Unido
40
Frecuencias Absolutas
Porcentaje Acumulado
75%
30
Rusia
Pie chart
50%
20
Pais
14
25%
12
10
10
Frequency
0%
0
6
Pareto Diagram
2
0
Pais
Bar plot
Other indices
P25 P75
D1 D2 D3 D4 D5 D6 D7 D8 D9
Types of quantiles
O1 O2 O3 O4 O5 O6 O7 (median, deciles,
quartiles, etc.)
Q1 Q2 Q3
Md
Position or location measures
INTERPOLATION TECHNIQUE
Determining the value from the estimation of the
position and specification of the Xi value indicating the
percentile, decile or quartile of interest.
Exact calculation of percentiles (Optional)
• Mid-range:
(min + max)
MidR =
2
• Average of quartiles (midhinge):
(Q1 + Q3 )
Q=
• Trimean: 2
(Q1 + 2Q2 + Q3 ) Q + Md
=TM =
4 2
Measures of scatter: Absolute
IQR Q3 − Q1
• Centile deviations: =
QD =
2 2
Quartile coefficient of variation (CVQ):
CVQ =
(P75 − P25 ) 2 = P75 − P25
(P75 + P25 ) 2 P75 + P25
Measures of shape
• Skewness indicators
Yule’s index
Q3 + Q1 − 2 ⋅ Md
H1 = H<0 Negative skew
2 ⋅ Md
H =0 Symmetry
Kelly’s index
H >0 Positive skew
P10 + P90 − 2 ⋅ Md
H3 =
2 ⋅ Md
Measures of shape
• Kurtosis indicators
• K2 index: compares the central 80% scatter with the
central 50% scatter.
Arithmetic mean
The center of gravity or the balance between
all values.
Parameter Statistic
=i N=i n
∑x i ∑x
i
=µ =
=i 1 =i 1
x
N n
Central tendency measures
Generalized means
i =n
a) Quadratic mean Q = 1 ∑ xi2
n i =1
b) Harmonic mean i =n
1
Useful for distributions which have
been transformed using the inverse ∑
for making them symmetric. 1 = i =1 xi
H n
i =n
c) Geometric mean
∑ log x i
log G = i =1
n
Resistant central tendency measures
Mode (Mo) – Most frequent value. Mind continuous variables and multimodal
distributions!
Median (Md) – Value that divides the distribution in to two halves of equal size
Trimean: Q1 + 2 · Q2 + Q3
Trimean =
4
Quartiles mean: Q1 + Q3
Q=
2
Interquartile mean:
xi( p 25+1) + ... + xi( p 75 )
Mid =
ni
∑x i −x
Absolute average deviation: DM = i =1
n
Variance:
i N=i n i =n
∑ (x
i− µ)
2
∑ 2
( xi − x ) ∑ (x i − x )2
σ2 =
i 1=
; sn2 i 1 σˆ2 ≡ s2 =
i =1
N n n −1
Standard deviation:
=i N=i n
− µ)
i
2
∑ ∑ (x
( xi − x )2
=σ =
=i 1=i 1
; sn
N n
Coefficient of variation:
s s
=CV = ; CV 100
X X
Resistance (?) & measures of scatter
MAD = Md xi − Md
Measures of shape: skewness
( )
∑ xi − X / n
i =1
β1 =
3
i =n
2
( )
∑ xi − X / n
i =1
Measures of shape: skewness
Fisher’s coefficient:
i =n
( )
3
∑ xi − X /n
γ1 = i =1
3 /2
i =n 2
∑ ( )
xi − X / n
i =1
For all previous coefficients:
Index < 0 Negatively skewed
Index = 0 Symmetric
Index > 0 Positively skewed
Measures of shape: skewness
Symmetric
Mode Median
= = Mean
Negatively skewed
Mode ≥ Median ≥ Mean
*Not always!!!!
Shaped measures: kurtosis
i =n
( )
4
∑ xi − X /n
γ2 i =1 −3
4 /2
i =n 2
∑ ( )
xi − X / n
i =1
Measures of shape: kurtosis
Fisher’s γ2 : interpretation
Other measures of shape
20
Frecuencias absolutas
15
10
10
5
5
0
0 5 10 15 20 25
Intervalos
BIVARIATE AND MULTIVARIATE
DESCRIPTIVE STATISTICS
TWO CATEGORICAL VARIABLES
Description of two qualitative variables
Var. A
Total
1 2 ... J
1 fo11 fo12 ... fo1J fo1.
Var. B 2 fo21 fo22 ... fo2J fo2.
... ... ... ... ... ...
I foI1 foI2 ... FoIJ FoI.
Total fo.1 fo.2 fo.K fo..
Graphic representations:
Grafico de mosaico para Pais y Es
Barras agrupadas para Pais y Espe
Letonia Reino Unido Rusia
Pais
Letonia
0.25
Reino Unido
En formación
Rusia
Frecuencias Relativas
0.20
Especialidad
0.15
0.10
Especialista
0.05
0.00
Pais
Bar chart
En formación Especialista
Especialidad
Mosaic chart
Some statistical indices
Association index: chi-square of Pearson
(
i = k j =l fo − fe )2
χ o = ∑∑
2 ij ij
i =1 j =1 fe
ij
Chuprov: T =
2
(k − 1) ⋅ (l − 1)
χ2
Pearson’s Phi: φ=
n
Cramér’s V:
χ2
V= ; where q = min( I , J )
n ⋅ (q − 1)
ONE QUALITATIVA AND ONE
QUANTITATIVE VARIABLES
A qualitative dichotomous & a quantitative variable
X group1 − X group 2 S 2
+S 2
Sintra 2
X control − X treatment
∆=
Scontrol
rpoint biserial =
( Y control − Y treatment ) pq
S quantitative
A qualitative polytomous & a quantitative variable
SCfactor SCfactor
=η2 = , where
SCtotal SCfactor + SCerror
a n
=
SCtotal ∑∑ (Y
=j 1 =i 1
ij − Y )2
a
=
SCfactor ∑ n(Y
j =1
j − Y )2
a n
=
SCerror ∑∑ ij
(Y − Y
=j 1 =i 1
j ) 2
Especialidad = En formación
20
frequency
8
6
4
2
Nº.Clientes
15
0
5 10 15 20
Nº.Clientes
10
Especialidad = Especialista
frequency
5
6
4
2
0
En formación Especialista
5 10 15 20
Especialidad
Nº.Clientes
TWO QUANTITATIVE VARIABLES
Covariance and Pearson’s linear correlation
coefficient
Main:
Establish if there is a relationship
between two quantitative variables
measured in interval or ratio scale.
Linear relationship between two quantitative variables
Index
i =n
∑ (x − x ) · ( yi − y ) S xy
rxy =
i
SPxy
S xy = i =1
=
n −1 n −1 Sx · S y
Covariance Pearson’s linear correlation coefficient
Coefficient of determination: r2
Others Correlation Coefficients
39
20
15
Nº.Clientes
10
10
5
30 40 50 60
Edad
Spearman Correlation Coefficient
To study the relationship between two variables measured in ordinal
scales, or for variables measured in interval or ratio scales that doesn’t
fit the normal distribution.
Is the most used of the rank correlation methods and the most
appropriate when n is between 25 & 30.
6 ∑ d i2
rs = 1 −
(n + 1) n (n − 1)
where di are the differences between the ordered values
Their value oscillate between -1 i +1.
Other formulas are:
n
6 ∑d 2 6 ∑ ( xi − y i ) 2
rs = 1 −
i
rs = 1 − i =1
n (n 2 − 1) n3 − 1
Point-Biserial Correlation Coefficient
x p − xq
rbp = p ·q
sx
Where p and q are the proportion of subjects that are in each category of the
qualitative variable, and Sx is the standard deviation of the continuous
quantitative variable without differentiate for the categories of the qualitative
variable.
Biserial Correlation Coefficient
x p − xq pq x p − xx p
rb = · rb = ·
sx y sx y
where “y” is the ordinate of the normal distribution that separates the two
categories of the dichotomized variable.
Phi Coefficient
αδ −βγ ad −bc
ϕ= ϕ=
p q p' q' (a + b) (c + d ) (a + c) (b + d )
where α, β, γ and δ are the proportions in the contingency table, and p, q, p’ and q’
are the marginal proportions row and column.