0% found this document useful (0 votes)
20 views57 pages

Descriptive Statistics

The document discusses initial concepts in descriptive statistics, focusing on the definition and operativization of variables, their types, and measurement scales. It covers sampling techniques, both probabilistic and nonprobabilistic, as well as qualitative and quantitative data analysis methods. Additionally, it includes measures of central tendency, scatter, and graphical representations for both categorical and quantitative variables.

Uploaded by

dessire.abzolv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views57 pages

Descriptive Statistics

The document discusses initial concepts in descriptive statistics, focusing on the definition and operativization of variables, their types, and measurement scales. It covers sampling techniques, both probabilistic and nonprobabilistic, as well as qualitative and quantitative data analysis methods. Additionally, it includes measures of central tendency, scatter, and graphical representations for both categorical and quantitative variables.

Uploaded by

dessire.abzolv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

INITIAL CONCEPTS

DESCRIPTIVE STATISTICS
Variables: definition and operativization
• Variable: Concept (a characteristic of an object or a
participant) which can vary between at least two
values.
• Conceptual and operative definition
• Types:
• Methodological criterion:
• Explicative variables: independent and dependent; predictor and
criterion
• Confounding variables
• Statistical criterion:
• Qualitative and (continuous and discrete) quantitative variables
Variables: definition and operativization
• Measurement types and measurement scales

Classification according to Stevens

Basic empirical Information


Scale operations Transformations Example

Nominal Equality X1 = X2 X1 ≠ X2 Injective applications Gender


Larger or smaller
Ordinal X1 = X2 X1 < X2 Increasing functions Social class

Determine the equallity of


X1 - X2 = X3 - X4 Temperture
Interval the difference between a + (b·X) where b > 0
X1 - X2 < X3 - X4 measured in ºC
intervals

X1 / X2 = X3 / X4
Ratio Existence of a real zero (b·X) where b > 0 Reaction time
X1 / X2 < X3 / X4
Participants: sampling techniques
Individual
Characteristics of the sample:
• It is part of the universe
• Amplitude proportional to the universe
• No distortion in the selection
Sample Population • Representative of the universe

statistic parameter

EM = θ − θˆ
Characteristics of the sampling units:
• Defined so that their identification is unequivocal
• No overlapping among them
• Each unit is assigned a probably of being selected
• All units should belong to the population object of the study.
Participants:
sampling techniques
Probabilistic:
 Simple random sampling
 Systematic random sampling
 Stratified random sampling
 Cluster sampling
Nonprobabilistic:
 Accidental sampling
 Intentional sampling
Empirical:
 By quotas
DESCRIPTIVE STATISTICS
CATEGORICAL VARIABLES
What are qualitative data?

 Qualitative or categorical variables: measured in nominal


scale.
 Operations allowed: classification of the participants
according to the quality of interest – presence or absence
of a specific feature.
 Categories should be exhaustive and mutually exclusive.
 EXAMPLE: DATA BASE ART THERAPIST1

Pais Sexo Edad Especialidad Lugar de trabajo Nº Clientes SatisfacciónServicio SatisfacciónEspecialidad


Reino Unido Hombre 62 Especialista Servicio salud 15 4 7
Diccionario:
Reino Unido Hombre 58 Especialista Servicio salud 14 5 6
Reino Unido Hombre 26 En formación Servicio salud 5 6 5 País: país de residencia
Reino Unido Mujer 55 Especialista Servicio salud 16 4 7
Reino Unido Mujer 45 Especialista Privada 12 7 8 Edad: edad en años del terapeuta
Reino Unido Mujer 25 En formación Privada 2 8 7
Reino Unido Mujer 27 En formación Servicio social 4 8 9 Lugar de trabajo: tipo de servicio en el
Reino Unido Mujer 37 Especialista Ambito educativo 10 6 7 que trabaja
Reino Unido Mujer 58 Especialista Servicio salud 9 4 5
Reino Unido Mujer 63 Especialista Servicio salud 8 3 7 Nº Clientes: número de clientes que
Reino Unido Mujer 57 Especialista Servicio salud 9 5 6 tiene actualmente a la semana
Reino Unido Hombre 59 Especialista Servicio salud 15 6 6
Reino Unido Mujer 27 En formación Servicio social 3 8 9 SatisfacciónServicio: grado de
Reino Unido Mujer 39 Especialista Ambito educativo 14 6 7
Reino Unido Mujer 52 Especialista Servicio salud 15 3 7
satisfacción con el servicio en el que
Rusia Hombre 46 Especialista Privada 14 8 9 trabaja (1 a 10; 1 muy insatisfecho a 10
Rusia Mujer 29 En formación Ambito educativo 13 7 10
Rusia Mujer 58 Especialista Ambito educativo 15 7 8
muy satisfecho)
Rusia Mujer 27 En formación Ambito educativo 11 6 8 SatifacciónEspecialidad: grado de
Rusia Mujer 28 En formación Privada 3 9 9
Rusia Mujer 57 Especialista Ambito educativo 20 7 8 satisfacción con la especialidad en la
Rusia Hombre 61 Especialista Servicio salud 14 4 6 que trabaja (1 a 10; 1 muy insatisfecho a
Rusia Hombre 28 En formación Privada 4 7 7
Rusia Hombre 26 En formación Privada 5 9 10 10 muy satisfecho)
Rusia Mujer 31 En formación Privada 4 8 9
Rusia Mujer 45 Especialista Ambito educativo 21 7 6
Rusia Mujer 56 Especialista Ambito educativo 19 6 7
Rusia Mujer 31 En formación Ambito educativo 12 8 9
Rusia Mujer 45 Especialista Privada 15 8 8
Letonia Mujer 57 Especialista Ambito educativo 18 7 8 1 Simulated data base from the article:
Letonia Mujer 59 Especialista Ambito educativo 17 7 8
Letonia Mujer 61 Especialista Privada 12 8 8 Karkou, V., Martinsone, K., Nazarova, N, y Vaverniece,
Letonia Mujer 33 En formación Ambito educativo 10 7 9 I. (2011). Art therapy in the postmoder world:
Letonia Mujer 48 Especialista Ambito educativo 22 9 8 findings form a comparative study across the UK,
Letonia Mujer 26 En formación Privada 4 9 9 Russia and Latvia. The Arts in Psychotherapy, 38, 86-
Letonia Mujer 28 En formación Privada 3 8 10 95.
Letonia Hombre 45 En formación Ambito educativo 12 7 9
Letonia Hombre 37 En formación Ambito educativo 10 8 9
Letonia Hombre 45 Especialista Ambito educativo 23 7 9
Letonia Mujer 25 En formación Privada 3 8 9
Frequency table: Country

País ni fi Pi
Reino Unido 15 .375 37.5
Rusia 14 .350 35.0
Letonia 11 .275 27.5
TOTAL 40 1.00 100.0

“Concerns” variable
ni : absolute frequency
Ni: accumulated frequency
fi = ni/N : relative frequency
Fi: accumulated relative frequency
pi = (ni/N): individual proportion
Pi = (ni/N) x 100 : individual percentage
Graphical representations
• Based on the frequency tables.
Pais

Letonia

Diagrama de Pareto para Pais


Frequency

100%
Reino Unido

40
Frecuencias Absolutas

Porcentaje Acumulado
75%
30
Rusia

Pie chart

50%
20
Pais
14

25%
12

10
10
Frequency

0%
0
6

Reino Unido Rusia Letonia


4

Pareto Diagram
2
0

Letonia Reino Unido Rusia

Pais

Bar plot
Other indices

• Central tendency indices (typical or most representative value)


• Mode (Mo)
• Scatter measures & Variety measures (homogeneity vs
heterogeneity):
nMode
• Variation ratio (RV): VR = 1 −
N k
k −1
• Diversity index (D) or Blau’s index: D = 1 − ∑ pi2 ; 0 ≤ D ≤
i i =1 k
• Teachman’s index: H = − ∑ pi ⋅ ln pi ; 0 ≤ H ≤ ln k
i =1
D
• Index of qualitative variation (IQV): IQV =
(k − 1) k
• Frequency measure:
p n ( n + m) n
• Odds: Odds = 1 − p = 1 − (n (n + m)) = m
• Epidemiologic indices:
Number of individuals diagnosed
• Prevalence: P=
Total number of individuals of the population examined
• Incidence: AI =
Number of new cases during the period
N population in risk
R output
DESCRIPTIVE STATISTICS
QUANTITATIVE VARIABLES
(ORDINAL SCALES)
Index

Organizing the data: distributions

Indicators based on position: quantiles.

Measures of central tendency.

Measures of scatter: absolute & relative.

Measures of shape: skewness & kurtosis

Graphical representations: bar plot, histogram, boxplot

Identifying outliers: graphically & numerically


Measures based on position

P25 P75

P10 P20 P30 P40 P50 P60 P70 P80 P90

D1 D2 D3 D4 D5 D6 D7 D8 D9

Types of quantiles
O1 O2 O3 O4 O5 O6 O7 (median, deciles,
quartiles, etc.)
Q1 Q2 Q3

Md
Position or location measures

Approach calculation of position measures:

INTERPOLATION TECHNIQUE
Determining the value from the estimation of the
position and specification of the Xi value indicating the
percentile, decile or quartile of interest.
Exact calculation of percentiles (Optional)

1st. Order the values.


2nd. Position of percentile k: j = k · (n+1)/100
3rd. Calculation of the percentile k: Pk = Xi + (j - i) (Xi+1 - Xi)
where j is the position for percentile k
i is the position anterior to position j (truncated j)
Xi is the value in the position i
Xi+1 is the value in the position i + 1
Central tendency measures

• Median: both central tendency and position index.

• Mid-range:
(min + max)
MidR =
2
• Average of quartiles (midhinge):

(Q1 + Q3 )
Q=
• Trimean: 2

(Q1 + 2Q2 + Q3 ) Q + Md
=TM =
4 2
Measures of scatter: Absolute

Range: Xmax - Xmin.


Median of the absolute deviations:=
MAD Md X i − Md
Quantile ranges
• Difference between quantiles equidistant from Md:
AC = P50 + k − P50 − k

• Interquartile range: IQR = Q3 − Q1

IQR Q3 − Q1
• Centile deviations: =
QD =
2 2
Quartile coefficient of variation (CVQ):

CVQ =
(P75 − P25 ) 2 = P75 − P25
(P75 + P25 ) 2 P75 + P25
Measures of shape

• Skewness indicators

Yule’s index
Q3 + Q1 − 2 ⋅ Md
H1 = H<0 Negative skew
2 ⋅ Md
H =0 Symmetry
Kelly’s index
H >0 Positive skew

P10 + P90 − 2 ⋅ Md
H3 =
2 ⋅ Md
Measures of shape

• Kurtosis indicators
• K2 index: compares the central 80% scatter with the
central 50% scatter.

P90 − P10 AC80


=K2 =
1.9 ⋅ ( P75 − P25 ) 1.9 ⋅ IQR

• Interpretation: Leptokurtic K>1


Mesokurtic K=1
Platykurtic K< 1
Graphical representations

• Bar plot: for representing frequencies of ordered


categories; useful when few.
• Box Plot: useful for:
 Assessing central tendency: Md.
 Identifying specific positions: quarters (Fl and Fu) o
quartiles (Q1 and Q3).
 Assessing scatter in the central 50% and the tails (25%
at each extreme).
 Assessing symmetry and type of skew in the central
50% and the tails (25% at each extreme).
 Identifying different degrees of outliers.
DESCRIPTIVE STATISTICS
QUANTITATIVE VARIABLES (INTERVAL
AND RATIO SCALES)
Index

 Indices based on moments


 Measures of central tendency: the mean
and beyond
 Measures of scatter: absolute & relative
 Measures of shape: skewness &
kurtosis
 Graphical representations
Central tendency measures

Arithmetic mean
The center of gravity or the balance between
all values.

Parameter Statistic

=i N=i n

∑x i ∑x
i

=µ =
=i 1 =i 1
x
N n
Central tendency measures

Generalized means
i =n
a) Quadratic mean Q = 1 ∑ xi2
n i =1
b) Harmonic mean i =n
1
Useful for distributions which have
been transformed using the inverse ∑
for making them symmetric. 1 = i =1 xi
H n
i =n
c) Geometric mean
∑ log x i
log G = i =1
n
Resistant central tendency measures

Mode (Mo) – Most frequent value. Mind continuous variables and multimodal
distributions!

Median (Md) – Value that divides the distribution in to two halves of equal size

Trimean: Q1 + 2 · Q2 + Q3
Trimean =
4
Quartiles mean: Q1 + Q3
Q=
2
Interquartile mean:
xi( p 25+1) + ... + xi( p 75 )
Mid =
ni

Other trimmed mean and winsorized mean.


Measures of scatter
i =n

∑x i −x
Absolute average deviation: DM = i =1
n
Variance:
i N=i n i =n

∑ (x
i− µ)
2
∑ 2
( xi − x ) ∑ (x i − x )2
σ2 =
i 1=
; sn2 i 1 σˆ2 ≡ s2 =
i =1

N n n −1

Standard deviation:
=i N=i n
− µ)
i
2
∑ ∑ (x
( xi − x )2
=σ =
=i 1=i 1
; sn
N n

Coefficient of variation:
s  s 
=CV = ; CV   100
X  X 
 
Resistance (?) & measures of scatter

Range= Xmax - Xmin

Interquartile range: IQR = Q3 - Q1


Q3 − Q1
Robust coefficient of variation: CVQ =
Q3 + Q1
Absolute median deviation:

MAD = Md xi − Md
Measures of shape: skewness

Negative asymmetry Symmetry Positive asymmetry


Measures of shape: skewness

 The deviation with respect to a theoretical symmetric


model is assessed.
 The symmetry axis is defined by the mean.

Pearson’s third coefficient:


2
  i =n   3

( )
  ∑ xi − X  / n 
  i =1 

β1 =  
3
  i =n 
2

( )
  ∑ xi − X  / n 
  i =1 
  
Measures of shape: skewness

 Fisher’s coefficient:

 i =n 
( )
3
 ∑ xi − X /n
γ1 =  i =1 
3 /2
  i =n 2 
∑ ( )
xi − X  / n 
  i =1  
For all previous coefficients:
Index < 0 Negatively skewed
Index = 0 Symmetric
Index > 0 Positively skewed
Measures of shape: skewness

 Other ways to assess skewness *

Mode ≤ Median ≤ Mean Positively skewed

Symmetric
Mode Median
= = Mean
Negatively skewed
Mode ≥ Median ≥ Mean
*Not always!!!!
Shaped measures: kurtosis

Platykurtic Mesokurtic Leptokurtic


Measures of shape: kurtosis

• Informs about the weight of the tails of the distribution.


• Appropriate for bell-shaped distributions, when mean, median
and mode are close to each other, and when the distribution is
(almost) symmetrical.
• Most frequent coefficient: Fisher’s γ2: calculus

 i =n 
( )
4
 ∑ xi − X /n
γ2  i =1  −3
4 /2
  i =n 2 
∑ ( )
xi − X  / n 
  i =1  
Measures of shape: kurtosis

 Fisher’s γ2 : interpretation
Other measures of shape

• Symmetry: Pearson’s 1st and • Kurtosis


2nd coefficients
X − Mo P87 ,5 − P12,5
As1 = K1 =
s 1,7 ⋅ (Q3 − Q1 )
As2 =
(
3 X − Md )
s
P90 − P10
• Interquartile symmetry K2 =
1,9 ⋅ (Q3 − Q1 )
As =
[( P75 − P50 ) − (P50 − P25 )]
P75 − P25
Example: R-Commander output
Example: R-Commander output
Diagrama de caja para Nº.Clientes
Histograma para Nº.Clientes
15

20
Frecuencias absolutas

15
10

10
5

5
0

0 5 10 15 20 25

Intervalos
BIVARIATE AND MULTIVARIATE
DESCRIPTIVE STATISTICS
TWO CATEGORICAL VARIABLES
Description of two qualitative variables

Summary index: contingence table

Var. A
Total
1 2 ... J
1 fo11 fo12 ... fo1J fo1.
Var. B 2 fo21 fo22 ... fo2J fo2.
... ... ... ... ... ...
I foI1 foI2 ... FoIJ FoI.
Total fo.1 fo.2 fo.K fo..
Graphic representations:
Grafico de mosaico para Pais y Es
Barras agrupadas para Pais y Espe
Letonia Reino Unido Rusia
Pais
Letonia
0.25

Reino Unido

En formación
Rusia
Frecuencias Relativas

0.20

Especialidad
0.15
0.10

Especialista
0.05
0.00

Pais

Bar chart
En formación Especialista

Especialidad

Mosaic chart
Some statistical indices
Association index: chi-square of Pearson

 (
i = k j =l fo − fe )2

χ o = ∑∑  
2 ij ij

i =1 j =1  fe 
 ij

Effect size: intensity of relationship


V=
χ n2

Chuprov: T =
2

(k − 1) ⋅ (l − 1)

χ2
Pearson’s Phi: φ=
n

Cramér’s V:
χ2
V= ; where q = min( I , J )
n ⋅ (q − 1)
ONE QUALITATIVA AND ONE
QUANTITATIVE VARIABLES
A qualitative dichotomous & a quantitative variable

X group1 − X group 2 S 2
+S 2

d = , where Sintra group1 group 2

Sintra 2

X control − X treatment
∆=
Scontrol

rpoint biserial =
( Y control − Y treatment ) pq
S quantitative
A qualitative polytomous & a quantitative variable

SCfactor SCfactor
=η2 = , where
SCtotal SCfactor + SCerror
a n
=
SCtotal ∑∑ (Y
=j 1 =i 1
ij − Y )2

a
=
SCfactor ∑ n(Y
j =1
j − Y )2

a n
=
SCerror ∑∑ ij
(Y − Y
=j 1 =i 1
j ) 2

* Eta-squared (η2) = R-squared: proportion of variability in C shared with


or explained by P. (0.01, 0.06, 0.14 to interpret)
* 1 − η2: variability in C due to factors other than P.
* Explained = differences between mean C of extroverted and introverted.
Not explained = differences witin each P-type.
Description of a qualitative variable and a
quantitative variable: R-Commander output
> numSummary(DadesTransparències[,"Nº.Clientes"], groups=DadesTransparències$Especialidad,
+ statistics=c("mean", "sd", "IQR", "quantiles", "cv", "skewness", "kurtosis"),
quantiles=c(0,.25,.5,
+ .75,1), type="2")
mean sd IQR cv skewness kurtosis 0% 25% 50% 75% 100% data:n
En formación 6.352941 3.920159 7.0 0.6170621 0.6595036 -1.42047 2 3 4 10.0 13 17
Especialista 15.086957 4.111105 4.5 0.2724940 0.1726646 -0.43399 8 13 15 17.5 23 23

Especialidad = En formación

20
frequency

8
6
4
2

Nº.Clientes

15
0

5 10 15 20

Nº.Clientes

10

Especialidad = Especialista
frequency

5
6
4
2
0

En formación Especialista
5 10 15 20
Especialidad
Nº.Clientes
TWO QUANTITATIVE VARIABLES
Covariance and Pearson’s linear correlation
coefficient

Main:
Establish if there is a relationship
between two quantitative variables
measured in interval or ratio scale.
Linear relationship between two quantitative variables

Dispersion graph or scatter plot:

Positive relationship Negative relationship No relationship

Index
i =n

∑ (x − x ) · ( yi − y ) S xy
rxy =
i
SPxy
S xy = i =1
=
n −1 n −1 Sx · S y
Covariance Pearson’s linear correlation coefficient
Coefficient of determination: r2
Others Correlation Coefficients

Variable 1 Variable 2 Correlation Coefficients


Interval or ratio Interval or ratio Pearson ρ
Intervalo or ratio Ordinal Spearman or Kendall’s τ
Ordinal Ordinal Spearman or Kendall’s τ
Dichotomous Interval or ratio Point-Biserial ρbp
Dichotomized Interval or ratio Biserial ρb
Dichotomous Dichotomous Phi φ
Dichotomized Dichotomized Tetrachoric rt
Linear relationship between two quantitative variables

39

20
15
Nº.Clientes

10

10
5

30 40 50 60

Edad
Spearman Correlation Coefficient
To study the relationship between two variables measured in ordinal
scales, or for variables measured in interval or ratio scales that doesn’t
fit the normal distribution.
Is the most used of the rank correlation methods and the most
appropriate when n is between 25 & 30.

6 ∑ d i2
rs = 1 −
(n + 1) n (n − 1)
where di are the differences between the ordered values
Their value oscillate between -1 i +1.
Other formulas are:
n

6 ∑d 2 6 ∑ ( xi − y i ) 2
rs = 1 −
i
rs = 1 − i =1

n (n 2 − 1) n3 − 1
Point-Biserial Correlation Coefficient

To study the relationship between a continuous quantitative variable and a


dichotomous qualitative variable (usually used in psychometrics).

x p − xq
rbp = p ·q
sx
Where p and q are the proportion of subjects that are in each category of the
qualitative variable, and Sx is the standard deviation of the continuous
quantitative variable without differentiate for the categories of the qualitative
variable.
Biserial Correlation Coefficient

To study the relationship between a continuous quantitative variable and a


dichotomized qualitative variable (usually used to obtain the discrimination index of
a item in psychometrics).

x p − xq pq x p − xx p
rb = · rb = ·
sx y sx y

where “y” is the ordinate of the normal distribution that separates the two
categories of the dichotomized variable.
Phi Coefficient

To study the relationship between two dichotomized quantitative variables.

αδ −βγ ad −bc
ϕ= ϕ=
p q p' q' (a + b) (c + d ) (a + c) (b + d )
where α, β, γ and δ are the proportions in the contingency table, and p, q, p’ and q’
are the marginal proportions row and column.

• It is less reliable than the Pearson’s correlation coefficient.


• The sample size must to be larger (n > 100). Should not be used when there are
proportions less than 0.01 and dangerous when there are proportions less than 0,1.
• Related to χ2 (χ2 = n · ϕ2)
Tetrachoric Correlation Coefficient

To study the relationship between two dichotomized continuous quantitative


variables.

(ad ) − (bc) 180º


rt = 2
≅ cos
n y y' ad
1+
bc
Where y is the ordinate normal distribution that differentiates the two categories
of one of the variables and y’ is the same for the other variable. The values a, b, c
and d are the observed frequencies of the 2 x 2 contingency table between these
two variables.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy