01 Econ115a Mod1 Lesson1 BasicStatisticalConcepts
01 Econ115a Mod1 Lesson1 BasicStatisticalConcepts
Learning objectives:
- Define Statistics
- Differentiate the two branches of Statistics
- Define basic concepts in Statistics
Econ 115a
Outline
1.1 What is Statistics?
1.2 Branches of Statistics
1.3 Population vs Sample
1.4 Sampling
1.5 Data Collection
1.6 Sources of Bias
Econ 115a
Definitions of Statistics
- the science of collecting, analyzing, presenting, and interpreting data (Anderson, et
al., 2020).
- a discipline consists of a body of methods for collecting and analyzing data (Agresti
& Finlay, 1997).
Statistics is the study of data: involves describing properties of data and drawing
conclusions about a population based on information in a sample.
Everything that deals with the collection, processing, interpretation, and presentation
of data belongs to the domain of statistics, and so does the detailed planning of that
precedes all these activities.
Econ 115a
Statistical methods can be used to find answers to some basic questions like:
• What kind and how much data need to be collected?
• How can we analyze the data and draw conclusions from it?
• How can we assess the strength of the conclusions and evaluate their uncertainty?
Econ 115a
Branches of Statistics
1. Descriptive Statistics
- the branch of statistics that involves organizing, displaying, and describing data.
- overview of the attributes of the data, more on describing the data obtained from
the samples/population.
Example:
The average age of citizens who voted for the winning candidate in the last election,
the average length of all books about statistics, the variation in the weight of 100
boxes of cereal selected from a factory’s production line.
Econ 115a
2. Inferential Statistics
- the branch of statistics that involves drawing conclusions about a population based
on information contained in a sample taken from that population.
- making inferences about the population, provides measures of how well data
supports hypothesis
Example:
A survey that sampled 2,001 full- or part-time workers ages 50 to 70, conducted by
the American Association of Retired Persons (AARP), discovered that 70% of those
polled planned to work past the traditional mid60s retirement age.
Econ 115a
2. Inferential Statistics
2.a Test for mean difference (z-test, t-tests, ANOVA’s, Mann-Whitney U, Wilcoxon,
Kruskal-Wallis, etc.)
2.b Testing for relationship (Correlation, Chi-square, etc.)
2.c Regression Analyses (OLS, Logit, Probit, Tobit, etc.)
2.d Structural Equation Modelling (PLS, Covariance-based)
Econ 115a
Parameter vs Statistic
Parameter
- a number that summarizes some aspect of the population as a whole.
- a numerical measure that describes a characteristic of a population.
Statistic
- a number computed from the sample data.
- a numerical measure that describes a characteristic of a sample.
Econ 115a
1.4 Sampling
Econ 115a
2. Non-probability Sampling
- each unit of the population has NO probability
of being selected to be part of the sample.
Probability Sampling
1. Simple Random Sampling (SRS) – every member and set of members has an
equal chance of being included in the sample.
2. Stratified Sampling – the population is first split into groups. The overall sample
consists of some members from every group. The members from each group are
chosen randomly.
Econ 115a
3. Cluster Sampling – the population is first-split into groups. The overall sample
consists of every member from some of the groups. The groups are selected
randomly.
Non-probability Sampling
1. Convenience Sampling – the researcher selects anyone he or she happens to come
across.
2. Purposive Sampling – only those elements will be selected from the population
which suits the best for the purpose of our study.
4. Referral/Snowball Sampling – takes the help from the first element selected from
the population and ask him/her to recommend other elements who will fit the
description of the sample needed.
Econ 115a
𝑁
𝑛=
(1 + 𝑁𝑒 2 )
where:
n = sample size
N = population
e = margin of error
NOTE: The higher the margin of error (e), the lower is the required sample size (n)
Econ 115a
Example: Your school has 1,500 students and you wish to draw samples for your study.
For e=1% For e=5% For e=10%
NOTE: Slovin’s formula should only be used when estimating a population proportion
and a 95% confidence level.*
*Punzalan, R. B. and Tejada, J. J. (2012).On the Misuse of Slovin’s Formula. The Philippine Statistician Vol. 61, No. 1 (2012).
Econ 115a
2. Proportionate Sampling
- samples are determined according to weights/proportions.
𝑍 2 𝑝 ∗ (1 − 𝑝)
𝑛𝑜 =
𝑒2
where:
Z = Z-value from the z-table reflecting the confidence interval
e = desired level of precision (margin of error)
p = estimated proportion of the population which has the attribute in question
Econ 115a
Example:
Suppose, you are doing a study on the inhabitants of a large community, let’s say a
housing or subdivision, and you want to find out if how many households have cable
televisions. You don’t have much information about them to begin with, so you’re
going to assume that half of the families do have. So, you’ll have p =0.50 and it let’s
say you want 95% confidence with a 5% margin of error, then you can now compute
for your sample size:
𝑍 2 𝑝 ∗ (1 − 𝑝)
𝑛𝑜 =
𝑒2
1.962 0.5 ∗(1−0.5)
𝑛𝑜 = 0.052
= 384
Econ 115a
𝑍 2 𝑝 ∗ (1 − 𝑝)
𝑛𝑜 =
𝑒2
2.582 0.5 ∗(1−0.5)
𝑛𝑜 = = 16, 641
0.012
Econ 115a
4. Other methods
4.1 Neuman (2006)
N = 1,000 (30% of the population)
N = 10,000 (10% of the population)
N = 150,000 (1% of the population)
Source: Krejcie, R.V., & Morgan, D.W., (1970). Determining Sample Size for Research
Activities. Educational and Psychological Measurement.
Econ 115a
Surveymonkey: https://www.surveymonkey.com/mp/sample-size-calculator/
Qualtrics: https://www.qualtrics.com/au/experiencemanagement/research/determine-
sample-size/
Raosoft: http://www.raosoft.com/samplesize.html
Econ 115a
Note:
- smaller margin of error, larger sample size given the same population.
- the higher the sampling confidence level, the larger your sample size will be.
- the larger the sample size, the more statistically significant it is — meaning there’s
less of a chance that your results happened by chance/coincidence.
Econ 115a
Data Collection
- is the process of gathering and measuring information on variables of interest,
in an established systematic fashion that enables one to answer stated research
questions, test hypotheses, and evaluate outcomes (Responsible Conduct of Research
(RCR) - Northern Illinois University).
Econ 115a
3. Observation
- gathering firsthand information.
Econ 115a
Sources of Bias
Bias is a source of systematic error.
If you select your subjects in a way that is biased — that is, favoring certain individuals
or groups of individuals — then your results will also be biased.
Types of Biases:
1. Sample Selection Bias
2. Information Bias
Econ 115a
The sample is selected in a way that systematically excludes part of the population.
1.b Volunteer bias: the fact that people who volunteer to be in the studies are usually
not representative of the population as a whole.
Econ 115a
1.c Nonresponse bias: the other side of volunteer bias. Just as people who volunteer
to take part in a study are likely to differ systematically from those who do not, so
people who decline to participate in a study when invited to do so very likely differ
from those who consent to participate.
1.d Informative censoring: can create bias in any longitudinal study (a study in which
subjects are followed over a period of time).
Losing subjects (participants) during a long-term study is common, but the real
problem comes when subjects do not drop out at random, but for reasons related to
the study's purpose.
Econ 115a
2. Information Bias
2.a Interviewer bias: when bias is introduced into the data collected because of the
attitudes or behavior of the interviewer.
2.b Recall bias: the fact that people with a life experience such as suffering from a
serious disease or injury are more likely to remember events that they believe are
related to that experience.
2.d Detection bias: the fact that certain characteristics may be more likely to be
detected or reported in some people than in others.
Detection bias can occur in trials when groups differ in the way outcome information
is collected or the way outcomes are verified.
Econ 115a
Example:
Larger men have bigger prostates, which makes diagnosing prostate cancer via biopsy
more difficult (it is harder to hit the target).
Therefore, men with larger prostates are less likely to be accurately diagnosed with
prostate cancer. Thus, a real association between obesity and prostate cancer risk may
be underestimated.
Econ 115a
There are many other biases that may be present or may exist in any study.
To know more about the other forms/source of biases (especially for health/life-
related studies), you may visit a database maintained by the University of Oxford
through the link:
https://catalogofbias.org/
Econ 115a
References:
Beginning Statistics (2012). https://2012books.lardbucket.org/books/beginning-statistics/
Cochran, W.G. (1977). Sampling Techniques. 3rd ed. New York: John Wiley & Sons.
Best, J. W., Kahn, J. V. (2006). Research in Education 10th Edition. Pearson Education
Fraenkel, J. R., Wallen, N. E., Hyun, H. H. (2012). How to Design and Evaluate Research in Education 8th Edition.
McGraw-Hill
Isotalo, Jarkko (n.d.) Basic Statistics.
Wahl, M. (2013). Crash Course on Basic Statistics. University of New York at Stony Brook
Yamane, Y. (1967). Mathematical Formulae for Sample Size Determination.
https://www.statisticshowto.com/probability-and-statistics/how-to-use-slovins-formula/
https://www.statisticshowto.com/probability-and-statistics/find-sample-size/#Cochran