Statistics-In-Psychology - Compress Notes
Statistics-In-Psychology - Compress Notes
Parametric Statistics
Statistical methods that estimate the population parameters, such as the standard deviation, on the basis of the
sample data, are called, “parametric statistics”. Parametric analyses should only be used if the DV is normally
distributed.
Non-Parametric Statistics
Non-parametric statistics are NOT used to make assumptions about population distributions. Often used when
data fail to meet the assumptions for parametric analyses; used in the study of proportions & ranks. For
example: (i) DV not normally distributed; (ii) small samples; (iii) unequal sample sizes. Sometimes referred to
as “distribution-free techniques”; they are very valuable in the analysis of ordinal and rank data. They have less
power to detect significant differences between groups.
Descriptive Statistics
Descriptive statistics is the term given to the analysis of data that helps describe, show or summarize data (i.e.
the sample) in a meaningful way such that, for example, patterns might emerge from the data. Descriptive
statistics do not, however, allow us to make conclusions beyond the data we have analyzed or reach conclusions
regarding any hypotheses we might have made. They are simply a way to describe our data.
Inferential Statistics
Inferential statistics allow the researcher to generalize their findings from the sample data to the larger
population. They help assess the strength of the relationship between the independent (causal) variables, and the
dependent (effect) variables. With inferential statistics, we are trying to reach conclusions that extend beyond
the immediate data alone (i.e. our sample). In short, inferential statistics are used to estimate the characteristics
of the larger group (i.e., population).
T-test One-way ANONVA
A t-test is used when we have 1 IV with 2 A one-way ANOVA is used when we have 1
levels. It estimates whether the population IV with more than 2 levels. It estimates
means under the 2 levels of the IV are whether the population means under the
different. different levels of the IV are different.
Independent t-test: between participants/ If the same participants are used for each
independent groups. level of the IV a one-way repeated measures
(i.e. within subjects) ANOVA should be used.
Paired t-test: within participants/ repeated
measures.
Factorial ANOVAs are used to test for Correlation means association - more
differences when we have more than one precisely it is a measure of the extent to which
independent variable (IV). two variables are related.
Including more than one IV, we can explore When working with continuous variables, the
the effects of interactions between IVs. correlation coefficient to use is Pearson’s r.
This is a numerical score showing the strength
The terms ‘IV’ and ‘factor’ are of a correlation.
interchangeable. ANOVAs with more than
one IV are called Factorial ANOVAs. o r = - 1 (perfect negative relationship)
o r = +1 (perfect positive relationship)
There are three broad Factorial ANOVA o r = 0 (no relationship)
designs:
1. all IVs are between-participants - Once we’ve determined the relationship
Participants take part in only one (Pearson's r) in our sample, inferential
condition (i.e. independent measures). analyses allow us to determine the probability
of measuring a relationship of that magnitude
2. all IVs are within-participants - if the null hypothesis is true?
Participants take part in all conditions
(repeated measures). Bivariate linear correlation involves
measuring the linear relationship between two
3. a mixture of between-participant and sample variables.
within-participant IVs - Participants take
part in more than one, but not all Partial correlation allows us to examine the
conditions. relationship between two variables, while
removing the influence of a third variable.
Regression Analysis Spearman’s rho
2x2 Chi-Square (Test for Independence) Conduct post-hoc tests (Wilcoxon T),
measures the association between two corrected for multiple comparisons.
categorical variables.
The mean and the median are summary measures used to describe the most "typical" value in a set of
values.
Statisticians refer to the mean and median as measures of central tendency.
The Mean and the Median
The difference between the mean and median can be illustrated with an example. Suppose we draw a
sample of five women and measure their weights. They weigh 100 pounds, 100 pounds, 130 pounds,
140 pounds, and 150 pounds.
To find the median, we arrange the observations in order from smallest to largest value. If there
is an odd number of observations, the median is the middle value. If there is an even number of
observations, the median is the average of the two middle values. Thus, in the sample of five
women, the median value would be 130 pounds; since 130 pounds is the middle weight.
The mean of a sample or a population is computed by adding all of the observations and
dividing by the number of observations. Returning to the example of the five women, the mean
weight would equal (100 + 100 + 130 + 140 + 150)/5 = 620/5 = 124 pounds. In the general case,
the mean can be calculated, using one of the following equations:
Population mean = μ = ΣX / N OR Sample mean = x = Σx / n
where ΣX is the sum of all the population observations, N is the number of population observations, Σx
is the sum of all the sample observations, and n is the number of sample observations.
When statisticians talk about the mean of a population, they use the Greek letter μ to refer to the mean
score. When they talk about the mean of a sample, statisticians use the symbol x to refer to the mean
score.
The Mean vs. the Median
As measures of central tendency, the mean and the median each have advantages and disadvantages.
Some pros and cons of each measure are summarized below.
The median may be a better indicator of the most typical value if a set of scores has an outlier.
An outlier is an extreme value that differs greatly from other values.
However, when the sample size is large and does not include outliers, the mean score usually
provides a better measure of central tendency.
To illustrate these points, consider the following example. Suppose we examine a sample of 10
households to estimate the typical family income. Nine of the households have incomes between
$20,000 and $100,000; but the tenth household has an annual income of $1,000,000,000. That tenth
household is an outlier. If we choose a measure to estimate the income of a typical household, the mean
will greatly over-estimate the income of a typical family (because of the outlier); while the median will
not.
Effect of Changing Units
Sometimes, researchers change units (minutes to hours, feet to meters, etc.). Here is how measures of
central tendency are affected when we change units.
If you add a constant to every value, the mean and median increase by the same constant. For
example, suppose you have a set of scores with a mean equal to 5 and a median equal to 6. If you
add 10 to every score, the new mean will be 5 + 10 = 15; and the new median will be 6 + 10 =
16.
Suppose you multiply every value by a constant. Then, the mean and the median will also be
multiplied by that constant. For example, assume that a set of scores has a mean of 5 and a
median of 6. If you multiply each of these scores by 10, the new mean will be 5 * 10 = 50; and
the new median will be 6 * 10 = 60.
Mode
The mode is the most frequent score in our data set. On a histogram it represents the highest bar in a bar
chart or histogram. You can, therefore, sometimes consider the mode as being the most popular option.
An example of a mode is presented below:
Normally, the mode is used for categorical data where we wish to know which is the most common
category, as illustrated below:
We can see above that the most common form of transport, in this particular data set, is the bus.
However, one of the problems with the mode is that it is not unique, so it leaves us with problems when
we have two or more values that share the highest frequency, such as below:
We are now stuck as to which mode best describes the central tendency of the data. This is particularly
problematic when we have continuous data because we are more likely not to have any one value that is
more frequent than the other. For example, consider measuring 30 peoples' weight (to the nearest 0.1
kg). How likely is it that we will find two or more people with exactly the same weight (e.g., 67.4 kg)?
The answer, is probably very unlikely - many people might be close, but with such a small sample (30
people) and a large range of possible weights, you are unlikely to find two people with exactly the same
weight; that is, to the nearest 0.1 kg. This is why the mode is very rarely used with continuous data.
Another problem with the mode is that it will not provide us with a very good measure of central
tendency when the most common mark is far away from the rest of the data in the data set, as depicted in
the diagram below:
In the above diagram the mode has a value of 2. We can clearly see, however, that the mode is not
representative of the data, which is mostly concentrated around the 20 to 30 value range. To use the
mode to describe the central tendency of this data set would be misleading.
Please use the following summary table to know what the best measure of central tendency is with
respect to the different types of variable.
Ordinal Median
The study of statistics revolves around the study of data sets. This lesson describes two important types of data
sets - populations and samples. Along the way, we introduce simple random sampling, the main method used in
this tutorial to select samples.
Population vs Sample
The main difference between a population and sample has to do with how observations are assigned to the data
set.
Depending on the sampling method, a sample can have fewer observations than the population, the same
number of observations, or more observations. More than one sample can be derived from the same population.
Other differences have to do with nomenclature, notation, and computations. For example,
We will see in future lessons that the mean of a population is denoted by the symbol μ; but the mean of
sample is denoted by the symbol x.
We will also learn in future lessons that the formula for the standard deviation of a population is different
from the formula for the standard deviation of a sample.
A sampling method is a procedure for selecting sample elements from a population. Simple random
sampling refers to a sampling method that has the following properties.
An important benefit of simple random sampling is that it allows researchers to use statistical methods to analyze
sample results. For example, given a simple random sample, researchers can use statistical methods to define
a confidence interval around a sample mean. Statistical analysis is not appropriate when non-random sampling
methods are used.
There are many ways to obtain a simple random sample. One way would be the lottery method. Each of
the N population members is assigned a unique number. The numbers are placed in a bowl and thoroughly
mixed. Then, a blind-folded researcher selects n numbers. Population members having the selected numbers are
included in the sample.
In practice, the lottery method described above can be cumbersome, particularly with large sample sizes. As an
alternative, use Stat Trek's Random Number Generator. With the Random Number Generator, you can select up to
1000 random numbers quickly and easily. This tool is provided at no cost - free!! To access the Random Number
Generator, simply click on the button below. It can also be found under the Stat Tools tab, which appears in the
header of every Stat Trek web page.
Suppose we use the lottery method described above to select a simple random sample. After we pick a number
from the bowl, we can put the number aside or we can put it back into the bowl. If we put the number back in the
bowl, it may be selected more than once; if we put it aside, it can be selected only one time.
When a population element can be selected more than one time, we are sampling with replacement. When a
population element can be selected only one time, we are sampling without replacement .
Problem 1
(A) I only.
(B) II only.
(C) III only.
(D) All of the above.
(E) None of the above.