Chapter 7
Chapter 7
Data Analysis
It is a numerical or graphic way to summarize data obtained from the population. It is a numerical or graphic way to summarize data obtained from a sample.
Quantitative data: obtained by determining placement on a scale that indicates amount or degree. Categorical data: obtained by determining the frequency of occurrences in each of several categories.
Frequency Distributions Histograms/Stem and Leaf Plots Distribution curves Averages/Spread Variability/Correlations
Frequency Polygons
Places data in some sort of order. A frequency distribution lists scores from high to low. This results in a grouped frequency distribution. Since the information is not very visual, a graphical display called a frequency polygon can help with this.
Frequency polygons can be negatively or positively skewed. They can be useful in comparing two or more groups.
5 4 3 0 0 7 10 11 4 3 0 2 1 n = 50
10
11
Histograms
A histogram is a bar graph used to display quantitative data at the interval or ratio level of measurement.
12
This distribution curve shows a generalized distribution of scores vs. straight lines (frequency polygon). Distribution of data tends to follow a specific shape called a normal distribution. This distribution is considered bell shaped and allows the plotting of the following averages: X
Median midpoint or the point below or above which 50% of the scores in a distribution Mode most frequent score in a distribution
13
14
15
Variability
Two distributions may have identical means and medians. Distribution A: 19, 20, 25, 32, 39 Distribution B: 2, 3, 25, 30, 75
The mean in both distributions is 27 and the median is 25. The two distributions differ. In distribution A, the scores are closer together, in distribution B, they are much more spread out. The two distributions differ in what statisticians call variability.
16
17
Variability
Refers to the extent to which the scores on a quantitative variable in a distribution are spread out. The range represents the difference between the highest and lowest scores in a distribution. A five number summary reports the lowest, the first quartile (25th percentile), the median (50th percentile), the third quartile (75th percentile), and highest score. Five number summaries are often portrayed graphically by the use of box plots.
18
Box plots
st Lowest 1 quartile score
Highest score
19
Standard Deviation
Considered the most useful index of variability. It is a single number that represents the spread of a distribution. If a distribution is normal, then the mean plus or minus 3 SD will encompass about 99% of all scores in the distribution.
20
Mean 54 54 54 54 54 54 54 54 54 54
3640 10 = 364 =
X X
n
This SD of the sample introduces biases when the sample size is small or moderate. Thus, sample SD is used.
SD
X X
n 1
21
22
50% of all the observations fall on each side of the mean. (Figure 10.11) 68% of scores fall within 1 SD of the mean in a normal distribution. 27% of the observations fall between 1 and 2 SD from the mean. 99.7% of all scores fall within 3 SD of the mean. This is often referred to as the 68-95-99.7 rule.
23
Fifty Percent of All Scores in a Normal Curve Fall on Each Side of the Mean
24
Standard Scores
Standard scores use a common scale to indicate how an individual compares to other individuals in a group. The simplest form of a standard score is a Z score. A Z score expresses how far a raw score is from the mean in standard deviation units. raw score mean z score SD Standard scores provide a better basis for comparing performance on different measures than do raw scores. A Probability is a percent stated in decimal form and refers to the likelihood of an event occurring. T scores are z scores expressed in a different form (z score x 10 + 50).
25
26
27
A probability is a percent stated in decimal form and refers to the likelihood of an event occuring.
28
29
30
Correlation
Researchers seek to determine whether a relationship exists between two or more quantitative variables. A scatterplot is a pictorial representation of the relationship between two quantitative variables. Outliers are scores that deviate or fall considerably outside most of the other scores in a distribution or pattern. They indicate an unusual exception to a general pattern. Correlation coefficients express the degree of relationship between two sets of scores. Pearson Product-Moment Correlation Coefficient (known as Pearson r, range between -1 and +1) Formula: n X iYi X i Yi r 2 2 [n X i ( X i )2 ][nYi (Yi )2 ]
31
Correlation
32
Scatterplot of Data
33
Relationship Between Family Cohesiveness and School Achievement in a Hypothetical Group of Students
34
Examples of Scatterplots
35
The Frequency Table Bar Graphs and Pie Charts The Crossbreak Table
36
37
38
39
40