0% found this document useful (0 votes)
18 views15 pages

Class 3 Disper&normalcurveh

Here are the steps to solve these problems: 1) You scored 1500 on the SAT. The mean is 1060 and standard deviation is 210. To find your z-score: z = (Your Score - Mean) / Standard Deviation = (1500 - 1060) / 210 = 2.38 So you are 2.38 standard deviations above the mean. 2) To find the area above Z = 1.96, look up 1.96 in the Z-table. The area is 0.025, or 2.5% of the distribution is above Z = 1.96. 3) To find the area between the mean and Z = 2.58, subtract the area below

Uploaded by

elisia50824
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views15 pages

Class 3 Disper&normalcurveh

Here are the steps to solve these problems: 1) You scored 1500 on the SAT. The mean is 1060 and standard deviation is 210. To find your z-score: z = (Your Score - Mean) / Standard Deviation = (1500 - 1060) / 210 = 2.38 So you are 2.38 standard deviations above the mean. 2) To find the area above Z = 1.96, look up 1.96 in the Z-table. The area is 0.025, or 2.5% of the distribution is above Z = 1.96. 3) To find the area between the mean and Z = 2.58, subtract the area below

Uploaded by

elisia50824
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

SOCI 1028 Measures of Dispersion

• Standard Deviation
Compare the variability of the following groups of quiz scores by comparing the standard deviations:

Group 1: 0, 4, 4, 5, 7, 10

Group 2: 0, 0, 1, 9, 10, 10

1. Group 1:

Scores (𝑿𝑿𝒊𝒊 ) �)
Deviations (𝑿𝑿𝒊𝒊 − 𝑿𝑿 � )𝟐𝟐
Deviations Squared (𝑿𝑿𝒊𝒊 − 𝑿𝑿
0
4
4
5
7
10
�(𝑋𝑋𝑖𝑖 ) = �(𝑋𝑋𝑖𝑖 − 𝑋𝑋�) = �(𝑋𝑋𝑖𝑖 − 𝑋𝑋�)2 =
∑(𝑋𝑋𝑖𝑖 )
𝑋𝑋� = =
𝑁𝑁

� )𝟐𝟐
∑(𝑿𝑿𝒊𝒊 − 𝑿𝑿
𝒔𝒔 = � =
𝑵𝑵

2. Group 2:

Scores (𝑿𝑿𝒊𝒊 ) �)
Deviations (𝑿𝑿𝒊𝒊 − 𝑿𝑿 � )𝟐𝟐
Deviations Squared (𝑿𝑿𝒊𝒊 − 𝑿𝑿
0
0
1
9
10
10
�(𝑋𝑋𝑖𝑖 ) = �(𝑋𝑋𝑖𝑖 − 𝑋𝑋�) = �(𝑋𝑋𝑖𝑖 − 𝑋𝑋�)2 =
∑(𝑋𝑋𝑖𝑖 )
𝑋𝑋� = =
𝑁𝑁

� )𝟐𝟐
∑(𝑿𝑿𝒊𝒊 − 𝑿𝑿
𝒔𝒔 = � =
𝑵𝑵

3. The variation of Group 1 is >, =, < the variation of Group 2.

1
• Opinion Polarization
Data: World Values Survey (Wave 6) (subset: only Japan, Sweden, and US are included)

Variables Description
V198 – Opinions on controversial issues. Answers ranged from 1 (never justifiable) to 10
V210 (always justifiable).
V2 Country code:
392 = Japan
752 = Sweden
840 = United States

V2 is a variable representing the country origin of the sample


case. Country codes are the categories within the variable.

A case
from
Japan.

A case
from
Sweden.

2
1. Load the wvs6sn.csv data into RStudio. Be sure to click “Yes” for heading.

2. Measures of Dispersion:
Variable: V204 (opinions on abortion)
Commands you need:
range()
IQR()
sd()
Hint
boxplot() You can use the summary() command to find out the
measures of central tendency first.

summary(wvs6sn$V204)
Range
Codes:
range(wvs6sn$V204)

*Again, if there are missing values, use the option na.rm = TRUE to remove the NAs when doing the
range(), IQR(), and sd() commands. NA is the default label for missing values in R.

range(wvs6sn$V204, na.rm = TRUE)

Interquartile Range
Codes:
IQR(wvs6sn$V204)

*same as the notes above. Use the na.rm = TRUE option to remove the NAs.

IQR(wvs6sn$V204, na.rm = TRUE)

Standard Deviation
Codes:
sd(wvs6sn$V204)

*same as the notes above. Use the na.rm = TRUE option to remove the NAs.

sd(wvs6sn$V204, na.rm = TRUE)

Boxplot (when not using RMarkdown, click “export” to


save/copy your plot)
Codes:
boxplot(wvs6sn$V204)

3
3. Optional: Which countries are more polarized in their views on controversial issues?
Similar to the last time, the codes below allow you to compute the measures of dispersion for a
continuous variable (e.g., V204) for each category of a categorical variable (e.g., V2). By computing
the standard deviation of the attitudes on abortion for the three countries, you can see which
countries are more polarized in their views on controversial issues.

Command you need:

sd(dataset$variable[conditions], na.rm = TRUE)

OR

tapply(continuous variable, categorical variable, statistics, na.rm = TRUE)

*Method 1
*Codes: (392 = Japan; 752 = Sweden; 840 = USA)
sd(wvs6sn$V204[wvs6sn$V2 == 392], na.rm = TRUE)
sd(wvs6sn$V204[wvs6sn$V2 == 752], na.rm = TRUE)
sd(wvs6sn$V204[wvs6sn$V2 == 840], na.rm = TRUE)

Codes Breakdown
a. The command line:
sd(wvs6sn$V204[wvs6sn$V2 == 392], na.rm = TRUE)

in plain words: calculate the standard deviation of the V204 variable if the V2 variable is 392, with
missing values removed.

Hint
The conditions in the brackets here are somewhat different from the ones we encountered before.
See that we don’t use quotes around the country code here (i.e.,[wvs6sn$V2 == 392]). That’s
because the variable V2 is a numeric variable, whereas the other variables we saw before are
factors (i.e., categorical variables). You can find out the object class (i.e., whether a variable is
numeric or not) by using the class() command below:

class(wvs6sn$V2)

The object class affects the operation of the codes, so pay attention to this. Just like here, if you put
quotes around 392, R will reply you with error messages.

4
*Method 2
*Codes:

tapply(wvs6sn$V204, wvs6sn$V2, sd, na.rm = TRUE)

Hints
a. The tapply() command calculates the standard deviation for each category of the V2 variable at the
same time.
b. The first argument should be a continuous variable and the second argument must be a
categorical variable. The third argument, statistics, can be mean, median, sd, or other statistics.

4. Optional: Boxplots by Country.

*Codes:
boxplot(continuous variable ~ categorical variable)

#For example (V204)


boxplot(wvs6sn$V204 ~ wvs6sn$V2)

MJ’s interpretation: People in the United States are more polarized in their views on abortion. From the
box plots above, we can see that the IQR for the United States is the greatest, while the IQR for Japan
and Sweden are smaller and almost the same. In addition to IQRs, we can also see that the medians are
different across countries. Sweden has the highest median, meaning the people there generally hold
liberal opinions on abortion. Japanese are quite conservative, but their opinions do not vary much. In
summary, Swedish tend to be more liberal, Japanese tend to be more conservative, and Americans tend
to have little tendency (although a little towards the conservative side). Americans hold diverse
viewpoints on the issue of abortion.

5
Exercises
1. Compute the range, interquartile range, and standard deviation for the variable V203
(homosexuality). Draw a boxplot for V203 and interpret the boxplot.

2. Optional: People in which countries are more polarized in their views on homosexuality
(V203)? Report the range, IQR, and standard deviation of the three countries at the same
time. Please also plot the three boxplots in the same chart. (Hint: use the tapply()
command and change the statistics argument.)

3. Optional: Report the standard deviation of opinions on suicide (V207) for Japan (V2 = 392).

6
SOCI 1028 The Normal Curve 1

• How does your height compare to others?


1. Below is the typical range of human height, put a mark at your height.

Height (cm)

2. The mean height of the population is 170, are you above, below, or at the mean?

Height (cm)

3. How would you describe your position? Use cm? (Step 1: find the distance
between your height and the mean)

4. Someone told you that human height is normally distributed. The person also said the
mean height is 170 cm and the standard deviation is 5 cm. How can the information
help you find your position in the population?

7
The empirical rules:

Standard Normal Distribution

The height distribution:

Normal Distribution

when you know the distance in the unit of standard deviation, you know your
position.

Step 2: turn the distance to be in the unit of the standard deviation of the
distribution.

𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦 ℎ𝑒𝑒𝑒𝑒𝑒𝑒ℎ𝑡𝑡 − 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 ℎ𝑒𝑒𝑒𝑒𝑒𝑒ℎ𝑡𝑡 𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦 ℎ𝑒𝑒𝑒𝑒𝑒𝑒ℎ𝑡𝑡 − 170


=
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 5

your height in the 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 𝒛𝒛 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔


𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦 ℎ𝑒𝑒𝑒𝑒𝑒𝑒ℎ𝑡𝑡 − 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 ℎ𝑒𝑒𝑒𝑒𝑒𝑒ℎ𝑡𝑡 𝑋𝑋𝑖𝑖 − 𝑋𝑋 �
= =
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝑠𝑠

8
5. Step 3: Find the percentage of population above or below you using the
empirical rules or/and the Z table.

• Exercises
1. Assume the distribution of the SAT score is a normal distribution. You take the SAT and
score 1500. The mean score for the SAT is 1060 and the standard deviation is 210. You
are _______ standard deviations above the mean.

2. Find the area above Z = 1.96.

3. Find the area between Mean and Z = 2.58.

4. Find the Z score when “Area Beyond Z = 0.0495.”

5. Compare an Orange to an Apple

Orange: 150 grams


Apple: 150 grams
Weight of oranges and weight of apples are normally distributed. The mean weight of
orange is 140 grams and the standard deviation is 25. The mean weight of apple is 100
and the standard deviation is 15. Which is relatively heavier? (compare to their own
populations)
Orange Apple
Z score
Z table (area beyond Z)
(Notes: 1. You actually only need the Z scores to answer the “which is relatively heavier”
question. But you can practice looking the table up. 2. You know the areas beyond Z
now. What are the areas below Z?)

9
SOCI 1028 The Normal Curve 2

• How does your height compare to others?


1. Step 1: find the distance between your height and the mean
2. Step 2: turn the distance to be in the unit of the standard deviation of the distribution.

𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦 ℎ𝑒𝑒𝑒𝑒𝑒𝑒ℎ𝑡𝑡 − 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 ℎ𝑒𝑒𝑒𝑒𝑒𝑒ℎ𝑡𝑡 𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦 ℎ𝑒𝑒𝑒𝑒𝑒𝑒ℎ𝑡𝑡 − 170


=
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 5

your height in the 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 𝒛𝒛 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔


𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦 ℎ𝑒𝑒𝑒𝑒𝑒𝑒ℎ𝑡𝑡 − 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 ℎ𝑒𝑒𝑒𝑒𝑒𝑒ℎ𝑡𝑡 𝑋𝑋𝑖𝑖 − 𝑋𝑋�
= =
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑑𝑑𝑑𝑑𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝑠𝑠

3. Step 3: Find the percentage of population above or below you using the empirical rules
or/and the Z table.

• Question last time: What’s the percentage of population that’s taller/shorter than you?

• Try to answer these after today’s class: What’s the probability of selecting an individual
who
• Is taller/shorter than you in the population?
• Has a height that is between Michael Jordan (198 cm) and you?

• Exercises
1. Human height is normally distributed. The mean height is 170 cm and the standard
deviation is 5. Jayson’s height is 179 cm. How many standard deviation(s) is his height
away from the mean?

What is the Z score of his height?

Find the area beyond the Z score using the Z table (Z table in COOL’s lesson’s page):

10
Find the area below the Z score using the Z table (Z table in COOL’s lesson’s page):

2. If a distribution of test scores is normal, with a mean of 78 and a standard deviation of


11, what percentage of the area lies: above 89

3. Human height is normally distributed. The mean height for women is 160 cm, and the
standard deviation is 5 cm. Only 20.9% of women are taller than Emily. How tall is Emily
(in cm)?

𝑋𝑋𝑖𝑖 −𝑋𝑋�
𝒛𝒛 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 =
𝑠𝑠
Normal Distribution

Standard Normal Distribution

11
4. If a distribution of test scores is normal, with a mean of 78 and a standard deviation of
11, what percentage of the area lies: below 60.

5. If a distribution of test scores is normal, with a mean of 78 and a standard deviation of


11, what percentage of the area lies: between 65 and 93.

6. If a normal distribution has a mean of 13 and a standard deviation of 4, what is the


probability of randomly selecting a score of 19 or more?

12
• Look up the Z table in R

Commands you need:


pnorm(score, mean =, sd =, lower.tail = )
qnorm(areainproportion, mean = , sd = )

For example: score = -1, mean = 0, sd = 1


(i.e., Z score = -1 in the standard normal distribution)

a. Find area below Z

pnorm(-1, mean = 0, sd = 1, lower.tail = TRUE)

*Since the default of the pnorm() command is mean = 0, sd = 1, and lower.tail = TRUE,
the above line can also be written as

pnorm(-1)

b. Find area beyond Z

pnorm(-1, mean = 0, sd = 1, lower.tail = FALSE)

*Similar to (a), the default for mean and sd are 0 and 1 respectively, so just change the
lower.tail default to FALSE.

pnorm(-1, lower.tail = FALSE)

c. Find area between Mean and Z

Negative Z:
0.5 - pnorm(-1, mean = 0, sd = 1, lower.tail = TRUE)

Positive Z (e.g., Z = 1):


pnorm(1, mean = 0, sd = 1, lower.tail = TRUE) – 0.5

d. Find Z from area (area from lower tail)

qnorm(0.1587, mean = 0, sd = 1)

13
IMPORTANT Hints
a. See the mean and sd options? You can change the mean and sd according to the normal
distribution at hand. For example: The distribution of the Emily’s height example is N(160, 5),
which means it’s a normal distribution with a mean of 160 and a standard deviation of 5. The
code for finding the area below 167 cm is:

pnorm(167, mean = 160, sd = 5, lower.tail = TRUE)

I believe you know how to find the area above 167 now. Simply change the option lower.tail
= TRUE to FALSE.

b. Similarly, you can answer the Emily’s height question with the qnorm() code

qnorm(1-0.209, mean = 160, sd = 5, lower.tail = TRUE)

This will directly give you the height of Emily in inches.

Exercises
1. If a normal distribution has a mean of 100 and a standard deviation of 20, what is the
probability of randomly selecting a score of 105 or more (i.e., the area beyond 105)?

2. Age is normally distributed in a community with a mean of 50-year-old and a standard


deviation of 15 years. A community member is older than 27% of the others in the
community. How old is this member?

3. A scale measuring subjective well-being has been administered to a large sample of


respondents. The distribution of scores is approximately normal, with a mean of 25 and a
standard deviation of 3. What is the probability of selecting a respondent who has a score
between 23 and 29?

14
Answers

> #Exercise
> #1)
> pnorm(105, mean = 100, sd = 20, lower.tail = FALSE)
[1] 0.4012937
>
> #2)
> qnorm(0.27, mean = 50, sd = 15)
[1] 40.80781
>
> #3)
> pnorm(29, mean = 25, sd = 3, lower.tail = TRUE) - pnorm(23, mean = 2
5, sd = 3, lower.tail = TRUE)
[1] 0.6562962

15

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy