Act 5 Probability
Act 5 Probability
During the course of the semester, we will be using probability and statistics in our study of
genetics. This first laboratory exercise will review the principles that will be used throughout the
course. For some students, this will serve as a refresher, while for others, it may be completely
new material. In the examples given below, assume that a coin has a head and a tail side, each of
which is equally likely to be obtained when the coin is tossed. Also, a deck of cards is a standard
deck with 52 cards (no jokers), 13 cards in each of four suits. The cards within a suit are ace, 2,
3, 4, 5, 6, 7, 8, 9, 10, jack, queen and king, and the suits are clubs, diamonds, hearts and spades.
The probability (p) of an event occurring is calculated by the frequency of the event (e) divided
by the total of all possible occurrences (n) p = e/n
For instance, the probability of selecting the ace of spades is 1/52. The probability of selecting
ANY ace is 4/52 or 1/13; and the probability of selecting ANY spade is 13/52 or 1/4.
Sum Rule
The sum rule is used when considering the probability of either of two mutually exclusive
events. If the verbal expression is 'A or B,' the 'or' clues you in that the sum rule is applied. In
this case, the individual probabilities are added.
pA or B = pA + pB
For example, the probability of selecting the three of clubs or any ace from the deck is the sum of
the individual probabilities:
Product Rule
The product rule is used when two events occur simultaneously (or consecutively). The general
verbal formula is 'A and B.' In this case, total probability of both events occurring is the product
of the two individual events. There are a few tricky applications which will be considered
following the examples.
What is the probability of tossing a coin twice and obtaining heads both times?
The probability of obtaining heads on each coin flip is ½; therefore the total probability is: ½ x ½
= ¼.
Assuming there is an equal chance of having a boy or a girl, what is the probability in a family
with 7 children, that all 7 will be girls?
(½)7 = 1/128
What is the probability that you will draw an ace of spades and a king of hearts when you draw 2
cards from the deck? The term 'and' indicates that the product rule should be used. As mentioned
above, the probability of drawing the ace of spades is 1/52. But the first card could be either the
ace of spades OR the king of hearts. Thus, the probability of drawing one of the two cards first
is 1/52 + 1/52 = 1/26
Assuming you are holding the first card when you draw the second card, the probability of
drawing the second card specified is 1/51. (Remember that one of the cards has been removed
from the deck!) Thus, the total probability is: 1/26 x 1/51 = 1/1326
What is the likelihood, in a family with 7 children, that there will be one boy and six girls? You
could use the same formula as above, but you must keep in mind that there are 7 ways to have
one boy and six girls. (The boy could be first, second, third, fourth, fifth, sixth or last in birth
order.) Because there are 7 mutually exclusive possibilities, the sum rule comes into play (note
the 'or' in the listing above). Thus the probability of six girls and one boy in a family with seven
children is 7/128.
(a + b)0 = 1
(a + b)1 = a + b
(a + b)2 = a2 + 2ab + b2
If you were to flip a coin 100 times, how many times would you expect to obtain heads? __
Would you be surprised if there were some slight variations from this value? ____
If a magician were flipping the coin and obtained heads 50% of the time, would you be
suspicious that something unusually were occurring? ___
If the magician ALWAYS got heads, 100 times in a row, would you be suspicious? ___
We expect that there will be chance deviations from the expected values, but sometimes the
variations are large, and are due to something beyond chance. Sample size has an impact; it is
common in a family with two children to have all boys. It is not as common in a family of twenty
children to have all boys! This is why a larger sample yields results with greater validity.
The chi square test is used to determine if the deviations are within a range considered to be
normal, or if they are so different than what we expected that we must consider that something
other than chance is involved.
One method of determining whether variations from the expected are reasonable is the chi square
test (see table 1). The observed results are what is measured. The expected results are what was
predicted. These should be in the same units. For instance, if you expect ¼ of families with two
children to have all boys, and you poll 100 families with 4 children, you would expect ¼ x 100
or 25 families to have all boys. One way of double checking if the observed and expected values
are in the same units is to see if the sum of each of the columns is the same. The deviation is the
difference between the observed and expected values (O - E). Note that the sum of all the
deviations is always zero. This value isn't very useful, since the negative deviations cancel the
positive ones. Therefore, the squared deviation [(O - E)2] is used. This eliminates all negative
values. This is also not very useful, as it doesn't take into consideration sample size. (A deviation
of two is a LOT in a sample size of five, but not very much in a sample size of a thousand.)
Therefore, the squared deviation is divided by the expected value to obtain a measure of the
relative size of these variations from the predicted values [(O - E)2/E]. These are summed to
obtain the chi square (2) value.
The chi square value is then found on a chart to determine a range of p (probability of variation
due to chance alone) values (table 2). The degrees of freedom are the minimum number of values
in a data set that must be known in order to determine the values of the remaining classes. In the
example of families with 2 children, if you were told to collect the data from 100 families, and
you knew that 32 families had two boys and 46 families had a boy and a girl (in any order), you
could determine that 22 families had two girls. In fact, if you know any two classes, you can
determine the third class. The value of the third class depends on the value of the first two, which
are independent variables. The degrees of freedom is a measure of the number of independent
variables, which in this example is two. For the most part, the degrees of freedom is one less than
the number of classes, though when we discuss population genetics, you will discover this
general rule does not apply.
A p value of .99 means that there is a 99% chance that the variation from the expected is due to
chance alone. A p value of .05 means there is a 5% chance that variation is due to chance alone
(and a 95% chance that something other than chance has caused the variation). Typically, a p
value greater than 0.05 is accepted as variation due to chance alone. Any time the p value is less
than 0.05, it is assumed that something other than chance is involved. Usually the chi square
value falls between two p values. Therefore the p value is listed in a range. In the example given
in table 1, the 2 value is 2.64 and there are 2 degrees of freedom. Thus, the p value lies between
0.20 and 0.50. It is written as:
This means that there is a 50-80% chance that the variation in the values is due to chance alone,
and not some other factor. The hypothesis (that in a family with two children, ¼ will have two
boys, ½ will have a boy and a girl, and ¼ will have two girls) is accepted.
Site map: Margaret F. Hicks Home - Biology 2120 - Notes - Probability Lab
Name: _____________________________
Using H for heads and T for tails, list all different possible results of flipping three coins (the
order of the coins matters):
___________________________________________________________________________
Now, how many ways are there of getting three heads? ___ two heads and a tail? ___ two tails
and a head? ___ three heads? ___ How many different possibilities are there? ___
trial 1 2 3 4 5 6 7 8 9 10
coin 1
coin 2
coin 3
your data
3H
2 H, 1 T
1 H, 2 T
3T
In one school system, there are 472 families that have five children. Do a chi square analysis to
determine if the variation in the distribution is due to chance alone.
category observed (O) expected (E) (O - E) (O - E)2 (O - E)2/E
5 girls 19
4 girls, 1 boy 81
3 girls, 2 boys 162
2 girls, 3 boys 135
1 girl, 4 boys 69
5 boys 6
2
total 472 ----- =
One of the schools in the district is a girls' boarding school. How has the data from this school
affected the results?
1. What is the probability of picking a red 2 or a black 3 from a deck of cards? Show your work.
2. What is the probability of picking a six and an eight from the deck of cards? Show your work.
3. You have three dice: one red (R), one green (G) and one blue (B). When all three dice are
rolled at the same time, calculate the probability of the following outcomes:
If you blindly select one marble from each jar, calculate the probability of obtaining:
a. a red, a blue and a green
b. three whites
c. a red, a green and a whites
d. a red and two whites
e. at least one white
5. If a man and a woman are heterozygous for a gene, and if they have three children, what is the
chance that all three will also be heterozygous?
6. If four babies are born on a given day: (a) What is the chance that two will be boys and two
girls? What is the chance that all four will be girls? (c) What combination of boys and girls
among four babies is most likely? (d) What is the chance that at least one baby will be a girl?