0% found this document useful (0 votes)
59 views100 pages

Math 101 Statistics

The document provides an introduction to statistics, defining it both as a set of numerical data and as a branch of science focused on data collection, analysis, and interpretation. It discusses the general uses of statistics in various fields such as business, medicine, and social sciences, and differentiates between descriptive and inferential statistics. Additionally, it covers the concepts of population and sample, methods of data collection, and various sampling techniques.

Uploaded by

ptrckdaniel15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views100 pages

Math 101 Statistics

The document provides an introduction to statistics, defining it both as a set of numerical data and as a branch of science focused on data collection, analysis, and interpretation. It discusses the general uses of statistics in various fields such as business, medicine, and social sciences, and differentiates between descriptive and inferential statistics. Additionally, it covers the concepts of population and sample, methods of data collection, and various sampling techniques.

Uploaded by

ptrckdaniel15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

CHAPTER 1

Introduction

Definition of Statistics

• In its plural sense, statistics is a set of numerical data (e.g., vital statistics in a
beauty contest, monthly sales of a company, daily P-$ exchange rate).

• In its singular sense, Statistics is that branch of science which deals with the
collection, presentation, analysis, and interpretation of data.

1.1 NATURE OF STATISTICS

General Uses of Statistics

a. Statistics aids in decision making

• provides comparison
• explains action that has taken place
• justifies a claim or assertion
• predicts future outcome
• estimates unknown quantities

b. Statistics summarizes data for public use

Examples on the Role of Statistics

• In the biological and medical sciences, it can help researchers discover


relationships worthy of further attention.

Example: A doctor can use Statistics to determine to what extent is an


increase in blood pressure dependent upon age.

• In the social sciences, it can guide and help researchers support theories and
models that cannot stand on rationale alone.
2 CHAPTER 1. INTRODUCTION

Example: Empirical studies are using Statistics to obtain socio-economic


profile of the middle class to form new socio-political theories on
classes as the existing theories apparently are no longer valid.

• In business, a company can use statistics to forecast sales, design products,


and produce goods more efficiently.

Example: A pharmaceutical company can apply statistical procedures to find


out if a new formula is indeed more effective than the one being
used. Results can help the company decide whether to market the
new formula or not.

• In engineering, it can be used to test properties of various materials.

Example: A quality controller can use Statistics to estimate the average


lifetime of the products produced by their current equipment.

Fields of Statistics

a. Statistical Methods of Applied Statistics - refer to procedures and techniques used


in the collection, presentation, analysis, and interpretation of data.

• Descriptive Statistics - methods concerned with the collection, description,


and analysis of a set of data without drawing
conclusions or inferences about a larger set
- the main concern is simply to describe the set of data
such that otherwise obscure information is brought
out clearly
- conclusions apply only to the data on hand

• Inferential Statistics - methods concerned with making predictions or


inferences about a larger set of data using only the
information gathered from a subset of this larger set
- the main concern is not merely to describe but
actually predict and make inferences based on the
information gathered
- conclusions are applicable to a larger set of data
which the data on hand is only a subset

b. Statistical Theory of Mathematical Statistics - deals with the development and


exposition of theories that serve as bases of statistical methods.
ELEMENTARY STATISTICS 3

Descriptive Statistics vs. Inferential Statistics

Descriptive Inferential

• A bowler wants to find his bowling • A bowler wants to estimate his chance
average for the past 12 games. of winning a game based on his current
season averages and the averages of
his opponents.

• A housewife wants to determine the • A housewife would like to predict


average weekly amount she spent on based on last year’s grocery bills, the
groceries in the past 3 months. average weekly amount she will spend
on groceries for this year.

• A politician wants to know the exact • A politician would like to estimate,


number of votes he received in the based on an opinion poll, his chance
last election. for winning in the upcoming election.

1.2 POPULATION AND SAMPLE

Definition. A population is a collection of all the elements under consideration in a


statistical study.

A sample is a part or subset of the population from which the information is


collected.

Example: A manufacturer of kerosene heaters wants to determine if customers are


satisfied with the performance of their heaters. Toward this goal, 5,000 of
his 200,000 customers are contacted and each is asked, “Are you satisfied
with the performance of the kerosene heater you purchased?” Identify the
population and the sample for this situation.

Definition. A parameter is a numerical characteristic of the population.


A statistic is a numerical characteristic of the sample.

Example: In order to estimate the true proportion of students at a certain college who
smoke cigarettes, the administration polled a sample of 200 students and
determined that the proportion of students from the sample who smoke
cigarettes is 0.12. Identify the parameter and the statistic.
CHAPTER 2

Collection and Presentation of Data

2.1 PRELIMINARIES

Steps in a Statistical Inquiry

1. Define the problem.


2. Formulate the research design.
3. Collect the data.
4. Code and analyze the collected data.
5. Interpret the results.

Variables and Measurement

Definition. A variable is a characteristic or attribute of persons or objects which can


assume different values or labels for different persons or objects under
consideration.

Definition. Measurement is the process of determining the value or label of a


particular variable for a particular experimental unit.

Definition. An experimental unit is the individual or object on which a variable is


measured.

Classification of Variables

1. Discrete vs Continuous

Discrete variable - a variable which can assume finite, or, at most,


countably infinite number of values; usually measured
by counting or enumeration
Continuous variable - a variable which can assume infinitely many values
corresponding to a line interval

2. Qualitative vs Quantitative

Qualitative variable - a variable that yields categorical responses (e.g.,


political affiliation, occupation, marital status)
Quantitative variable- a variable that takes on numerical values representing
an amount or quantity (e.g., weight, height, no. of cars)
ELEMENTARY STATISTICS 5

Levels of Measurement

1. Nominal Level (or Classificatory Scale)

The nominal level is the weakest level of measurement where numbers or


symbols are used simply for categorizing subjects into different groups.

Examples:

Sex M-Male F-Female


Marital status 1-Single 2-Married 3-Widowed 4-Separated

2. Ordinal Level (or Ranking Scale)

The ordinal level of measurement contains the properties of the nominal level,
and in addition, the numbers assigned to categories of any variable may be
ranked or ordered in some low-to-high-manner.

Examples:

Teaching ratings 1-poor 2- fair 3-good 4-excellent


Year level 1-1st yr 2 – 2nd yr 3 – 3rd yr 4 – 4th yr

3. Interval Level

The interval level is that which has the properties of the nominal and ordinal
levels, and in addition, the distances between any two numbers on the scale
are of known sizes. An interval scale must have a common and constant unit
of measurement. Furthermore, the unit of measurement is arbitrary and there
is no “true zero” point.

Examples:

IQ
Temperature (in Celsius)

4. Ratio Level

The ratio level of measurement contains all the properties of the interval level,
and in addition, it has a “true zero” point.
.
Examples:

Age (in years)


Number of correct answers in an exam
6 CHAPTER 2. COLLECTION & PRESENTATION OF DATA

Classification of Data

1. Primary vs. Secondary

a. Primary source - data measured by the researcher/agency that published it

b. Secondary source - any republication of data by another agency

Example: The publications of the National Statistics Office are primary sources and
all subsequent publications of other agencies are secondary sources.

2. External vs. Internal

a. Internal data - information that relates to the operations and functions of the
organization collecting the data

b. External data - information that relates to some activity outside the


organization collecting the data

Example: The sales data of SM is internal data for SM but external data for any other
organization such as Robinson’s.

2.2 DATA COLLECTION METHODS

Data Collection Methods

1. Survey method - questions are asked to obtain information, either


through self-administered questionnaire or personal
interview

Self-administered questionnaire Personal interview


• Obtained information is limited to • Missing information and vague
subjects’ written answers to pre-arranged responses are minimized with the
questions proper probing of the interviewer
• Lower response rate • Higher response rate through call-backs
• It can be administered to a large number • It is administered to a person or group
of people simultaneously one at a time
• Respondents may feel freer to express • Respondent may feel more cautious
views and are less pressured to answer particularly in answering sensitive
immediately questions for fear of disapproval
• It is more appropriate for obtaining • It is more appropriate for obtaining
objective information about complex emotionally-laden topics
or probing sentiments underlying an
expressed opinion
ELEMENTARY STATISTICS 7

2. Observation method - makes possible the recording of behavior but only


at the time of occurrence (e.g., observing reactions
to a particular stimulus, traffic count)

Advantages over Survey Method:

• does not rely on the respondent’s willingness to provide information


• certain types of data can be collected only by observation (e.g.
behavior patterns of which the subject is not aware of or is ashamed
to admit)
• the potential bias caused by the interviewing process is reduced or
eliminated

Disadvantages over Survey Method:

• things such as awareness, beliefs, feelings and preferences cannot be


observed
• the observed behavior patterns can be rare or too unpredictable thus
increasing the data collection costs and time requirements

3. Experimental method - a method designed for collecting data under


controlled conditions. An experiment is an
operation where there is actual human interference
with the conditions that can affect the variable
under study. This is an excellent method of
collecting data for causation studies. If properly
designed and executed, experiments will reveal
with a good deal of accuracy, the effect of a change
in one variable on another variable.

4. Use of existing studies - e.g., census, health statistics, and weather bureau
reports

Two types:

• documentary sources - published or written reports, periodicals,


unpublished documents, etc.

• field sources – researchers who have done studies on the area of


interest are asked personally or directly for information needed

5. Registration method - e.g., car registration, student registration, and


hospital admission
8 CHAPTER 2. COLLECTION & PRESENTATION OF DATA

General Classification of Collecting Data

Definition. Census or complete enumeration is the process of gathering information


from every unit in the population.

• not always possible to get timely, accurate and economical data


• costly, especially if the number of units in the population is too large

Definition. Survey sampling is the process of obtaining information from the units in
the selected sample.

Advantages of Survey Sampling:

• reduced cost
• greater speed
• greater scope
• greater accuracy

2.3 PROBABILITY AND NON-PROBABILITY SAMPLING

Definition. A sampling procedure that gives every element of the population a (known)
nonzero chance of being selected in the sample is called probability
sampling. Otherwise, the sampling procedure is called non-probability
sampling.

• Whenever possible, probability sampling is used because there is no


objective way of assessing the reliability of inferences under non-
probability sampling.

Definition. The target population is the population from which information is desired.

Definition. The sampled population is the collection of elements from which the sample
is actually taken.

Definition. The population frame is a listing of all the individual units in the
population.
ELEMENTARY STATISTICS 9

Methods of Non-probability Sampling

1. purposive sampling - sets out to make a sample agree with the profile of
the population based on some pre-selected
characteristic

2. quota sampling - selects a specified number (quota) of sampling


units possessing certain characteristics

3. convenience sampling- selects sampling units that come to hand or are


convenient to get information from

4. judgment sampling - selects sample in accordance with an expert’s


judgment

Methods of Probability Sampling

1. Simple random sampling

2. Stratified random sampling

3. Systematic sampling

4. Cluster sampling

5. Multistage sampling

6. Sequential sampling - units are drawn one by one in a sequence without


prior fixing of the total number of observations and
the results of the drawing at any stage are used to
decide whether to terminate sampling or not
10 CHAPTER 2. COLLECTION & PRESENTATION OF DATA

• Simple Random Sampling

Description of the Design

Simple random sampling (SRS) is a method of selecting n units out of the N units in
the population in such a way that every distinct sample of size n has an equal chance
of being drawn. The process of selecting the sample must give an equal chance of
selection to any one of the remaining elements in the population at any one of the n
draws.

Random sampling may be with replacement (SRSWR) or without replacement


(SRSWOR). In SRSWR, a chosen element is always replaced before the next
selection is made, so that an element may be chosen more than once.

Sample Selection Procedure

Step 1: Make a list of the sampling units and number them from 1 to N.

Step 2: Select n numbers from 1 to N using some random process, for example, the
table of random numbers. n is distinct for SRSWOR , not necessarily distinct
for SRSWR.

Step 3: The sample consists of the units corresponding to the selected random
numbers.

Advantages

• The theory involved is much easier to understand than the theory behind other
sampling designs.

• Inferential methods are simple and easy.

Disadvantages

• The sample chosen may be widely spread, thus entailing high transportation costs.

• A population frame is needed.

• SRS results in less precise estimates if the population is heterogeneous with


respect to the characteristic under study.
ELEMENTARY STATISTICS 11

• Stratified Random Sampling

Description of the Design

In stratified random sampling, the population of N units is first divided into


subpopulations called strata. Then a simple random sample is drawn from each
stratum, the selection being made independently in different strata.

Sample Selection Procedure

Step 1: Divide the population into strata. Ideally, each stratum must consist of more
or less homogeneous units.

Step 2: After the population has been stratified, a simple random sample is selected
from each stratum.

Advantages

• Stratification may produce a gain in precision in the estimates of characteristics of


the population

• It allows for more comprehensive data analysis since information is provided for
each stratum.

• It is administratively convenient.

Disadvantages

• A listing of the population for each stratum is needed.

• The stratification of the population may require additional prior information about
the population and its strata.
12 CHAPTER 2. COLLECTION & PRESENTATION OF DATA

• (1-in-k) Systematic Sampling

Description of the Design

Systematic sampling with a “random start” is a method of selecting a sample by


taking every kth unit from an ordered population, the first unit being selected at
random. Here k is called the sampling interval; the reciprocal 1/k is the sampling
fraction.

Sample Selection Procedure

Method A

Step 1: Number the units of the population consecutively from 1 to N.


Step 2: Determine k, the sampling interval using the formula k = N/n.
Step 3: Select the random start r, where 1 < r < k. The unit corresponding to r is the
first unit of the sample.
Step 4: The other units of the sample correspond to r + k, r + 2k, r + 3k, and so on.

Method B

Step 1: Number the units of the population consecutively from 1 to N.


Step 2: Let k be the nearest integer less than N/n.
Step 3: Select the random start r, where 1 ≤ r ≤ N. The unit corresponding to r is the
first unit of the sample.
Step 4: Consider the list of units of the population as a circular list, i.e., the last unit
in the list is followed by the first. The other units in the sample are the units
corresponding to r + k, r + 2k, r + 3k,..., r+ (n-1)k.

Advantages

• It is easier draw the sample and often easier to execute without mistakes than
simple random sampling.
• It is possible to select a sample in the field without a sampling frame.
• The systematic sample is spread evenly over the population.

Disadvantages

• If periodic regularities are found in the list, a systematic sample may consist only
of similar types. (Example: Store sales over seven days of the week – estimating
total sales based on a systematic sample every Tuesday would be unwise.)
• Knowledge of the structure of the population is necessary for its most effective
use.
ELEMENTARY STATISTICS 13

• Cluster Sampling

Description of the Design

Cluster sampling is a method of sampling where a sample of distinct groups, or


clusters, of elements is selected and then a census of every element in the selected
clusters is taken. Similar to strata in stratified sampling, clusters are non-overlapping
sub-populations which together comprise the entire population. For example, a
household is a cluster of individuals living together or a city block might also be
considered as a cluster. Unlike strata, however, clusters are preferably formed with
heterogeneous, rather than homogeneous elements so that each cluster will be typical
of the population.

Clusters may be of equal or unequal size. When all of the clusters are of the same
size, the number of elements in a cluster will be denoted by M while the number of
clusters in the population will be denoted by N.

Sample-Selection Procedure

Step 1: Number the clusters from 1 to N.

Step 2: Select n numbers from 1 to N at random. The clusters corresponding to the


selected numbers form the sample of clusters.

Step 3: Observe all the elements in the sample of clusters.

Advantages

• A population list of elements is not needed; only a population list of clusters is


required. Thus, listing cost is reduced.

• Transportation cost is also reduced.

Disadvantages

• The costs and problems of statistical analysis are greater.

• Estimation procedures are more difficult.


14 CHAPTER 2. COLLECTION & PRESENTATION OF DATA

• Multistage Sampling

Description of the Design

In multistage sampling, the population is divided into a hierarchy of sampling units


corresponding to the different sampling stages. In the first stage of sampling, the
population is divided into primary stage units (PSU) then a sample of PSUs is drawn.
In the second stage of sampling, each selected PSU is subdivided into second-stage
units (SSU) then a sample of SSUs is drawn. The process of subsampling can be
carried to a third stage, fourth stage and so on, by sampling the subunits instead of
enumerating them completely at each stage.

Advantages

• Listing cost is reduced.

• Transportation cost is also reduced.

Disadvantages

• Estimation procedure is difficult, especially when the primary stage units are not
of the same size.

• Estimation procedure gets more complicated as the number of sampling stages


increases.

• The sampling procedure entails much planning before selection is done.


ELEMENTARY STATISTICS 15

2.4 TABULAR AND GRAPHICAL PRESENTATION OF DATA

Textual Presentation

• data incorporated to a paragraph of text

Example

At last count, 38 airlines were operating Boeing 707’s, 720’s, and 727’s over the
world’s airlines. The far-flung Boeing fleet has now logged an estimated
1,803,704,000 miles (22,855,948,000 kms.) and has massed approximately
4,096,000 revenue flight hours. Passenger totals stand at upwards of 71.6 million.

Advantages

• This presentation gives emphasis to significant figures and comparisons


• It is simplest and most appropriate approach when there are only a few numbers to
be presented

Disadvantages

• When a large mass of quantitative data are included in a text or paragraph, the
presentation becomes almost incomprehensible
• Paragraphs can be tiresome to read especially if the same words are repeated so
many times

Tabular Presentation

• the systematic organization of data in rows and columns

Advantages

• more concise than textual presentation


• easier to understand
• facilitates comparisons and analysis of relationship among different categories
• presents data in greater detail than a graph
16 CHAPTER 2. COLLECTION & PRESENTATION OF DATA

Parts of a Formal Statistical Table

1. Heading - consists of a table number, title, and headnote. The title is a


brief statement of the nature, classification and time reference
of the information presented and the area to which the statistics
refer. The headnote is a statement enclosed in brackets
between the table title and the top rule of the table that provides
additional title information.

2. Box Head - the portion of the table that contains the column heads which
describe the data in each column, together with the needed
classifying and qualifying spanner heads.

3. Stub - the portion of the table usually comprising the first column on
the left, in which the stubhead and row captions, together with
the needed classifying and qualifying center head and subheads
are located. The stubhead describes the stub listing as a whole
in terms of the classification presented. The row caption is a
descriptive title of the data on the given line.

4. Field - main part of the table; contains the substance or the figures of
one’s data

5. Source note - an exact citation of the source of data presented in the table
(should always be placed when the figures are not original)

6. Footnote - any statement or note inserted at the bottom of the table

heading Table 4.4 – CRIME VOLUME AND RATE BY TYPE: 1991 – 1993
(Rate per 100,000 population)

1991 1992 1993 boxhead


Type Volume Crime Volume Crime Volume Crime
Rate Rate Rate
Total 121,326 195 104,719 164 96,686 148

Index Crimes 77,261 124 67,354 106 58,684 90


stub Murder 8,707 14 8,293 13 7,758 12
Homicide 8,069 13 7,912 12 7,123 11
Physical Injury 21,862 35 20,462 32 18,722 29 field
Robbery 13,817 22 11,164 18 9,856 15
Theft 22,780 37 17,374 27 12,940 20
Rape 2,026 3 2,149 3 2,285 4

Nonindex Crimes 44,065 71 37,365 59 38,002 58

Source: Philippine National Police


ELEMENTARY STATISTICS 17

Guidelines

• The title should be concise, written in telegraphic style, not in complete sentence.

• Column labels should be precise. Stress differences rather than similarities between
adjacent columns. As much as possible, two or more adjacent columns should not
begin nor end with the same phrase. This is frequently a signal that a spanner head is
needed.

• The arrangement of lines in the stub depends on the nature of classification, purpose
of presentation or limitations of space.

• Categories should not overlap.

• The units of measure must be clearly stated.

• Show any relevant total, subtotals, percentages, etc.

• Indicate if the data were taken from another publication by including a source note.

• Tables should be self-explanatory, although they may be accompanied by a paragraph


that will provide an interpretation or direct attention to important figures.
18 CHAPTER 2. COLLECTION & PRESENTATION OF DATA

Graphical Presentation

• a graph or chart is a device for showing numerical values or relationships in


pictorial form

Advantages

• main features and implications of a body of data can be grasped at a glance


• can attract attention and hold the reader’s interest
• simplifies concepts that would otherwise have been expressed in so many
words
• can readily clarify data, frequently bring out hidden facts and relationships

Qualities of a Good Graph

1. Accuracy - A good chart should not be deceptive, distorted, misleading, or


in any way susceptible to wrong interpretations as a result of
inaccurate or careless construction. Also, care should be taken
so as not to create any optical illusion.

2. Clarity - An effective chart can be easily read and understood. The


graph should focus on the message it is trying to
communicate. There should be an unambiguous
representation of the facts. The graph must be able to aid the
reader in the interpretation of facts.

3. Simplicity - The basic design of a statistical chart should be simple,


straight- forward, not loaded with irrelevant, superfluous, or
trivial symbols and ornamentation. There should be no
distracting elements in a chart that inhibit effective visual
communication.

4. Appearance - A good chart is one that is designed and constructed to attract


and hold attention by holding a neat, dignified, and
professional appearance. It must be artistic in that it embodies
harmonious composition, proportion, and balance.
ELEMENTARY STATISTICS 19

Common Types of Graph

1. Line Chart - graphical presentation of data especially useful for showing


trends over a period of time.

Market Shares of Leading Softdrinks in Metro Manila:


1989-1995

50 Coca-cola
Pepsi
40
% Shares

30

20

10

0
1989 1990 1991 1992 1993 1994 1995
Year

2. Pie Chart - a circular graph that is useful in showing how a total quantity is
distributed among a group of categories. The “pieces of the pie” represent the
proportions of the total that fall into each category.

Market Shares of Softdrinks in


Metro Manila

Sprite
Sarsi 5% Others
7-up 5% 12%
8%

Pepsi Coca-Cola
30% 40%
20 CHAPTER 2. COLLECTION & PRESENTATION OF DATA

3. Bar Chart - consists of a series of rectangular bars where the length of the bar
represents the quantity or frequency for each category if the bars are arranged
horizontally. If the bars are arranged vertically, the height of the bar
represents the quantity.

Market Shares of Softdrinks in


Metro Manila

Others

Sprite

Sarsi

7-up

Pepsi

Coca-Cola

0 10 20 30 40
Market Shares (in % )

4. Pictorial unit chart – a pictorial chart in which each symbol represents a


definite and uniform value

Growth Pattern of Philippine Population: 1960 - 2000


ELEMENTARY STATISTICS 21

2.5 THE FREQUENCY DISTRIBUTION TABLE

Definition. The raw data is the set of data in its original form.

Example: Final grades of Stat 101 Students

82 82 83 79 72 71 84 59 77 50 87
83 82 63 75 50 85 76 79 68 69 62
79 69 74 53 73 71 50 76 57 81 62
72 88 84 80 68 50 74 84 71 73 68
71 80 72 60 81 89 94 80 84 81 50
84 76 75 82 76 53 91 69 60 89 79
59 62 79 82 72 81 60 84 68 66 94
77 78 87 75 86 82 74 73 72 84 51
50 69 75 70 77 87 86 77 75 96 66
87 73 84 68 85 62 87 92 69 52 65

Definition. An array is an arrangement of observations according to their magnitude,


either in increasing or decreasing order.

Example: Final grades of Stat 101 Students arranged in an array

50 57 63 69 72 74 77 80 82 84 87
50 59 65 69 72 75 77 80 82 84 87
50 59 66 69 72 75 77 80 82 85 88
50 60 66 69 72 75 77 81 83 85 89
50 60 68 70 73 75 78 81 83 86 89
50 60 68 71 73 75 79 81 84 86 91
51 62 68 71 73 76 79 81 84 87 92
52 62 68 71 73 76 79 82 84 87 94
53 62 68 71 74 76 79 82 84 87 94
53 62 69 72 74 76 79 82 84 87 96

Advantages:

• easier to detect the smallest and largest value


• easier to find the measures of position
22 CHAPTER 2. COLLECTION & PRESENTATION OF DATA

In the construction of a frequency distribution, the various items of a series are classified
into groups. The frequency distribution table shows the number of items falling into
each group.

Definition of terms

1. Class frequency - the number of observations falling in the class


2. Class interval - the numbers defining the class
3. Class limits - the end numbers of the class
4. Class boundaries - the true class limits; the lower class boundary (LCB) is
usually defined as halfway between the lower class limit of the
class and the upper class limit of the preceding class while the
upper class boundary (UCB) is usually defined as halfway
between the upper class limit of the class and the lower class
limit of the next class
5. Class size - the difference between the upper class boundaries of the class
and the preceding class; can also be computed as the difference
between the lower class boundaries of the current class and the
next class; can also be computed by using the respective class
limits instead of the class boundaries
6. Class mark (CM) - midpoint of a class interval
7. Open-end class - a class that has no lower limit or upper limit

Examples:

Class Freq. LCB UCB CM


50 – 55 10 49.5 55.5 52.5
56 – 61 6 55.5 61.5 58.5
62 – 67 8 61.5 67.5 64.5
68 – 73 24 67.5 73.5 70.5
74 - 79 22 73.5 79.5 76.5
80 – 85 24 79.5 85.5 82.5
86 – 91 12 85.5 91.5 88.5
92 – 97 4 91.5 97.5 94.5
OR
Class Freq. LCB UCB CM
50 – 54 10 49.5 54.5 52
55 – 59 3 54.5 59.5 57
60 – 64 8 59.5 64.5 62
65 – 69 13 64.5 69.5 67
70 – 74 17 69.5 74.5 72
75 – 79 19 74.5 79.5 77
80 – 84 22 79.5 84.5 82
85 – 89 13 84.5 89.5 87
90 – 94 4 89.5 94.5 92
95 – 99 1 94.5 99.5 97
ELEMENTARY STATISTICS 23

Steps in Constructing a Frequency Distribution Table

1. Determine the number of classes. There must be an adequate number of classes to


show the essential characteristics of the data; at the same time, there should not be
too many classes that it is already difficult to grasp the picture of the distribution
as a whole. There are no precise rules concerning the optimal number of classes
but Sturges’ formula can be used as a first approximation.

Sturges’ formula: K = 1 + 3.322 log n


= approximate number of classes
n = number of observations

2. Determine the approximate class size. Whenever possible, all classes should be
of the same size. The following steps can be used to determine the class size.

• Solve for the range, R = max – min.


• Compute for C’ = R ÷ K.
• Round-off C’ to a convenient number to work with, say C, and use C as the
class size.

3. Determine the lowest class limit. The first class must include the smallest value
in the data set.

4. Determine all class limits by adding the class size, C, to the limit of the previous
class.

5. Tally the frequencies for each class. Sum the frequencies and check against the
total number of observations.

Variations of the Frequency Distribution

1. Relative Frequency (RF) Distribution and Relative Frequency Percentage (RFP)

RF = class frequency ÷ no. of observations


RFP = RF * 100%

2. Cumulative Frequency Distribution (CFD)


- shows the accumulated frequencies of successive classes, beginning at either
end of the distribution

Greater than CFD – shows the no. of observations greater than the LCB
Less than CFD – shows the no. of observations less than the UCB
24 CHAPTER 2. COLLECTION & PRESENTATION OF DATA

Example:

Class Freq. LCB UCB RF RFP <CF >CF

50 – 54 10 49.5 54.5 .09 9 10 110


55 – 59 3 54.5 59.5 .03 3 13 100
60 – 64 8 59.5 64.5 .07 7 21 97
65 – 69 13 64.5 69.5 .12 12 34 89
70 – 74 17 69.5 74.5 .15 15 51 76
75 – 79 19 74.5 79.5 .17 17 70 59
80 – 84 22 79.5 84.5 .20 20 92 40
85 – 89 13 84.5 89.5 .12 12 105 18
90 – 94 4 89.5 94.5 .04 4 109 5
95 – 99 1 94.5 99.5 .01 1 110 1

Graphical Presentation of the Frequency Distribution Table

1.Frequency Histogram - a bar graph that displays the classes on the horizontal axis and
the frequencies of the classes on the vertical axis; the vertical lines of the bars are
erected at the class boundaries and the height of the bars correspond to the class
frequency

25

20

No. of 15
Students
10

0
49.5 54.5 59.5 64.5 69.5 74.5 79.5 84.5 89.5 94.5 99.5

Grades
ELEMENTARY STATISTICS 25

2. Relative Frequency Histogram - a graph that displays the classes on the horizontal
axis and the relative frequencies on the vertical axis

Note: The relative frequency histogram has the same shape as the frequency
histogram but has a different vertical axis.

0 .2 5

0 .2

Relative 0 .1 5
Freq.
0 .1

0 .0 5

0
49.5 54.5 59.5 64.5 69.5 74.5 79.5 84.5 89.5 94.5 99.5
Grades

3. Frequency Polygon – a line chart that is constructed by plotting the frequencies at


the class marks and connecting the plotted points by means of straight lines; the
polygon is closed by considering an additional class at each end and the ends of the
lines are brought down to the horizontal axis at the midpoints of the additional
classes.

25

20

15
No. of
students 10

0
47 52 57 62 67 72 77 82 87 92 97 102

Grades
26 CHAPTER 2. COLLECTION & PRESENTATION OF DATA

4. Ogives - graphs of the cumulative frequency distribution

a. < ogive - the <CF is plotted against the UCB


b. > ogive - the >CF is plotted against the LCB
Cumulative Frequency
120
100
80 < o g iv e
60
40 > o g iv e
20
0
.5

.5

.5

.5

.5

.5
44

54

64

74

84

94
G ra d e s

2.6 THE STEM-AND-LEAF DISPLAY

The stem-and-leaf display is an alternative method for describing a set of data. It


presents a histogram-like picture of the data, while allowing the experimenter to
retain the actual observed values of each data point. Hence, the stem-and-leaf display
is partly tabular and partly graphical in nature.

In creating a stem-and-leaf display, we divide each observation into two parts, the
stem and the leaf. For example, we could divide the observation 244 as follows:

Stem Leaf
2 | 44 Grades

Alternatively, we could choose the point of division between the units and tens,
whereby

Stem Leaf
24 | 4

The choice of the stem and leaf coding depends on the nature of the data set.
ELEMENTARY STATISTICS 27

Steps in Constructing the Stem-and-Leaf Display

1. List the stem values, in order, in a vertical column.

2. Draw a vertical line to the right of the stem value.

3. For each observation, record the leaf portion of that observation in the row
corresponding to the appropriate stem

4. Reorder the leaves from lowest to highest within each stem row. Maintain
uniform spacing for the leaves so that the stem with the most number of
observations has the longest line.

5. If the number of leaves appearing in each row is too large, divide the stem into
two groups, the first corresponding to leaves beginning with digits 0 through 4
and the second corresponding to leaves beginning with digits 5 through 9. this
subdivision can be increased to five groups if necessary.

6. Provide a key to your stem-and-leaf coding so that the reader can recreate the
actual measurements from your display.

Example: Typing speeds (net words per minute) for 20 secretarial applicants
68 72 91 47
52 75 63 55
65 35 84 45
58 61 69 22
46 55 66 71

Stem Leaf (unit = 1)


2 2
3 5
4 5 6 7
5 2 5 5 8
6 1 3 5 6 8 9
7 1 2 5
8 4
9 1

Note: The stem-and-leaf display should include a reminder indicating the units
of the data value.

Example:
Unit = 0.1 1 | 2 represents 1.2
Unit = 1 1 | 2 represents 12
Unit = 10 1 | 2 represents 120
CHAPTER 3

Measures of Central Tendency


and
Measures of Location

Definition. A measure of central tendency is any single value that is used to identify
the “center” or the typical value of a data set. It is often referred to as the
average.

Characteristics of a Good Average

1. easily understood
- not a distant mathematical abstraction

2. objective and rigidly defined


- should encounter no question as to what the value is

3. stable
- not affected materially by minor variations in the groups of items

4. easily amenable to further statistical computation

3.1 NOTATIONS AND SYMBOLS

Suppose that a variable X is the variable of interest, and that n measurements are
taken. The notation X1, X2, . . . ,Xn will be used to represent the n observations.

Let the Greek letter Σ indicate the “summation of,” thus, we can write the sum of
n
n observations as ∑X
i =1
i = X 1 + X 2 + ... + X n .

The numbers 1 and n are called the lower and the upper limits of summation,
respectively.
ELEMENTARY STATISTICS 29

Some Results on Summation

1. The summation of the sum of variables is the sum of their summations.

n n n

∑(X
i =1
i + Yi ) = ∑ X i + ∑ Yi
i =1 i =1

n n n n

∑ (a
i =1
i + bi + ... + z i ) = ∑ ai + ∑ bi + ... + ∑ z i
i =1 i =1 i =1

2. If c is a constant, then

n n

∑ cX i = c∑ X i
i =1 i =1

3. If c is a constant then

∑ c = nc
i =1

Examples

Given:
i 1 2 3 4
Xi 2 4 6 8
Yi 1 2 1 2

Show:

3 4 4
1. ∑ X i = 10
i=2
4. ∑ X i ∑ Yi = 120
i =1 i =1

3 4
 Xi 
2. ∑ (X
i=2
i + Yi ) = 13 5. ∑  Y
i =1  i 
 = 14

4 ∑X i
20
3. ∑ X iYi = 32 6. i =1
n
=
6
=31
3
i =1
∑Y
i =1
i
30 CHAPTER 3. MEASURES OF CENTRAL TENDENCY
AND MEASURES OF LOCATION

3.2 THE ARITHMETIC MEAN

- the most common average


- the sum of all values of the observations divided by the number of
observations
- simply referred to as the mean

The population mean for a finite population with N elements, denoted by the Greek
N

∑X i
letter µ (mu), is computed as µ = i =1
.
N
n

∑X i
The sample mean X (read as “X bar”) of n observations is computed as X = i =1
.
n

The sample mean (a statistic) is an estimate of the unknown population mean (a


parameter).

Examples:

1. The number of employees at 5 different drug stores are 10, 12, 6, 8, and 4.
Treating the data as a population, find the mean number of employees for the 5
stores.

2. Scores in the Statistics 101 first exam for a sample of 10 students are as follows:
60, 55, 30, 90, 88, 79, 45, 66, 93, and 80. Find the mean.

3. Refer to the example on the final grades of 110 Statistics 101 students. The
110

∑X i
sample mean is given by X = i =1
= 74.1 .
110
ELEMENTARY STATISTICS 31

Definition. The weighted mean is a modification of the usual mean that assigns
weights (or measures of relative importance) to the observations to be
averaged. If each observation Xi is assigned a weight Wi, i = 1, 2,…, n,
n

∑W X i i
the weighted mean is given by X = i =1
n
.
∑W i =1
i

Examples:

1. Suppose a teacher assigns the following weights to the various course


requirements:

Assignment 15%
Project 25%
Midterm Exam 20%
Final Exam 40%

The maximum score a student may obtain for each component is 100. Jeffry
obtains marks of 83 for assignments, 72 for the project, 41 for the midterm exam,
and 47 for the final exam. Find his mean mark for the course.

2. Alex’s grades for the second semester AY 1996-1997 are as follows:

History 1.0
Humanities 1.0
Math 19 3.0
Math 53 3.0
Philosophy 1.0

Math 53 is a 5-unit course and all others are 3-unit courses. Find Alex’s GWA for
the semester.
32 CHAPTER 3. MEASURES OF CENTRAL TENDENCY
AND MEASURES OF LOCATION

Characteristics of the Mean

1. It is the most familiar measure used, and it employs all available information.

2. It is affected by the value of every observation. In particular it is strongly


influenced by extreme values.

3. Since the mean is a calculated number, it may not be an actual number in the data
set.

4. It possesses two mathematical properties that will prove to be important in


subsequent analyses.
i) The sum of the deviations of the values from the mean is zero.
ii) The sum of the squared deviations is minimum when the deviations are
taken from the mean.

5. a. If a constant c is added (subtracted) to all observations, the mean of the new


observations will increase (decrease) by the same amount c.
b. If all observations are multiplied or divided by a constant, the new
observations will have a mean that is the same constant multiple of the
original mean.

Example:

Given 5 temperature readings measured in Fahrenheit: 98, 100, 107, 90, 92. The
mean temperature is X F = 97.4 .

5
The mean temperature in centigrade is X c = (97.4 − 32) = 36.3 .
9

Approximating the Mean from a Frequency Distribution

- possible only when the class mark can be assumed to be representative of all
the values in that class. If the assumption holds, the following equation may
be used to approximate the mean from a frequency distribution.
k

∑fX i i
X = i =1

where fi = the frequency of the ith class


Xi = the class mark of the ith class
k = total number of classes
k
n = total number of observations = ∑f
i =1
i
ELEMENTARY STATISTICS 33

Example: Final grade of 110 Statistics 101 students

Class Freq. CM fiXi


(fi) (XI)
50 – 54 10 52 520
55 – 59 3 57 171
60 – 64 8 62 496
65 – 69 13 67 871
70 – 74 17 72 1224
75 – 79 19 77 1463
80 – 84 22 82 1804
85 – 89 13 87 1131
90 – 94 4 92 368
95 – 99 1 97 97
Total 110 8145

10

∑fX i i
8145
X = i =1
10
= = 74.0
110
∑f
i =1
i

Remarks:

1. The formula for approximating the mean cannot be used if a frequency


distribution has open-ended intervals, unless there are reasonably accurate
estimates of the class marks for the open intervals.

2. The mean of a frequency distribution is simply a weighted mean of the class


marks, where the fi’s are the weights.
34 CHAPTER 3. MEASURES OF CENTRAL TENDENCY
AND MEASURES OF LOCATION

3.3 THE MEDIAN

- the positional middle of the arrayed data


- in an array, one-half of the values precede the median and one-half follow it

The first step in calculating the median, denoted as Md, is to arrange the data in an
array.

Let X(i) be the ith observation in the array, i = 1, 2, . . . , n.

If n is odd, the median position equals (n+1)/2, and the value of the (n+1)/2 th
observation in the array is taken as the median, i.e.,

Md = X ([n +1] / 2 )

If n is even, the mean of the two middle values in the array is the median, i.e.,

X ( n / 2) + X (( n / 2 )+1)
Md =
2

Examples :

1. Given the following heights ( in inches ): 71, 72, 75, 75, and 67 . Find the median
height.

2. Given the following scores: 1, 7, 3, 3, 6, 5, 4, 3, find the median of the


scores.

3. Refer to the example on the grades of 110 Statistics 101 students. The median is
X (55) + X (56 ) 75 + 75
given by Md = = = 75 .
2 2

Characteristics of the Median:

1. The median is a positional measure.

2. The median is affected by the position of each item in the series but not by the
value of each item. This means that extreme values affect the median less than
the arithmetic mean.
ELEMENTARY STATISTICS 35

Approximating the Median from the Frequency Distribution

- possible only when the values of the observations falling in the median class can
be assumed to be evenly spaced throughout the class. (The median class is the
class containing the median.)

Step 1. Construct the less than cumulative frequency distribution.


Step 2. Starting from the top, locate the class with less than cumulative frequency
greater than or equal to n/2 for the first time. This class is the median
class.
Step 3. Approximate the median using the following formula:

 n / 2− < CFmd −1 
Md = LCBmd + c 
 f md 

where LCBmd = the lower class boundary of the median class


c = class size of the median class
n = the total number of observations in the distribution
<CF md - 1 = less than cumulative freq. of the class preceding the
median class
fmd = frequency of the median class

Example:

Refer to the example on the final grades of 110 Statistics 101 students.

Class Freq. <CF

50 – 54 10 10
55 – 59 3 13
60 – 64 8 21
65 – 69 13 34
70 – 74 17 51 < cum. freq.
Median class 75 – 79 19 70 greater than n/2=55
80 – 84 22 92 for the first time
85 – 89 13 105
90 – 94 4 109
95 – 99 1 110

 (110 / 2) − 51 
Md = 74.5 + 5  = 75.6
 19 
36 CHAPTER 3. MEASURES OF CENTRAL TENDENCY
AND MEASURES OF LOCATION

3.4 THE MODE

- the observed value that occurs most frequently


- locates the point where the observation values occur with the greatest density
- generally a less popular measure than the mean or the median

The mode is determined by counting the frequency of each value and finding the
value with the highest frequency of occurrence.

Examples:

1. 2, 5, 2, 3, 5, 2, 1, 4, 2, 2, 2, 1, 2, 2, 2, 3, 2, 2, 2, 2

2. 2, 5, 5, 2, 2, 5, 1, 3, 5, 4, 2, 5, 5, 2, 2, 5, 5, 2, 2, 1

3. 1, 2, 3, 3, 2, 1, 2, 3, 1, 4, 4, 5, 5, 1, 2, 3, 4, 5, 4, 5

4. Refer to the example on the final grades of 110 Statistics 101 students. The mode
is Mo=84.

Characteristics of the Mode:

1. It does not always exist; and if it does, it may not be unique. A data set is said to
be unimodal if there is only one mode, bimodal if there are two modes, trimodal if
there are three modes, and so on.

2. It is not affected by extreme values.

3. The mode can be used for qualitative as well as quantitative data.


ELEMENTARY STATISTICS 37

Approximating the Mode from the Frequency Distribution

Step 1: Locate the modal class. The modal class is the class with the highest
frequency.
Step 2: Approximate the mode using the following formula:

 f mo − f1 
Mo = LCBmo + c 
 2 f mo − f1 − f 2 

where LCBmo = lower class boundary of the modal class


c = class size of the modal class
fmo = frequency of the modal class
f1 = frequency of the class preceding the modal class
f2 = frequency of the class following the modal class

Example :

Refer to the example on the final grades of 110 Statistics 101 students.

Class Freq.

50 – 54 10
55 – 59 3
60 – 64 8
65 – 69 13
70 – 74 17
75 – 79 19
Modal class 80 – 84 22
85 – 89 13
90 – 94 4
95 – 99 1

 22 − 19 
Mo = 79.5 + 5  = 80.8
 2(22) − 19 − 13 
38 CHAPTER 3. MEASURES OF CENTRAL TENDENCY
AND MEASURES OF LOCATION

3.5 MEASURES OF LOCATION

Definition. Measures of location (or fractiles/quantiles) are values below which a


specified fraction or percentage of the observations in a given set must fall.

Definition. Percentiles are values that divide a set of observations in an array into 100
equal parts. Thus,

P1, read as first percentile, is the value below which 1% of the values fall.

P2, read as second percentile, is the value below which 2% of the values fall.



P99, read as ninety-ninth percentile, is the value below which 99% of the
values fall.

To compute for the ith percentile:

 i (n + 1) 
Pi = the value of the  th observation in the array
 100 

Approximating the ith Percentile from the Frequency Distribution

 (in / 100)− < CFPi −1 


Pi = LCB Pi + c 
 f Pi 

where LCBPi = the lower class boundary of the Pi th class


c = the class size of the Pith class
n = the total number of observations in the distribution
<CF Pi - 1 = the less than cumulative frequency of the class preceding
the Pi th class
fPi = frequency of the Pi th class

The Pi th class is the class where the less than cumulative frequency is equal
to, or exceeds for the first time, in/100.
ELEMENTARY STATISTICS 39

Other Forms of Fractiles:

1. Deciles

- values that divide the array into 10 equal parts. Thus,

D1, read as first decile, is the value below which 10% of the values fall.
D2, read as second decile, is the value below which 20% of the values fall.



D9, read as ninth decile, is the value below which 90% of the values fall.

2. Quartiles

- values that divide the array into 4 equal parts. Thus,

Q1, read as first quartile, is the value below which 25% of the values fall.
Q2, read as second quartile, is the value below which 50% of the values fall.
Q3, read as third quartile, is the value below which 75% of the values fall.

Examples: Use the data on Stat 101 final grades

a.) Ungrouped data b) Grouped data

1. P90 = X(90*[110+1]/100) 1. P90 = 84.5 + 5x(99-92)/13


= X(99.9) = 87.2
= X(99) + 0.9[X(100) - X(99)]
= 87 + 0.9(87 - 87) = 87

2. D3 = 69 2. D3 = 69.1

3. Q2 = 75 3. Q2 = 75
CHAPTER 4

Measures of Dispersion
and
Measures of Skewness

Definition. Measures of dispersion indicate the extent to which individual items in a


series are scattered about an average.

Some Uses for Measuring Dispersion

• to determine the extent of the scatter so that steps may be taken to control the
existing variation

• used as a measure of reliability of the average value

General Classifications of Measures of Dispersion

1. Measures of Absolute Dispersion


2. Measures of Relative Dispersion

4.1 MEASURES OF ABSOLUTE DISPERSION

Measures of absolute dispersion are expressed in the units of the original


observations. They can not be used to compare variations of two data sets when
the averages of these data sets differ a lot in value or when the observations differ
in units of measurement.

• The Range

Definition. The range of a set of measurements is the difference between the largest
and the smallest values.

Range = maximum - minimum

The range is approximated from a frequency distribution by getting the difference


between the upper class limit of the highest class interval and the lower class limit of
the lowest class interval.
ELEMENTARY STATISTICS 41

Examples:

1. The IQ’s of 5 members of a certain family are 108, 112, 127, 116, and 113. Find
the range.

2. Refer to the example on the final grade of 110 Statistics 101 students. The range
is Range = 96 – 50 = 46.

Approximating the range from the frequency distribution table, we get


Range = 99 – 50 = 49.

Characteristics of the Range

1. It uses only the extreme values. It fails to communicate any information about the
clustering or the lack of clustering of the values between the extremes.

2. A weakness of the range is that an outlier can greatly alter its value.

3. It can not be approximated from open-ended frequency distributions.

4. It is unreliable when computed from a frequency distribution table with gaps or


zero frequencies.

• The Standard Deviation and the Variance

Definition. For a finite population of size N, the population variance is

∑ (X − µ)
2
i
σ2 = i =1

and the population standard deviation is

∑ (X − µ)
2
i
σ= i =1

N
42 CHAPTER 4. MEASURES OF DISPERSION
AND MEASURES OF SKEWNESS

Definition. For a sample of size n, the sample variance is

∑ (X )
n
2
i −X
s2 = i =1

n −1

and the sample standard deviation is

∑ (X )
n
2
i −X
i =1
s=
n −1

Remarks:

1. The standard deviation is the most frequently used measure of dispersion.

2. The variance is not a measure of absolute dispersion. It is not expressed in the


same units as the original observations.

Examples:

1. The following scores were given by 6 judges for a gymnast’s performance in the
vault: 7, 5, 9, 7, 8, and 6. Find the standard deviation.

µ= 7 σ = 10 = 1 .3
6

2. A sample of 5 households showed the following number of household members:


3, 8, 5, 4, and 4. Find the standard deviation.

X = 4.8 s = 14.8 = 1.9


4

3. Refer to the example on the final grades of 110 Statistics 101 students. The
sample standard deviation is given by

110

∑ (X − 74.11)
2
i
i =1 13798.69
s= = = 11.25
109 109
ELEMENTARY STATISTICS 43

Computational formula:

2
n
 n 
n∑ X i2 −  ∑ X i 
s 2 = i =1  i =1 
n(n − 1)

Example: For the final grade of 110 Statistics 101 students,

110(617936) − (8152) 2 1517856


s= = = 11.25
110(109) 11990

Approximating the Variance from the Frequency Distribution

∑ f (X )
k
2
i i −X
s2 = i =1

n −1

or, using the computational formula,

2
k
 k 
n∑ fi X −  ∑ fi X i 
i
2

s =
2 i =1  i =1 
n(n − 1)

where fi = frequency of the ith class


Xi = classmark of the ith class
X = mean of the frequency distribution
n = total number of observations
44 CHAPTER 4. MEASURES OF DISPERSION
AND MEASURES OF SKEWNESS

Example:

Class Freq. CM fiXi fiXi2


(fi) (Xi)

50 – 54 10 52 520 27040
55 – 59 3 57 171 9747
60 – 64 8 62 496 30752
65 – 69 13 67 871 58357
70 – 74 17 72 1224 88128
75 – 79 19 77 1463 112651
80 – 84 22 82 1804 147928
85 – 89 13 87 1131 98397
90 – 94 4 92 368 33856
95 – 99 1 97 97 9409
Total 110 8145 616265

110(616265) − (8145) 2 1448125


s= = = 10.99
110(109) 11990

Characteristics of the Standard Deviation

1. It is affected by the value of every observation. It may be distorted by few


extreme values.

2. It can not be computed from an open-ended distribution.

3. If each observation of a set of data is transformed to a new set by the addition (or
subtraction) of a constant c, the standard deviation of the new set of data is the
same as the standard deviation of the original data set.

4. If a set of data is transformed to a new set by multiplying (or dividing) each


observation by a constant c, the standard deviation of the new data set is equal to
the standard deviation of the original data set multiplied (or divided) by c.
ELEMENTARY STATISTICS 45

4.2 MEASURES OF RELATIVE DISPERSION

Measures of relative dispersion are unitless and are used when one
wishes to compare the scatter of one distribution with another distribution.

• The Coefficient of Variation

Definition. The coefficient of variation, CV, is the ratio of the standard deviation to
the mean and is usually expressed in percentage. It is computed as

σ
CV = × 100%
µ
and its sample counterpart is

s
CV = × 100%
X
Examples.

1. The foreign exchange rate is an indicator of the stability of the peso and is also an
indicator of the economic performance. In 1992 Bangko Sentral ng Pilipinas
(BSP) put the peso on a floating rate basis. Market forces and not government
policy have determined the level of the peso since. Government intervenes
through the BSP, only when there are speculative elements in the market. Given
below are the means and standard deviations of the quarterly P-$ exchange rate
for the periods 1989 to 1991 and 1992 to 1994. Which of the two periods is more
stable?

Mean s.d.

1989-1991 22.4 1.84


1992-1994 26.4 1.15

1.84
CV89 − 91 = × 100% = 8.21%
22.4
1.15
CV92 − 94 = × 100% = 4.36%
26.4
46 CHAPTER 4. MEASURES OF DISPERSION
AND MEASURES OF SKEWNESS

2. Two of the quality criteria in processing butter cookies are the weight and color
development in the final stages of oven browning. Individual pieces of cookies
are scanned by a spectrophotometer calibrated to reflect yellow-brown light. The
readout is expressed in per cent of a standard yellow-brown reference plate and a
value of 41 is considered optimal (golden-yellow). The cookies were also
weighed in grams at this stage. The means and standard deviations of 30 sample
cookies are presented below.

Mean s.d.

Color 41.1 10
Weight 17.7 3.2

Which of the two quality criteria is more varied?

10
CVcolor = × 100% = 24.33%
41.1
3.2
CVweight = × 100% = 18.08%
17.7

• The Standard Score

Definition. The standard score measures how many standard deviations an


observation is above or below the mean. It is computed as

X −µ
Z=
σ

and the sample counterpart is

X−X
Z=
s

Remarks:

1. The standard score is not a measure of relative dispersion per se but is somewhat
related.

2. It is useful for comparing two values from different series specially when these
two series differ with respect to the mean or standard deviation or both are
expressed in different units.
ELEMENTARY STATISTICS 47

Examples:

1. Robert got a grade of 75% in Stat 101 and a grade of 90% in Econ 11. The mean
grade in Stat 101 is 70% and the standard deviation is 10%, whereas in Econ 11,
the mean grade is 80% and the standard deviation is 20%. Relative to the other
students, where did he perform better?

75 − 70
Z Stat101 = = 0 .5
10
90 − 80
Z Econ11 = = 0 .5
20

2. In problem (1), if the mean grade in Stat 101 is 65%, in which subject did Robert
perform better?

75 − 65
Z Stat101 = = 1 .0
10

3. Different typing skills are required for secretaries depending on whether one is
working in a law office, an accounting firm, or for a mathematical research group
at a major university. In order to evaluate candidates for these positions, an
agency administers 3 distinct standardized typing samples. A time penalty has
been incorporated into the scoring of each sample based on the number of typing
errors. The mean and standard deviation for each test, together with the scores
achieved by Nancy, an applicant, are given in the following table.

Sample Nancy’s Score Mean std. dev.

Law 141 sec 180 sec 30 sec


Accounting 7 min 10 min 2 min
Scientific 33 min 26 min 5 min

Where do you think should Nancy be placed?

141 − 180 7 − 10 33 − 26
ZL = = −1.3 ZA = = − 1 .5 ZS = = 1 .4
30 2 5
48 CHAPTER 4. MEASURES OF DISPERSION
AND MEASURES OF SKEWNESS

4.3 MEASURES OF SKEWNESS

Definition. A measure of skewness shows the degree of asymmetry, or departure from


symmetry of a distribution. It indicates not only the amount of skewness but
also the direction.

Two Type of Skewness

1. Positively Skewed or Skewed to the Right

• distribution tapers more to the right than to the left


• longer tail to the right
• more concentration of values below than above the mean
• most skewed curves encountered in the social sciences are skewed to the right

Example: frequency distribution of income

2. Negatively Skewed or Skewed to the Left

• distribution tapers more to the left than to the right


• longer tail to the left
• more concentration of values above than below the mean
• only rarely do we find curves skewed to the left, and even more rarely do we
find data characteristically skewed to the left

Example: the distribution of ages at death of the American inventors may be


characteristically skewed to the left, since younger men do not often
have enough inventions to their credit to be classified as “inventor”
ELEMENTARY STATISTICS 49

Pearson’s First and Second Coefficients of Skewness

X − Mo
1. Sk =
s

where X = mean
Mo = mode
s = standard deviation

2. Sk =
(
3 X − Md )
s

where X = mean
Md = median
s = standard deviation

Remarks:

1. Since the mode is frequently only an approximation, formula 2 is preferred.

2. Interpretation of the measure of skewness:

Sk > 0: positively skewed since X > Md > Mo


Sk < 0: negatively skewed since X < Md < Mo
Sk = 0: symmetric since X = Md = Mo

Example: Refer to the final grade of 110 Statistics 101 students

X = 74.1 Md = 75 Mo = 84 s = 11.25

Using the first formula,

74.1 − 84
Sk = = −0.88
11.25

Using the second formula,

3(74.1 − 75)
Sk = = −0.24
11.25
50 CHAPTER 4. MEASURES OF DISPERSION
AND MEASURES OF SKEWNESS

4.4 THE BOXPLOT

Definition. The boxplot is a graph that is very useful for displaying the following
features of the data:

• location
• spread
• symmetry
• extremes
• outliers

Steps in Constructing a Boxplot

1. Construct a rectangle with one end at the first quartile and the other end at the third
quartile.

2. Put a vertical line across the interior of the rectangle at the median.

3. Compute for the interquartile range (IQR), lower fence (FL) and upper fence (FU)
given by:

IQR = Q3 - Q1
FL = Q1 - 1.5 IQR
FU = Q3 + 1.5 IQR

4. Locate the smallest value contained in the interval [FL , Q1]. Draw a line from this
value to Q1.

5. Locate the largest value contained in the interval [Q3,FU]. Draw a line from this value
to Q3.

6. Values falling outside the fences are considered outliers and are usually denoted by
“x”.

Remarks:

1. The height of the rectangle is usually arbitrary and has no specific meaning. If several
boxplots appear together, however, the height is sometimes made proportional to the
different sample sizes.

2. If the outlying observation is less than Q1 - 3 IQR or greater than Q3 + 3 IQR it is


identified with a circle at their actual location. Such an observation is called a far
outlier.
ELEMENTARY STATISTICS 51

Examples:

1. Set A: 1 15 21 22 24
10 18 22 23 25
14 20 22 24 28

Q1 = 15 IQR = 9
Q3 = 24 FL = 1.5
Md = 22 FU = 37.5

Set B: 3 10 11 12 19
8 10 12 16 19
9 10 12 16 30

Q1 = 10 IQR = 6
Q3 = 16 FL = 1
Md = 12 FU = 25

Set A x

Set B x

0 5 10 15 20 25 30 35

2. Boxplot of the final grade of 110 Statistics 101 students.

p
50 55 60 65 70 75 80 85 90 95 100
CHAPTER 5

Probability

5.1 RANDOM EXPERIMENTS, SAMPLE SPACES AND EVENTS

Definition of Terms

1. Random experiment any process of generating a set of data or observations


that can be repeated under basically the same
conditions, which lead to well-defined outcomes

2. Sample space set of all possible outcomes of an experiment, usually


denoted by S

3. Sample point an element of the sample space, an outcome

4. Event any subset of the sample space, usually denoted by


capital letters

5. Null space/Empty space a subset of the sample space that contains no elements
and denoted by the symbol φ.

6. Simple event an event which contains only one element of the sample
space

7. Compound event an event that can be expressed as the union of simple


events, thus containing more than one sample point

8. Mutually exclusive events Two events A and B are mutually exclusive if A∩B=φ;
that is, A and B have no elements in common

Remarks:

• An event is said to have occurred if the outcome of the experiment is one of the
sample points in the event.

• The empty space can be viewed as an event that will never happen. It is called the
impossible event.

• The sample space S, as an event, always occurs, and is referred to as the certain or
sure event.
ELEMENTARY STATISTICS 53

Event Composition and Event Relations


1. A ∩ B the intersection of events A and B is the event that both A and B occur

2. A ∪ B the union of events A and B is the event that A or B or both occur

3. A1 or Ac the complement of an event A with respect to S contains all elements of S


that are not in A and is the event that A does not occur

Some relationships between events can be illustrated by means of a Venn Diagram.

5.2 THE PROBABILITY CONCEPT AND SOME PROPERTIES

Probability analysis is based on the following simple postulates.

Postulate 1. 0 ≤ P(Oi) ≤ 1 for any simple event Oi


Postulate 2. The probability for any event E is the sum of the probability of the simple
events that constitute E.
Postulate 3. P(S) = 1, where S is the sample space, and P(φ) = 0, where φ is the null
space.

Approaches to Assigning Probabilities

1. A Priori or Classical Probability – probability is determined even before the


experiment is performed using the following rule:

If an experiment can result in any one of N different equally likely


outcomes, and if exactly n of these outcomes correspond to event A,
then the probability of event A is

no. of sample points in A n


P(A) = =
no. of sample points in S N
2. A Posteriori or Relative Frequency or Empirical Probability - probability is
determined by repeating the experiment a large number of times using the following
rule:
no. of times event A occurred
P(A) =
no. of times experiment was repeated
3. Subjective Probability – probability is determined by the use of intuition, personal
beliefs, and other indirect information.
54 CHAPTER 5. PROBABILITY

Examples:

1. Find the errors in each of the following statements:

a. The probability that it will rain tomorrow is 0.40 and the probability that it will
not rain tomorrow is 0.52.

b. The probabilities that a printer will make 0, 1, 2, 3, or 4 or more mistakes in


printing a document are, respectively, 0.19, 0.34, -0.25, 0.43, and 0.29.

c. The probabilities that an automobile salesperson will sell 0, 1, 2, or 3 cars on any


given day in February are, respectively, 0.19, 0.38, 0.29, and 0.15.

d. On a single draw from a deck of playing cards the probability of selecting a heart
is 1/4, the probability of selecting a black card is 1/2, and the probability of
selecting both a heart and a black card is 1/8.

2. a. In tossing a fair coin, what is the probability of getting a head? Of either a head or
tail? Of neither a head nor tail?
b. In tossing a fair die, what is the probability of getting a 3? Of getting an even
number? Of getting a number greater than 6?

3. A coin is biased so that a head is twice as likely to occur as a tail. If the coin is tossed
once, what is the probability of getting a head?

Rules of Counting (Optional)

Theorem. If an operation can be performed in n1 ways, and for each of these a second
operation can be performed in n2 ways, then the two operations can be
performed in n1n2 ways.

Example: How many sample points are there in the sample space when a pair of
balanced dice is thrown once?

Theorem. (Multiplication Rule) If an operation can be performed in n1 ways, if for


each of these a second operation can be performed in n2 ways, if for each of
the first two a third operation can be performed in n3 ways, and so on, then
the sequence of k operations can be performed in n1n2 ... nk ways.

Examples:

1. How many even three-digit numbers can be formed from the digits 1, 2, 5, 6, and
9 if each digit can be used only once?

2. How many ways can a 10-question true-false examination be answered?


ELEMENTARY STATISTICS 55

Definition. A permutation is an arrangement or ordering of all or part of a set of


objects.

Theorem. The number of permutations of n distinct objects is

n (n-1)(n-2) . . . (2)(1) = n!

(n! is read “n factorial”)

Note. 0! = 1.

Example: How many different orders or sequences can we arrange the letters A, B,
C, and D?

Theorem. The number of permutations of n distinct objects taken r at a time is


n!
n Pr =
(n − r )!

Examples:

1. Two lottery tickets are drawn from 20 for the first and second prize. Find the
number of sample points in the space S.

2. In how many ways can the 5 starting positions on a basketball team be filled with
8 men who can play any position?

Theorem. The number of distinct permutations of n things of which n1 are of one kind,
n2 are of a second kind, . . . , nk of a kth kind is

k
n!
n1!n 2 !...n k !
where ∑n
i =1
i =n

Examples:

1. Consider our favorite word, STATISTICS, that contains a total of 10 letters.


There are 3 classes of indistinguishable objects that consists of 3 S’s, 3 T’s and 2
I’s. Find the total number of distinct permutations of these 10 letters.

2. In how many different ways can 3 red, 4 yellow, and 2 blue bulbs be arranged in a
string of Christmas tree lights with 9 sockets?
56 CHAPTER 5. PROBABILITY

Definition. A combination is a selection of r objects from n without regard to order.

Theorem. The number of combinations of n distinct objects taken r at a time is


n!
n Cr =
r!(n − r )!

Examples:

1. In a Stat 101 exam, a student has a choice of 8 questions out of 10. In how many
ways can he choose a set of 8 questions if he chooses arbitrarily?

2. Find the number of ways of selecting the 6 winning numbers in the original
version of the game of lotto.

Theorems on Probabilities of Events

Theorem. (Additive Rule) If A and B are any two events, then

P(A∪B) = P(A) + P(B) - P(A∩B)

Corollary. If A and B are mutually exclusive, then


P(A∪B) = P(A) + P(B)

Corollary. If A1, A2, . . . , An are mutually exclusive, then


P(A1 ∪ A2 ∪ . . . ∪ An) = P(A1) + P(A2) + . . . +P(An)

Theorem. If A and Ac are complementary events, then

P(A) + P(Ac) = 1.
Examples:

1. The probability that a student passes Stat 101 is 0.60, and the probability that he
passes Comm II is 0.85. If the probability that he passes at least one of the two
courses is 0.95, what is the probability that he will pass both courses? fail both Stat
101 and Comm II?

2. An oil-prospecting firm plans to drill two exploratory wells. Past evidence shows that
the probability that neither well produces oil is 0.8; the probability that exactly one
well produces oil is 0.18; and, the probability that both wells produce oil is 0.02.
What is the probability that at most one well produces oil? At least one?

3. In the toss of a fair coin 4 times, what is the probability of no head in the toss? At
least one head?
ELEMENTARY STATISTICS 57

Definition. The probability of an event B occurring when it is known that some event A
has occurred is called a conditional probability. It is defined by the
equation
P( A ∩ B )
P ( B | A) = if P(A)>0
P ( A)

P(BA) is read as “probability of B given A”.

Examples:

1. A random sample of 100 insurance claims are classified below according to the type
of policy and whether the claim is fraudulent or not.

a. Find the probability of a fraudulent claim given that such a claim is for a fire
policy.

b. Find the probability that a claim for a fire policy is selected given that such a
claim is fraudulent.

Categorization of Insurance Claims

Type of Policy
Category Fire Auto Others Total
Fraudulent 6 1 3 10
Nonfraudulent 14 29 47 90
Total 20 30 50 100

2. The probability that a student passes Stat 101 is 0.60, the probability that he passes
Comm II is 0.85, the probability that he passes both subjects is 0.5. If the student
passes Stat 101, what is the probability that the student will pass Comm II?
58 CHAPTER 5. PROBABILITY

Definition. Two events A and B are said to be independent if any one of the following
conditions is satisfied:

(a) P(AB) = P(A) if P(B)>0


(b) P(BA) = P(B) if P(A)>0
(c) P(A∩B) = P(A) P(B)

Otherwise, the events are said to be dependent.

Examples:

1. Consider the following events in the toss of a single die:

A: Observe an odd number


B: Observe an even number

Are A and B independent events?

2. The probability that Robert will correctly answer the toughest question in an exam is
1/4. The probability that Ana will correctly answer the same question is 4/5. Find the
probability that both will answer the question correctly, assuming that they do not
copy from each other.
CHAPTER 6

Probability Distributions

6.1 CONCEPT OF A RANDOM VARIABLE

Definition. A function whose value is a real number determined by each element in the
sample space is called a random variable.

Remark. We shall use an uppercase letter, say X, to denote a random variable and its
corresponding lowercase letter, x in this case, for one of its values.

Examples:

1. (Experiment No. 1) An experiment consists of tossing a coin 3 times and observing


the result. The possible outcomes and the values of the random variables X and Y,
where X is the number of heads and Y is the number of heads minus the number of
tails are

Sample Points x y

HHH 3 3
HHT 2 1
HTH 2 1
HTT 1 -1
THH 2 1
THT 1 -1
TTH 1 -1
TTT 0 -3

2. (Experiment No. 2) A hatcheck girl returns 3 hats at random to 3 customers who had
previously checked them. If Jason, Charlie, and Ohmar, in that order, receives one of
the hats, list the sample points for the possible orders of returning the hats and find
the values m of the random variable M, that represents the number of correct matches.
60 CHAPTER 6. PROBABILITY DISTRIBUTIONS

6.2 DISCRETE & CONTINUOUS PROBABILITY DISTRIBUTIONS

Definition. If a sample space contains a finite number of possibilities or an unending


sequence with as many elements as there are whole numbers, it is called a
discrete sample space.

Definition. A random variable defined over a discrete sample space is called a discrete
random variable.

Definition. If a sample space contains an infinite number of possibilities equal to the


number of points on a line segment, it is called a continuous sample space.

Definition. A random variable defined over a continuous sample space is called a


continuous random variable.

Discrete Probability Distributions

Definition. A table or formula listing all possible values that a discrete random variable
can take on, along with the associated probabilities, is called a discrete
probability distribution.

Remark. The probabilities associated with all possible values of a discrete random
variable must sum to 1.

Examples:

1. For Experiment No. 1, the discrete probability distributions of the random variables X
and Y are

x 0 1 2 3
P(X=x) 1/8 3/8 3/8 1/8

y -3 -1 1 3
P(Y=y) 1/8 3/8 3/8 1/8

2. Construct the discrete probability distribution for the random variable M defined in
Experiment No. 2.
ELEMENTARY STATISTICS 61

Continuous Probability Distributions

Definition. The function with values f(x) is called a probability density function for
the continuous random variable X, if

• the total area under its curve and above the horizontal axis is equal to 1; and
• the area under the curve between any two ordinates x = a and x = b gives the
probability that X lies between a and b.

Remarks:

1. A continuous random variable has a probability of zero of assuming exactly any of its
values, that is, if X is a continuous random variable, then P(X=x) = 0 for all real
numbers x.

2. The probability density function can not be represented in tabular form.

Example:

A continuous random variable X that can assume values between 0 and 2 has a
density function given by

0.5 for 0 < x < 2


f ( x) = 
0 otherwise

Find the following probabilities:

a. P(1 < X < 2).

f(x)
1/2

0 1 2

area of shaded region = (length)(width)

P(1 < X < 2) = (1/2)(1) =1/2

b. P(X > 1.5)


c. P(X < 0.75)
d. P(X = 0.75)
e. P(X ≤ 0.75).
62 CHAPTER 6. PROBABILITY DISTRIBUTIONS

6.3 EXPECTED VALUES

Definition. Let X be a discrete random variable with probability distribution

x x1 x2 ... xn
P(X=x) f(x1) f(x2) ... f(xn)

The mean or expected value of X is


n
µ = E ( X ) = ∑ x i f ( xi )
i =1

Examples:

1. Find the mean of the random variables X and Y of Experiment No. 1.

X 0 1 2 3
P(X=x) 1/8 3/8 3/8 1/8

E(X) = (0)(1/8) + (1)(3/8) + (2)(3/8) + (3)(1/8) = 12/8 or 1.5

Y -3 -1 1 3
P(Y=y) 1/8 3/8 3/8 1/8

E(Y) = (-3)(1/8) + (-1)(3/8) + (1)(3/8) + (3)(1/8) = 0

2. Find the expected number of correct matches in Experiment No. 2.

3. In a gambling game a man is paid P50 if he gets all heads or all tails when 3 coins are
tossed, and he pays out P30 if either 1 or 2 heads show. What is his expected gain?

Theorem. Let X be a discrete random variable with probability distribution

x x1 x2 ... xn
P(X=x) f(x1) f(x2) ... f(xn)

The mean or expected value of the random variable g(X) is


n
E ( g ( X )) = ∑ g ( xi ) f ( xi )
i =1

Example: A used car dealer finds that in any day, the probability of selling no car is
0.4, one car is 0.2, two cars is 0.15, 3 cars is 0.10, 4 cars is 0.08, five cars is
0.06 and six cars is 0.01. Let g(X) = 500 + 1500X represent the salesman’s
daily earnings, where X is the number of cars sold. Find the salesman’s
expected daily earnings.
ELEMENTARY STATISTICS 63

Definition. Let X be a random variable with mean µ then the variance of X is

σ 2 = Var ( X ) = E ( X − µ ) 2

Definition. Let X be a discrete random variable with probability distribution

X x1 x2 ... xn
P(X=x) f(x1) f(x2) ... f(xn)

The variance of X is

n
σ 2 = Var ( X ) = E ( X − µ ) 2 = ∑ ( xi − µ ) 2 f ( xi )
i =1

Theorem. Computational Formula for σ2

Var(X) = E(X2) - [E(X)]2

Example :

In Experiment No. 1, find the variance of X.

Using the definition of Var(X),

E(X) = 1.5
4
Var(X) = ∑ (x
i =1
i − 1.5) 2 f ( xi )

= (0-1.5)2(1/8) + (1-1.5)2(3/8) + (2-1.5)2(3/8) + (3-1.5)2(1/8) = 0.75

Using the computational formula of the Var(X),

Var(X) = E(X2) - [E(X)]2

= 3 – (1.5)2 = 0.75
64 CHAPTER 6. PROBABILITY DISTRIBUTIONS

Properties of the Mean and Variance

Let X and Y be random variables (discrete or continuous) and let a and b be constants.

1. E(aX + b) = a E(X) + b

Special Cases:
a. if b = 0, then E(aX) = a E(X).
b. if a = 0, then E(b) = b.

2. E(X+Y) = E(X) + E(Y)


E(X-Y) = E(X) - E(Y)

3. E(XY) = E(X)E(Y) if X and Y are independent.

4. E[ X - E(X) ] = 0.

5. Var(aX + b) = a2Var(X).

Special Cases:
a. if b = 0, then Var(aX) = a2Var(X).
b. if a = 0, then Var(b) = 0.

6. If X and Y are independent then


Var(X + Y) = Var(X) + Var(Y)
Var(X - Y) = Var(X) + Var(Y)

Example :

If X and Y are independent random variables with E(X) = 3, E(Y) = 2, Var(X) = 2


and Var(Y)=1, find

a. E(3X + 5)
b. Var(3X +5)
c. E(XY)
d. Var(3X - 2Y)
ELEMENTARY STATISTICS 65

6.4 THE NORMAL DISTRIBUTION

Definition. A continuous random variable X is said to be normally distributed if its


density function is given by :
2
1  x−µ 
1 −  
2 σ 
f ( x) = e
σ 2π

for -∞ < x < ∞ and for constants µ and σ, where -∞ < µ < ∞ , σ>0 and
e≈2.71828 and π ≈ 3.14159.

Notation: If X follows the above distribution, we write X~ N(µ, σ2 ).


Note: If X~ N(µ, σ2 ), then

E(X)=µ and Var (X) = σ2.

The graph of the normal distribution is called the normal curve.

Properties:

1. The curve is bell-shaped and symmetric about a vertical axis through the mean µ.

2. The normal curve approaches the horizontal axis asymptotically as we proceed in


either direction away from the mean.

3. The total area under the curve and above the horizontal axis is equal to 1.
66 CHAPTER 6. PROBABILITY DISTRIBUTIONS

Definition. The distribution of a normal random variable with mean zero and standard
deviation equal to 1 is called a standard normal distribution.

If X~ N(µ, σ2 ) , then X can be transformed into a standard normal random


variable through the following transformation,

X −µ
Z=
σ

Hence, whenever X is between the values x1 and x2, the random variable Z will
fall between the corresponding values

x1 − µ x2 − µ
z1 = and z2 =
σ σ

Thus, P ( x1 < X < x2 ) = P ( z1 < Z < z2 ) .

Examples :

1. Given a normal distribution with µ = 40 and σ = 8, find the probability that X


assumes a value
a. less than 45
b. between 35 and 45
c. more than 45

2. Given the normally distributed random variable X with mean 18 and standard
deviation 2.5, find

a. the value k such that P( X< k) = 0.2578


b. the value k such that P( X> k) = 0.1539.

3. The achievement scores for a college entrance examination are normally distributed
with mean 75 and standard deviation equal to 10. What fraction of the scores would
one expect to lie between 70 and 90?

4. A softdrink machine is regulated so that it dispenses an average of 200 ml. per cup. If
the amount of drink dispensed is normally distributed with a standard deviation equal
to 15 ml.,

a. what fraction of the cups will contain more than 224 ml ?


b. what is the probability that a cup contains between 191 ml. and 209 ml. ?
c. how many cups will likely overflow if 230 ml. cups are used for the next 1000
drinks ?
d. below what value do we get the smallest 25% of the drinks?
ELEMENTARY STATISTICS 67

6.5 OTHER COMMON DISTRIBUTIONS

• Binomial Distribution

Definition. A binomial experiment is one that possesses the following properties:

• the experiment consists of n identical trials


• each trial results in one of two outcomes, a “success” or a “failure”
• the probability of success on a single trial is equal to p and remains the
same from trial to trial. The probability of a failure is equal to q=1-p.
• the trials are independent

The random variable of interest X, the number of successes observed in n


trials, is called a binomial random variable.

Definition. The discrete probability distribution of the binomial random variable is


given by
n
P ( X = x) = f ( x) =   p x (1 − p ) n − x , x = 0,1,…,n and 0<p<1
 x

Notation : If X follows the above distribution, we will write X~Bi(n, p).

Note : If X~Bi(n, p) then E(X) = np and Var(X) = npq, where q = 1-p.

Examples:

1. A multiple-choice quiz has 15 questions, each with 4 possible answers of which only
1 is the correct answer. What is the probability that sheer guesswork yields

a. exactly 10 correct answers


b. at least 1 correct answer
c. 8 to 12 correct answers .

2. Suppose that airplane engines operate independently in flight and fail with probability
1/5. Assuming that a plane makes a safe flight if at least one-half of its engines run,
which between a 4-engine plane and a 2-engine plane has the higher probability for a
successful flight?
68 CHAPTER 6. PROBABILITY DISTRIBUTIONS

• Hypergeometric Distribution (Optional)

Definition. A hypergeometric experiment is one that possesses the following


properties:

• a sample of size n is taken without replacement from a population of


size N
• k of the N are classified as “success” and (N-k) classified as “failure”.

The random variable of interest X, the number of successes in the sample, is


called a hypergeometric random variable.

Definition. The discrete probability distribution of the hypergeometric random variable


is given by
 k  N − k 
  
 x  n − x 
P( X = x) = f ( x) = , x = 0,1,…, min(n,k)
N
 
n 

Notation: If X follows the above distribution, we write X~H(N,n,k).

Note: If X~H(N,n,k) then


nk ( N − n) nk  k
E( X ) = and Var ( X ) = 1 − 
N ( N − 1) N  N 

Remark: If n is small relative to N the probability of “success” for each draw will
change only slightly. Hence, the hypergeometric distribution can be
approximated by the binomial distribution with p = k/N.

Examples:

1. What is the probability that a person’s 6 number bet wins the second prize in a game
of lotto?

2. A lot of 20 personal computers was delivered to the Statistical Center. Ten computers
were selected at random without replacement and tested for defects. If at least 2 of
these 10 are defective, the entire lot of 20 computers will be returned. What is the
probability that the lot will be returned if 5 of the 20 computers are indeed defective?

3. A production lot of 2000 units contains 50 units that do not meet the specifications.
What is the probability that a random sample of 20 units without replacement will
contain no nonconforming item?
ELEMENTARY STATISTICS 69

• Poisson Distribution (Optional)

Definition. A poisson experiment is one that possesses the following properties:

• the number of outcomes occurring in one time interval or specified


region is independent of the number that occur in any other disjoint
time interval or region of space
• the probability that a single outcome will occur during a very short
time interval or in a small region is proportional to the length of the
time interval
• the probability that more than one outcome will occur in such a short
time interval or fall in such a small region is negligible

The random variable of interest X, the number of outcomes in a specified


length of time interval or region, is called a Poisson random variable.

Definition. The probability distribution of the Poisson random variable is given by

e−µ µ x
P( X = x) = f ( x) = , x = 0,1,2,….
x!

Notation: If X follows the above distribution, we write X~Poi(µ).

Note: If X~Poi(µ), then E(X) = µ and Var(X) = µ.

Remark: If X~Bi(n, p) and n is large and p is close to 0, the Poisson distribution is


used to approximate the Binomial distribution with µ = np.

Examples:

1. On the average a certain intersection results in 3 traffic accidents per month. Suppose
that the number of accidents per month follows a Poisson distribution, what is the
probability that in any given month at this intersection,

a. exactly 5 accidents will occur?


b. less than 3 accidents will occur?
c. at least 2 accidents will occur?

2. The probability that a person dies from a certain respiratory infection is 0.002. Find
the probability that fewer than 5 in a random sample of 2000 so infected will die.
70 CHAPTER 6. PROBABILITY DISTRIBUTIONS

• Normal Approximation (Optional)

Normal Approximation to the Binomial

Theorem. If X~Bi(n, p) with mean np and variance npq, then the distribution of

X − np
Z=
npq

as n approaches ∞ will approximate the standard normal distribution.

Remarks:

1. The normal distribution gives a very good approximation of the Binomial distribution
when n is large and p is close to 1/2.

2. Since a continuous distribution (in this case, the Normal) is used to approximate a
discrete distribution, then we must adjust for continuity. For example:

Let X~Bi(n, p).

 (a − 0.5) − np (a + 0.5) − np 
P ( X = a ) ≈ P <Z<
 
 npq npq 

Example:

A certain pharmaceutical company knows that, on the average, 45% of a certain type of
pill has an ingredient that is below the minimum strength and thus unacceptable. What is
the probability that fewer than 10 in a sample of 200 pills will be unacceptable?
CHAPTER 7

Sampling Distributions

Definition. The probability distribution function of a statistic is called its sampling


distribution.

• A statistic (e.g. sample mean, sample standard deviation) is a random variable


whose value depends only on the observed sample and may vary from sample to
sample.

• The sampling distribution of a statistic will depend on the size of the population,
the size of the sample, and the method of choosing the sample.

• The standard deviation of the sampling distribution is called the standard error
of the statistic. It tells us the extent to which we expect the values of the statistic
to vary from different possible samples.

• The probability distribution of the sample mean X is called the sampling


distribution of the mean.

Sampling Distribution of the Mean

Consider 4 observations making up the population values of a random variable X


having the probability distribution

f(x) = 1/ 4 , x = 0, 1, 2, 3

Note that µ = E(X) = 3/2 and σ2 = Var(X) = 5/4 .

Suppose we list all possible samples of size 2, with replacement, and for each sample
compute for the value of the sample mean, X :

No. Sample X No. Sample X


1 0, 0 0.0 9 2, 0 1.0
2 0, 1 0.5 10 2, 1 1.5
3 0, 2 1.0 11 2, 2 2.0
4 0, 3 1.5 12 2, 3 2.5
5 1, 0 0.5 13 3, 0 1.5
6 1, 1 1.0 14 3, 1 2.0
7 1, 2 1.5 15 3, 2 2.5
8 1, 3 2.0 16 3, 3 3.0
72 CHAPTER 7. SAMPLING DISTRIBUTIONS

Sampling Distribution of the Mean

X f( X )

0 1/16
0.5 2/16
1.0 3/16
1.5 4/16
2.0 3/16
2.5 2/16
3.0 1/16

Note : E( X ) = 3/2 and Var( X ) = 5/8.

Theorems:

1. If all possible random samples of size n are drawn with replacement from a finite
population of size N with mean µ and standard deviation σ, then the sample mean
will have mean and variance given by:

E( X ) = µ and Var( X ) = σ2 /n .

2. If all possible random samples of size n are drawn without replacement from a finite
population of size N with mean µ and standard deviation σ, then the sample mean
will have mean and variance given by:

σ 2  N −n
E( X ) = µ and Var( X ) =  .
n  N −1 

 N −n
• The factor   in the formula of the variance of X is called the finite
 N −1 
population correction factor. For large N relative to the sample size n, this
factor will be close to 1 and the variance of X is approximately equal to σ2 /n.
ELEMENTARY STATISTICS 73

3. Central Limit Theorem

If X is the mean of a random sample of size n taken from a (large or infinite)


population with mean µ and variance σ2, then the sampling distribution of X is
approximately normally distributed with mean E( X ) = µ and variance Var( X )=σ2/n
when n is sufficiently large. Hence, the limiting form of the distribution of

X −µ
Z=
σ n

as n approaches infinity is the standard normal distribution.

• The normal approximation in the theorem will be good if n ≥ 30 regardless of the


shape of the population.

• If n < 30, the approximation is good only if the population is not too different
from the normal.

• If the distribution of the population is normal then the sampling distribution will
also be exactly normal, no matter how small the size of the sample.

Example:

An electrical firm manufactures electric light bulbs that have a length of life
which is normally distributed with mean and standard deviation equal to 500 and
50 hours, respectively. Find the probability that a random sample of 15 bulbs
will have an average life of less than 475 hours.

4. The t-distribution.

If X and S2 are the mean and variance, respectively, of a random sample of size n
taken from a population which is normally distributed with mean µ and variance σ2 ,
then

X −µ
T=
S n

is a random variable having the t - distribution with v = n-1 degrees of freedom.

• Notation: T~ tv=n-1
74 CHAPTER 7. SAMPLING DISTRIBUTIONS

• Comparison between the t-distribution and the standard normal distribution

1. Both are symmetric about zero


2. Both are bell-shaped, but the t-distribution is more variable

(i) t-values depend on the fluctuation of 2 quantities: X and S2


(ii) z-values depend only on the changes in X from sample to sample

3. When the sample size is large, i.e. n ≥ 30, the t-distribution can be well
approximated by the standard normal distribution.

• Area under the curve

Just like any continuous probability distribution, the probability that a random
sample produces a t-value falling between any two specified values is equal to the
area under the curve of the t-distribution between any two ordinates corresponding
to the specified values

• Notation: tα is the t-value leaving an area of α in the right-tail of the t-


distribution. That is, if T~t(v) then tα is such that P(T> tα) = α.

• Since the t-distribution is symmetric about zero, t1-α = - tα

Examples:

1. Find the following values on the t -table:


(a) t0.025 when v = 14.
(b) t0.99 when v=10.

2. Find k such that P(k < T < 2.807) = 0.945 when T ~ t(23)

3. A manufacturing firm claims that the batteries used in their electronic games will last
an average of 30 hours. To maintain this average, 16 batteries are tested each month.
If the computed t-value falls between -t0.025 and t0.025, the firm is satisfied with its
claim. What conclusion should the firm draw from a sample that has mean X = 27.5
hours and standard deviation S = 5 hours? Assume the distribution of battery lives to
be approximately normal.
CHAPTER 8

Estimation

Definition. Statistical inference refers to methods by which one uses sample


information to make inferences or generalizations about a population.

Two Areas of Statistical Inference

1. Estimation
- point estimation
- interval estimation

2. Hypothesis Testing

8.1 BASIC CONCEPTS IN ESTIMATION

Point Estimation

Definition. An estimator is any statistic whose value is used to estimate an unknown


parameter. A realized value of an estimator is called an estimate.

For example, the sample mean X , is an estimator of the population mean µ.

Remarks:

1. An estimator is said to be unbiased if the average of the estimates it produces under


repeated sampling is equal to the true value of the parameter being estimated.

Examples: Under random sampling, the sample mean is an unbiased estimator of the
population mean, that is, E( X ) = µ.

Under random sampling with replacement, S2 is an unbiased estimator of


σ2, but S on the other hand is a biased estimator of σ with the bias
becoming insignificant for large samples.

2. A parameter can have more than one unbiased estimator. We would naturally choose
the unbiased estimator with the smallest variance.
76 CHAPTER 8. ESTIMATION

Interval Estimation

Definition. An interval estimator of a population parameter is a rule that tells us how


to calculate two numbers based on sample data, forming an interval within
which the parameter is expected to lie. This pair of numbers, (a,b), is called
an interval estimate or confidence interval.

Example. The running time (in minutes) of a sample of films produced by Star-Regal
Theater are as follows: 103 94 110 87 98.

A 95% confidence interval for the mean running time of films produced by
Star-Regal Theater is (87.6, 109.2).

• The number 0.95 in the example is called the confidence coefficient or the
degree of confidence.

• The endpoints 87.6 and 109.2 are called the lower and upper confidence
limits.

Remarks:

1. In general, we construct a (1-α)100% confidence interval. The fraction (1-α) is called


the confidence coefficient, and the endpoints a and b are called the lower and upper
confidence limits, respectively.

2. Interpretation of (1-α)100% confidence interval:

If we take repeated samples of size n and if for each one of these samples we compute
the (1-α)100% confidence interval then (1-α)100% of the resulting confidence
intervals will contain the unknown value of the parameter.

3. The confidence coefficient is not “the probability that the true value of the parameter
falls in the interval estimate” since once a sample is drawn and a confidence interval
constructed, the resulting interval estimate either encloses the true value of the
parameter or it doesn’t. Rather, the confidence coefficient is “the probability that the
interval estimator encloses the true value of the parameter.”

4. A good confidence interval is one that is as narrow as possible and has a large
confidence coefficient, near 1. The narrower the interval, the more exactly we have
located the parameter; whereas, the larger the confidence coefficient, the more
confidence we have that a particular interval encloses the true value of the parameter.
However, for a fixed sample size, as the confidence coefficient increases, the length
of the interval also increases.
ELEMENTARY STATISTICS 77

8.2 ESTIMATING THE MEAN

A point estimator of the population mean µ is the sample mean, X .

(1-α) 100% Confidence Interval for µ

a. when σ is known

 σ σ 
 X − zα / 2 , X + zα / 2 
 n n

where zα/2 is the z-value leaving an area of α/2 to the right.

b. when σ is unknown

 S S 
 X − tα / 2 , X + tα / 2 
 n n

where tα/2 is the t-value with v = n - 1 degrees of freedom.

Remarks:

1. The above formulas hold strictly for random samples from a normal distribution.
However, they provide good approximate (1-α)100% confidence intervals when the
distribution is not normal provided the sample size is large, i.e. n > 30.

2. If σ2 is unknown and n > 30, use

 S S 
 X − zα / 2 , X + zα / 2 
 n n

where zα/2 is the z-value leaving an area of α/2 to the right.


78 CHAPTER 8. ESTIMATION

Examples:

1. An electrical firm manufactures light bulbs that have a length of life that is normally
distributed, with a standard deviation of 40 hours. If a random sample of 25 bulbs has
a mean life of 780 hours, find a 95% confidence interval for the population mean of
all bulbs produced by this firm.

2. Regular consumption of presweetened cereals contribute to tooth decay, heart disease,


and other degenerative diseases, according to a study by Dr. M. Albreight of the
National Institute of Health and Dr. D. Solomon, Professor of Nutrition and Dietetics
at the University of London. In a random sample of 20 similar servings of Alpha-
Bits, the mean sugar content was 11.3 grams with a standard deviation of 2.45 grams.
Assuming that the sugar content is normally distributed, construct a 95% confidence
interval for the mean sugar content for single servings of Alpha-Bits.

3. A random sample of 100 automobile owners shows that an automobile is driven on


the average 23,500 kilometers per year, in the state of Virginia, with a standard
deviation of 3900 kilometers. Construct a 99% confidence interval for the average
number of kilometers an automobile is driven annually in Virginia.

8.3 ESTIMATING THE DIFFERENCE BETWEEN TWO


POPULATION MEANS

If we have two populations with means µ1 and µ2 and standard deviations σ1 and
σ2, respectively, a point estimator of the difference between µ1 and µ2 is the
statistic X 1 − X 2 .

Types of Sampling:

• selecting two independent samples


• paired sampling

Paired sampling is used to overcome the difficulty imposed by extraneous


differences between two groups when testing the difference between 2 means.
This is achieved by “matching” or studying 2 related samples. Matching may be
achieved by:

• using the same subject in the 2 samples


• pairing of subjects with respect to any extraneous variable which might
affect or influence the outcome.
ELEMENTARY STATISTICS 79

(1-α)100% Confidence Interval for µ1 - µ2

• Based on Two Independent Samples

a. σ 12 and σ 22 known
 σ 12 σ 22 σ 12 σ 22 
(X 1 − X 2 ) − z + , ( X − X ) + z +
 n 2 
α/2 1 2 α /2
 n 1 n 2 n 1

b. σ 12 = σ 22 but unknown
 
 ( X 1 − X 2 ) − tα / 2 ( v ) S p 1 + 1 , ( X 1 − X 2 ) + tα / 2 ( v ) S p 1 + 1 
 n1 n 2 n1 n 2 

(n1 − 1) S12 + (n 2 − 1) S 22
where S p = and v = n1 + n2 - 2
n1 + n2 − 2

c. σ 12 ≠ σ 22 but unknown
 S12 S 22 S12 S 22 
(X 1 − X 2 ) − t + , ( X 1 − X 2 ) + tα / 2 ( v ) +
 α / 2(v )
n n n n 2 
 1 2 1

where v =
(S 1
2
n1 + S 22 n 2 )
2

(S n1
1
2
+
) (
2
S 22 n2 )
2

n1 − 1 n2 − 1

Remarks:

1. These formulas hold strictly for independent samples selected from Normal
populations. However, they provide good approximate (1-α)100% confidence
intervals when the distributions are not Normal provided both n1 and n2 are
greater than 30.

2. If σ 12 and σ 22 are unknown but n1 and n2 are greater than 30, use
 S 12 S 22 S 12 S 22 
 (X 1 − X 2 ) − z + , ( X − X ) + z +
 α /2 α /2
n1 n 2 
1 2
n1 n 2
 

3. Even if the population variances are considerably different, formula (b) will
still provide a good estimate provided that n1=n2 and both populations are
normal. Therefore, in a planned experiment, one should make every effort to
equalize the size of the samples.
80 CHAPTER 8. ESTIMATION

Examples:

1. A statistics test was given to a random sample of 50 girls and another random
sample of 75 boys. The mean score of the girls is 80 with a standard deviation
of 4 and the mean score of the boys is 86 with a standard deviation of 6. Find
a 95% confidence interval for the difference µB - µG.

2. Students may choose between a 3-unit course in Physics without lab and a 4-
unit course with lab. The final written examination is the same for each
section. The mean score of a random sample of 12 students in the section
with lab is 84 with a standard deviation of 4, and the mean score of another
random sample of 18 students in the section without lab is 77 with a standard
deviation of 6. Find a 99% confidence interval for the difference between the
mean grades for the two courses. Assume the populations to be approximately
normally distributed with equal variances.

3. The following data represent the running time of a random sample of films
produced by two motion picture companies:

Time (minutes)

Company 1 103 94 110 87 98


Company 2 97 82 123 92 175 88 118

Compute a 90% confidence interval for the difference between the mean
running time of films produced by the two companies. Assume that the
running times for each of the companies are approximately normally
distributed with unequal variances.

• Based on Two Related/Paired Samples

 S S 
 d − tα / 2( v ) d , d + tα / 2( v ) d 
 n n

where di = xi - yi
2
n n
 n 
∑d i
n∑ d −  ∑ d i 
i
2

d= i =1
Sd =
i =1  i =1 
n n(n − 1)

v = n-1 n = number. of pairs


ELEMENTARY STATISTICS 81

Examples:

1. It is claimed that a new diet will reduce a person’s weight by 4.5 kilograms on
the average in a period of 2 weeks. The weights of a random sample of 7
women who followed this diet were recorded before and after a 2-week
period:

Woman

1 2 3 4 5 6 7

Weight Before 58.5 60.3 61.7 69.0 64.0 62.6 56.7


Weight After 60.0 54.9 58.1 62.1 58.5 59.9 54.4

Compute a 95% confidence interval for the mean difference in the weight.
Assume the distribution of weights to be approximately normal.

2. Twenty college freshmen were divided into 10 pairs, each member of the pair
having approximately the same IQ. One of each pair was selected at random
and assigned to a mathematics section using programmed materials only. The
other member of each pair was assigned to a section in which the professor
lectured. At the end of the semester each group was given the same
examination and the following results were recorded.

Pair 1 2 3 4 5 6 7 8 9 10

Programmed 76 60 85 58 91 75 82 64 79 88
Materials

Lectures 81 52 87 70 86 77 90 63 85 83

Find a 98% confidence interval for the mean difference in scores of the two
learning procedures. Assume normality.
82 CHAPTER 8. ESTIMATION

8.4. ESTIMATING PROPORTIONS

X
In a binomial experiment a point estimator of the proportion p is pˆ = , where X
n
represents the number of successes in n trials.

If the unknown proportion is not expected to be too close to 0 or 1 and n is large,


an approximate (1-α)100% confidence interval for p is given by

 pˆ qˆ pˆ qˆ 
 pˆ − zα / 2 , pˆ + zα / 2 
 n n 

Example:

In a random sample of 200 students who enrolled in Math 17, 138 passed on their
first take. Construct a 95% confidence interval for the population proportion of
students who passed Math 17 on their first take.

8.5 ESTIMATING THE DIFFERENCE OF TWO PROPORTIONS

Given 2 independent random samples of size n1 and n2 , a point estimator of the


X Y
difference between the two proportions p1 and p2 is given by pˆ 1 − pˆ 2 = − ,
n1 n 2

where X is the number of successes in n1 trials (first sample) and Y is the number
of successes in n2 trials (second sample).

An approximate (1-α)100% confidence interval for p1 - p2 when n1 and n2 are


large is
 pˆ 1 qˆ1 pˆ 2 qˆ 2 pˆ 1 qˆ1 pˆ 2 qˆ 2 
 ( pˆ 1 − pˆ 2 ) − zα / 2 + , ( ˆ 1 − pˆ 2 ) + zα / 2
p + 
 n n n n 
 1 2 1 2 

Example:

In a random sample of 200 students, 78 of the 120 females and 60 of the 80 males
passed Math 17 on their first take. Construct a 95% confidence interval for p1- p2,
where p1 and p2 are the true proportions of females and males, respectively, who
passed Math 17 on their first take.
ELEMENTARY STATISTICS 83

8.6 SAMPLE SIZE DETERMINATION

Sample Size for Estimating µ

In random sampling, if X will be used to estimate µ, we can be (1-α)100%


confident that that the error will not exceed a specified amount, e, when the
sample size is

z σ 
2

n =  α /2 
 e 

Example:

An electrical firm manufactures light bulbs that have a length of life that is
approximately normally distributed, with a standard deviation of 40 hours. How
large a sample is needed if we wish to be 95% confident that the sample mean will
be within 10 hours of the true mean?

Sample Size for Estimating p

If p̂ will be used to estimate p, then we can be (1-α)100% confident that the error
will not exceed a specified amount, e, when the sample size is
zα2 / 2 pq
n=
e2

When the value of p is unknown or cannot be approximated, then using p=0.5


produces the maximum value of pq=0.25. Hence a conservative formula for the
sample size is
z2
n = α /22
4e

Example:

Use the conservative formula to determine the sample size needed if we want to
be 95% confident that our estimate of p is within 0.05 of the true value.
CHAPTER 9

Tests of Hypothesis

9.1 BASIC CONCEPTS OF STATISTICAL HYPOTHESIS TESTING

Definition of Terms

1. A statistical hypothesis is an assertion or conjecture concerning one or more


populations.

2. The null hypothesis (Ho) is the hypothesis that is being tested; it represents what the
experimenter doubts to be true.

3. The alternative hypothesis (Ha) is the operational statement of the theory that the
experimenter believes to be true and wishes to prove. It is the contradiction of the
null hypothesis.

4. A one-tailed test of hypothesis is a test where the alternative hypothesis specifies a


one-directional difference for the parameter of interest.

Examples:

a. Ho: µ = 14 vs. Ha: µ > 14


b. Ho: µ = 14 vs. Ha: µ < 14
c. Ho: µ1 - µ2 = 0 vs. Ha: µ1 - µ2 > 0
d. Ho: µ1 - µ2 = 0 vs. Ha: µ1 - µ2 < 0

A two-tailed test of hypothesis is a test where the alternative hypothesis does not
specify a directional difference for the parameter of interest.

Examples:

a. Ho: µ = 14 vs. Ha: µ ≠ 14


b. Ho: µ1 - µ2 = 0 vs. Ha: µ1 - µ2 ≠ 0

5. A test statistic is a statistic whose value is calculated from sample measurements and
on which the statistical decision will be based.
ELEMENTARY STATISTICS 85

6. The critical region or rejection region is the set of values of the test statistic for
which the null hypothesis will be rejected. The acceptance region is the set of values
of the test statistic for which the null hypothesis will not be rejected. The acceptance
and rejection regions are separated by a critical value of the test statistic.

7. The Type I error is the error made by rejecting the null hypothesis when it is true.
The probability of a Type I error is denoted by α.

The Type II error is the error made by accepting (not rejecting) the null hypothesis
when it is false. The probability of a Type II error is denoted by β.

Null Hypothesis
Decision True False

Reject Ho Type I error Correct decision

Accept Ho Correct decision Type II error

8. The level of significance, α, is the maximum probability of Type I error the


researcher is willing to commit.

Steps in Hypothesis Testing

1. State the null hypothesis (Ho) and the alternative hypothesis (Ha).
2. Choose the level of significance α.
3. Select the appropriate test statistic and establish the critical region.
4. Collect the data and compute the value of the test statistic from the sample data.
5. Make the decision. Reject Ho if the value of the test statistic belongs in the critical
region. Otherwise, do not reject Ho.
86 CHAPTER 9. TESTS OF HYPOTHESIS

9.2 TESTING A HYPOTHESIS ON THE POPULATION MEAN

Ho Test Statistic Ha Critical Region


a. σ known
X − µo µ < µo z < - zα
µ = µo Z= µ > µo z > zα
σ n
µ ≠ µo | z | > zα/2
b. σ unknown
X − µo µ < µo t < - tα
µ = µo t= µ > µo t > tα
S n
µ ≠ µo | t | > tα/2
υ=n–1

Remarks:

1. The above tests are exact α-level tests for samples from a normal distribution.
However, they provide good approximate α-level test when the distribution is not
normal provided that the sample size is large, i.e. n > 30.

2. If σ is unknown and n > 30, use the test in (a) replacing the test statistic by
X − µo
Z=
S n

Examples:

1. Test Ho: µ=50 vs. Ha: µ≠50 if a random sample of 16 subjects had mean 48 and
standard deviation of 5.8 at 0.05 level of significance. Assume that the sample was
taken from a Normal population with standard deviation of 6.

2. It is claimed that an automobile is driven on the average less than 25,000 kilometers
per year. To test this claim, a random sample of 100 automobile owners are asked to
keep a record of the kilometers they travel. Would you agree with this claim if the
random sample showed an average of 23,500 kilometers and a standard deviation of
3,900 kilometers? Use a 0.01 level of significance.

3. According to Dietary Goals for the United States (1977), high sodium intake may be
related to ulcers, stomach cancer, and migraine headaches. The human requirement
for salt is only 230 milligrams per day, which is surpassed in most single servings of
ready-to-eat cereals. A random sample of 20 similar servings of Special K had mean
sodium content of 244 milligrams of sodium and a standard deviation of 24.5
milligrams. Is there sufficient evidence to believe that the average sodium content for
single servings of Special K exceeds the human requirement for salt at α=0.025? at α
= 0.05? at α = 0.10? Assume normality.
ELEMENTARY STATISTICS 87

The following remarks hold for any test:

1. For the same data set, as α increases the size of the critical region also increases.
Consequently, if Ho is rejected at α-level of significance then Ho will also be rejected
at a higher level of significance using the same data. For example, if Ho is rejected at
α = 0.05 then testing at α = 0.1 will also lead to the rejection of Ho. However, Ho
will not necessarily be rejected at α = 0.01.

2. The Type I error and Type II error are related. For a fixed sample size n, a decrease in
the probability of one will result in an increase in the probability of the other.
However, increasing the sample size will result in the reduction of both probabilities.

3. An alternative way to report the results of the test is to report the p-value. The p-
value is the smallest value of α for which Ho will be rejected based on sample
information. Reporting the p-value will allow the reader of the published research to
evaluate the extent to which the data disagree with Ho. In particular, it enables each
reader to choose their personal value of α.

If p-value < α then Ho is rejected. Otherwise, Ho is not rejected.

9.3. TESTING THE DIFFERENCE BETWEEN TWO POP’N MEANS

• Based on 2 independent samples


Ho Test Statistic Ha Critical region
a. σ 1 and σ 2 known
2 2

( X1 − X 2 ) − do µ1 - µ2 < d o z < - zα
µ1 - µ2 = do Z= µ1 - µ2 > do z > zα
(σ 12 n1 ) + (σ 22 n2 )
µ1 - µ2 ≠ do | z | > zα/2
b. σ 12 = σ 22 but unknown
( X1 − X 2 ) − do
µ1 - µ2 = do t= µ1 - µ2 < d o t < - tα
S p (1 n1 ) + (1 n2 )
µ1 - µ2 > do t > tα
υ = n1 + n2 − 2 µ1 - µ2 ≠ do | t | > tα/2
(n1 − 1) S12 + (n 2 − 1) S 22
S =
2

n1 + n 2 − 2
p

c. σ 12 ≠ σ 22 and unknown
( X 1 − X 2 ) − do
µ1 - µ2 = do t= µ1 - µ2 < d o t < - tα
( S12 n1 ) + ( S22 n2 )
µ1 - µ2 > do t > tα
( S12 n1 + S 22 n2 ) 2 µ1 - µ2 ≠ do | t | > tα/2
υ=
( S12 n1 ) 2 ( S 22 n2 ) 2
+
n1 − 1 n2 − 1
88 CHAPTER 9. TESTS OF HYPOTHESIS

• Based on 2 related samples


Ho Test Statistic Ha Critical region
d − do µD < do t < - tα
µD = do t= µD > do t > tα
Sd n
µD ≠ do | t | > tα/2
υ=n-1

Remark: The remarks made in Chapter 8.3 relative to the use of a given statistic apply
to the tests described here.

Examples:

1. A statistics test was given to 50 girls and 75 boys. The girls made an average of 80
with a standard deviation of 4 and the boys had an average of 86 with a standard
deviation of 6. Is there sufficient evidence at 0.05 level of significance that the
average grades of girls and boys differ?

2. A study was made to determine if the subject matter in a physics course is better
understood when a lab constitutes part of the course. Students were allowed to
choose between a 3-unit course without lab and a 4-unit course with lab. In the
section with lab, a sample of 11 students had an average grade of 85 with a standard
deviation of 4.7, and in the section without lab, a sample of 17 students had an
average grade of 79 with a standard deviation of 6.1. Would you say that the
laboratory course increases the average grade by more than 5 points? Use a 0.01 level
of significance and assume the populations to be approximately normally distributed
with equal variances.

3. The following data represent the running time of films produced by two motion
picture companies:
Time (minutes)
Company 1 103 94 110 87 98
Company 2 97 82 123 92 175 88 118

Test the hypothesis that the average running time of films produced by company 2
exceeds the average running time of films produced by company 1 by 10 minutes
against the one-sided alternative that the difference is more than 10 minutes. Use a
0.1 level of significance and assume the distributions of times to be approximately
normal with unequal variances.

4. A taxi company is trying to decide whether the use of radial tires instead of regular
belted tires improves fuel economy. Twelve cars were driven twice over a prescribed
test course, each time using a different type of tires (radial and belted) in random
order. The mileage, in kilometers per liter, were recorded as follows:
ELEMENTARY STATISTICS 89

Kilometers per liter

Cars Radial Tires Belted Tires

1 4.2 4.1
2 4.7 4.9
3 6.6 6.2
4 7.0 6.9
5 6.7 6.8
6 4.5 4.4
7 5.7 5.7
8 6.0 5.8
9 7.4 6.9
10 4.9 4.7
11 6.1 6.0
12 5.2 4.9

At the 0.025 level of significance, can we conclude that cars equipped with radial tires
give better fuel economy than those equipped with belted tires? Assume the
populations to be normally distributed.

9.4 TESTING A HYPOTHESIS ON PROPORTIONS

Consider the problem of testing the hypothesis that the proportion of successes in a
binomial experiment equals some specified value.

If the unknown proportion is not expected to be too close to 0 or 1 and n is large, a


large sample approximation is given by:

Ho Test Statistic Ha Critical region


x − npo p < po z < - zα
Z=
p = po npo qo p > po z > zα
p ≠ po | z | > zα/2

Example:

A commonly prescribed drug on the market for relieving nervous tension is


believed to be only 60% effective. Experimental results with a new drug
administered to a random sample of 100 adults who were suffering from nervous
tension showed that 70 received relief. Is this sufficient evidence to conclude that the
new drug is superior to the one commonly prescribed? Use a 0.05 level of
significance.
90 CHAPTER 9. TESTS OF HYPOTHESIS

9.5 TESTING THE DIFFERENCE BETWEEN TWO PROPORTIONS

Consider a situation in which a researcher wishes to compare the proportions of an


attribute between two populations. For example, he is interested in assessing whether the
proportion of female household heads is greater in urban areas than in rural localities; or a
marketing manager would consider packaging a product towards working mothers if,
based on a planned research, the proportion of potential purchasers is higher in this group
compared to the group of non-working mothers. Thus, the researcher is, in general,
interested in testing the null hypothesis Ho: p1 = p2, where p1 and p2 are the two
population proportions of interest.

The testing procedure involves selection of independent samples of size n1 and n2


from two binomial populations. The sample proportions p$1 and p$ 2 are computed and the
x + x2
common (population) proportion p is given as the pooled estimate p$ = 1 where
n1 + n2
x1 and x2 are the observed number of units possessing the attribute of interest in the two
samples. The test is as follows:

Ho Test Statistic Ha Critical region


p$1 − p$ 2 p1 < p2 z < - zα
Z=
p1 = p2 p1 > p2 z > zα
 1 1
$ $ + 
pq p1 ≠ p2 | z | > zα/2
 n1 n2 

Example:

In a survey of 200 students, 78 of the 120 females in the sample passed Math 17
on their first take while this figure is 60 among the 80 males. Will you agree that the
proportion of males who passed Math 17 on their first take is higher than the
proportion of females who passed the same course on their first take? Test at α=0.05.
ELEMENTARY STATISTICS 91

9.6. TEST FOR INDEPENDENCE

The test for independence is used to determine whether two variables are related or
not. For example, we might test whether a person’s music preference is related to his
intelligence as measured by IQ. We then take a random sample and for each subject
determine his music preference and classify his IQ into different categories (high,
medium, low). The observed frequencies are presented in what is known as a
contingency table shown below:

Music IQ
Preference High Medium Low Total
Classical 40 26 17 83
Pop 47 59 25 131
Rock 83 104 79 266
Total 170 189 121 480

A contingency table containing r rows and c columns is referred to as an rxc table.


The row and column totals are called marginal frequencies. Note that in a test for
independence, these marginal frequencies are not fixed in advance but depend instead on
the way the sample distributed itself across the various cells in the table.

Procedure:

1. State the null and alternative hypothesis.

Ho: The two variables are independent


Ha: The two variables are not independent.

2. Choose the level of significance.

3. Compute the test statistic, given by


r c (O − E )2
χ = ∑ ∑ ij
2 ij

i =1 j =1 Eij
where Oij= observed number of cases in the ith row of the jth column
Eij = expected number of cases under Ho
=
( column total) x( row total)
grand total

4. Decision Rule: Reject Ho if χ 2 > χ α2 ,( r −1)( c −1) .


92 CHAPTER 9. TESTS OF HYPOTHESIS

Remarks:

1. The test is valid if at least 80% of the cells have expected frequencies of at least 5 and
no cell has an expected frequency ≤ 1.

2. If many expected frequencies are very small, researchers commonly combine


categories of variables to obtain a table having larger cell frequencies. Generally, one
should not pool categories unless there is a natural way to combine them.

3. For a 2x2 contingency table, a correction called Yates’ correction for continuity is
applied. The formula then becomes
c ( O −E
ij − 0.5)
r 2

χ = ∑∑
2 ij

i =1 j = 1 Eij

Example:

Using the table above:

Ho: Music preference and intelligence are independent


Ha: Music preference and intelligence are not independent

Music IQ
Preference High Medium Low Total

Classical 40 (29.4) 26 (32.7) 17 (20.9) 83


Pop 47 (46.4) 59 (51.6) 25 (33.0) 131
Rock 83 (94.2) 104 (104.7) 79 (67.1) 266

Total 170 189 121 480

3 3 ( Oij − Eij ) 2
χ = ∑∑
2

i =1 j =1 Eij

= 12.38

at α = 0.05, χ 42 = 9.488

Decision: Since 12.38 > 9.488, reject Ho. There is sufficient evidence at 0.05
level of significance that music preference and intelligence are not
independent.
CHAPTER 10

Regression and Correlation

10.1 Correlation Coefficient

Definition. The linear correlation coefficient, denoted by ρ (rho), is a measure of the


strength of the linear relationship existing between two variables, X and Y,
that is independent of their respective scales of measurement.

Remarks:
• -1 < ρ < 1
• A positive ρ means that the line slopes upward to the right; a negative ρ means
that it slopes downward to the right.
• When ρ is 1 or –1, there is perfect linear relationship between X and Y and all
the points (x,y) fall on a straight line. A ρ close to 1 or –1 indicates a strong
linear relationship but it does not necessarily imply that X causes Y or Y
causes X. It is possible that a third variable may have caused the change in
both x and y, producing the observed relationship.
• If ρ = 0 then there is no linear correlation between X and Y. A value of ρ = 0,
however, does not mean a lack of association. Hence, if a strong quadratic
relationship exists between X and Y, we will still obtain a zero correlation to
indicate a nonlinear relationship.

Definition. The Pearson product moment coefficient of correlation, denoted by r, is

n
 n  n 
n ∑ X i Yi −  ∑ X i  ∑ Yi 
r=
i =1  i =1  i =1 
 2
 2

 n X 2 −  X   n Y 2 −  Y  
n n


 i =1 i ∑ ∑
 i =1   i =1
i i ∑
 i =1  
i

Remarks:
• r is used to estimate ρ based on a random sample of n pairs of measurements
(Xi, Yi), i=1,…,n.
• -1 < r < 1
• Just like ρ, when r = 1 or –1, all the points (xi,yi), i=1,…,n, fall on a straight
line; when r=0, they are scattered and give no evidence of a linear relationship.
Any other value of r suggests the degree to which the points tend to be linearly
related.
94 CHAPTER 10. REGRESSION AND CORRELATION

Some Typical Scatterplots with Approximate Values of r:

(a) Strong positive linear correlation; r is near 1

y *
**
* *
*
**
*

(b) Strong negative linear correlation; r is near -1

y *
**
* *
*
**
*
x

(c) No apparent linear correlation; r is near 0

y * * * *
* * * * *
* * *
* *
*

(d) Quadratic relation, r is near 0

y *
* *
* *
* *
* *

x
ELEMENTARY STATISTICS 95

Example: Consider the data given below. Let X represent the lot size and Y
represent the man hours required.

Observation Lot Size Man Hours


No. (X) (Y)
1 30 73
2 20 50
3 60 128
4 80 170
5 40 87
6 50 108
7 60 135
8 30 69
9 70 148
10 60 132

Construct the scatterplot and compute r.

Scatter Plot of Lot Size versus Man Hours


180
160
140
120
MAN HOURS

100
80
60
40
20
0
0 10 20 30 40 50 60 70 80 90
LOT SIZE

ΣX = 500
ΣY = 1100
ΣXY = 61800
Σ X2 = 28400
Σ Y2 = 134660
r = 0.99780
96 CHAPTER 10. REGRESSION AND CORRELATION

10.2 Testing the Correlation Coefficient

Ho Test Statistic Ha Critical Region

ρ=0 r n−2 ρ<0 t < -tα


t= ρ>0 t > tα
1− r2
ρ≠0 | t | > tα/2
v=n–2

10.3 Simple Linear Regression

Equation of a Straight Line


y = β o + β 1x where β0 = y-intercept ; the value of y when x=0
β1 = slope of the line; change in y for a 1-unit increase in x

Deterministic Model vs. Probabilistic Model

The linear model y = βo + β1x is said to be a deterministic mathematical model


because, when a value of x is substituted into the equation, the value of y is
determined and no allowance is made for error.

In contrast, the linear model y = βo + β1x + ∈ (where ∈ is a random error, the


difference between an observed value of y and mean value of y for a given value
of x) is said to be a probabilistic mathematical model because this model
assumes that for any given value of x the observed value of y varies in a random
manner and possesses a probability distribution with mean E(Y|X=x) = βo + β1x.

Definition: The simple linear regression model is given by:

Y = β o + β 1X + ∈

where Y = response variable


X = explanatory or predictor variable
∈ = random error
βo = y-intercept
β1 = slope of the line

Linear regression models that involve two or more explanatory variables are
called multiple regression models.
ELEMENTARY STATISTICS 97

Assumptions of the Model

For any given value x, the response variable Y possesses a normal distribution,
with a mean value given by the equation E(Y|X=x) = βo + β1x and with a variance
of σ2. Furthermore, any one value of Y is independent of every other value.

Estimating βo and β1

The formulas for bo (estimate of βo) and b1 (estimate of β1) are derived using the
method of least squares where the “best-fitting” line is selected as the one that
minimizes the sum of squares of the deviations of the observed value of Y from
those predicted by the model. The formulas are

n
 n  n 
n∑ X i Yi −  ∑ X i  ∑ Yi 
b1 =
i =1  i =1  i =1 
2
n
 n 
n ∑ X i2 −  ∑ X i 
i =1  i =1 

bo = y − b1 x

Predicting the Value of Y Given X=x

The predicted value of Y, denoted by ŷ, is computed by substituting x in the


prediction equation

yˆ = bo + b1 x

Remarks:

• The calculated prediction equation is appropriate only for the relevant range of
X that includes all values of X used in developing the regression model.
Hence, when predicting Y for a given value of X, one may interpolate only
within this relevant range of the X values. Extrapolation in predicting Y for
values of X outside the relevant range would result in a serious prediction
error.

• If X = 0 is not included in the range of the sample data, then b0 will not have a
significant interpretation.
98 CHAPTER 10. REGRESSION AND CORRELATION

Coefficient of Determination

The coefficient of determination is defined as the proportion of the variability in the


observed values of Y that can be explained by X. Denoted by R2, this coefficient is
nothing but the square of the correlation coefficient between X and Y.

Inferences Concerning the Slope of the Line, β1

An estimator for σ2 is
n

SSE
∑(y i − yˆ i ) 2
S2 = = i =1

n−2 n−2

where SSE stands for sum of squares of errors.

A (1-α)100% Confidence Interval for β1 is

( b1 - tα/2 (v= n - 2) sb1 , b1 + tα/2 (v= n - 2) sb1)

s2
where sb1 = 2
 n 
∑ X i 
X i2 −  i =1 
n


i =1 n

Test of Hypothesis Concerning β1

Ho Test Statistic Ha Critical Region

β1 = 0 b1 β1 < 0 t < -tα


t=
sb1 β1 > 0 t > tα
β1 ≠ 0 | t | > tα/2
v = n-2
ELEMENTARY STATISTICS 99

Example:

Suppose a researcher wishes to investigate the relationship between the achieved


grade-point index (GPI) and the starting salary of recent graduates majoring in
business. A random sample of 30 recent graduates majoring in business is drawn,
and the data pertaining to the GPI and starting salary (in thousands of dollars) are
recorded for each individual in the following table:

Starting
Individual GPI Salary
No. (X) (Y)
1 2.7 17.0
2 3.1 17.7
3 3.0 18.6
4 3.3 20.5
5 3.1 19.1
6 2.4 16.4
7 2.9 19.3
8 2.1 14.5
9 2.6 15.7
10 3.2 18.6
11 3.0 19.5
12 2.2 15.0
13 2.8 18.0
14 3.2 20.0
15 2.9 19.0
16 3.0 17.4
17 2.6 17.3
18 3.3 18.1
19 2.9 18.0
20 2.4 16.2
21 2.8 17.5
22 3.7 21.3
23 3.1 17.2
24 2.8 17.0
25 3.5 19.6
26 2.7 16.6
27 2.6 15.0
28 3.2 18.4
29 2.9 17.3
30 3.0 18.5
100 CHAPTER 10. REGRESSION AND CORRELATION

a. Find the equation of the regression line.


b. Find an estimate for the starting salary if the GPI is 2.5.
c. Test for the significance of β1 at α = 0.05.
d. Compute and interpret the correlation coefficient and the coefficient of
determination.
e. Test for the significance of ρ at the 0.01 level of significance.

Scatter Diagram of Grade-Point Index versus Starting


Salary
25.0

20.0
STARTING SALARY

15.0

10.0

5.0

0.0
0.0 1.0 2.0 3.0 4.0
GRADE-POINT INDEX

b0 = 6.418245
b1 = 3.928191

r = 0.865088
R2 = 0.748377

ΣX = 87.0
ΣY = 534.3
ΣXY = 1564.24
Σ X2 = 256.06
Σ Y2 = 9593.41

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy