50% found this document useful (2 votes)
856 views24 pages

Statistics A. Introduction

This document provides an introduction to statistics concepts and processes. It defines key terms like statistics, descriptive statistics, inferential statistics, population, sample, statistic, and parameter. It then discusses measures of central tendency including mean, median, and mode for both ungrouped and grouped data. Examples are provided for calculating the mean, median, and mode of various data sets. The document concludes with exercises for students to practice calculating common statistical measures.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
50% found this document useful (2 votes)
856 views24 pages

Statistics A. Introduction

This document provides an introduction to statistics concepts and processes. It defines key terms like statistics, descriptive statistics, inferential statistics, population, sample, statistic, and parameter. It then discusses measures of central tendency including mean, median, and mode for both ungrouped and grouped data. Examples are provided for calculating the mean, median, and mode of various data sets. The document concludes with exercises for students to practice calculating common statistical measures.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 24

MODULE 5

Statistics

A. Introduction
In this age of information technology, it is important how raw data are processed
and translated into useful information. Because of society’s ever increasing need for
information technology, education should develop in students an understanding of the
concepts and processes of statistics. This should include collection, organization,
analysis and interpretation of data for decision making and predictions. Statistical tools
derived from Mathematics are useful in processing and managing numerical data in
order to describe a phenomenon and predicts values.

B. Objectives:
At the end of the module, students are expected to:
1. Apply a variety of statistical tool to process and
manage numerical data
2. Solve problems involving measures of positions;
3. Use appropriate measures of positions and other
statistical methods in analyzing and interpreting
research data;
4. Advocate the use of statistical data in making
important decisions
5. Recognize the importance of statistical analyses in
making decisions

C. Pre-Test Evaluation:
Solve the following problems.
1. Jose has been working on programming and updating a Web site for his company
for the past 24 months. The following numbers represent the number of hours Jose
has worked on this Web site for each of the past 7 months: 24, 25,31, 45, 50, 66 and
78. What is the mean (average) number of hours that Jeffrey worked on this Wed
site each month?
2. The number of incorrect answers on a true-false competency test for a random
sample of 15 students were recorded as follows: 2, 4, 3, 0, 1, 3, 8, 0, 3, 3, 5, 2, 1, 4,
and 2. Find the
a. mean
b. median
c. mode
3. There are 1500 BSED students of CTE. Solve for the sample size and sample in each
year level.
First year 400
Second year 350
Third year 500
Fourth year 250
4. Calculate the mean deviation of the following scores: 4,8 and 12?
5. Find the standard deviation of 10.2, 13.7, 18.5 and 20.8?
36
6. John got 76 marks in his Statistics test. If the marks of the whole class had a
mean of 52 and a standard deviation of 8, what was John’s standard score?
7. Given the mean of 55 and standard deviation of 8, what score corresponds to
two standard deviations above the mean?

D. Learning Contents:

I. Definition of Terms

Statistics – is concerned with the extraction of information from numerical data and its use in
making inferences about the population from which the data are obtained. Statistics is also
defined as an area of science concerned with the design of experiments or sampling procedures,
the analysis of data, and the making of inferences about a population of measurements from
information contained from the sample.

Descriptive Statistics- is concerned with the gathering, classification, and presentation of data
and the collection of summarizing values to describe group characteristics of the data. The
summarizing values most commonly used in descriptive statistics are the measures of central
tendency, of variability, and of skewness and kurtosis. It comprises those methods concerned
with collecting and describing a set of data so as to yield meaningful information.

Inferential Statistics- demands a higher order of critical judgment and mathematical method. It
aims to give information about a large groups of data without dealing with each and every
element of these groups. It uses only a small portion of the total set of data in order to draw
conclusions or judgments regarding the entire set. Among the topics included in the study of
inferential Statistics are the testing of hypothesis using, the Z-test, t-test, simple linear
correlation, analysis of variance, the chi-square test, regression analysis, and time series
analysis. It comprises those methods concerned with the analysis of a subset of data leading to
predictions or inferences about the entire data.
Population – consists of the totality of the observations with which we are concerned.
Sample – is a subset of a population.
Data – is a set of observations, values, elements or objects under consideration. A
complete set of all possible observations or elements is known as population or universe while a
representative of a population is called a sample. Each element is called a data point.
Statistic - any numerical value describing a characteristic of a sample.
Parameter is any statistical information or attribute taken from a population. It is a true
value or actual statistics since its source is the population itself.

II. Measures of Central Tendency

a. Mean of Ungrouped Data


a. population mean – if a set of data x 1, x2,….x, not necessarily all distinct
represent a finite population of size N, then the population mean is
µ = ∑x
N

37
Example: The number of employees at a 5 different drugstores are, 3, 5,6, 4and 6.
Treating the data as a population, find the mean number of employees for the 5 stores.
Solution: Since the data are considered to be a finite population.

µ =3 + 5+ 6 + 4 + 6 = 4
5

b. sample mean – if the set of data x1, x2….x, not necessarily all distinct
represent a finite sample of size n, then the sample mean is

X́ = ∑x
N

Example : A food inspector examined a random sample of 7 cans of a certain brand of tuna to
determine the percent of foreign impurities. The following data were recorded ; 1. ,
2.1,1.7,1.6,0.9,2.7,and 1. . Compute the mean sample.

Solution: X́ = 1.8 + 2.1 + 1.7 + 1.6 + 0.9 + 2.7 + 1.8 = 1.8--- answer
7

b. Median from Ungrouped Data

Median- a set of observations arranged in an increasing or decreasing order of


magnitude is the median value when the number of observations is odd or the arithmetic mean
of the two middle values when the observations is even.

Example: 1. On 5 term tests in Math a student has made grades of 2, 93, 6, 92 and 79. Find the
median for this population grades.

Solution: Arranging the grades in an increasing order of magnitude, we get

79 82 86 92 93

Hence the Md = 86 --- answer

1. The nicotine contents for a random sample of 6 cigarettes of a certain brand are found
to be 2.3, 2.7, 2.5, 2.9, 3.1 and 1.9 milligrams. Find the median?
Solution: If we arrange the nicotine contents in an increasing order of magnitude, we get

1.9 2.3 2.5 2.7 2.9 3.1

and the median is then the mean of 2.5 and 2.7. Therefore,

Md = 2.5 + 2.7 = 2.6 milligrams


2

b. Mode of Ungrouped Data

Mode – is defined as the value of the terms that appears most frequently. Mode does
not always exist. This is certainly true when all observations occur with the same frequency. For

38
some sets of data there may be several values occurring with the greatest frequency in which
case we have more than one mode.

If we are going to examine the five terms below, we will notice that the mode is 15 since
the term 15 appears most frequently,(twice) while each of the other terms appears only once.
Thus, mode is 15.

Mo = 15--- answer

Exercises:

Solve the following problems.


1. The number of incorrect answers on a true-false competency test for a random sample
of 15 students were recorded as follows: 2, 4, 3, 0, 1, 3, 8, 0, 3, 3, 5, 2, 1, 4, and 2. Find
the
a. mean
b. median
c. mode
2. The number of building permits issued last month to 10 construction firm in a small
Midwestern city were 6, 7, 0, 8, 11, 4, 1, 15, 3 , and 7. Treating the data as a population,
find the
a. mean
b. median
c. mode
3. The reaction times for a random sample of 9 subjects to a stimulant were recorded as
2.5, 3.6, 3.1, 4.3, 2.9, 2.3, 2.6, 4.1 and 3.4. Calculate the
a. mean
b. median
4. The employees of a local manufacturing plant pledged the following donations in
dollars, to the United Fund: 20, 30, 15, 5, 20, 10, 25, 10, 50, 30, 10, 5, 15, 25. 50, 10,30,
5,25, 45 and 15. Treating the data as a population, calculate
a. mean
b. mode
5. The average IQ of 10 students in a Mathematics subject is 114. If 9 of the students have
IQs of 101, 125, 118,128, 106, 115, 99, 112 and 109, what must be the other IQ ?

d. Computation of Mean from Grouped Data


Data which are arranged in a frequency distribution are called grouped data.
When the number of items is too large, it is best to compute for the measure of central
tendency and variability using the frequency distribution.

Formula:

x́ = ∑ f x
N
Where:
x́ - mean
f - frequency
X - class mark
N - total frequency
39
Example:
Compute for the mean of the data shown below.
Class Interval f(frequency) X ( class mark) fx
65-69 2
60-64 4
55-59 8
50-54 11
45-49 6
40-44 4
35-39 3
30-34 2
N 40

e. Computation of Median from Grouped Data


The median is the value of the middle item in an ordered arrangement of data. In an ordered
distribution, half of the terms are located above the median and half are below the median.

Formula:

Md = Lm + ( N/2 - ∑ fm - 1 ) i
fm

Where:

Lm = lower limit boundary of the median class


∑ fm - 1 = sum of all frequency before the median class or the cumulative frequency before
the median class
fm = frequency of the median class
i= class size

Example: Compute for the median of the data shown below.


Class Interval f(frequency)
65-69 2
60-64 4
55-59 8
50-54 11
45-49 6
40-44 4
35-39 3
30-34 2
N 40

Solution:
1. Step in is to solve for the cumulative frequency, by adding the frequency starting
from the lowest class interval.
2. Starting from an interval 30-34, the cumulative frequency is 2, 35-39 interval just
add the frequency 2 + 3 = 5, 40-44 interval the cumulative frequency is 5 + 4 = 9.
Continue the process until the last interval.

40
Example: Compute for the mean of the data shown below.
Class Interval f(frequency) <cf( cumulative frequency
65-69 2 40
60-64 4 38
55-59 8 34
50-54 11 26 Median class
45-49 6 15
40-44 4 9
35-39 3 5
30-34 2 2
N 40
3. Locate where the median class falls by dividing n/2 = 40/2 = 20. Look for 20 from
the <cf I, it can be equal to 20 but not lesser that 20. So the middle class falls in a 50-
54 interval.
4. Computing for median:
n/2 = 40/2 = 20
Lm = 50- 0.5 = 49.5
∑ fm - 1 = 15
fm = 11
i= (54- 50) + 1 = 5
5. Substitute the values in the formula,
Md = Lm + ( N/2 - ∑ fm - 1 ) I = 49.5 + ( 20- 15) 5= 49.5 + 2.27
Fm 11

Md = 51.77--- answer

f. Computation of Mode from Grouped Data


The mode is the simplest measure of central tendency. A distribution with only one
mode is said to be unimodal while a distribution with two or mode modes is described as
multimodal. A distribution which has two modes is labeled as bimodal with three modes, as a
trimodal, and so on.
The mode in a frequency distribution is within the class interval with the highest frequency.
The class interval with the highest frequency is known as the modal class. A crude mode may be
determined by taking the class mark with the highest frequency.

Formula in Solving for Mode for Grouped Data:

Mo = Lmo + ( ∆1 ) i
∆ 1 +∆2
Where:
Lmo - lower limit of the modal class( this is the class interval with the highest frequency)
∆1 - the difference between the highest frequency and the frequency just above it, if the
interval starts above , if not otherwise
∆2 - is the difference between the highest frequency and the frequency just below it, if the
interval starts above or vice versa
i - class size or class width

Example: Compute for the mode of the data shown below.


Class Interval f(frequency)
41
65-69 2
60-64 4
55-59 8
50-54 11
45-49 6
40-44 4
35-39 3
30-34 2
N 40

Solution :
a. First is to locate the highest frequency, base on the data above and interval with
highest frequency is at 50-54.
b. Identify the value of :
Lmo = 50- 0.5 = 49.5
∆1 = 11-6 = 5 ( Note : use 6, since the interval starts below)
∆2 = 11- 8 = 3 ( Note: use 8, since the interval starts below)
i =5
c. Substitute the value in the formula:
Mo = Lmo + ( ∆1 )i = 49.5 + ( 5 ) 5 = 49.5 + 3.125 = 52.625 ---- answer
∆ 1 +∆2 5+3
Exercises:
Solve the following problems. Round off your final answer to two decimal places.
1. Solve for a.) the mean, b.) median and c.) mode of the data shown below.
CI f
118 - 126 3
127 - 135 5
136 - 144 9
145 - 153 12
154 - 162 5
163 - 171 4
172 - 180 2
N = 40
2. An achievement test on Economics contained 30 questions. The distribution below
summarizes the results of the test.

Number of Answer Frequency


1–3 1
4–6 4
7–9 8
10 –12 13
13 – 15 19
16 –18 21
19 – 21 18
22–24 12
25 – 27 3
28 – 30 1
100
Find the a).mean b.) median and c.) mode of the above mentioned distribution.

42
3. Given the following frequency distribution, estimate the mean, median and mode.
C.I. f

71.75 4
66.70 8
61.65 14
56.60 22
51.55 27
46.50 19
41.45 17
36.40 11
31-35 3

III. Sampling Procedures


Sampling- is the process which involves taking a part of a population, making
observation on this representative group, and then generalizing the findings to the bigger
group.

Sample Size of the Population

For you to determine the sample size of the population, a formula of Slovin is given as
follows:
n =N
1 + Ne2
where : n = sample size
N = population size
e = desired margin of error
Example: N = 9000; e = 5% n =?

Sampling Strategies

1. Random Sampling – is a method of selecting a sample size from a universe such


that each member of the population has an equal chance of being included in the
sample and all possible combinations of the size have an equal chance of being
selected as the sample ( Weirsma, 1975).

The prerequisites for your random sampling include the following


1. define your population
2. list all members of your population
3. select your sample by employing an adequate procedure where every member has
an equal chance as samples of the investigation.

Type of Random Sampling

a. Table of random numbers


Under this technique, the selection of each member of the population is
left adequately to chance, and every member of the population has an
equal chance of being chosen.

43
b. Lottery Sampling- called the fishbowl technique by Fox (1969), this
procedure can be applied by first assigning numbers to the participants of
your population assembling them in a sampling frame.
Fox (1969) presents two arrangements in using the lottery or fishbowl
technique.
a. sampling without replacement in which drawn pieces of paper with a
number each are no longer returned in the box.
b. sampling with replacement which returns to the box every piece of paper
drawn. This holds the probability constant. For instance, in the previous
example of 50 members, if each slip pulled out of the box is returned, the
probability of choosing one in 50 is maintained throughout the process.

2. Systematic Sampling
Another form of sampling is systematic sampling. Vockell (1983) defines it as a
strategy for selecting the members of a sample that allows only chance and a “system” to
determine membership in the sample. According to him, a “system” is a planned strategy
for selecting members after a starting point is selected at random, such as every fifth
subject, every tenth subject, etc.
3. Stratified Sampling
Stratified sampling is defined as a strategy for selecting samples in such a way that
specific subgroups (strata) will have a sufficient number or representatives within the
sample to provide sample numbers for sub-analysis of the members of these sub-groups
(Vockell, 1983). Stratified random sampling is a process in which certain subgroups, or
strata, are selected for the sample in the same proportion as they exist in the population.
Example: Let us say Sixto and a group of his classmates would want to study the
consumption patterns of a municipality which has a population of 10,000 families. They
would like to draw 1,000 sample units from his population. The clerk from the
municipality hall provided them with the following information: 500 families belong to
the high-income group, 2,500 families to the middle-income group, and 7,000 families to
the low-income group.
The steps in the sampling process would be as follows.

Identify the population and its different strata


N =10,000 , n = 1,000
Table I ( Distribution of the Population)
Strata Number of population
High-income group 500
Middle-income group 2,500
Low-income group 7,000
Total 10,000

Table 2 ( Percent Share of Each Stratum)

Strata Distribution of Population % Share sample


High Income 500 5% 19
Middle Income 2,500 25% 96
Low Income 7,000 70% 270
Total 10,000 100% 385

44
Solve for the sample size using the formula:
N
n=
1+ N e2

Substitute the value in the formula:

n = 10,000 =10,000= 385


1 + 10,000 (.05)2 26
Solve for the sample size every category:
Sample size for high income group : n = 385 x .05 = 19

Middle Income group : n = 385 x .25 =96


Low Income Group : n =385 x .7 = 270
Exercises:
1. Find the sample size of the following population:
a. 568 b. 2590 c. 8765
2. Filamer has a 5000 population. Find a.) sample size and b.) the sample size in each
department.
BSN 1500
HRMT 800
CTE 650
CBA 850
ARTS 500
ECE 700

IV. Mean Absolute Deviation

We have learned that the range is based only on the highest and the lowest values of
the distribution while the quartile deviation identifies only the distance that is haft the range
between the first and the third quartiles. To arrive at a more reliable indicator of the variability
or spread in a distribution we should consider the value of each individual score and determine
the amount by which each varies from the mean of the distribution. One way of doing so is to
use the measure called mean absolute deviation.
In computing for the mean absolute deviation, we consider the extent to which each
individual score in a distribution deviates from the mean of that distribution. In other words, we
subtract the mean from each score to determine the deviation or the distance of each score
from the mean, if X is a score and X is the mean, then X – X́ is the distance of the score from the
mean. We use x ( read little x) to denote a score’s deviation from the mean. Hence

X = X - X́
where:
x = each score’s deviation from the mean
X = the particular score
X́ = the mean

Score above the mean will have positive x values and score below the mean will have
negative values.
45
To get the mean absolute deviation, we get the sum of the absolute values of the mean deviates
then divide it by the total number of cases in the distribution. The formula is:

MAD = ∑ / X - X́ /
N

Where: MAD – the mean absolute deviation

X - the individual score


X́ - the mean
∑ / X - X́ / - the sum of the absolute deviation from the mean
Example: Using the data below solve for the mean absolute deviation.

Values (X) Algebraic deviations Absolute deviations


from the mean from the mean
( X – X́ ) / X–X/
15 15- 17 = -2 2
15 -2 2
17 0 0
18 1 1
20 3 3

Total 8
Solution:
First step is to solve for the mean;

X = 15 + 15 + 17 +18 +20 = 17
5
Then, subtract the mean from each score. Get the sum of the absolute value and
substitute the value in the formula in solving the mean absolute deviation.

8
MAD = ∑ / X - X́ /= = 1.6
5
N
The mean absolute deviation for these raw data of five items is 8 /5 or 1.6. This would
mean that, on the average , the values deviated from the mean value of 17 by 1.6.

Exercises:
1. The ages of the presidents of 6 universities are: 50, 39, 40, 65, and 47. Compute
for the mean absolute deviation.
2. Find the mean absolute deviations of the numbers.
a. 5, 7, 3, 9, 8, 2, 10, 11, 9, 6
b. 12.3 , 9.7, 11.6, 13.3, 14, 10.7

V. The standard Deviation

The standard deviation is a special form of average deviation from the mean. It is
therefore also affected by all the individual values of the items in the distribution. The
standard deviation, denoted by s, is a positive square root of the arithmetic mean of the
46
squared deviations from the mean of the distribution. It is important as a measure of
heterogeneity or unevenness within the set of observations. For example if the standard
deviation of the IQ scores of a class of fifty students is numerically big, then we can say
that there is heterogeneity in their intelligence. If the standard deviation is small, we can
say that there is homogeneity in their intelligence.

Computation of Standard Deviation from ungrouped Data:

Formula:

∑ ( x− X́ ) 2
S=
√ n

Where:
S = standard deviation
X = the value of each item
X́ = computed mean
N = total cases
Procedure to compute for standard deviation from ungrouped data
1. Compute the mean.
2. Get the deviation from the mean using (X- X )
3. Square the deviations from the mean.
4. Get the sum of the squared deviations.
5. Divide the sum by the total frequency.
6. Extract the square root of the quotient of step 5.

Example: Using the data below solve for the standard deviation.

X ( X – x́ ) ( X - X )2

15 15 – 17 = 2 4
15 2 4
17 0 0
18 1 1
20 3 9

Solve for the mean:


x́ =15 + 15 + 17 + 18 +20= 17
5
Solve for the standard deviation:

∑ ( x−x )2
S=

S = √ 18/5
n

S = 1.9 ----- answer

Exercises:

47
1. The weights in kilos of ten students are : 50, 55, 48, 60, 54, 48, 57, 45, 52, and 63.
Find the standard deviation.
2. After 10 weeks of pre-service training sessions, the test results of David (X) and Robert
(Y) were as follows:
X : 58, 59 , 60, 54 , 65, 66, 52, 75, 69, 52
Y: 56, 87, 89, 78, 71, 73, 84, 65, 66, 48
Which would you consider more consistent?

VII. Range and Coefficient of Variation

R = Max – Min or Highest – Lowest

Example: Find the range of the following scores: 10, 13, 11, 9, 13, 12, 11, 14 and 15

Exercises:

Solve for the range:


1. 100, 80, 60, 95, 88 and 98
2. 9,20,6, 15, 40, 56, 13, 18

VIII. The Normal Distribution

The normal distribution is central in the study of statistics as it is the basis for solving various
types of statistical problems. In most cases, the distribution of variables such as grades of
students, weights, heights of persons, incomes of families, or IQ’s of children may be said to
approximate a normal distribution. If we take the heights of adults as an example, the normal
distribution means roughly that:

-there are relatively few short adults


-there are relatively few tall adults
-the height of most adults will tend towards the middle value ( the average value)
between the shortest and the tallest.
If we describe the situation by way of a histogram, we have the following.

many

F few

few

short tall average

48
In the histogram, the height of each rectangle represents frequency or, in our
example, number of adults. Our histogram, then tells the story at a glance – few short adults,
few tall adults and many adults with average heights.

Skewed Distribution

While the distribution of many variables tend to approximate a normal distribution,


there are some variables whose distributions are non-normal. We call such types skewed
distributions

Skewed distribution are of two types:

1. Skewed to the right


2. Skewed to the left

A distribution that is skewed to the right has a tail that is longer on the right end:

While one that is skewed to the left tail:

A few examples of skewed distribution are:

- age at marriage
- mortality age for certain diseases
- certain industrial measurements
- various biological measurement

Properties of a normal curve

A normal curve, which is bell-shaped figure, has the following six properties:

1. It is symmetrical about X.
2. The mean is equal to the median, which is also equal to the mode.
3. The tails or ends are asymptotic relative to the horizontal line.
4. The total area under the normal curve is equal to 1 or 100%.
5. The normal curve area may be subdivided into at least three standard scores each
the left and to the right of the vertical axis.

49
6. Along the horizontal line, the distance from one integral standard score to the next
integral standard score is measured by the standard deviation.

Areas under the normal curve

The first step in finding areas under the normal curve is to convert the normal curve of
any given variable into a standardized normal curve using the formula:

Example:

x́ = 50 - mean
S = 5 – standard deviation
X = 60
Z = 60 -50
5
Z = 2 – This means that there are two standard deviations ( s = 5) between 60 and 50.
Similarly, for X = 30
Z =30 – 50
5
Z = -4 - This means that there are four standard deviations between 30 and 50. The negative
Sign indicates that the sign value of X ( which is 30) is less than the mean, 50.
Statistician, by using the formula for standard scores, have formulated a statistical table
showing the area for various portions under the standardized normal curve.
The table of Areas Under the Standard Normal Curve gives the area for only the right half of
the normal curve. It is necessary to give the area for the left haft because of the property of
symmetry of the normal curve. Since the right half is equal to the left half, one need only find
the area on the right haft in order to find the corresponding area on the left haft.

For instance:

At Z = 1.52, Area = 0.4357

0.4357 is the area from Z = 0 to Z = 1.52

_______________________________________________________________ Z

To find area under the normal curve

Example:

1. Find the area under the normal curve from Z = 0 to Z = 1.2


Solution:
Find the probability that z is from 0 to 1.2
From the table: at Z = 1.2 the area = 0. 3849 --- answer
50
2. Find the area under the normal curve from Z = - 0.68 to Z = 0?
Solution:
Our table does not give areas on the left half of the normal curve, but by making use
of the property of symmetry, we can find the required area in this problem by finding
the area from Z = 0 to Z = + 0.68, thus
At z = 0.68 the area is 0.2517 or 25.17% --- answer
3. Find the area under the normal curve from Z = 0.81 to Z = 1.94?
Solution:
From the table Z = 0.81, the area A = 0.2910, this is the area from z = 0 to z = 0.81
Z = 1.94 , the area A = 0.4738, this is the area from Z = 0 to Z = 1.94
Therefore the required area is .4738 – 0.2910 = 0.1828 or 18.28% --- answer
Exercises:
1. Find the area from Z = -0.46 to Z = 2.21?
2. Find the area to the left of Z = -0.6?

A. Z-score

Z = X - X́
S

Where:
Z = standard score
X́ = mean
S = standard deviation
X = a given value of a particular variable

Example:
1. In a Statistics examination, the student’smean grade is 78 and the standard deviation is
10.Find the corresponding Z scores if the grades is 93.

x́ = 78 - mean
S = 10 – standard deviation
X = 93
Z = X - X́
S

Z = 93 - 78
10
Z = 1.5 – This means that there are 1.5 standard deviations ( s = 10) between 93 and 78.
2. The average weekly income of 2000 workers is P151 with a standard deviation of P15.
Assuming that the weekly incomes are normally distributed, find the number of workers who
earn:
a. from P119.50 to P155.50 per week.
b. less than or equal to P127.50 per week
c. greater to equal to P185.50 per week
Solution:
a. First let us give the bare outline of the solution. The weekly salaries are normally
distributed. Therefore the problem may be solved through areas under the normal
curve.
Given: x = P119.50, x́ = P151.00, s = P15
51
Substitute the values in the formula:
Z = X - X́ = P119.50 – P151 = -2.1
S P15
For P155.50, the z score is:

Z = X - X́ = P155.50 – P151 = 0.3


S P15
From the table:

Area from z = 0 to z = 0.3 , A = 0.1179


Area from z = 0 to -2.1, is by symmetry, the same as the area from z = 0 to z +2.1 the area, A =
0.4821
The total area shaded area:
The total shaded area: 0.4821
+ 0.1179
0.6000 or 60%
This means that 60% of the 2,000 workers have weekly incomes from P119.50 to P155.50
Therefore, the numbers of workers whose incomes fall from P119.50 to P155.50
= 60% of 2000 = 0.6 x 2000 = 1,200 workers ---- answer
b. The area to the left of P127.50 is shaded since we want to find the number of workers
earning less than or equal to this amount.
The z-score of P127.50 is
Z = X - X́ = P127.50 – P151 = -1.56
S P15
The standard normal curve is: The area from z=0 to z = -1.56 is, by symmetry the same
as the area from z = 0 to z = + 1.56
From the table z = 1.56 the area is 0.4406
Solving for the shaded area, we have:
0.5000 - 0.4406 = 0.0594 or 5.94% - This means that only 5.94% of the 2,000 workers
earn weekly incomes less than or equal to P127.50
Therefore, the number of workers earning less than or equal to P127.50 per week
= 5.94% of 2000
= 0.0594 x 2000 = 118.8 or 119 workers ---- answer
c. The area to the right of P185.50 is shaded since we want to find the number of workers
who earn greater than or equal to this amount per week
The z score of P185.50 is:

Z = X - X́ = P185.50 – P151 = 2.3


S P15
The standardized normal curve is:
The area from z = 0 to z = 2.3 is 0.4893. Therefore, the shaded area is:
0.5000 – 0.4893 = 0.0107 or 1.07 – This means that only 1.07% of the 2,000 workers earn
weekly incomes greater than or equal to P185.50.
Therefore the number of workers earning greater than or equal to P185.50 per week
= 1.07 % x 2,000
= 0.0107 x 2,000
= 21.4 or 21 workers ---- answer

52
Hypotheses

Hypotheses- is a statement or tentative theory which aims to explain facts about the
real world. Most hypotheses have their origin in a question to some practical problems. In
search for an answer, “educated guesses” and pertinent evidences are brought out which later
on turned into propositions or hypotheses. These hypotheses are then subjected to testing. If
they are found to be statistically true, they are accepted, if they are found to be false, they are
rejected.

Level of Significance

The significance level of a test is the maximum value of probability of rejecting the null
hypotheses. A 5% significance level means that we are 95% confidence that we have made the
right decision.

Steps in Hypotheses – testing

1. Formulate the null hypotheses ( H0 ) that there is no significant difference between


items being compared. State the alternative hypotheses (Ha) which is used in case (Ho)
is rejected.
2. Set the level of significance.
3. Determine the test to be used. Used t-test if the standard deviation given is from the
samples.
4. Determine the tabular value for the test. For a t-test, one must first compute for the
degrees of freedom, then look for the tabular value from the table of t-distribution. For
a single sample,
df = number of items – 1 = n -1. For two samples, df = n + (n – 2) , where n1 refers to
the number of items in the first sample, and n 2 refers to the number of items in the
second sample.
t-test

Formulas: t-test

Sample mean compared with population mean

t= ( x́ - u ) √ n−1
s

5. Compare the computed value with its corresponding tabular value, then state your
conclusion based on the following guidelines:
a. Reject H0 if the absolute computed value is equal to or greater than the absolute
tabular value.
b. Accept H0 if the absolute computed value is less than the absolute tabular value.
Examples:
1. A researcher knows that the average height of Filipino women is 1.525 meters. A
random sample of 26 women was taken and was found to have a mean height of 1.56
meters, with a standard deviation of .10 meters. Is there a reason to believe that the 26
women in the sample are significantly taller than the others at .05 significance level?
53
Solution:
Step 1. H0 : The sample is not significantly taller than the other Filipino women.
Ha : The sample is significantly taller than the other
2. Set the alpha level = 0.05
3. The standard deviation given is based on the sample. Therefore, the t-test must be used.
4. First we must look for the degree of freedom. Since the only one sample is given, df = n-
1
df= 26 -1 = 25
The tabular value for one-tailed test at 25 df and .05 , the significance level is 1.708
5. The formula to be used is:
t= ( x́ - u ) √ n−1 = 1.56 – 1.525√ 26−1 = 1.75
s .10
6. The absolute value is greater than the absolute tabular value. Therefore H 0 is rejected.
The sample is significantly taller than the other.

Exercises:
Solve the following problems.
1. Suppose we want to determine whether or not the average weekly budget for a
family of four within a certain income bracket in Makati is P2,000. It was gathered that
the variability of such budget is given by a standard deviation of P455. Our decision is to
be based on the mean of a random sample of size 100 which is P2,100. Use 5% level of
significance.
2. The Beta Company is manufacturing steel wire with an average tensil& strength of 50
kilos. The laboratory tests 16 pieces and finds that the mean is 47 kilos and the standard
deviation is 15 kilos. Are the results in accordance with the hypothesis that the
population mean is 50 kilos?

Analysis of Enumeration Data

Enumeration Data are expressed in the form of frequencies which represent the
number of items within specified qualitative descriptions or categories. This type of
data answers the question, How many items satisfy a particular description? or
“How many items belong to a category?”
Enumeration data may be classified to the number of variables described as either
one-way or two-way classification. Each variable is further subdivided into more specified
categories.
A one-way classification has only one variable described by at least two categories. Let
us take for instance, the variable on civil status. Civil status may be subdivided into more
specific categories-single, married, widowed, married but legally separated. Each of these
categories is numerically described in terms of its corresponding frequency.
A variable on intelligence quotient may be described as high, average, or low. A variable
on production quality may be subdivided into non-defective categories.

Analysis of Enumeration Data

54
Analysis of enumeration data is done through the chi-square test (the chi-square symbol
2
is X ). The chi-square is a versatile statistical test names after the chi-square distribution
which are derived under the assumption of normality of the population.

Among the uses of the chi-square are the following:

1. To test the goodness of fit to a normal curve; that is, to find out whether or not a
sample distribution conforms with the hypothetical normal distribution.
2. To find out whether or not an observed proportion is equal to some given ideal or
expected to some given ideal or expected proportion.
3. To test the independence of one variable from another variable.

Steps in the analysis of enumeration data:

1. State the null hypothesis. The null hypothesis may be stated in any of these ways.
a. Ho : The sample distribution conforms with the hypothetical or theoretical
distribution.
b. Ho: The actual observed proportion is not significantly different from the ideal
or expected proportion.
c. Ho: One variable does not depend on the other variable. Or the two variables
are independent from each other.
The first two types of null hypothesis are applicable to data with one-way classification.
The third type of null hypothesis is applicable to data with two classification.

2. Set the level of significance.


3. Determine the degree of freedom using the formula:
df= c-1 for one –way classification
df= (r – 1) (k -1) for two – way classification
where: c –stands for the number of categories of the single variable
r- number of rows describing one variable
k- number of columns describing the other variable
4. Locate the tabular value of X2 in the chi-square distribution table by getting the value
where the desired level of significance and the computed degree of freedom intersect.

5. Calculate the chi-square value using the formula:


X2 =∑ ( f0 - fe)2
Fe

Fo – actual observed frequency

Fe – expected or ideal frequency


In a one – way classification of enumeration, the expected frequency ( fe ) is
computed by multiplying the total frequency (n) by the known proportion (p) of the
category.

Fe= np

55
In two-way classification, the expected frequency is computed by multiplying the
subtotal of the intersecting categories, then dividing the product by the total frequency
represented by the grand total of the contingency table.

Fe =( subtotal A) ( Subtotal B)
Grand Total

6. State the conclusion arrived as by the acceptance or rejection of the null hypothesis.
If the computed value of chi-square is less than the tabular value, the null hypothesis is
accepted but if the computed chi-square is greater than the tabular value, the null
hypothesis is rejected.

Example problems for one way classification

One-way classification of the Civil Status of 50 employees

1. Based on the data shown below is the actual observed proportion significantly different
from the expected proportion, if the ideal or expected or expected proportion is 30%
married, 50% single, 10% widowed and 10% legally separated?

Status Frequency

Single 18

Married 24

Widowed 5

Leg. Separated 3

Total 50

Let us analyze enumeration data with two-


way classification using the data below. Data with two- way classification may be tested to
find out whether or not a variable is independent from the other variable.

Examples of two way classification:

1. Does attitude toward household chores depend on sex for 50 children considered in the
table shown below. Set at 5 % level of significance.
Contingency Table on Sex and Attitude Toward Household Chores of 50 College Students

Sex Boys Girls Total

Attitude

Positive 9 21 30

56
Negative 9 11 20

Total 18 32 50

2. Using the data below, test the hypotheses that academic performance does not depend
on IQ at 1% significance level.

High Average Low Total

IQ

Academic
performance

Passed 31 45 4

Failed 1 4 15

Total

Exercises:

1. In an experiment to study the dependence of hypertension on smoking habits, the


following data were taken on 180 individuals:

Non smokers Moderate Smokers Heavy Smokers

Hypertension 21 36 30

No Hypertension 48 26 19

Test the hypothesis that the presence or absence of hypertension is independent of smoking
habits. Use a 0.05 level of significance.

E. Formative Test:

Solve the following problems.

1. The following are the annual income of seven companies (in million pesos): 1.3, 6.6, 10.5,
12.6, 20.7, 7.3. Calculate the median income of the seven companies?
2. Consider the following set of data: 22, 31, 17, 35, 58, 89. What is a.) the mean, b. ) median
and c.) mode of the given set of data?

57
3. These are the years of service of 8 employees: 9, 11, 16, 12, 20, 17, 18 and 19. Calculate the
median years of service of these 8 employees.

4.The following are wages per day of workers in a certain company.

Wages Number

P10-11 5
12.13 7
14.15 9
16.17 11
18.19 15
20.21 13
22.23 8
24.25 3
Find the mean, median and mode wage?

5. Consider the following frequency distribution of ages of employees at a certain school.

Age Frequency
21-30 9
31-40 11
41-50 18
51-60 8
61-70 4
a. What is the class size?
b. What is the median class?
c. What is the modal class?
d. What is the lower class boundary of the median class?
e. What is the class mark of the median class?
f. What is the less than cumulative frequency of the class preceding the median class?
g. What is the mean of the distribution?
h. What is the median of the distribution?
i. What is the mode of the distribution?
6. Find the standard deviation of the numbers: 5, 7, 3, 9, 8, 2, 10, 11, 9, 6 ?
7. Solve for the mean absolute deviation of : 12.3 , 9.7, 11.6, 13.3, 14, 10.7
8. Consider the scores of 10 qualifiers for a scholarship: 92, 93, 95, 90, 89, 92, 94, 95, 91 and 96,
find
the range of the scores.
9. If the range of a set of scores is 24 and the lowest score is 7, what is the highest score?
10. Determine the mean deviation of the following scores: 8, 10, 9, 9, 7.
11. The average grade of 20 students in a certain test is 90. The average grade of the rest of the
class is
80. If there are 50 students in the class, what is their average grade?
12. Find the value of m if the mean of 30, 70, 110, m and 200 is 110.
13. In the Statistics examination, the mean grade is 78 and the standard deviation is 10,
a. Find the corresponding Z scores of two students whose grades are 93 and 62 respectively.
58
b. Find the grades of two students whose z scores are -0.6 and 1.2 respectively.
14. Assuming that the heights (x) of college male students is normally distributed with a mean of
69 inches and standard deviation of 3 inches, calculate the probability that
a. x ≤ 65inches
b. 65 inches ≤ x ≤ 70 inches
15. Find the area under the normal curve from z = 0 to z = 1.23?
16. Find the area under the normal curve from z = 1.62 to z = 0.43?

F. References:
1. Baltazar, Ragaso and Evangelista, Mathematics in the Modern World; C & E Publishing, Inc.
(2018)
2. Daligdig, Romeo, M. Mathematics in the Modern World: Lorimar Publishing Inc. ( 2019)
3. Earnhart& Adina, Mathematics in the Modern World, C & E Publishing, Inc. (2018)
4. Pagala, R. Statistics, Pagala, R. Statistics, Mindshapers Co. Inc. (2008)
5. Mendenhall, W. Introduction to Probability and Statistics 14 Th ed. Singapore: Sengage
Learning Asia, (2014)

59

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy