Statistical Analysis With Software Application PDF
Statistical Analysis With Software Application PDF
STATISTICAL
general manager decide which player might be the
best fit for a team. It is used in politics to help
candidates understand how the public feels about
CONCEPTS various policies. And statistics is used in medicine
to help determine the effectiveness of new drugs.
Used appropriately, st at istics can enhance
our understanding of the world around us. Used
Objectives: inappropriately, it can lend support to inaccurate
After successful completion of this beliefs. Understanding statistical methods will
module, you should be able to: provide you with the ability to analyze and critique
studies and the opportunity to become an informed
• Define statistics. consumer of information. Understanding statistical
methods will also enable you to distinguish solid
• Enumerate the importance analysis from bogus “facts.”
and limitations of statistics Many people say that statistics is numbers. After all,
we are bombarded by numbers that supposedly
• Explain the process of statistics represent how we feel and who we are. Certainly,
statistics has a lot to do with numbers, but this
• Know the difference definition is only partially correct. Statistics is also
between descriptive and about where the numbers come from (that is, how
inferential statistics. they were obtained) and how closely the numbers
reflect reality.
• Distinguish between
Statistics is the science of collecting, organizing,
qualitative and quantitative
summarizing, and analyzing information to draw
variables. conclusions or answer questions. In addition,
statistics is about providing a measure of
• Distinguish between discrete confidence in any conclusions.
and continuous variables.
Let’s break this definition into four parts. The first
• Determine the level of part states that statistics involves the collection of
information. The second refers to the
measurement of a
organization and summarization of information.
variable.
The third states that the information is analyzed
to draw conclusions or answer specific
questions. The fourth part states that results
should be reported using some measure that
represents how convinced we are that our
conclusions reflect reality.
• Statistics is important because it enables 4. Statistics table may be misused.
people to make decisions based on
empirical evidence. 5. Statistics is only, one of the methods of
studying a problem.
• Statistics provides us with tools needed to
convert massive data into pertinent Definitions:
information that can be used in decision
making. • Universe is the set of all entities under
study.
• Statistics can provide us information that we
can use to make sensible decisions. • A Population is the total or entire group of
individuals or observations from which
What information is referred to in the information is desired by a researcher.
definition? Apart from persons, a population may
consist of mosquitoes, villages, institution,
The information referred to the definition is the etc.
data. According to the Merriam Webster
dictionary, data are “factual information used • An individual is a person or object that is a
as a basis for reasoning, discussion, or member of the population being studied.
calculation”.
• A statistic is a numerical summary of a
Data can be numerical, as in height, or sample.
nonnumerical, as in gender. In either case,
data describe characteristics of an individual. • Sample is the subset of the population.
Example: A.
A. women.
than
4. Men are better in math
5. Forty percent of the 10. Brands of soft
drinks employees of an organization were recorded
tardy for at least 15 working days. 11. Socioeconomic status
7. Employee number
8. Civil status
9. Equity accounts
MODULE 2: DATA COLLECTION
AND BASIC CONCEPTs day life. It is a common practice that people receive
large quantities of information everyday through
IN SAMPLING DESIGN
conversations, televisions, computers, the radios,
newspapers, posters, notices and instructions. It is
just because there is so much information available
that people need to be able to absorb, select and
reject it. In everyday life, in business and industry,
Objectives:
certain statistical information is necessary and it is
After successful completion of this independent to know where to find it how to collect
module, you should be able to: it.
• Determine the sources of Analysis of data can lead to powerful results. Data
data (primary and secondary can be used to offset anecdotal claims, such as the
suggestion that cellular telephones cause brain
data).
cancer. Anecdotal means that the information being
conveyed is based on casual observation, not
• Distinguish the different
scientific research. Because data are powerful, they
methods data collection under can be dangerous when misused. The misuse of
primary and secondary data. data usually occurs when data are incorrectly
obtained or analyzed. For example, radio or
• Determine the television talk shows regularly ask poll questions for
appropriate sample size. which respondents must call in or use the Internet
to supply their vote. Most likely, the individuals who
• Differentiate various sampling are going to call in are those who have a strong
techniques. opinion about the topic. This group is not likely to be
representative of people in general, so the results of
• Know the sources of errors
the poll are not meaningful. Whenever we look at
in sampling.
data, we should be mindful of where the data come
from.
The secondary data can be collected by The sample size is typically denoted by n and
the following five methods: it is always a positive integer. No exact
sample size can be mentioned here and it can
1. Published report on newspaper and
vary in different research settings. However,
periodicals.
all else being equal, large sized sample leads
2. Financial Data reported in annual reports. to increased precision in estimates of various
properties of the population.
3. Records maintained by the institution.
Take Note!
4. Internal reports of the government
-
departments. Representativeness, not size, is the more
important consideration.
5. Information from official publications. -
Use no less than 30 subjects if possible.
Take Note! -
If you use complex statistics, you may need
• Always investigate the validity and reliability a minimum of 100 or more in your sample
of the data by examining the collection (varies with method).
method employed by your source.
SAMPLE SIZE
3. Degree of Variability
1. Level of Precision
n≥ ( e
)
Also called sampling error, the level of
precision, is the range in which the true value where:
of the population is estimated to be.
Z is the z-score corresponding to level of
2. Confidence Interval confidence.
• Estimating Proportion (Infinite The conservative formula using the strong law
Population) of large number.
0.05.
P is population proportion.
formula: It dependents on x
= p
Example:
N
n≥
1 + Ne2
Where:
e is the level of
precision. Example:
Sampling Procedure
-
Identify the population.
-
Determine if population is accessible.
-
Select a sampling method.
-
Choose a sample that is representative of
the population.
-
Ask the question, can I generalize to the
Simple Random Sampling
general population from the accessible
population?
N PopulationSize
k = n = SampleSize
data is
Example:
data is
Solution:
Given:
N1 = 200 N2 = 300 N n
n1 =
n 50
N1 = 200 = 20
( N ( 500
n2 = ) )
n N2 = 50 300 = 30
( N ( 500
) )
The sample sizes are 20 from A and 30 from
B. Then the units from each institution are to
be selected by simple random sampling.
Example:
Disadvantage: In actual field applications, 1. Organize the sampling process into stages
adjacent households tend to have more where the unit of analysis is systematically
similar characteristics than households grouped.
distantly apart.
2. Select a sampling technique for each
3. Systematically apply the sampling
technique to each stage until the unit of
analysis has been selected.
Example:
-
Any errors that cannot be attributed to the 7. Audio and video
sample-to-sample variability. recordings
1. Non-responses 9. Newspaper
REFERENCES:
h t t p : / / w w w. n a t c o 1 . o r g / r e s e a r
c h / fi l e s /SamplingStrategies.pdf
https://data36.com/statistical-bias-types-
explained/
MODULE 3: DESCRIPTIVE STATISTICS
OBJECTIVES:
After successful completion of this module, you should be
able to:
◆ Distinguish the three main forms of data presentation.
◆ Know the different parts of the table.
◆ Choose appropriate diagrams/graphs to present a given set of
data.
◆ Organize qualitative and quantitative data in tables.
◆ Compute measures of central tendency, measures of variation and
measures of relative position of grouped and ungrouped data.
◆ Describe the shape of a distribution.
◆ Identify regions under the normal curve corresponding to
different standard normal values.
◆ Compute probabilities using the standard normal table and Excel.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Data Presentation
Data are usually collected in a raw format and thus
the inherent information is difficult to understand.
Therefore, raw data need to be summarized,
processed, and analyzed to usefully derive
information from them. However, no matter how well
manipulated, the information derived from the raw
data should be presented in an effective format,
otherwise, it would be a great loss for both authors
and readers. Planning how the data will be presented
is essential before appropriately processing raw data.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Presentation of Data
Presentation of data refers to an exhibition
or putting up data in an attractive and useful
manner such that it can be easily
interpreted.
The three main forms of presentation of
data are:
Textual Presentation
Tabular Presentation
Polytechnic University of the
Philippines College of Science
Graphical Presentation
Example:
A researcher is asked to present the performance of a section in
the statistics test. The following are the test scores:
34 42 20 50 17 9 34 43
50 18 35 43 50 23 23 35
37 38 38 39 39 38 38 39
24 29 25 26 28 27 44 44
49 48 46 45 45 46 45 46
The data presented in textual form would be like this:
In the statistics class of 40 students, 3 obtained the perfect
score of 50. Sixteen students got a score 40 and above,
while only 3 got 19 and below. Generally, the students
performed well in the test with 23 or 70% getting a passing
score of 38 and above.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Remember!
◆ Keep your paragraphs simple and short.
◆ Always make sure that the readers are provided
with additional explanations about the relevance
of the figures and its implications.
Polytechnic University of the
Philippines College of Science
Tabular Presentation:
•
It is a systematic and logical arrangement of
data in the form of Rows and Columns with
respect to the characteristics of data.
•
A table is best suited for representing
individual information and represents both
quantitative and qualitative information.
Preparing Tables
The making of a compact table itself is an art. This should
contain all the information needed within the smallest possible
space. What the purpose of tabulation is and how the tabulated
information is to be used are the main points to be kept in mind
while preparing for a statistical table. An ideal table should
consist of the following main parts:.
A. Title: The title must tell as simply as possible what is in the
table. It should answer the questions:
◆ Who? White females with breast cancer, black males with
lung cancer.
◆ What are the data? Counts, percentage distributions, rates.
https://byjus.com/commerce/tabular-presentation-of-data/
One exception to the requirement of equal class widths occurs Scores Frequency
in open-ended tables. A table is open ended if the first class 10 - 19 25
has no lower class limit or the last class has no upper class 20 - 29 36
limit. 30 - 39 40
40 and over
12
Guidelines for Determining the Lower Class Limit of the First Class and Class Width
Choosing the Lower Class Limit of the First Class:
Choose the smallest observation in the data set or a
convenient number slightly lower than the smallest
observation in the data set.
For example, the smallest observation is 10.2. A convenient
lower class limit of the first class is 10.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Solution:
To answer this question we need to construct a frequency
distribution to determine how many female and male
respondents participated in the study.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
◆
If the data is in the form of quantitative data
Steps
4. Highlight your data for the “INPUT RANGE”.
5. Highlight your data for the “BIN RANGE”.
6. Click the box of “LABELS IN FIRST ROW”
then click “OK”.
7. The result will appear on the new worksheet of
the excel file. Get the Percentage and total.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Answer:
◆ Useless Information – Don’t show decimals if they are not
needed.
◆ Poor Alignment – Make sure alignment makes sense.
•
Don’t center numbers, always right justify – try to align
decimal points.
•
Consider the appropriate placement of row titles.
◆ Difficult to Read – Use commas used when the number exceeds
a thousand.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Graphical Presentation
◆ A graph is a very effective visual tool as it displays data at
a glance, facilitates comparison, and can reveal trends and
relationships within the data such as changes over time,
and correlation or relative share of a whole.
◆ It is considered an important medium of communication
because we are able to create a pictorial representation of
the numerical figures.
◆ Suited when we need to show the results of the study to
nonprofessionals and or people who dislike numbers and too
lengthy texts.
Remember!
•
Bar graphs may also be drawn with horizontal
bars. Horizontal bars are preferable when
category names are lengthy.
•
In bar graphs, the order of the categories does
not usually matter. However, bar graphs that
have categories arranged in decreasing order
of frequency help prioritize categories for
decision-making purposes in areas such as
quality control, human resources, and
marketing.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Histogram
◆ It is constructed by drawing rectangles for each class of
data. The height of each rectangle is the frequency or
relative frequency of the class. The width of each rectangle
is the same and the rectangles touch each other.
◆ It is a graph used to present quantitative data, is similar to
the bar graph.
◆ It is use to organize continuous data.
Line Graph
◆
A graph that shows information that is
connected in some way (such as change over
time)
◆
Line segments are then drawn connecting the
points. It is use to organize continuous data.
◆
Very useful in identifying trends in the data
over time.
◆
Title and label the graphic axes clearly,
providing explanations if needed. Include
units of measurement and a data source when
appropriate.
◆
Avoid distortion.
◆
Minimize the amount of white space in the
graph. Use the available space to let the data
stand out. If you truncate the scales, clearly
indicate this to the reader.
Polytechnic University of the
Philippines College of Science
Guidelines for Constructing Good Graphics
◆
Avoid clutter, such as excessive gridlines
and unnecessary backgrounds or pictures.
◆
Don’t distract the reader.
◆
Avoid three dimensions.
◆
Do not use more than one design in the same
graphic. Let the data speak for themselves.
•
It is the sum of the data values divided by the number of
data values.
•
It is also called the average.
•
It is appropriate only for data under interval and ratio scale
measurement.
Advantage of Mean
◆ Simple to understand and easy to calculate.
◆ It is rigidly defined.
where: where:
xi = data values xi = data values ∑r
∑n xi x¯ = i=1 fxi
= no. of i=1 f = frequency
sample
x¯ n
n = no. of
observations = sample
observations
Population Mean
where:
∑N
where: ∑r
xi = data values
i=1 xi = data values μ= i=1 fxi
N = no. of μ=
N f = frequency N
observations N = no. of
observations
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Remember!
• Whenever you hear the word average, be aware that
the word may not always be referring to the mean.
One average could be used to support one position,
while another average could be used to support a
different position.
• Mode is not always present in the data sets unlike
mean and median.
Notice how the mean of the second data set has been
influenced by the presence of an unusual case/outlier in the
data set. If we were to say the mean is equal to 132.5 for the
second data set and it represents a typical case, this will not
make much sense because the majority of data values are
less than 120. Therefore, the mean should not be used when
unusual, or outlying, data values are present in the data set,
as the mean tends to be extremely sensitive to the unusual
values. Rather, the median should be reported in this case.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
55 - 59 55
50 - 54 23
45 - 49 37
40 - 44 37
35 - 39 48
30 - 34 42
25 - 29 27
Solution:
To compute mean of grouped data, first you need to
fill out this table.
Class Frequency
x fx
Interval (f)
55 - 59 3
It is the midpoint of
50 - 54 6 every class interval.
45 - 49 7
40 - 44 9
To compute this:
35 - 39 6 LC +
30 - 34 4 x=
25 - 29 5 Ex: UP 2
7
55 + 59
Total n= fxi x= 2
= 57
∑
= 50 + 54
i=1
x= = 52
Polytechnic University of the Philippines
2
College of Science
Department of Mathematics and Statistics
Solution:
Class Interval
Frequency
x ∑7
(f)
fx
x¯ = i=1 fxi
55 - 59 3 57 171
50 - 54 6 52 312
n
45 - 49 7 47 329
40 - 44 9 42 378 1,675
35 - 39 6 37 222 =
30 - 34 4 32 128 40
25 - 29 5 27 7 135
= 41.88
Total n = 40 fxi =
∑
1,675
i=1
Solution:
Class n
Interval
f LB < cf First, compute 2 , it will help us to
55 - 59 3 54.5 40
determine the median class and the
50 - 54 6 49.5 37
45 - 49 7 44.5 31
< cf.
40 - 44 9 39.5 24
n 40
= = 20
35 - 39 6 34.5 15 2 2
30 - 34 4 29.5 9
25 - 29 5 24.5 5
The median class is the class
Total n = 40
containing the 20th item. Hence, the
median class is 40 - 44.
n − < cf i
2 (20 − 15)5
= LB + ( )
x̃ x˜ = 39.5 + = 42.28
f 9
Polytechnic University of the
Philippines College of Science
Solution:
Class
The modal class is the classInterval
interval with
f theLBhighest
< cf frequency. The modal class is 40 - 44.
55 - 59 3 54.5 40
50 - 54 6 49.5 37 If there are two class interval that
45 - 49 7 44.5 31 contains the highest frequency,
40 - 44 9 39.5 24 always choose the highest class
35 - 39 6 34.5 15 interval.
30 - 34 4 29.5 9
25 - 29 5
d1
24.5 5 d1 = 9 − 6 = 3
x̂ = LB +
( d1 + d2 ) i d2 = 9 − 7 = 2
x̂ = 39.5 + 3 5 = 42.5
( 3+2)
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Quartiles - split
the ordered data
into four quarters.
Percentiles - split
the ordered data
into 100 equal
parts.
(12)(3)
Qclass = = 9.5
4
2. Use interpolation since the computed Qclass is not an integer.
20 23 24 27 30 32 37 37 4042 48 55
910 11 12
1 2 3 4 5 6 7 8
Q3 = 40 + 0.5(42 −
40)
= 41
D4 = 30 + 0.3(32 − 30)
Polytechnic University of the
Philippines College of Science
= 30.6
(12)(55)
Pclass = + 0.5 = 7.1
100
2. Use interpolation since the computed Pclass is not an integer.
202324273032373740424855
123456789101112
Example 2:
The data given below is the age of the residents in
Barangay 634, Sta. Mesa, Manila. Compute Q1 D7 and
P10.
Class Interval Frequency
55 - 59 55
50 - 54 23
45 - 49 37
40 - 44 37
35 - 39 48
30 - 34 42
25 - 29 27
Solution:
To compute Q1 D7 and P10 of grouped data, first you
need to fill out this table.
Class
f LB < cf
Interval To compute the lower
55 - 59 3
50 - 54 6
boundary, always
45 - 49 7 subtract 0.5 to lower
40 - 44 9 class limit (LC).
35 - 39 6
Ex:
30 - 34 4
25 - 29 5 55 − 0.5 = 54.5
Total n= 50 − 0.5 = 49.5
45 − 0.5 = 44.5
Solution:
Class nk
Interval
f LB < cf First, compute 4 , it will help us to
55 - 59 3 54.5 40
determine the quartile class and the
50 - 54 6 49.5 37
45 - 49 7 44.5 31 < cf.
nk (40)(1)
40 - 44 9 39.5 24 = = 10
35 - 39 6 34.5 15 4 4
30 - 34 4 29.5 9
The quartile class is the class
25 - 29 5 24.5 5
Total n = 40
containing the 10th item. Hence, the
quartile class is 35 - 39.
nk − < cf i
4 Q1 = 34.5 + (10 − 9)5
Q = LB + ( )
= 35.33
k
f 6
Solution:
Class nk
Interval
f LB < cf First, compute 10 , it will help us to
55 - 59 3 54.5 40
determine the decile class and the
50 - 54 6 49.5 37
45 - 49 7 44.5 31 < cf.
(40)(7)
40 - 44 9 39.5 24 = = 28
35 - 39 6 34.5 15 nk 10
30 - 34 4 29.5 9 10
25 - 29 5 24.5 5
Total n = 40 The decile class is the class
containing the 28 item. Hence, the
decile class is 45 - 49.
nk − < cf i (5 − 0)5
100 P10 = 24.5 + = 29.5
P = LB + ( )
k 5
f
Example 2:
The ages of the town’s people in a certain community
is as follows:
Class Interval Frequency
18 - 24 28
25 - 31 54
32 - 38 38
39 - 45 20
46 - 52 17
53 - 59 3
Solution:
To compute Q2 D5 and P50 of grouped data, first you
need to fill out this table.
Class
f LB < cf
Interval
To compute the lower
18 - 24 28 boundary, always
25 - 31 54
subtract 0.5 to lower
32 - 38 38
class limit (LC).
39 - 45 20
46 - 52 17 Ex:
53 - 59 3 18 − 0.5 = 17.5
Total n= 25 − 0.5 = 24.5
32 − 0.5 = 31.5
Polytechnic University of the
Philippines College of Science
Solution: If the arrangement of
the class interval is
Class
Interval f LB < cf a s c e n d i n g o rd e
18 - 24 28 17.5 28 r, always start at the
25 - 31 54 24.5 upper part.
32 - 38 38 31.5
39 - 45 20 38.5 Copy the frequency
46 - 52 17 45.5 of the lowest class
53 - 59 3 52.5 interval.
Total n = 160
160
28 + 54 = 82 + 38 = 120 + 20 = 140 + 17 = 157 + 3 =
Solution:
Class nk
Interval
f LB < cf First, compute 4 , it will help us to
18 - 24 28 17.5 28 determine the quartile class and the
25 - 31 54 24.5 82 < cf.
32 - 38 38 31.5 120 nk (160)(2)
39 - 45 20 38.5 140
= = 80
4 4
46 - 52 17 45.5 157
53 - 59 3 52.5 160
The quartile class is the class
Total n = 160
containing the 80th item. Hence, the
quartile class is 25 - 31.
nk − < cf i
4 (80 − 28)7
Q = LB + ( )
Q2 = 24.5 + = 31.24
k
f 54
Solution:
nk
Class
f LB < cf First, compute , it will help us to
Interval 10
18 - 24 28 17.5 28 determine the decile class and the
25 - 31 54 24.5 82 < cf.
32 - 38 38 31.5 120 (160)(5)
= 80
39 - 45 20 38.5 140 nk
46 - 52 17 45.5 157 10 =
53 - 59 3 52.5 160
10
The decile class is the class
Total n = 160
containing the 80th item. Hence, the
decile class is 25 - 31.
nk − < cf i
10 (80 − 28)7
D = LB + ( )
D5 = 24.5 + = 31.24
k
f 54
Polytechnic University of the
Philippines College of Science
Solution:
nk
Class First, compute , it will help us to
f LB < cf 100
Interval
determine the percentile class and
18 - 24 28 17.5 28 the
25 - 31 54 24.5 82 nk (160)(50)
< cf.
32 - 38 38 31.5 120 = = 80
39 - 45 20 38.5 140 100 100
46 - 52 17 45.5 157 The percentile class is the class
53 - 59 3 52.5 160 containing the 80th item. Hence, the
Total n = 160 percentile class is 25 - 31.
Sample Interpretation:
1. Jennifer just received the results of her SAT exam. Her
SAT Mathematics score of 600 is in the 74th percentile. What
does this mean?
A percentile rank of 74% means that 74% of SAT
Mathematics scores are less than or equal to 600 and 26%
of the scores are greater. So 26% of the students who took
the exam scored better than Jennifer.
Measures of Dispersion/Variability
Based on the figure below, determine which between the
two scatter diagram illustrate larger variability?
Figure 1 Figure 2
•
It is a measure of how far away items in a data set are from
the mean.
•
The larger the standard deviation, the more variation there
is in the data set.
•
The standard deviation can never be a negative number,
due to the way it’s calculated and the fact that it measures a
distance (distances are never negative numbers).
•
The smallest possible value for the standard deviation is 0,
and that happens only in contrived situations where every
single number in the data set is exactly the same (no
deviation).
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
where: where:
n 2
∑ ∑ r f(xi − x¯)2
xi = data
i=1 (xi − xi = data i=1
values s = s =
x¯) values n−1
= mean = mean
n−1
= no. of sample observations f = frequency
= no. of sample observations
Population Standard Deviation
where: where:
xi = data N xi = data ∑r
∑i=1 (xi − i=1
f(xi −
values σ = values σ =
= mean μ)2 N = mean μ)2 N
College of Science
N = no. of observations f = frequency
Polytechnic University of the Philippines N = no. of observations
College of Science
Measures of Dispersion/Variability: VARIANCE
where: where:
xi = data
n ∑ r f(xi − x¯)2
s2 = ∑i=1 (xi − xi = data
s 2=
i=1
2
x¯) n−1
values values
n−1
= mean = mean
= no. of sample observations f = frequency
= no. of sample observations
Population Variance
where: where: ∑ r f(xi − μ)2 N
σ2 = i=1
xi = data N
∑ (xi − xi = data
values 2 i=1 values
σ =
= mean μ)2 N = mean
N = no. of observations f = frequency
Polytechnic University of the Philippines
College of Science
N = no. of observations
Department of Mathematics and Statistics
Example 1:
The data given below is the age of the residents in
Barangay 634, Sta. Mesa, Manila. Compute sample
standard deviation and sample variance.
Class Interval Frequency
55 - 59 55
50 - 54 23
45 - 49 37
40 - 44 37
35 - 39 48
30 - 34 42
25 - 29 27
Solution:
Class
f x fx (x − f(xi − x¯) 2
2
x¯)
Interval i
55 - 59 3 57 171 228.61
50 - 54 6 52 312 102.41
45 - 49 7 47 329 26.21
40 - 44 9 42 378 0.01
35 - 39 6 37 222 23.81
30 - 34 4 32 128 97.61
25 - 29 5 27 135 221.41
7 7
Total n = 40 ∑
fxi = f(xi − x¯)
2 =
i=1 1,675 ∑
1,675 i=1
Solution:
Class
Interval
f x fx (xi − f(xi −
x¯) 2 x¯) 2
55 - 59 3 57 171 228.61 685.83
50 - 54 6 52 312 102.41 614.46
45 - 49 7 47 329 26.21 183.47
40 - 44 9 42 378 0.01 0.09
35 - 39 6 37 222 23.81 142.86
30 - 34 4 32 128 97.61 390.44
25 - 29 5 27 135 221.41 1107.05
7 7
fxi = f(xi − x¯)
2 =
Total n = 40 ∑
i=1 1,675
∑ 3,124.20
i=1
f(x1 − x¯) 2 = 3(228.61) = 685.83
f(x2 − x¯) 2 = 6(102.41) = 614.46
Polytechnic University of the
Philippines College of Science
f(x3 − x¯) 2 = 7(26.21) = 183.47
= 80.11
Shape of Distribution
These two statistics give you insights into the shape of
the distribution.
◆
Skewness is the degree of distortion from the
symmetrical bell curve or the normal distribution. It
measures the lack of symmetry in data distribution.
◆
Kurtosis is a measure of the combined sizes of the
two tails. It tells you how tall and sharp the central
peak is, relative to a standard bell curve.
Kurtosis
It is actually the measure of outliers present in the
distribution. The outliers in a sample, therefore, have
even more effect on the kurtosis than they do on the
skewness.
Higher kurtosis means more of the variance is the
result of infrequent extreme deviations, as opposed to
frequent modestly sized deviations. In other words, it’s
the tails that mostly account for kurtosis, not the
central peak.
The kurtosis decreases as the tails become lighter. It
increases as the tails become heavier.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
• Platykurtic (Kurtosis< 3 ):
Compared to a normal
distribution, its tails are shorter
and thinner, and often its central
peak is lower and broader.
Polytechnic University of the
Philippines College of Science
Percentile Coefficient of Kurtosis
data set.
College of Science
Department of Mathematics and Statistics
◆
It is also known as the Gaussian distribution, after the
German mathematician Carl Friedrich Gauss who first
described it.
◆
It is a probability function that describes how the values
of a variable are distributed.
Normal Curve
50 100 150
and μ σ.
μ−σ μ μ+σ
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Mean:
◆
Changing the mean shifts the entire
curve left or right on the X-axis.
Standard Deviation:
◆
Changing the standard deviation
either tightens or spreads out the
width of the distribution along the X-
axis. μ1 < μ2, σ1 = σ2
Larger standard deviations produce distributions that are more
spread out.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
A. C.
D.
B.
Remember!
Positive values of z-score indicate how far above
the mean a score falls and negative values
indicate how far below the mean a score falls.
Using Table 1
A. Area to the right of a negative z value or to the left of a
positive z value.
Use Table 1 directly
0 z1 z1 0
B. Area between z values on either side of 0.
= 0 z2
z1 0 z2 z1 0
1 − Area
C. Area between z values on same side of 0.
z1 z2 =
0 z1 0 z2
1 − Area
1 − Area
Polytechnic University of the
Philippines College of Science
Patterns for Finding Areas under a Standard Normal Curve
Using Table 1
D. Area to the right of a positive z value or to the left of a
negative z value.
=
0 z1 0 0 z1
Area = 1
= 0 z1 Area =0 0.50
0 z1
Using Table 2
A. Area to the right of a positive z value or to the left of a
negative z value.
Use Table 2 directly
z1 0 0 z1
B. Area between z values on same side of 0.
= 0 z1
z1 z2 0 z2
= + z1 0
z1 0 z2 0.50 −0 zAr
2
Using Table 2
D. Area to the right of a negative z value or to the left of a
positive z value.
= +
z1 0 Area =0 0.50
z1 0
0.50 − Area
=
Area =0 0.50
0 z1 0 z1
X
450 510 570
Polytechnic University of the Philippines
College of Science
560
Department of Mathematics and Statistics
Example 2:
A pediatrician obtains the heights of her three-year-old female
patients. The heights are approximately normally distributed,
with mean 38.72 inches and standard deviation 3.17 inches.
Determine the proportion of the three-year-old females that
have a height less than 35 inches.
Solution:
Given: μ σ and x
Step 1: Draw a normal curve and shade
Polytechnic University of the Philippines College of Science
X
35.55 38.72 41.89
College of Science
Using Table 2 By-hand Approach!
Step 2: Convert the value of x to a z-score.
P(X < 35) = P (Z < z) Area = P(Z < − 1.17) = 0.1210
Z < 35 − 38.72
=P
3.17
( )
= P(Z < − 1.17)
= 0.1210
−2 −10 1 2 Z
−1.17
Use “TRUE”
for cumulative
since we want
the area under
the norma
l curve.
Example 3:
A pediatrician obtains the heights of her three-year-old female
patients. The heights are approximately normally distributed,
with mean 38.72 inches and standard deviation 3.17 inches.
Determine the probability that a randomly selected three-year-
old girl is between 35 and 40 inches tall, inclusive.
Solution:
Given: μ σ , and X
X
35.55 38.72 41.89
Polytechnic University of the Philippines 35 40
College of Science
Using Table 1 By-hand Approach!
Step 2: Convert the value of x to a z-score.
P(35 ≤ X ≤ 40) = P(z ≤ Z ≤ z)
35 − 38.72 40 − 38.72
= P ( 3.17 ≤Z≤
≤ 0.40) 3.17 )
=P
(−1.17 ≤ Z
= P(Z ≤ 0.40) − [1 − P(Z ≥ − 1.17)]
= 0.6554 − [1 −
Area = P(−1.17 ≤ Z ≤ 0.40)
0.8790]
= 0.6554 − 0.1210
= 0.5344
The probability a randomly
selected three-year-old female
is between 35 and 40 inches tall X
−2 −1 0 1 2
is 0.5344.
−1.17 0.40
X
−2 −1 0 1 2
ACTIVITIES/ASSESSMENTS:
2. What features o f t h e ‘ G o o d P r e s e n t a t i o n ’ make i t better tha n
A.
B.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
3. Review the table and consider questions such as the
following.
Needs
Origin / Rating Poor Improvement Satisfactory V Good Excellent Total
ACTIVITIES/ASSESSMENTS:
5. The length of life of an instrument produced by a machine has a normal
distribution with a mean of 12 months and standard deviation of 2 months.
Find the probability that an instrument produced by this machine will last
A. less than 7 months.
B. between 7 and 12 months.
Be sure to draw a normal curve with the area corresponding to the
probability shaded.
6. The lengths of human pregnancies are approximately normally distributed,
with mean μ days and standard deviation days.
What proportion of pregnancies lasts more than 270 days?
B. What proportion of pregnancies lasts less than 250 days?
C. What proportion of pregnancies lasts between 240 and 280 days?
D. What is the probability that a randomly selected pregnancy?
lasts more than 280 days?
Be sure to draw a normal curve with the area corresponding to the
probability shaded.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
7. Construct frequency distribution table based on the
scores of 75 randomly selected students.
37 46 37 26 30 41 28 49 29 34 46 50 38 35 42
35 46 45 27 41 26 45 39 43 46 36 32 46 36 48
49 47 30 43 31 34 38 41 39 45 28 43 37 39 26
38 30 29 38 26 31 42 44 48 43 37 46 38 27 50
42 33 42 42 43 39 39 31 46 46 48 48 50 45 31
Scores Frequency Percentage (%)
26 to 30
31 to 35
36 to 40
41 to 45
46 to 50
Total
Polytechnic University of the
Philippines College of Science
ACTIVITIES/ASSESSMENTS:
A. Based on the frequency distribution, compute measures of
central tendency, measures of variation, Q1 D9 P , Skewness
and kurtosis.
B. Based on the raw data, compute measures of central
tendency, measures of variation, Skewness and kurtosis using
Excel.
C. Compute Skewness and kurtosis of grouped and ungrouped
data. Make sure to describe the shape of the distribution
D. Do you think that computed value for grouped and
ungrouped data are the same?
8. Begin with the following set of data, call it Data Set I.
5, −2, 6, 14, −3, 0, 1, 4, 3, 2, 5
A. Compute the sample standard deviation and sample mean of
Data Set I.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
B. Form a new data set, Data Set II, by adding 3 to each
number in Data Set I. Calculate the sample standard deviation
and sample mean of Data Set II.
C. Form a new data set, Data Set III, by subtracting 6 from
each number in Data Set I. Calculate the sample standard
deviation and sample mean of Data Set III.
D. Comparing the answers to parts (a), (b), and (c), can you
guess the pattern? State the general principle that you expect
to be true.
References
https://prezi.com/rirrca9ckuiz/textual-
presentation-of-data/
https://www.toppr.com/guides/economics/
presentation-of-data/textual-and- tabular-
presentation-of-data/
Statistics. Informed Decision using Data by
Michael Sullivan, III,. Fifth Edition
What is HYPOTHESIS?
•
A statement or claim regarding a characteristic of
one or more populations.
•
A preconceived idea, assumed to be true but has to
be tested for its truth or falsity.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Reminders:
If you are conducting a research study and you want
to use a hypothesis test to support your claim, the
claim must be stated in such a way that it becomes
the alternative hypothesis, so it cannot contain
the condition of equality.
Right tailed
◆
Example:
H0: The defendant is innocent.
Ha: The defendant is not innocent.
Answer:
A type I error is like putting an innocent person in
jail.
A type II error is like letting a guilty person go free.
Reminders:
It is important to note that we want to set
( α ) before we start our study because the
Type I error is the more ‘grevious’ error to
make.
The smaller (α ) is, the smaller the
region of rejection.
◆
Using confidence interval
◆
Using p-value approach
◆
Using traditional method
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Decision Rule:
◆ Using Confidence Interval
Rejection of
region or critical
region is the set of
all values of the test
s t at isti c which will
lead to the
rejection of H0.
Acceptance Region
is the set of all values
of the test statistic that
leads the researcher to
retain H0.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
-2 0
2 -2 0 2
Two-tailed
Ha : μ1 ≠ μ2
Rejection Region
Rejection Region
STEP 1:
Rearrange the data in ascending order.
U s e" = D E V S Q () ”
function in excel
PolPytoelcyhtneicchUnniivcerUsintyivoef rthseitPyhoilfiptphineesPhilippines ColCleogelloefgSecioefncSecience DeDparetpmaernttmofeMntatohfemMataictshaenmdaSttiactsistaicnsd Statistics
m
STEP 3: Calculate b as follows: b = ai (xn+1−i − xi)
∑
i=1
n is the number of
observation
If n is even:
n
m=
2
If n is odd:
n−1
m= 2
Since n is even in this
example, m=8. That’s
Department of Mathematics and
Polytechnic University of the Philippines
College of Science why we used a1 to a8
Note that if n is odd, the median data value is not used in the ca
Result
Inferential Statistics
1. Parametric Tests
◆
Assume underlying statistical distributions in the data.
Therefore, several conditions of validity must be met
so that the result of a parametric test is reliable.
◆
Apply to data in ratio scale, and some apply to data in
interval scale.
2. Non Parametric Test
◆
Refer to a statistical method in which the data is not
required to fit a normal distribution.
◆
Most non-parametric tests apply to data in an ordinal
scale, and some apply to data in nominal scale.
Polytechnic University of the
Philippines College of Science
Inference About Two Means
To perform inference on the difference of two
population means, we must first determine whether the
data come from an independent or dependent
sample.
Example:
Determine whether the sample is independent or dependent.
1. An urban economist believes that commute times to
work in the South are less than commute times to work
in the Midwest. He randomly selects 40 employed
individuals in the south and 45 employed individuals in
the Midwest and determines their commute times.
Answer: Independent
2. In an experiment conducted in biology class, Prof.
Rhea measured the time required for 12 students to
catch a failing meter stick using their dominant hand
and nondominant hand. The goal of the study was to
determine whether the reaction time in an individual’s
dominant hand is different from the reaction time in
the non dominant hand. Answer: Dependent
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Example:
Determine whether the sample is independent or
dependent.
3. A researcher wants to know if the mean
length of stay in for-profit hospitals is different
from the mean length of stay in not-for-profit
hospitals. He randomly selected 20 individuals in
the for-profit hospital and matched them with 20
individuals in the not-for-profit by diagnosis.
Answer:
Dependent
Polytechnic University of the
Philippines College of Science
Dependent Sample t - Test
The dependent sample t-test (also
called the paired t-test or paired-
samples t-test) compares the means of two
related groups to determine whether there
is a statistically
significant difference between these
means.
H0 : μ1 ≥ μ2 and Ha : μ1 < μ2
H0 : μ1 ≤ μ2 and Ha : μ1 > μ2
H0 : μ1 = μ2 and Ha : μ1 ≠ μ2
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Assumptions
1. Your dependent variable should be measured
at the interval or ratio level (i.e., they are
continuous).
2. Your independent variable should consist of two
categorical, "related groups" or "matched pairs”.
3. There should be no significant outliers in the
differences between the two related groups.
4. The distribution of the differences in the
dependent variable between the two related
groups should be approximately normally
distributed.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Example:
A teacher is interested to know if the new learning program
will help to increase the number of correct remembered
words. 10 Subjects learn a list of 50 words. Learning
performance is measured using a recall test.
After the first test all subjects
are instructed how to use the
learning program and then
learn a second list of 50 words.
Learning performance is again
measured with the recall test. In
the following table the number
of correct remembered words
are listed for both tests.
Polytechnic University of the
Philippines College of Science
1. State the Null and Alternative
Hypothesis
Null hypothesis: H : μ ≥ μ
o 1 2
Dependent Variable:
Number of correct remembered words
Independent Variable:
Treatment (Before and After)
Reject Ho
6. Draw Conclusion
There is sufficient evidence to support that the
new learning program help to increase the number
of correct remembered words.
Result
Example:
Researchers wanted to know whether there was a difference in
comprehension among students learning a computer program
based on the style of the text. They randomly divided 18
students into two groups of 9 each. The researchers verified
that the 18 students were similar in terms of educational level,
age, and so on. Group 1 individuals learned the software using
v i s ual m a nu a l ( m ul t i m o dal
instruction), while Group 2
individual learned the software
using textual manual (Unimodal
instruction). The following data
represent scores the students
received on an exam given to them
they studied from the manuals.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Dependent Variable:
Scores
Independent Variable:
Style of the Text (Visual and Textual)
Failedto
Since we failed to reject Ho, we will proceed to t-test: Two
Reject Ho
PS a m p l e A s s u m ing Equal Variances.
ol y te c h n ic U niv e rsity of th e P hi lipp ines
College of Science
Department of Mathematics and Statistics
Failedto
Reject Ho
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
6. Draw Conclusion
There is no enough evidence to support that
there is a difference in comprehension among
students learning a computer program based
on the style of the text.
Proper Presentation of Results
PolytechnicCollege o Departm
University of the Philippines
f Science
ent of Mathematics and Statistics
Polytechnic
C University of the Philippines
Department
ollege of Science
of Mathematics and Statistics
Assumptions
1. Your dependent variable should be measured at the
interval or ratio level (i.e., they are continuous).
2. Your independent variable should consist of two or more
categorical, independent groups.
3. You should have independence of observations,
which means that there is no relationship between the
observations in each group or between the groups
themselves.
4. There should be no significant outliers.
5. Your dependent variable should be approximately
normally distributed for each category of the independent
variable.
6. There needs to be homogeneity of variances.
Polytechnic University of the
Philippines College of Science
Example:
A Researchers wanted to compare math test scores of
students at the end of secondary school from various
cities. Eight randomly selected students from Makati,
Manila, and Quezon City each were administered the
same exam; the results are presented in the following
table. Can the researchers conclude
that the distribution of
exam scores is different
for each city at the
level of significance?
Dependent Variable:
Mathematics Scores
Independent Variable:
Cities (Makati, Manila, Quezon City)
Since we are comparing the means of one
independent variable that consist of two
or more categorical groups, we will use
the one-way ANOVA.
Polytechnic University of the
Philippines College of Science
Click “Data”, then click “Data Analysis”
Failedto
Reject Ho
E q u a l
Variances
Polytechnic University of the Philippines Assumed
College of Science
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Failedto
Reject Ho
E q u a l
Variances
Polytechnic University of the Philippines College of Science
Department of Mathematics and Statistics
Assumed
Failedto
Reject Ho
Equal
Variances
Polytechnic University of the Philippines College of Science
Department of Mathematics and Statistics
Assumed
Reject Ho
6. Draw Conclusion
There is enough evidence to support that the
distribution of exam scores of students in
mathematics is different for each city.
Result
Features of r
•
Unit free
•
Range between -1 and 1
•
The closer to -1, the stronger the negative
linear relationship.
•
The closer to 1, the stronger the positive
linear relationship.
•
The closer to 0, the weaker the linear
relationship.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Y Y Y
X
r = -1 r = -.6 X X
Y Y r =0
r = .6 r = 1
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Reminders:
• Correlation does not imply causation.
• Watch out for hidden (lurking) variables.
Lurking Variable
• A variable that is not included as an explanatory
or response variable in the analysis but can
affect the interpretation of relationships between
variables.
• Can falsely identify a strong relationship
between variables or it can hide the true
relationship.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Assumptions
1. Your two variables should be measured at the
interval or ratio level (i.e., they are
continuous).
2. There is a linear relationship between your
two variables.
3. There should be no significant outliers.
4. Your variables should be approximately
Polytechnic University of the
Philippines College of Science
normally distributed.
Test Statistic:
df
t = r 1 − r2
where:
df = degrees of freedom
r = correlation coefficient of
Note:
df = n − 2
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Example:
A dietetics student wanted to look at the
relationship between calcium intake and
knowledge about calcium in sports
science students. Table shows the data
she collected. Is there a relationship
between calcium intake and knowledge
about calcium in sports science
students?
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Dependent Variable:
Calcium Intake
Independent Variable:
Knowledge about Calcium
df = n − 2
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Result
PolPytoelcyhtneicchUnniivcerUsintyivoef rthseitPyhoilfiptphineesPhilippines ColCleogelloefgSecioefncSecience DeDparetpmaernttmofeMntatohfemMataictshaenmdaSttiactsistaicnsd Statistics
Reject Ho
Polytechnic University of the
Philippines College of Science
6. Draw Conclusion
There is sufficient evidence to conclude that there
is significant relationship between the calcium
intake and knowledge about calcium in sports
science students.
Proper Presentation of Results
Exercises:
Apply the procedure in testing the hypothesis.
Result
◆
Used to discover if there is association
between two categorical variables.
◆
Used when you want to decide whether
two variables are independent or
dependent.
◆
A contingency table will be constructed.
Assumptions
1. There are 2 variables, and both are measured as
categories, usually at the nominal level.
2. The two variables should consist of two or more
categorical, independent groups.
3. The data in the cells should be frequencies, or counts
of cases rather than percentages or some other
transformation of the data.
4. For a 2 by 2 table, all expected frequencies > 5.
5. For a larger table, all expected frequencies > 1
and no more than 20% of all cells may have expected
frequencies < 5.
Reminders:
The word contingency refers to
dependence, but this is only a
statistical dependence and cannot be
used to establish a direct cause-and-
effect link between the two variables
in question.
Example:
Educators are always looking for novel ways in
which to teach statistics to undergraduates as part
of a non-statistics degree course (e.g., psychology).
With current technology, it is possible to present
how-to guides for statistical programs online
instead of in a book. However, different people
learn in different ways. An educator would like to
know whether gender (male/female) is associated
with the preferred type of learning medium (online
vs. books). Use “Data_Example and Exercises
Polytechnic University of the
Philippines College of Science
file”.
Polytech
nic University
College oof Departm
the Philippines
f Science
ent of Mathematics and Statistics
Polytechnic University of the Philippines College of Science
Department of Mathematics and Statistics
Polytechnic University of the Philippines College of Science
Department of Mathematics and Statistics
Row Total
Department of Mathematics and Statistics
Grand Total
Column Total
6. Draw Conclusion
There is sufficient evidence to conclude that there
gender is associated with the preferred type of
learning medium.
Proper Presentation of Results
Result
ACTIVITIES/ASSESSMENTS:
Determine whether the sampling is dependent or independent.
1. A researcher wishes to compare academic
aptitudes of married mathematicians and their spouses. She
obtains a random sample of 287 such couples who take an
academic aptitude test and determines each spouses academic
aptitude.
2. A political scientist wants to know how a random
sample of 18- to 25-year-olds feel about Democrats and
Republicans in Congress. She obtains a random sample of
1030 registered voters 18 to 25 years of age and asks, Do you
have favorable/unfavorable opinion of the Democratic/
Republican party? Each individual was asked to disclose his
or her opinion about each party.
ACTIVITIES/ASSESSMENTS:
Solve the following problems. Make sure to follow the 6 steps
procedure.
1. A study is designed to test whether there is a difference in mean daily
calcium intake in adults with normal bone density, adults with
osteopenia (a low bone density which may lead to osteoporosis) and
adults with osteoporosis. Adults 60 years of age with normal bone
density, osteopenia and osteoporosis are selected at random from
hospital records and invited to participate in the study. Each
participant's daily calcium intake is measured based on reported food
intake and supplements. The data are shown below.
Is the re a s tati s t i c a l l y significant difference in Normal Bone Osteopenia Osteoporosis
mean calcium intake in patients with normal bone 1200 1000 890
density as compared to patients with osteopenia and 1000 1100 650
osteoporosis? 980 700 1100
Polytechnic University of the Philippines College of Science 900 800 900
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
Head
4. A pediatrician wants to Height
Circumference
determine the relation that may exist between a child’s (inches)
(inches)
height and head circumference. She randomly selects 27.75 17.5
eleven 3-yearold children from her practice, 24.5 17.1
measures their heights and head circumference, and 25.5 17.1
obtains the data shown in the table below. 26 17.3
25 16.9
27.75 17.6
26.5 17.3
27 17.5
Polytechnic University of the Philippines
College of Science
26.75 17.3
Department of Mathematics and Statistics
26.75 17.5
27.5 17.5
ACTIVITIES/ASSESSMENTS:
5. The following data represent the smoking status from a
random sample of 1054 U.S. residents 18 years or older by
level of education.
No. Of Years Smoking Status
of Education Current Former Never
Less than 12 178 88 208
12 137 69 143
13 - 15 44 25 44
16 or more 34 33 51
References
https://wolfweb.unr.edu/homepage/ania/
stat352f12lectures/352lecture21f12.pdf
Statistics. Informed Decision using Data by
Michael Sullivan, III,. Fifth Edition
http://www.real-statistics.com/tests-normality-and-
symmetry/statistical-tests- normality-
symmetry/shapiro-wilk-test/