Definition of Terms 1. Statistics
Definition of Terms 1. Statistics
1. Statistics
2. Descriptive Statistics
It utilizes numerical and graphical methods to look for patterns in the data
set
3. Inferential Statistics
4. Mean
It is the sum of all items in a set of data divided by the number of items. It
is also known as arithmetic average.
5. Median
The median is, represented by Md, is the value of the middle term when
data are arrange in either ascending or descending order. Hal of the terms are
located above the median, while the other half below the median. It is affected
by the number of items and nit by the size of extreme values.
6. Mode
8. Nominal data
9. Ordinal data
It is a categorical, statistical data type where the variables have natural,
ordered categories and the distances between the categories is not known. It
has a ranking.
This is what you are measuring in the experiment and what is affected
during the experiment. The dependent variable responds to the independent
variable. It is called dependent because it "depends" on the independent
variable.
18. Sample
Formula: x̄=
∑x
n
n= number of items
Problem
Find the average sale of bananacue in a school canteen if the daily sales are
as follows:
Solution
x̄=
∑x
n
Php . 353.25+ Php. 220.75+ Php .347.00+ Php . 210.50+ Php. 193.50
=
5
1325
=
5
x̄ = 265
Problem
n= 50 fx= 4135
Solution:
x̄=
∑ fx
n
4135
= 50
2. Take note of the items in the middle position. If there is an odd number of an item,
the middle item is the median. If there is an even number of items, the median is
taken as the arithmetic mean of the two values falling in the middle.
Problems
A. The numbers of books borrowed from the library during each day of the week
were 36, 31, 24, 45, and 50. What is the median?
Solution:
Arrange the numbers as 24, 31, 36, 45, and 50. Since there are 5
items, the middle item is 36. Thus, the median is 36.
B. The numbers of books borrowed from the library during another week from
Monday to Saturday were 36, 31, 24, 25, 50, and 47. What is the median?
Solution:
Arrange the numbers as 24, 25, 31, 36, 47, and 50. In this case, there
are two middle numbers: 31 and 36. The median is the average of the middle
numbers, that is,
31+36
Md= =33.5
2
[ ]
Md = L + 2
−F
f
i
F= “less than” or “equal to” cumulative frequency preceding the class interval
containing the median
Problem
i= 5 n= 100
n
Solution: n = 100; = 50; L = 79.5; F = 42; f = 25; i = 5
2
50−42
Md = 79.5 + [ 25 ]5
= 79.5 + 1.6
Md = 81.1
Example:
In a grouped distribution, the class interval where the value with the highest
frequency is the modal class. The midpoint of the class interval is the mode.
Formula:
d
[ ]
1
Mo = Lmo + d +d i
1 2
Example:
Consider the distribution of the weekly wages of the factory workers in KRNRD
Garments Factory. Where is the highest frequency in the distribution located? What
is the modal class in the distribution?
Weekly Wages (in Php.) No. of Workers
1,380 - 1,399 4
1,360 - 1,379 6
1,340 – 1,359 12
1,320 – 1,339 modal class 31
1,300 – 1,319 24
1,280 – 1,299 15
1,260 – 1,279 11
1,240 - 1,259 8
d1
Mo = Lmo + [ ]
d 1 +d 2
i
31−24
= 1,319.5 + [ ( 31−24 ) +(31−12) ]
20
7
= 1,319.5 + [ 7+ 19 ]20
7
= 1,319.5 + [ ]2026
140
= 1,319.5 + 26
= 1,319.5 + 5.38
Mo = 1,324.88
2 2
√
s= : n ∑ x −( ∑ x)
n(n−1)
where: s- standard deviation
∑ x - summation of x
Example:
Calculate the standard deviation of the given scores in an Algebra quiz: 18,
20, 22, 15, 16, 12, 17, 21, 10, 19.
X X2
18 324
20 400
22 484
15 225
16 256
12 144
17 289
21 441
10 100
19 361
∑ X = 170 ∑ X 2= 3024
2 2
n ∑ x −( ∑ x )
s=
n (n−1)
= √ 10 ( 3024 )−¿ ¿ ¿
30240−28900
=
√ 90
1340
=
√ 90
=√ 14.89
s= 3.86
∑ f d2
s=
√ n
Example
Using the data of The Arts and Craft Shop shown below, calculate the
standard deviation.
Amount in F X d d2
Pesos days midpoint fx (deviation f d2
)
172-180 3 176 528 25 625 1875
163-171 5 167 835 16 256 1280
154-152 9 158 1422 7 49 441
145-153 12 149 1788 -2 4 48
136-144 5 140 700 -11 121 605
127-135 4 131 524 -20 400 1600
118=126 2 122 244 -29 841 1680
n= 40 ∑ fx =¿ ∑ f d 2= 7531
6041
Step 1: Prepare the frequency distribution with appropriate class intervals and write
the corresponding frequency (f).
Step 3: Multiply the (f) at the midpoint (x) of each interval to get fx.
x̄=
∑ fx
n
6041
=
40
= 151.03 or 151
Step 6: Calculate the deviation (d) by subtracting the mean (x̄) from each midpoint
(x). Thus, d= x - x̄.
2
Step 8: Multiply the frequency (f) and d 2. Find the sum of each product to get ∑ f d .
∑ f d2
s=
√ n
7531
=
√ 40
=√ 188.275
s = 13.72
Problem: The data are collected to determine the difference between the
supplies of laundry detergent and fabric conditioner in a laundry shop during
the past 6 months.
Laundry X 21 Fabric X 22
x́ 1−x́2
2 2
Formula: t= ( n1 −1 ) S 1 + ( n2−1 ) S 2 1
√ n1+ n2−2
(
n1
+
1
n2
)
SD= n ∑ x 2−¿ ¿ ¿ ¿
34.2−28.3
= ( 5 ) 194.17+ ( 5 ) 146.67 1 1
√ 6+6−2 ( )
+
6 6
5.9
= 970.85+733.35
√ 10
.333333333
5.9
5.9 5.9
t= 1704.2 = = = 0.7828
√ 10
(0.333333333) √ 56.8066666099 7.53701974323141
V. Interpretation
Since the computed t-value of 0.7828 is less than the tabular t- value of
2.228, at 5% level of significance, df=10 the null hypothesis is therefore
accepted, thus there is no difference between the supplies of laundry
detergent and fabric conditioner in a laundry shop for the past 6 months.
IV. Computation
d
Formula: d́=
n
2
√
Sd= n (∑ d )−¿ ¿¿ ¿
d́
t= sd
√n
6
Solution: d́= = d́= 1
6
1
1
t= 5.5678 = = .4399
2.2730
√ 6
V. Interpretation
Since the computed t-value of 0.4399 is less than the tabular t- value of 2.571,
at 5% level of significance, df=5 the null hypothesis is therefore accepted, thus there
is no difference between the number of produced computers in 2019 and 2020 of
Cybernetic Company..
F-test ANOVA
Problem: The data below are the 5 month sales of three branches of Oriental
Milk tea in Oriental Mindoro.
IV. Computation
(r)2
Formula: SST= ∑ ∑ x2−
N
(t . j) r2
SSt r = ∑
[ ]
(m. j) N
−
[ ]
SSt r = (200) +(225) +(150) − 330625 = [ 22625 ] −22041.7 = 583.3
5 21
2.1211
Error 1650 (N-k) =12 1650
= 137.5
12
Total 2233.3 (N-1) =14
V. Interpretation
Since the computed ƒ -value of 2.1211 is less than the tabular ƒ - value of
3.88, at 5% level of significance, df= 2, 12; the null hypothesis is therefore accepted,
thus there is no difference in the sales of three branches of Oriental Milk Tea in
Oriental Mindoro for the first 5 months of its operations.
CHI-SQUARE
Problem: A public opinion poll surveyed a simple random sample of 1000 voters.
Respondents were classified by gender (male or female) and by voting preference
(Republican, Democrat, or Independent). Results are shown in the table below.
IV. Computation
Formula: E=
∑ row total x column total
total total
(O−E)2
x 2=∑
E
Whereas: O =the frequency Observed
E= the frequency Expected
∑= the ‘sum of’
Observed Frequency
Republican Democrat Independent Row Total
Female 200 150 50 400
Male 250 300 50 600
Column Total 450 450 100 1000
Expected Frequency
Republican Democrat Independent Row Total
Female 180 180 40 400
Male 270 270 60 600
Column Total 450 450 100 1000
CHI-SQUARE
Republican Democrat Independent Row Total
Female 2.22222222222 5 2.5 9.7222222222
Male 1.48148148148 3.33333333333 1.66666666667 6.48148148148
3
Column 3.7037037037 8.33333333333 4.16666666667 16.2037037037
Total 3
V. Interpretation
3 90
1 80
2 85
4 93
1.5 83
2.5 87
4 95
2 85
5 97
3 90
IV. Computation
Fomula:
r= n ¿ ¿
Solution:
X Y X2 Y2 XY
3 90 9 8100 270
1 80 1 6400 80
2 85 4 7225 170
4 95 16 9025 380
1.5 83 2.25 6889 124.5
2.5 87 6.25 7569 217.5
4 95 16 9025 380
2 85 4 7225 170
5 97 25 9409 485
3 90 9 8100 270
25740−24836
=
√ [ 925−784 ][ 789670−786769 ]
904
=
√(141)(2901)
904
=
√ 409041
904
= 639.5631
=1.4135
V. Interpretation
1. Since the computed r-value of 1.4135 is greater than the tabular
t-value of .632, df=8 at 5% level of significance, the null
hypothesis is therefore rejected, thus there is a relationship
between the number of hours in studying and the scores that a
student makes on an exam.
Spearman’s Rho
Problem: Problem: Red Marasigan, an AB History student wants to examine
the relationship between the amount of time spent studying for an exam (X) in
hours and the score that a student makes on an exam (Y). Data are shown
below.
3 87
1 81
2 85
4 93
1.5 84
2.5 88
4 92
2 86
5 98
3 89
Formula:
P = 1-
∑ d2
n(n−2)
Solution:
X Rank Y Rank D d2
3 4.5 87 6 -1.5 2.25
1 10 81 10 0 0
2 7.5 85 8 -.5 .25
3.5 2.5 93 2 .5 .25
1.5 9 84 9 0 0
2.5 6 88 5 1 1
4 2 92 3 -1 1
2 7.5 86 7 .5 .25
5 1 98 1 0 0
3 4.5 89 4 .5 .25
∑ d 2=
5.25
Substitute:
P = 1-
∑ d2
n(n−2)
5.25
= 1-
10(10−2)
5.25
= 1- 80 = 1- .065625 = .934375
V. Interpretations
1. Since the computed rho value of .934375 is greater than the
tabular value of .564 at 5% level of significance, df= 10, the null
hypothesis is therefore rejected , thus there is a relationship
between the number of hours in studying and the scores that a
student makes on an exam.
2. Since the computed rho value is .934375, we can say that there
is a moderate relationship between the number of hours in
studying and the scores that a student makes on an exam.
3. Since the computed rho value is .934375, we can say that there
is a positive relationship between the number of hours in
studying and the scores that a student makes on an exam.
Regression Analysis
X1 X2 Y X 21 X 22 Y2 X1 X2 X1 Y X2 Y
3 2 11 9 4 121 6 33 22
1 1 8 1 1 64 1 8 8
4 3 9 16 9 81 12 36 27
2 2 10 4 4 100 4 20 20
5 2 12 25 4 144 10 60 24
8 32 64
78 81
70 30
14 56
45 74
∑ X 1 ∑ X 2 ∑ Y = ∑ X 21 ∑❑
= = =