0% found this document useful (0 votes)
198 views25 pages

Definition of Terms 1. Statistics

1. The document defines various statistical terms including descriptive statistics, inferential statistics, measures of central tendency (mean, median, mode), standard deviation, types of data (nominal, ordinal, interval, ratio), and sampling methods (simple random sampling, systematic random sampling, cluster sampling, stratified random sampling). 2. Examples are provided to demonstrate calculating the mean, median, and mode for both ungrouped and grouped data. 3. Formulas and step-by-step solutions are shown for problems involving descriptive statistics such as finding the mean, median, and mode.

Uploaded by

Jane Bermoy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
198 views25 pages

Definition of Terms 1. Statistics

1. The document defines various statistical terms including descriptive statistics, inferential statistics, measures of central tendency (mean, median, mode), standard deviation, types of data (nominal, ordinal, interval, ratio), and sampling methods (simple random sampling, systematic random sampling, cluster sampling, stratified random sampling). 2. Examples are provided to demonstrate calculating the mean, median, and mode for both ungrouped and grouped data. 3. Formulas and step-by-step solutions are shown for problems involving descriptive statistics such as finding the mean, median, and mode.

Uploaded by

Jane Bermoy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Definition of Terms

1. Statistics

It is the branch of mathematics concerned with the techniques by which


information is collected, organized, analysed, and interpreted.

2. Descriptive Statistics

It utilizes numerical and graphical methods to look for patterns in the data
set

3. Inferential Statistics

It is one of the two categories of statistics that concerns with treatment of


data leading to predictions or inferences concerning a larger group of data. It
draws conclusions like decisions, predictions or generalization about the data
set.

4. Mean
It is the sum of all items in a set of data divided by the number of items. It
is also known as arithmetic average.

5. Median

The median is, represented by Md, is the value of the middle term when
data are arrange in either ascending or descending order. Hal of the terms are
located above the median, while the other half below the median. It is affected
by the number of items and nit by the size of extreme values.

6. Mode

It is referred to as the most frequently occurring value in a given set of


data. In the distribution the element of measure which is repeated the most
number of items is the mode. When the highest frequency corresponds to two
elements or two measures, the distribution is said to be bimodal. When the
distribution has more than two modes, it is said to be multimodal. It is also
possible that a mode may not exist at all.
7. Standard Deviation

It is the measure of variation of a set of data in terms of the amounts by


which the individual values differ from their mean. It is considered the most
stable measure of spread, and is usually preferred in experimental and
research studies where in-depth statistical analysis of data is involve. It is
affected by the value of each data.

8. Nominal data

It is a type of data that is used to label variables without providing any


quantitative value. 

9. Ordinal data
It is a categorical, statistical data type where the variables have natural,
ordered categories and the distances between the categories is not known. It
has a ranking.

10. Interval data

It is a type of data which is measured along a scale, in which each point is


placed at an equal distance (interval) from one another.

11. Ratio data

It is defined as a variable measurement scale that not only produces the


order of variables but also makes the difference between variables known
along with information on the value of true zero. 

12. Simple random sampling

It is a procedure where a sample is selected in such a way that every


element is as likely to be selected as any other element from the population.

13. Systematic random sampling

It is a sampling procedure with a random at a start.


14. Cluster sampling

It is a probability sampling technique where researchers divide the


population into multiple groups (clusters) for research.

15. Stratified random Sampling

It is specifically used when the population can be naturally be classified


into groups or data.

16. Independent Variable

It is a variable that is being manipulated in an experiment in order to


observe the effect on a dependent variable.

17. Dependent Variable

This is what you are measuring in the experiment and what is affected
during the experiment. The dependent variable responds to the independent
variable. It is called dependent because it "depends" on the independent
variable. 

18. Sample

It is any subset of elements drawn by some appropriate method from a


defined population.

19. Null hypothesis

The hypothesis that there is no significant difference between specified


populations, any observed difference being due to sampling or experimental
error. It is represented using H0.

20. Alternative Hypothesis

It is the hypothesis that defines a statistically important relationship between two


variables. It is denoted using the symbol Ha or H1
SAMPLE PROBLEMS FOR DESCRIPTIVE STATISTICS

 MEAN OF UNGROUPED DATA

Formula: x̄=
∑x
n

where: ∑x= sum of the item values

n= number of items

Problem

Find the average sale of bananacue in a school canteen if the daily sales are
as follows:

Monday - Php. 353.25

Tuesday -Php. 220.75

Wednesday -Php. 347.00

Thursday -Php. 210.50

Friday -Php. 193.50

Solution

x̄=
∑x
n

Php . 353.25+ Php. 220.75+ Php .347.00+ Php . 210.50+ Php. 193.50
=
5

1325
=
5

x̄ = 265

 MEAN OF GROUPED DATA using the long method


Formula: x̄=
∑ fx
n

where: f – frequency of class interval

n – midpoint of class interval (presumed to be the mean of the values


grouped under this interval).

Problem

Calculate the mean grade of 50 students in Statistics.

Class Interval f x(midpoint) Fx


90-94 7 92 644
85-89 13 87 1131
80-84 16 82 1312
75-79 8 77 616
70-74 6 72 432

n= 50 fx= 4135

Solution:

x̄=
∑ fx
n

4135
= 50

x̄= 82.7 or 83, the mean grade

 MEDIAN FOR UNGROUPED DATA

In computing for the median, it is important to remember the following.


1. Arrange the data in the array of descending or ascending order.

2. Take note of the items in the middle position. If there is an odd number of an item,
the middle item is the median. If there is an even number of items, the median is
taken as the arithmetic mean of the two values falling in the middle.

Problems

A. The numbers of books borrowed from the library during each day of the week
were 36, 31, 24, 45, and 50. What is the median?
Solution:
Arrange the numbers as 24, 31, 36, 45, and 50. Since there are 5
items, the middle item is 36. Thus, the median is 36.

B. The numbers of books borrowed from the library during another week from
Monday to Saturday were 36, 31, 24, 25, 50, and 47. What is the median?
Solution:
Arrange the numbers as 24, 25, 31, 36, 47, and 50. In this case, there
are two middle numbers: 31 and 36. The median is the average of the middle
numbers, that is,

31+36
Md= =33.5
2

 MEDIAN FOR GROUPED DATA


Formula:

[ ]
Md = L + 2
−F
f
i

where: L= exact lower limit of median class

n= total number of items

F= “less than” or “equal to” cumulative frequency preceding the class interval
containing the median

f= frequency of the median class

i= size of the class interval

Problem

(f) (L) (F)


Scores Frequency Exact Lower Limit Cumulative Cumulative
or Lower Frequency Percent
Boundary

95-99 5 94.5 100 100.0


90-94 11 89.5 95 95.0
85-89 17 84.5 84 84.0
80-84 25 79.5 67 67.0
75-79 20 74.5 42 42.0
70-74 12 69.5 22 22.0
65-69 7 64.5 10 10.0
60-64 3 59.5 3 3.0

i= 5 n= 100

n
Solution: n = 100; = 50; L = 79.5; F = 42; f = 25; i = 5
2

50−42
Md = 79.5 + [ 25 ]5
= 79.5 + 1.6

Md = 81.1

 MODE FOR UNGROUPED DATA

In mode for ungrouped data, there is no calculation required, just counting,


and it can be determined for qualitative as well as quantitative data.

Example:

A. The size of 15 classes selected at random are:


40, 39, 42, 48, 45, 46, 42, 49, 43, 42, 41, 44, 38, 42, and 47
The mode is 42 because it is the measure that occurs the most number
of times.
B. The size of 15 families in a barangay chosen at random are:
8, 7, 4, 6, 12, 6, 7, 6, 8, 10, 7, 8, 5, 3, 4
The modes are 6, 7, and 8. The distribution is multimodal.

 MODE FOR GROUPED DATA

In a grouped distribution, the class interval where the value with the highest
frequency is the modal class. The midpoint of the class interval is the mode.

Formula:

d
[ ]
1
Mo = Lmo + d +d i
1 2

Example:

Consider the distribution of the weekly wages of the factory workers in KRNRD
Garments Factory. Where is the highest frequency in the distribution located? What
is the modal class in the distribution?
Weekly Wages (in Php.) No. of Workers
1,380 - 1,399 4
1,360 - 1,379 6
1,340 – 1,359 12
1,320 – 1,339 modal class 31
1,300 – 1,319 24
1,280 – 1,299 15
1,260 – 1,279 11
1,240 - 1,259 8

Substituting the values

d1
Mo = Lmo + [ ]
d 1 +d 2
i

31−24
= 1,319.5 + [ ( 31−24 ) +(31−12) ]
20

7
= 1,319.5 + [ 7+ 19 ]20
7
= 1,319.5 + [ ]2026

140
= 1,319.5 + 26

= 1,319.5 + 5.38

Mo = 1,324.88

 STANDARD DEVIATION for Ungrouped Data

To find the standard deviation of an ungrouped data, use the formula:

2 2


s= : n ∑ x −( ∑ x)
n(n−1)
where: s- standard deviation

∑ x 2- sum of squared deviations


n- number of items

∑ x - summation of x
Example:

Calculate the standard deviation of the given scores in an Algebra quiz: 18,
20, 22, 15, 16, 12, 17, 21, 10, 19.

Step 1: Construct a table of values.

X X2

18 324
20 400
22 484
15 225
16 256
12 144
17 289
21 441
10 100
19 361
∑ X = 170 ∑ X 2= 3024

Step 2: Substitute in the formula

2 2
n ∑ x −( ∑ x )
s=
n (n−1)

= √ 10 ( 3024 )−¿ ¿ ¿
30240−28900
=
√ 90

1340
=
√ 90

=√ 14.89

s= 3.86

 STANDARD DEVIATION of Grouped Data

To find the standard deviation of a grouped data, use the formula:

∑ f d2
s=
√ n

where: s- standard deviation

∑ f d 2- sum of the product of frequency and squared deviation


n- number of items

Example

Using the data of The Arts and Craft Shop shown below, calculate the
standard deviation.

Amount in F X d d2
Pesos days midpoint fx (deviation f d2
)
172-180 3 176 528 25 625 1875
163-171 5 167 835 16 256 1280
154-152 9 158 1422 7 49 441
145-153 12 149 1788 -2 4 48
136-144 5 140 700 -11 121 605
127-135 4 131 524 -20 400 1600
118=126 2 122 244 -29 841 1680
n= 40 ∑ fx =¿ ∑ f d 2= 7531
6041

Step 1: Prepare the frequency distribution with appropriate class intervals and write
the corresponding frequency (f).

Step 2: Get the midpoint (x) of each class interval.

Step 3: Multiply the (f) at the midpoint (x) of each interval to get fx.

Step 4: Add fx of each interval to get ∑ fx.

Step 5: Compute the mean (x̄) using x̄=


∑ fx
n

x̄=
∑ fx
n

6041
=
40

= 151.03 or 151

Step 6: Calculate the deviation (d) by subtracting the mean (x̄) from each midpoint
(x). Thus, d= x - x̄.

Step 7: Square the deviation (d) of each interval to get d 2.

2
Step 8: Multiply the frequency (f) and d 2. Find the sum of each product to get ∑ f d .

Step 9: Calculate the standard deviation (s) using the formula.


∑ f d2
s=
√ n

Substitute the values in the formula

∑ f d2
s=
√ n

7531
=
√ 40

=√ 188.275

s = 13.72

SAMPLE PROBLEMS INFERENTIAL STATISTICS

 T-TEST (Pooled Estimate)

Problem: The data are collected to determine the difference between the
supplies of laundry detergent and fabric conditioner in a laundry shop during
the past 6 months.

Laundry X 21 Fabric X 22

Months Detergent Conditioner


( X 1) ( X 2)
January 40 1600 30 900
February 50 2500 25 625
March 35 1225 30 900
April 20 400 15 225
May 15 225 20 400
June 45 2025 50 2500

∑ X 1= 205 ∑ X 21 = 7975 ∑ X 2= 170 ∑ X 22= 5550

I. Statement of the Problem


Is there a difference between the supplies of laundry detergent and
fabric conditioner in a laundry shop for the past 6 months?
Statement of Null Hypothesis
There is no difference between the supplies of laundry detergent and
fabric conditioner in a laundry shop for the past 6 months.

II. Level of Significance


α= 5%

III. Critical Value


df= n1 +n 2−2
df= 6+ 6−2
df= 10

Tabular t-value = 2.228


IV. Computation.

x́ 1−x́2
2 2
Formula: t= ( n1 −1 ) S 1 + ( n2−1 ) S 2 1
√ n1+ n2−2
(
n1
+
1
n2
)

SD= n ∑ x 2−¿ ¿ ¿ ¿

Solution of Standard Deviation:

6 ( 7975 )−( 205)2 47850−42025 5825


S12= n ∑ x 2−¿ ¿ ¿ ¿ = = = = 194.17
6(6−1) 30 30

2 2 6 ( 5550 )−( 170)2 33300−28900 4400


S2 = n ∑ x −¿ ¿ ¿ ¿ = = = = 146.67
6(6−1) 30 30
Solution:
x́ 1−x́ 2
2 2
t= ( n1 −1 ) S 1 + ( n2−1 ) S 2 1
√ n1+ n2−2 ( n + n1 )
1 2

34.2−28.3
= ( 5 ) 194.17+ ( 5 ) 146.67 1 1
√ 6+6−2 ( )
+
6 6

5.9
= 970.85+733.35
√ 10
.333333333

5.9
5.9 5.9
t= 1704.2 = = = 0.7828
√ 10
(0.333333333) √ 56.8066666099 7.53701974323141

V. Interpretation
Since the computed t-value of 0.7828 is less than the tabular t- value of
2.228, at 5% level of significance, df=10 the null hypothesis is therefore
accepted, thus there is no difference between the supplies of laundry
detergent and fabric conditioner in a laundry shop for the past 6 months.

 T-test (Paired Estimate)


Problem: Cybernetic Company, one of the leading computer companies in the
Philippines, wants to know the difference between the number of produced
computer in 2019 and 2020.

2019 (in 2020 (in d d2


thousands) thousands)
20 22 -2 4
15 13 2 4
40 35 5 25
12 15 -3 9
10 17 -7 49
25 15 10 100
∑d = 5 ∑ d 2 = 191

I. Statement of the Problem

Is there a difference between the number of produced computers in 2019 and


2020 of Cybernetic Company?

Statement of Null Hypothesis

There is no difference between the number of produced computers in 2019


and 2020 of Cybernetic Company.

II. Level of Significance


α= 5%

III. Critical Value


df= 6-1
df= 6-1
df= 5

Tabular t-value = 2.571

IV. Computation
d
Formula: d́=
n
2

Sd= n (∑ d )−¿ ¿¿ ¿


t= sd
√n
6
Solution: d́= = d́= 1
6

5 ( 191 )−(5)2 930


Sd=
√ 6(6−1)
=
√ 30
= √ 31 =5.5678

1
1
t= 5.5678 = = .4399
2.2730
√ 6
V. Interpretation
Since the computed t-value of 0.4399 is less than the tabular t- value of 2.571,
at 5% level of significance, df=5 the null hypothesis is therefore accepted, thus there
is no difference between the number of produced computers in 2019 and 2020 of
Cybernetic Company..

 F-test ANOVA
Problem: The data below are the 5 month sales of three branches of Oriental
Milk tea in Oriental Mindoro.

Month OMT 1 X 21 OMT 2 X 22 OMT3 X 23


January 30 900 40 1600 20 400
February 40 1600 60 3600 45 2025
March 35 1225 25 625 15 225
April 50 2500 45 2025 30 900
May 45 2025 55 3025 40 1600

t.j 200 225 150 T= 575


m.j 5 5 5 N= 15
∑ X2 8250 10875 5150 ∑ ∑ x2=¿
24275

I. Statement of the Problem


Is there a difference on sales of Oriental Milk Tea 1, Oriental Milk Tea 2, and
Oriental Milk Tea 3 in Oriental Mindoro for the first 5 months of its operations?
Statement of Null Hypothesis
There is no difference on sales of Oriental Milk Tea 1, Oriental Milk Tea 2,
and Oriental Milk Tea 3 on Oriental Mindoro for the first 5 months of its
operations.

II. Level of Significance


α= 5%

III. Critical Value


df= [ k −1, N −k ] = (3-1), (15-3) = 2, 12

Tabular ƒ value = 3.88

IV. Computation
(r)2
Formula: SST= ∑ ∑ x2−
N

(t . j) r2
SSt r = ∑
[ ]
(m. j) N

SSE= SST- SSt r


SStr
MS tr=
(k −1)
SSE
MSE=
(N −k )
MStr
ƒ= .
MSE
Solution:
2
−(575)
SST= 24275 = 24275−22,041.7= 2233.3
15
2 2 2

[ ]
SSt r = (200) +(225) +(150) − 330625 = [ 22625 ] −22041.7 = 583.3
5 21

SSE= 2233.3 – 583.3 = 1650


ANOVA TABLE

Source of Sum of Degrees of Mean Square F-value


Variation Squares (SS) Freedom (DF) (MS)
Treatment 583.3 (k-1) =2 583.3 291.65
= 291.65 =
2 137.5

2.1211
Error 1650 (N-k) =12 1650
= 137.5
12
Total 2233.3 (N-1) =14

V. Interpretation
Since the computed ƒ -value of 2.1211 is less than the tabular ƒ - value of
3.88, at 5% level of significance, df= 2, 12; the null hypothesis is therefore accepted,
thus there is no difference in the sales of three branches of Oriental Milk Tea in
Oriental Mindoro for the first 5 months of its operations.

 CHI-SQUARE
Problem: A public opinion poll surveyed a simple random sample of 1000 voters.
Respondents were classified by gender (male or female) and by voting preference
(Republican, Democrat, or Independent). Results are shown in the table below.

Voting Preferences Row total


Rep Dem Ind
Male 200 150 50 400
Female 250 300 50 600
Column total 450 450 100 1000

I. Statement of the Problem

Is there a difference in the opinion of 1000 voters when it comes to their


gender and voting preference?

Statement of Null Hypothesis

There is no difference in the opinion of 1000 voters when it comes to their


gender and voting preference.

II. Level of significance


α= 5%

III. Critical Value


df= r −1 x c−1 = (2-1)(3-1) = 2

Tabular x 2 value = 4.605

IV. Computation

Formula: E=
∑ row total x column total
total total

(O−E)2
x 2=∑
E
Whereas: O =the frequency Observed
E= the frequency Expected
∑= the ‘sum of’

Observed Frequency
Republican Democrat Independent Row Total
Female 200 150 50 400
Male 250 300 50 600
Column Total 450 450 100 1000

Expected Frequency
Republican Democrat Independent Row Total
Female 180 180 40 400
Male 270 270 60 600
Column Total 450 450 100 1000

CHI-SQUARE
Republican Democrat Independent Row Total
Female 2.22222222222 5 2.5 9.7222222222
Male 1.48148148148 3.33333333333 1.66666666667 6.48148148148
3
Column 3.7037037037 8.33333333333 4.16666666667 16.2037037037
Total 3

V. Interpretation

Since the computed chi-square ( x 2) -value of 16.2037037037 is greater than


the tabular chi-square ( x 2)- value of 4.605, at 5% level of significance, df= 2 the null
hypothesis is therefore rejected, thus there is a difference in the opinion of 1000
voters when it comes to their gender and voting preferences..
 Pearson’s R
Problem: Red Marasigan, an AB History student wants to examine the
relationship between the amount of time spent studying for an exam (X) in
hours and the score that a student makes on an exam (Y). Data are shown
below.

X (no. of hours in studying) Y (scores on exam)

3 90
1 80
2 85
4 93
1.5 83
2.5 87
4 95
2 85
5 97
3 90

I. Statement of the Problem


Is there a relationship between the number of hours in studying and the score
that a student makes on an exam?
Statement of Null Hypothesis
There is no difference on the number of hours in studying and the score that a
student makes on an exam.
II. Level of significance
α= 5%

III. Critical Value


df= n-2
= 10 – 2
=8

Tabular r value = .632

IV. Computation
Fomula:

r= n ¿ ¿
Solution:
X Y X2 Y2 XY
3 90 9 8100 270
1 80 1 6400 80
2 85 4 7225 170
4 95 16 9025 380
1.5 83 2.25 6889 124.5
2.5 87 6.25 7569 217.5
4 95 16 9025 380
2 85 4 7225 170
5 97 25 9409 485
3 90 9 8100 270

∑ X= 28 ∑Y = 887 ∑ X 2 = 92.5 ∑ Y 2 = 78967 ∑ XY = 2547

Substitute the formula


r= n ¿ ¿
10 ( 2574 )−(28)(887)
=
√¿¿¿

25740−24836
=
√ [ 925−784 ][ 789670−786769 ]
904
=
√(141)(2901)
904
=
√ 409041
904
= 639.5631

=1.4135

V. Interpretation
1. Since the computed r-value of 1.4135 is greater than the tabular
t-value of .632, df=8 at 5% level of significance, the null
hypothesis is therefore rejected, thus there is a relationship
between the number of hours in studying and the scores that a
student makes on an exam.

2. r-value of 1.4135 indicates a direct proportional relationship,


which means that as the number of hours in studying increases,
the scores that a student makes on an exam also increases, or
as the number of hours in studying decreases, the scores that a
student makes on an exam also decreases.

3. r-value indicates a very high relationship between the number of


hours in studying and the scores that a students makes on an
exam.

 Spearman’s Rho
Problem: Problem: Red Marasigan, an AB History student wants to examine
the relationship between the amount of time spent studying for an exam (X) in
hours and the score that a student makes on an exam (Y). Data are shown
below.

X (no. of hours in studying) Y (scores on exam)

3 87
1 81
2 85
4 93
1.5 84
2.5 88
4 92
2 86
5 98
3 89

I. Statement of the Problem


Is there a relationship between the number of hours in studying and the score
that a student makes on an exam?
Statement of Null Hypothesis
There is no difference on the number of hours in studying and the score that a
student makes on an exam.

II. Level of significance


α= 5%

III. Critical Value


df= n
= 10

Tabular p value = .564


IV. Computation

Formula:

P = 1-
∑ d2
n(n−2)

Solution:

X Rank Y Rank D d2
3 4.5 87 6 -1.5 2.25
1 10 81 10 0 0
2 7.5 85 8 -.5 .25
3.5 2.5 93 2 .5 .25
1.5 9 84 9 0 0
2.5 6 88 5 1 1
4 2 92 3 -1 1
2 7.5 86 7 .5 .25
5 1 98 1 0 0
3 4.5 89 4 .5 .25

∑ d 2=
5.25

Substitute:

P = 1-
∑ d2
n(n−2)
5.25
= 1-
10(10−2)
5.25
= 1- 80 = 1- .065625 = .934375

V. Interpretations
1. Since the computed rho value of .934375 is greater than the
tabular value of .564 at 5% level of significance, df= 10, the null
hypothesis is therefore rejected , thus there is a relationship
between the number of hours in studying and the scores that a
student makes on an exam.

2. Since the computed rho value is .934375, we can say that there
is a moderate relationship between the number of hours in
studying and the scores that a student makes on an exam.

3. Since the computed rho value is .934375, we can say that there
is a positive relationship between the number of hours in
studying and the scores that a student makes on an exam.

 Regression Analysis

X1 X2 Y X 21 X 22 Y2 X1 X2 X1 Y X2 Y

3 2 11 9 4 121 6 33 22
1 1 8 1 1 64 1 8 8
4 3 9 16 9 81 12 36 27
2 2 10 4 4 100 4 20 20
5 2 12 25 4 144 10 60 24
8 32 64
78 81
70 30
14 56
45 74

∑ X 1 ∑ X 2 ∑ Y = ∑ X 21 ∑❑
= = =

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy