MMW Statistics
MMW Statistics
in the Modern
World
Data Management
4.1 Descriptive Statistics
Statistics is a branch of mathematics that deals
with data collection, organization, analysis,
interpretation and presentation.
Continuous
height of students in class
weight of students in class
time it takes to get to school
distance traveled between classes
Types of Statistical Data
1. Numerical data. These data have meaning as a
measurement such as a person’s height, weight, IQ,
or blood pressure or shares of stocks a person owns.
2. Ordinal – the data at this level can be ordered but no differences between the
data. (eg. ten cities are ranked from one to ten, but differences between the
cities don't make much sense, letter grades where we can order things so that A
is higher than B but without any other information).
3. Interval – deals with data that can be ordered, and in which differences
between the data does make sense. But data at this level has no starting point.
(eg. Fahrenheit and Celsius scales of temperatures).
4. Ratio – the highest level of measurement. Data possess all of the features of
the interval level, in addition to an absolute zero. Due to the presence of a zero, it
now makes sense to compare the ratios of measurements.
4.2 Data Collection Method
Methods of Collecting Data
1. In-Person Interviews
Pros: In-depth and a high degree of confidence on the data
Cons: Time consuming, expensive and can be dismissed as anecdotal
2. Mail Surveys
Pros: Can reach anyone and everyone – no barrier
Cons: Expensive, data collection errors, lag time
3. Phone Surveys
Pros: High degree of confidence on the data collected, reach almost
anyone
Cons: Expensive, cannot self-administer, need to hire an agency
4. Web/Online Surveys
Pros: Cheap, can self-administer, very low probability of data errors
Cons: Not all your customers might have an email address/be on the
internet, customers may be wary of divulging information online
Three Ways of Presenting Data
Example 1
Jack joins football practice every Wednesday morning,
Sunday morning and afternoon.
The following are the 3rd year math grades of an applied math student:
1.6 1.2 1.9 1.5 1.5 1.5 1.0 1.3 1.0
Mean:
̅X = X1 + X2 + ⋯ + X9
9
Median:
1.0 1.0 1.2 1.3 1.5 1.5 1.5 1.6 1.9
Mode: 1.5
Example 2 (for grouped data)
𝑛 − 𝑐𝑓
2 𝑝
𝑀𝑑 = 𝐿𝐶𝐵 + 𝑖
𝑓𝑚
n − 𝑐𝑓
2 𝑝
Median = LCB + i
fm
15 − 10
= 35.5 + 8 = 39.5
10
The mode for grouped data is given by
𝑓𝑚 − 𝑓1
𝑀𝑜 = 𝐿𝐶𝐵 𝑖
2𝑓𝑚 − 𝑓1 − 𝑓2
+
where LCB is the lower boundary of the modal class
is the size of the class interval
i
fm is the frequency of the modal class
𝑓 𝑚−
𝑓1
Mode = LCB + i
2𝑓𝑚 − 𝑓1 − 𝑓2
10 − 7
= 35.5 + 8 = 38.9
20 − 7 − 6
4.5 Measures of Variability
Variability for Ungrouped Data
R = HV − LV
• Variance
It is defined as the average of the squared deviations from the mean.
It is the measure that considers the position of each observation
relative to the mean.
n x − (x)
2 2
2
𝑥𝑖 − 𝑥
𝑠2 = ∑ or s2 =
𝑛 −1 n (n −1)
𝑖
• Standard Deviation (the most widely encountered) - It is
the measure of the spread or dispersion of scores from the
mean of distribution. It is the square root of the variance.
2
n x − (x)
2
𝑥𝑖 − 𝑥 2
𝑠 = ∑
𝑛 −1
or s =
𝑖 n (n −1)
s =
2 s =
n (n −1) n (n −1)
4.6 Testing a Statistical Hypothesis
Hypothesis testing is the most significant area of statistical
inference. It is a step-by-step process in making inferences
(conclusions) about a population.
Types of Error
2. A study shows that the daily consumption depends on the age level of a person.
H0: The daily consumption does not depend on the age level of a person.
H1: The daily consumption depends on the age level of a person.
C. Correlation
To determine whether two variables (usually x and y) are
linearly related, correlation is the statistical method to be used.
In this method, the data collected on two numerical variables
are tested to determine the strength of their relationship
estimated by the sample correlation coefficient r given by
𝑛(∑𝑥𝑦) − (∑ 𝑥)(∑𝑦)
𝑟=
𝑛(∑ 𝑥 2 ) − ∑ 𝑥 2 𝑛(∑ 𝑦 2 ) − ∑ 𝑦 2
where −1 ≤ 𝑟 ≤ 1 𝑎𝑛𝑑
𝑛 = number of data pairs
If the value of 𝑟 is close to positive 1, then there is a strong positive linear
relationship between the two variables. If 𝑟 is close to negative 1, there is a
strong negative linear relationship between them. However, if the two
variables has a weak or no linear relationship, 𝑟 is close to 0.
Example 3
1. A study is conducted to show how strong is the relationship between sleeping habit of
employees and their level of performance at work.
H0: Sleeping habit of employees is not related to their level of performance at work.
H1: Sleeping habit of employees is related to their level of performance at work.
2. A student wants to know if his grade in Mathematics is associated to his grade in English.
A 7 83
B 3 63
C 2 60
D 6 88
E 3 68
F 4 75
Solution:
To solve for the correlation coefficient r, we must find first the values of
𝑥𝑦, 𝑥 2 , and𝑦2.
A 7 83 581 49 6889
B 3 63 189 9 3969
C 2 60 120 4 3600
D 6 88 528 36 7744
E 3 68 204 9 4624
F 4 75 300 16 5625
𝚺𝒙 = 25 𝚺𝒚 = 437 𝚺𝒙𝒚 = 1922 𝚺𝒙2 = 123 𝚺𝒚2 = 32451
Substituting the values to the formula,
6)(1922) − (25)(437
𝑟=
2
6 123 − 25 2 6 32451 − 437
𝑟 = 0.934
Let us take the example in correlation section since a strong linear relationship exists
between the number of hours of study and test scores on an exam of students.
Solution:
Since 𝑥𝑦, 𝑥 2 , and𝑦2 are necessary to solve for 𝒂 and 𝒃, we must solve them first.
Hours of Grade
Student 𝑥𝑦 𝑥2 𝑦2
Study (x) (y)
A 7 83 581 49 6889
B 3 63 189 9 3969
C 2 60 120 4 3600
D 6 88 528 36 7744
E 3 68 204 9 4624
F 4 75 300 16 5625