MAT 361 Lecture 15 16
MAT 361 Lecture 15 16
Professor
Dept. of Statistics, JnU, Dhaka
&
Adjunct Faculty
Dept. of Mathematics and Physics
North South University, Bangladesh
E-mail: atiqul.islam07@northsouth.edu
atique@stat.jnu.ac.bd
Lecture: 15 - 16 (Section: 6.1 – 6.2)
▷ Descriptive Statistics
2
What is Statistics?
▷ Statistics is the quantitative information of any inquiry.
For example, the statistics of birth and death
the statistics of export and import
the statistics of the evolution of human races
the statistics of the products of all human activity in society
the statistics of disease outbreak, comorbidity, manpower,
loss and profit of different organizations.
▷ According to Lovitt (1929), Statistics is the science which deals
with collection, classification and tabulation of numerical facts
as the basis for explanation, description and comparison of
phenomenon.
3
Types of Statistics
▷ Descriptive Statistics: It deals with collection, tabulation,
presentation and analysis of data without considering the
theory of probability.
▷ Inferential Statistics: Statistics is based on inductive logic.
Inferential statistics is concerned with making estimates,
predictions and generalizations, or reaching decisions about
population based on sample observations.
▷ So, the methods used to determine something about a
population on the basis of a sample.
4
Descriptive Statistics
▷ Collect data
▻ e.g., Survey
▷ Present data
▻ e.g., Tables and graphs
▷ Summarize data
∑𝑋𝑋
▻ e.g., Sample mean =
𝑛𝑛
5
Inferential Statistics
▷ Estimation
▻ e.g., Estimate the population mean
weight using the sample mean weight.
▷ Hypothesis testing
▻ e.g., Test the claim that the population
mean weight is 70 kg.
6
Important Definitions
▷ Data: Data are the facts and figures collected, analyzed, and
summarized for presentation and interpretation.
▷ Any measurement of one or more characteristics recorded (as
a result of observation, interview and so on) either from
population or sample units are called data.
▷ Data are the raw, disorganized facts and figures collected from
any field of inquiry.
▷ Data can be numerical (e.g. Age, temperature, GDP) or non-
numerical (e.g., having cancer, feeling, attitude etc.).
▷ The size of data is called Observations.
7
Definitions Cont’d
Variables
Qualitative Quantitative
e.g. gender, color e.g. CGPA, family size
9
Definitions Cont’d
10
Definitions Cont’d
11
Definitions Cont’d
12
Scales of Measurements
▷ Nominal scale: When the data for a variable consist of labels or
names used to identify an attribute of the element is considered as
nominal scale.
▻ The categories are in no logical order and have no particular
relationship.
▻ The variables such as Name, ID, Address, Cell # declare this scale.
Not possible to do analysis.
▷ Ordinal scale: If the data exhibit the properties of nominal data and
the order or rank of the data is meaningful. One can count and order,
but not measure ordinal data.
▻ Qualitative data such as test performances (excellent, good, poor
etc.), quality of food (good or bad), disease stages etc. Possible to
order. Some analysis is possible.
▷ Nominal and ordinal - Qualitative data.
13
Scales of Measurements
▷ Interval scale: This shows properties of ordinal data and interval
between values are meaningful.
▷ An interval scale is a scale of measurement where the distance
between any two adjacent units of measurement is the same but the
zero point is arbitrary.
▻ Example - The Celsius scale is a clear example of the interval
scale of measurement. Thus, 0 degree Celsius is interval data.
▷ Ratio scale: Have properties of Interval data. In addition ratio of the
data values is meaningful. The zero value on this scale is absolutely
zero.
▻ For example - height and weight of a person.
▷ Interval and ratio - Quantitative data
14
Variables and Measurement Scales
▷ State whether the following variables are qualitative (categorical) or
quantitative and indicate their measurement scale.
15
Variables and Measurement Scales
▷ State whether the following variables are qualitative (categorical) or
quantitative and indicate their measurement scale.
16
Summary of Raw data
▷ Summary of Raw data
▻ The tabular form (Frequency distribution)
▻ The graphical form, and
▻ The numerical form
Measures of Central Tendency
Measures of Dispersion
Others
17
Summarizing Data for a Qualitative Variable
▷ Frequency distribution
▷ A frequency distribution is a tabular summary of data
showing the number (frequency) of observations in each of
several nonoverlapping categories or classes.
▷ It provides a summary of how the values of a variable are
distributed across the different categories.
▷ Frequency is denoted by 𝑓𝑓𝑖𝑖 , (𝑖𝑖 = 1,2,3, … . , 𝑘𝑘).
18
Frequency Distribution Cont’d
19
Solution
▷ Table-1 shows the test performance of
𝑛𝑛 = 15 students selected in a sample. Table 1: Test performance
20
Solution: Tabular Summary
Good Good Excellent
Excellent Poor Excellent
Poor Excellent Good
Excellent Excellent Poor
Poor Good Good
Good 5 5 5
= 0.33 (33%) × 360 = 1200
15 15
Poor 4 4 4
= 0.27 (27%) × 360 = 960
15 15
Total 𝑛𝑛 = 15 100% 3600
22
Graphical Summary using Excel
5
No. of Students
27%
4 40%
3
2
33%
1
0
Excellent Good Poor
Performances Excellent Good Poor
Fig. 1: Bar diagram of test performance Fig. 2: Pie diagram of test performance
23
Graphical Summary
Breakdown Cause Frequency Slice with Angle
Electrical 9 70.43 (19.57%)
Mechanical 24 187.83 (52.17%)
Misuse 13 101.74 (28.26%)
Total 46 360
25
5 Steps in Construction of Frequency Distribution
1) Find out Range (R) by subtracting the lowest value (𝑳𝑳) from the highest
value (𝑯𝑯) of a variable, i.e., 𝑹𝑹 = 𝑯𝑯 − 𝑳𝑳.
2) The number of classes (𝒌𝒌) should not be less than 5 and should not be more
than 20. However, the value of 𝒌𝒌 can be found by a formula:
𝒌𝒌 = 𝟏𝟏 + 𝟑𝟑. 𝟑𝟑𝟑𝟑𝟑𝟑 𝐥𝐥𝐥𝐥𝐥𝐥 𝟏𝟏𝟏𝟏 𝒏𝒏, 𝑛𝑛 𝑖𝑖𝑖𝑖 𝑡𝑡𝑡𝑡𝑡 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜.
(Sturge’s Rule for the number of classes).
OR, Using the 2𝑘𝑘 rule to determine the number of classes, i.e.,
𝟐𝟐𝒌𝒌 ≥ 𝒏𝒏
OR, the number of classes, i.e., 𝒌𝒌 = 𝑵𝑵𝑵𝑵. 𝒐𝒐𝒐𝒐 𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶
𝑹𝑹
Then, the width / size of the of a class is found out by, 𝒄𝒄 = .
𝒌𝒌
3) Arrange the table with three columns having headings: Variable, Tally
Marks and Frequency. The first class interval will start with the lowest
value and continue until the interval with the highest value of the given
series of data is reached.
4) Read the items and give a tick mark or circle to each of the values and put a
tally mark against the appropriate class interval.
5) Count the number of tally marks corresponding to each class interval and
write the result in the respective frequency column. 26
Example
▷ Test scores obtained in Probability and Statistics course (MAT 361)
by 40 students are given below:
40 38 44 28 30 22 35 42 40 36
50 67 25 58 53 48 65 35 55 39
72 44 70 55 62 20 78 46 57 68
59 34 41 56 60 42 64 73 38 41
𝑝𝑝 = 100 29
Solution
▷ Summary: Out of 40 students, 10 students got scores between 40 to
50, Eight students scored between 50 to 60, and so on.
▷ About 25% of students scored 40 to 50, 20% of students scored 50
to 60, and so on. 12 students scored below 40, 4 students scored
above 70.
Test Tally Marks Frequency Relative (percent) Frequency
Score (𝑓𝑓𝑖𝑖 , 𝑖𝑖 = 1,2, … , 6) 𝑟𝑟𝑓𝑓𝑖𝑖 (𝑝𝑝 × 𝑟𝑟𝑓𝑓𝑖𝑖 )
20 – 30 |||| 4 4 ÷ 40 = 0.10 (10%)
30 – 40 |||| ||| 8 8 ÷ 40 = 0.20 (20%)
40 – 50 |||| |||| 10 10 ÷ 40 = 0.25 (25%)
50 – 60 |||| ||| 8 8 ÷ 40 = 0.20 (20%)
60 – 70 |||| | 6 6 ÷ 40 = 0.15 (15%)
70 - 80 |||| 4 4 ÷ 40 = 0.10 (10%)
Total 𝑛𝑛 = 40 100%
𝑝𝑝 = 100
30
Histogram
▷ A common graphical presentation of quantitative data is a
histogram.
▷ The variable of interest is placed on the horizontal axis (𝑿𝑿)
and the frequency, relative frequency, or percent frequency is
placed on the vertical axis (𝒀𝒀).
▷ A rectangle is drawn above each class interval with its height
corresponding to the interval’s frequency, relative frequency,
or percent frequency.
▷ Unlike a bar graph, a histogram has no natural separation
between rectangles of adjacent classes.
31
Histogram
33
Stem and Leaf Plot
▷ A stem-and-leaf plot shows both the rank order and shape of
the distribution of the data.
▷ It is similar to a histogram on its side, but it has the advantage
of showing the actual data values.
▷ The first digits of each data item are arranged to the left of a
vertical line.
▷ To the right of the vertical line we record the last digit for each
item in rank order.
▷ Each line in the display is referred to as a stem.
▷ Each digit on a stem is a leaf.
34
Stem and Leaf Plot: Test Score data
36
Example: Leaf Unit = 0.1
▷ If we have data with values such as
8.6 11.7 9.4 9.1 10.2 11.0 8.8
▷ A stem-and-leaf display of these data will be
Stem Leaf
8 6 8
9 1 4
10 2
11 0 7
37
Example: Leaf Unit = 10
▷ If we have data with values such as
1806 1717 1974 1791 1682 1910 1838
▷ A stem-and-leaf display of these data will be
Stem Leaf
16 8
17 1 9
18 0 3
19 1 7
Key: 16 | 8 represents the value of 1680
38
Example: C.W.
▷ The data in Table 2.5 represent the blood cholesterol levels of 40
first-year students at a particular college.
39
Thank You!
40