0% found this document useful (0 votes)
4 views40 pages

MAT 361 Lecture 15 16

The document outlines a course on Probability and Statistics, focusing on descriptive and inferential statistics, including definitions of key concepts such as data, variables, populations, and samples. It explains the types of statistics, methods for summarizing data, and different measurement scales. Additionally, it provides examples of frequency distributions and graphical representations of data analysis.

Uploaded by

sabab.fahim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views40 pages

MAT 361 Lecture 15 16

The document outlines a course on Probability and Statistics, focusing on descriptive and inferential statistics, including definitions of key concepts such as data, variables, populations, and samples. It explains the types of statistics, methods for summarizing data, and different measurement scales. Additionally, it provides examples of frequency distributions and graphical representations of data analysis.

Uploaded by

sabab.fahim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Course Title: Probability and Statistics

Course Code: MAT 361

Prof. Dr. Md. Atiqul Islam


M.Sc. (SUST, BD), M.Sc. (UHasselt, BE), Ph.D. (RuG, NL)

Professor
Dept. of Statistics, JnU, Dhaka
&
Adjunct Faculty
Dept. of Mathematics and Physics
North South University, Bangladesh
E-mail: atiqul.islam07@northsouth.edu
atique@stat.jnu.ac.bd
Lecture: 15 - 16 (Section: 6.1 – 6.2)

▷ Descriptive Statistics

2
What is Statistics?
▷ Statistics is the quantitative information of any inquiry.
 For example, the statistics of birth and death
 the statistics of export and import
 the statistics of the evolution of human races
 the statistics of the products of all human activity in society
 the statistics of disease outbreak, comorbidity, manpower,
loss and profit of different organizations.
▷ According to Lovitt (1929), Statistics is the science which deals
with collection, classification and tabulation of numerical facts
as the basis for explanation, description and comparison of
phenomenon.

3
Types of Statistics
▷ Descriptive Statistics: It deals with collection, tabulation,
presentation and analysis of data without considering the
theory of probability.
▷ Inferential Statistics: Statistics is based on inductive logic.
Inferential statistics is concerned with making estimates,
predictions and generalizations, or reaching decisions about
population based on sample observations.
▷ So, the methods used to determine something about a
population on the basis of a sample.

4
Descriptive Statistics

▷ Collect data
▻ e.g., Survey

▷ Present data
▻ e.g., Tables and graphs

▷ Summarize data
∑𝑋𝑋
▻ e.g., Sample mean =
𝑛𝑛

5
Inferential Statistics
▷ Estimation
▻ e.g., Estimate the population mean
weight using the sample mean weight.

▷ Hypothesis testing
▻ e.g., Test the claim that the population
mean weight is 70 kg.

Inference is the process of drawing conclusions or making


decisions about a population based on sample results.

6
Important Definitions
▷ Data: Data are the facts and figures collected, analyzed, and
summarized for presentation and interpretation.
▷ Any measurement of one or more characteristics recorded (as
a result of observation, interview and so on) either from
population or sample units are called data.
▷ Data are the raw, disorganized facts and figures collected from
any field of inquiry.
▷ Data can be numerical (e.g. Age, temperature, GDP) or non-
numerical (e.g., having cancer, feeling, attitude etc.).
▷ The size of data is called Observations.

7
Definitions Cont’d

▷ Variable: A variable is a characteristic whose value can vary


from person to person, object to object or from phenomenon to
phenomenon.
▷ A variable is a changeable characteristics of interest.
▷ Variable is denoted by 𝑋𝑋, 𝑌𝑌, 𝑍𝑍 or denoted by 1st letter (e.g. Score -
S, Age – A)
▻ For example
1) Gender is variable which is composed of two categories, male
and female and it varies from one to another.
2) Age is a variable which may vary from person to person and
may assume values 10 years, 15 years 20 years.
3) Level of arsenic in water may vary between wells.
4) Grade, Family size, Exam score etc.
8
Types of Variables based on Characteristics

Variables

Qualitative Quantitative
e.g. gender, color e.g. CGPA, family size

Dichotomic Polynomic Discrete Continuous

Brand of Pc, hair Children in a family, Income, exam


Gender,
color, disease Strokes on a golf hole, score, age, height,
smoking status Hospital bed weight
grade

9
Definitions Cont’d

▷ Population: An aggregate of all individuals or items under


investigation defined on some common characteristics is called a
population.
▷ For example, if the objective is to estimate the average income in
2023 of female employees working in different garments
industries in Bangladesh, all female employees in all industries of
Bangladesh during a particular time period constitute the
population.
▷ Target Population: The set of all elements of interest in a
particular study. E.g. all female garments worker in Bangladesh.
▷ Study population: The subset of the target population available
for study. E.g. all female garments worker in Dhaka.

10
Definitions Cont’d

▷ Sample: A small but representative part of a population which is


under investigation is called a sample.
▻ For example, a group of female employees in all industries of
Bangladesh during a particular time period constitute the
sample.
▷ Random Sample: If each individual or item in the population from
which a sample has been drawn or selected, has an equal chance of
being included in the sample, then the sample is called a random
sample.
▻ For example, if we select a sample of 20 students from 100
students completely at random, then each of the students has an
equal chance of being included in the sample. Therefore, the
sample 20 students is a random sample.

11
Definitions Cont’d

▷ Population: A population consists of all possible observations


available from a particular probability distribution.
▷ Sample: A sample is a particular subset of the population that an
experimenter measures and uses to investigate the unknown
probability distribution.
▷ Random Sample: A random sample is one in which the elements
of the sample are chosen at random from the population, and this
procedure is often used to ensure that the sample is
representative of the population.

12
Scales of Measurements
▷ Nominal scale: When the data for a variable consist of labels or
names used to identify an attribute of the element is considered as
nominal scale.
▻ The categories are in no logical order and have no particular
relationship.
▻ The variables such as Name, ID, Address, Cell # declare this scale.
Not possible to do analysis.
▷ Ordinal scale: If the data exhibit the properties of nominal data and
the order or rank of the data is meaningful. One can count and order,
but not measure ordinal data.
▻ Qualitative data such as test performances (excellent, good, poor
etc.), quality of food (good or bad), disease stages etc. Possible to
order. Some analysis is possible.
▷ Nominal and ordinal - Qualitative data.
13
Scales of Measurements
▷ Interval scale: This shows properties of ordinal data and interval
between values are meaningful.
▷ An interval scale is a scale of measurement where the distance
between any two adjacent units of measurement is the same but the
zero point is arbitrary.
▻ Example - The Celsius scale is a clear example of the interval
scale of measurement. Thus, 0 degree Celsius is interval data.
▷ Ratio scale: Have properties of Interval data. In addition ratio of the
data values is meaningful. The zero value on this scale is absolutely
zero.
▻ For example - height and weight of a person.
▷ Interval and ratio - Quantitative data

14
Variables and Measurement Scales
▷ State whether the following variables are qualitative (categorical) or
quantitative and indicate their measurement scale.

Variables Qualitative / Quantitative Measurement Scale

Class Size Quantitative Ratio

Blood Group Qualitative Nominal

Income Quantitative Ratio

Jersey Number Qualitative Nominal

Temperature Quantitative Interval

Blood Pressure Quantitative Ratio

Voice Quality Qualitative Ordinal

15
Variables and Measurement Scales
▷ State whether the following variables are qualitative (categorical) or
quantitative and indicate their measurement scale.

Variables Qualitative / Quantitative Measurement Scale

Class Size Quantitative Ratio

Blood Group Qualitative Nominal

Income Quantitative Ratio

Jersey Number Qualitative Nominal

Temperature Quantitative Interval

Blood Pressure Quantitative Ratio

Voice Quality Qualitative Ordinal

16
Summary of Raw data
▷ Summary of Raw data
▻ The tabular form (Frequency distribution)
▻ The graphical form, and
▻ The numerical form
Measures of Central Tendency
Measures of Dispersion
Others

17
Summarizing Data for a Qualitative Variable

▷ Frequency distribution
▷ A frequency distribution is a tabular summary of data
showing the number (frequency) of observations in each of
several nonoverlapping categories or classes.
▷ It provides a summary of how the values of a variable are
distributed across the different categories.
▷ Frequency is denoted by 𝑓𝑓𝑖𝑖 , (𝑖𝑖 = 1,2,3, … . , 𝑘𝑘).

18
Frequency Distribution Cont’d

▷ Let us consider the following data of the test performance of


MAT 361.

Good Good Excellent


Excellent Poor Excellent
Poor Excellent Good
Excellent Excellent Poor
Poor Good Good

▷ Make a tabular and graphical summary of the above data.

19
Solution
▷ Table-1 shows the test performance of
𝑛𝑛 = 15 students selected in a sample. Table 1: Test performance

Good Good Excellent


▷ It is a qualitative data of 3 groups.
Excellent Poor Excellent

▷ To develop a frequency distribution for Poor Excellent Good

these data, we count the number of times Excellent Excellent Poor


each test performance appears and put a Poor Good Good
tally marks in a column of a table.
▷ To indicate the accommodation of an
observation to a particular class, a tally
mark (|) is used.
▷ Later, numerical value (frequency) is
provided based on tally marks.

20
Solution: Tabular Summary
Good Good Excellent
Excellent Poor Excellent
Poor Excellent Good
Excellent Excellent Poor
Poor Good Good

Test Tally Marks Frequency Relative (percent)


Performance (𝑓𝑓𝑖𝑖 , 𝑖𝑖 = 1,2,3) Frequency - 𝑟𝑟𝑓𝑓𝑖𝑖 (𝑝𝑝 × 𝑟𝑟𝑓𝑓𝑖𝑖 )
Excellent |||| | 6 6
= 0.40 (40%)
15
Good |||| 5 5
= 0.33 (33%)
15
Poor |||| 4 4
= 0.27 (27%)
15
Total 𝑛𝑛 = 15 100%  𝑝𝑝 = 100

▷ Summary: Out of 15 students, 6 students’ performances are excellent, 5


students show good performances and so on. OR, Among the students, 40%
of students show excellent performance, 33% percent are good and so on.
21
Graphical Summary: Pie Diagram
▷ Need to know angles of the specific portion of pie diagram.
▷ As a circle consists of 360°, the whole quantity is equated to 360°.

Test Frequency Relative(percent) Angles


Performance (𝑓𝑓𝑖𝑖 , 𝑖𝑖 = 1,2,3) Frequency of the Portion
- 𝑟𝑟𝑓𝑓𝑖𝑖 (𝑝𝑝 × 𝑟𝑟𝑓𝑓𝑖𝑖 )
Excellent 6 6 6
= 0.40 (40%) × 360 = 1440
15 15

Good 5 5 5
= 0.33 (33%) × 360 = 1200
15 15
Poor 4 4 4
= 0.27 (27%) × 360 = 960
15 15
Total 𝑛𝑛 = 15 100% 3600

22
Graphical Summary using Excel

Bar Diagram of Test Performance Pie Diagram of Test Performance


7

5
No. of Students

27%
4 40%
3

2
33%
1

0
Excellent Good Poor
Performances Excellent Good Poor

Fig. 1: Bar diagram of test performance Fig. 2: Pie diagram of test performance

▷ Data Summary: Figure shows that 40% of students performed


excellently, 33% performed well and 27% performed poorly.

23
Graphical Summary
Breakdown Cause Frequency Slice with Angle
Electrical 9 70.43 (19.57%)
Mechanical 24 187.83 (52.17%)
Misuse 13 101.74 (28.26%)
Total 46 360

Figure 3: Bar diagram Figure 4: Pie diagram 24


Summarizing Data for a Quantitative Variable

▷ A frequency distribution is a tabular summary of data


showing the number (frequency) of observations in each of
several nonoverlapping categories or classes.
▷ This definition holds for quantitative as well as categorical
data.
▷ Frequency is denoted by 𝑓𝑓𝑖𝑖 , (𝑖𝑖 = 1,2,3, … . , 𝑘𝑘).
▷ However, with quantitative data we must be more careful in
defining the nonoverlapping classes to be used in the
frequency distribution.

25
5 Steps in Construction of Frequency Distribution

1) Find out Range (R) by subtracting the lowest value (𝑳𝑳) from the highest
value (𝑯𝑯) of a variable, i.e., 𝑹𝑹 = 𝑯𝑯 − 𝑳𝑳.
2) The number of classes (𝒌𝒌) should not be less than 5 and should not be more
than 20. However, the value of 𝒌𝒌 can be found by a formula:
𝒌𝒌 = 𝟏𝟏 + 𝟑𝟑. 𝟑𝟑𝟑𝟑𝟑𝟑 𝐥𝐥𝐥𝐥𝐥𝐥 𝟏𝟏𝟏𝟏 𝒏𝒏, 𝑛𝑛 𝑖𝑖𝑖𝑖 𝑡𝑡𝑡𝑡𝑡 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜.
(Sturge’s Rule for the number of classes).
OR, Using the 2𝑘𝑘 rule to determine the number of classes, i.e.,
𝟐𝟐𝒌𝒌 ≥ 𝒏𝒏
OR, the number of classes, i.e., 𝒌𝒌 = 𝑵𝑵𝑵𝑵. 𝒐𝒐𝒐𝒐 𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶𝑶
𝑹𝑹
Then, the width / size of the of a class is found out by, 𝒄𝒄 = .
𝒌𝒌
3) Arrange the table with three columns having headings: Variable, Tally
Marks and Frequency. The first class interval will start with the lowest
value and continue until the interval with the highest value of the given
series of data is reached.
4) Read the items and give a tick mark or circle to each of the values and put a
tally mark against the appropriate class interval.
5) Count the number of tally marks corresponding to each class interval and
write the result in the respective frequency column. 26
Example
▷ Test scores obtained in Probability and Statistics course (MAT 361)
by 40 students are given below:
40 38 44 28 30 22 35 42 40 36

50 67 25 58 53 48 65 35 55 39

72 44 70 55 62 20 78 46 57 68

59 34 41 56 60 42 64 73 38 41

a) Make a tabular and graphical summary of the above data.


b) What percentage of the scores are less than 40?
c) What proportion of the students have scores greater than or equal to 60?
d) What percentage of the scores are between 40 and 59 inclusive?
e) How many of the scores are greater than 50?
f) What proportion of the scores are either less than 30 or greater than 70?
27
Solution
40 38 44 28 30 22 35 42 40 36
50 67 25 58 53 48 65 35 55 39
72 44 70 55 62 20 78 46 57 68
59 34 41 56 60 42 64 73 38 41
▻ Here, a total number of observations, 𝑛𝑛 = 40 OR: Option-2
▷ The number of classes
▻ 𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 (𝐿𝐿) = 20
2𝑘𝑘 ≥ 𝑛𝑛
▻ 𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 (𝐻𝐻) = 78 ⇒ 2𝑘𝑘 ≥ 40
𝑖𝑖𝑖𝑖 𝑘𝑘 = 1, 21 = 2
▻ Therefore, 𝑅𝑅 = 𝐻𝐻– 𝐿𝐿 = 78 − 20 = 58 𝑖𝑖𝑖𝑖 𝑘𝑘 = 2, 22 = 4
𝑖𝑖𝑖𝑖 𝑘𝑘 = 3, 23 = 8
▻ The number of classes is
𝑖𝑖𝑖𝑖 𝑘𝑘 = 4, 24 = 16
𝑘𝑘 = 1 + 3.322𝑙𝑙𝑙𝑙𝑙𝑙10 𝑛𝑛 𝑖𝑖𝑖𝑖 𝑘𝑘 = 5, 25 = 32
𝑖𝑖𝑖𝑖 𝑘𝑘 = 6, 26 = 64
= 1 + 3.322𝑙𝑙𝑙𝑙𝑙𝑙10 40 = 6.32 ≈ 6 𝑖𝑖. 𝑒𝑒. , 26 > 40
▻ Hence, the class width is ▷ So, 𝑘𝑘 = 6.
OR: Option-3
𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 58 ▷ 𝑘𝑘 = 40 = 6.35 ≈ 6
𝑐𝑐 = = = 9.67 ≈ 10
𝑘𝑘 6 28
Solution
40 38 44 28 30 22 35 42 40 36
50 67 25 58 53 48 65 35 55 39
72 44 70 55 62 20 78 46 57 68
59 34 41 56 60 42 64 73 38 41
Inclusive Exclusive
Test Test Tally Frequency Relative(percent)
Score Score Marks (𝑓𝑓𝑖𝑖 ) Frequency - 𝑟𝑟𝑓𝑓𝑖𝑖 (𝑝𝑝 × 𝑟𝑟𝑓𝑓𝑖𝑖 )
20 – 29 20 – 30 |||| 4 4 ÷ 40 = 0.10 (10%)
30 – 39 30 – 40 |||| ||| 8 8 ÷ 40 = 0.20 (20%)
40 – 49 40 – 50 |||| |||| 10 10 ÷ 40 = 0.25 (25%)
50 – 59 50 – 60 |||| ||| 8 8 ÷ 40 = 0.20 (20%)
60 – 69 60 – 70 |||| | 6 6 ÷ 40 = 0.15 (15%)
70 - 79 70 - 80 |||| 4 4 ÷ 40 = 0.10 (10%)
Total Total 𝑛𝑛 = 40 100%

 𝑝𝑝 = 100 29
Solution
▷ Summary: Out of 40 students, 10 students got scores between 40 to
50, Eight students scored between 50 to 60, and so on.
▷ About 25% of students scored 40 to 50, 20% of students scored 50
to 60, and so on. 12 students scored below 40, 4 students scored
above 70.
Test Tally Marks Frequency Relative (percent) Frequency
Score (𝑓𝑓𝑖𝑖 , 𝑖𝑖 = 1,2, … , 6) 𝑟𝑟𝑓𝑓𝑖𝑖 (𝑝𝑝 × 𝑟𝑟𝑓𝑓𝑖𝑖 )
20 – 30 |||| 4 4 ÷ 40 = 0.10 (10%)
30 – 40 |||| ||| 8 8 ÷ 40 = 0.20 (20%)
40 – 50 |||| |||| 10 10 ÷ 40 = 0.25 (25%)
50 – 60 |||| ||| 8 8 ÷ 40 = 0.20 (20%)
60 – 70 |||| | 6 6 ÷ 40 = 0.15 (15%)
70 - 80 |||| 4 4 ÷ 40 = 0.10 (10%)
Total 𝑛𝑛 = 40 100%

 𝑝𝑝 = 100
30
Histogram
▷ A common graphical presentation of quantitative data is a
histogram.
▷ The variable of interest is placed on the horizontal axis (𝑿𝑿)
and the frequency, relative frequency, or percent frequency is
placed on the vertical axis (𝒀𝒀).
▷ A rectangle is drawn above each class interval with its height
corresponding to the interval’s frequency, relative frequency,
or percent frequency.
▷ Unlike a bar graph, a histogram has no natural separation
between rectangles of adjacent classes.

31
Histogram

▷ Data Summary: The histogram shows that 10 (25%) students


obtained 40 to 50 marks and 8 (20%) students obtained 50 to 60
marks and so on.
32
Shape of the Distribution
▷ Frequency distributions and their histograms may be described in a
number of ways depending on their shape.
▻ For example, they may be symmetric (the left half is at least
approximately a mirror image of the right half: Figure-1).
▻ Skewed to the right (the frequencies tend to decrease as the
measurements increase in size: Figure-2).
▻ Skewed to the left (the frequencies tend to increase as the
measurements increase in size: Figure-3).
Figure -1 Figure -2 Figure -3

33
Stem and Leaf Plot
▷ A stem-and-leaf plot shows both the rank order and shape of
the distribution of the data.
▷ It is similar to a histogram on its side, but it has the advantage
of showing the actual data values.
▷ The first digits of each data item are arranged to the left of a
vertical line.
▷ To the right of the vertical line we record the last digit for each
item in rank order.
▷ Each line in the display is referred to as a stem.
▷ Each digit on a stem is a leaf.

34
Stem and Leaf Plot: Test Score data

Key: 2 | 5 represents the test score of 25

▷ Summary: There are 10 students whose scores are ranging from 40


to 48 and so on.
35
Stem and Leaf Plot

Histogram of Test Score Key: 2 | 5 represents the test score of 25

36
Example: Leaf Unit = 0.1
▷ If we have data with values such as
8.6 11.7 9.4 9.1 10.2 11.0 8.8
▷ A stem-and-leaf display of these data will be

Stem Leaf
8 6 8
9 1 4
10 2
11 0 7

Key: 8 | 6 represents the value of 8.6

37
Example: Leaf Unit = 10
▷ If we have data with values such as
1806 1717 1974 1791 1682 1910 1838
▷ A stem-and-leaf display of these data will be
Stem Leaf
16 8
17 1 9
18 0 3
19 1 7
Key: 16 | 8 represents the value of 1680

▷ Thus, 168 × 10 = 1680 is an approximation of the original


data value used to construct the three digit stem-and-leaf plot.

38
Example: C.W.
▷ The data in Table 2.5 represent the blood cholesterol levels of 40
first-year students at a particular college.

a) Construct a frequency distribution of the above data.


b) Find out the relative and percent frequency distribution of the above
data?
c) Draw the histogram and stem and leaf plot of blood cholesterol levels.

39
Thank You!

40

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy