Waqar Ansari's RISE QM Ch#07
Waqar Ansari's RISE QM Ch#07
Chapter-07
STATISTICS
A science of facts and figure is called statistics. OR
A science of collection, presentation analysis and interpretation of numerical data is called
statistics.
BRANCHES TYPES OF STATISTICS
• Descriptive Statistics
• Inferential Statistics
Descriptive Statistics
Descriptive statistics deals with collection, presentation and analysis of numerical data.
Inferential Statistics
Inferential statistics deals with drawing conclusion about the population on the bases of sample.
DATA AND INFORMATION
A collection of related objects is called data. OR
Data is the raw form of facts and figures which must be processed to make it meaningful and useful
for further analysis.
Processing
Data Information
TYPES OF A DATA
Types of data (By Nature):
• Quantitative Data (Numerical Data)
• Qualitative Data (Categorical Data)
Quantitative (Numerical) Data:
Data that can be measured numerically. e.g height, weight, age, salary, temperature, number of
family members, number of students in a class etc. It is further divided into two types.
Discrete Data: A data that contains countable (Integer) values is called discrete data.
e.g. Number of students in a class, Number of road accident, Number of customers, Number of
units produced each day etc.
Continuous Data: A data that contains each and every value within a given interval is called
continuous data.
e.g. Age, Height, Weight, Temperature etc.
Qualitative (Categorical) Data: Data that cannot be measured numerically but its presence or
absence can be observed. e.g. Eye colour, Gender, Hobbies, Favourite food, Location, etc.
(101)
(Chapter-07): Data Collection and Representation RISE Quantitative Methods
Nominal data: A data that is classified into categories which have no meaningful order e.g. Mode
of transportation (bus, car, motorbike, bicycle), Gender (Male/Female), Result (Pass/Fail)
Ordinal data: A data that is classified into categories which have a natural order e.g. Views of
customers about services (worst, poor, normal, good, excellent). Size (small, medium, large),
Result in grades (A, B, C)
TYPES OF DATA (By Source)
• Primary Data
• Secondary Data
Primary Data
The data that have not been undergone any sort of statistical treatment is known as primary data.
Primary data is always: Raw data, First hand collected data, Uncreated / Unorganized / Ungrouped
data.
Secondary Data
The data that have been undergone any sort of statistical treatment at least once is known as
secondary data.
Secondary data is always: Processed, Created, Organized and Grouped data.
Sources for Collecting Primary Data
Personal Interview, Telephonic Interview, Questionnaire, Direct Observation etc
Sources for Collecting Secondary Data
Published Data by Government / Semi–Government / Private Organization, Newspaper, College
Record, Websites, Magazines, Books, Journals etc
CROSS SECTIONAL AND TIME SERIES DATA
Cross Sectional
A data which is collected by observing one or more subjects or variable (Like: firms, countries,
regions, individuals) at the same point in time.
e.g Production of a unit for year 1, Sales of products in January, Multiple stock positions at the end
of year,
DGP of 3 countries in 2021
Country Time GDP
India 2021 –
China 2021 –
Pakistan 2021 –
Time Series Data
A data which is collected in a sequence at discrete and equally spaced intervals of time. Usually
the collection of data consist one variable or subject.
e.g Year–wise sales revenue, Year–wise GDP of Pakistan, Month wise production etc
STRUCTURED AND UNSTRUCTURED DATA
Structured Data
Data can be referred to as structured when there is a pre–defined model or format
e.g Hourly weather statistics, Customer Address, Product size and value etc
(102)
(Chapter-07): Data Collection and Representation RISE Quantitative Methods
Unstructured Data
Data can be referred to as unstructured when there is no pre–defined model or format.
e.g Text documents, PDFs, Images, Videos etc.
Population
The totality of observations with which we are concerned is called population.
e.g The total number of enrolled students in a college is the population of college.
Sample
A small and representative part of population is called sample.
e.g Ten students are randomly selected from a class of hundred.
Parameter
∑𝑥
A numerical quantity computed from population is known as parameter e.g. =
𝑁
Statistic
∑𝑥
A numerical quantity computed from sample is known as statistic e.g. x =
𝑁
Variable
Any characteristic which varies in quantity from one individual to another is called variable.
Age, sex, height, weight, income, expenses, class grades and vehicle type are examples of
variables.
These are the examples of variables as their values will change from one individual to another. All
recorded values against a variable are called data.
Constant
Any characteristics that does not vary from one individual to another. OR
A quantity which can assume only one value is called a constant.
22
e.g = 7
= 3.1415
e = 2.71828
Qualities of Data
The following are the characteristics that define data quality are:
Important MCQs
(103)
(Chapter-07): Data Collection and Representation RISE Quantitative Methods
5) The data that have been undergone any statistical treatment at least once is known as:
A) Discrete data B) Continuous data
C) Primary data D) Secondary data
8) A data consist of multiple observations at the same point in time is known as:
A) Time series data B) Structured data
C) Unstructured data D) Cross-sectional data
Array Form
An arrangement of available data into ascending or descending order.
Tabular Form
To manage available data in tabular form, we adopt the following two interlinked method
(i) Classification
(ii) Tabulation
(104)
(Chapter-07): Data Collection and Representation RISE Quantitative Methods
Classification
The process of arranging data into different groups or categories according to some common
characteristics is called classification
Types of Classification
• Quantitative Classification
• Qualitative Classification
• Temporal Classification
• Geographical Classification
Quantitative Classification
Classification of students by their weight, age and marks etc.
Qualitative Classification
Classification of students in a class by their eye colour (Black, Brown, Blue) is an example of
qualitative classification.
Temporal Classification
Birth rate is Lahore during 2016–2020 is example temporal classification.
Geographical Classification
Country, province, city or district wise grouping (classification) of data e.g Crime rate in Lahore
Birth rate in Lahore.
Tabulation
The arrangement of data into different rows and columns in a table is called tabulation.
Methods for Tabulating Data
• Frequency Distribution
• Stem and Leaf Display
Frequency Distribution
• A compact form of data in a table is called frequency distribution.
• Frequency distribution is a tabular arrangement of quantitative data into different classes
prepared on the basis of magnitude along with their corresponding class frequencies.
Types of Frequency Distribution
• Discrete Frequency Distribution
• Continuous Frequency Distribution
Discrete Frequency Distribution
• Discrete Data
• Range ≤ 15
• Class Interval 1
Example 01:
The following data show the number of mistakes in typing test of 25 candidates for a post of typist.
3, 4, 5, 6, 7, 1, 0, 2, 3, 4, 5, 7, 8, 4, 2, 1, 5, 6, 7, 8, 9, 10, 6, 7, 3
Make a frequency distribution
(105)
(Chapter-07): Data Collection and Representation RISE Quantitative Methods
Solution:
Range = R = Xm – Xo = 10 – 0 = 10
Number of Mistakes Tally Number of Candidate
0 I 1
1 II 2
2 II 2
3 III 3
4 III 3
5 III 3
6 III 3
7 IIII 4
8 II 2
9 I 1
10 I 1
Continuous Frequency Distribution
• Discrete data with range more than 15
• Continuous Data
How to Prepare / Construct a Frequency Distribution
Steps
(i) Range = R = Xm – Xo
(ii) Number of classes = C = 1 + 3.3 log n
𝑅
(iii) Class Interval / Size = h = 𝐶
Example 02:
The height (in cms) of 30 students measured at the time of registration is given by 91, 89, 88, 87,
89, 91, 87, 92, 90, 98, 95, 97, 96, 100, 101, 96, 98, 99, 98, 100, 102, 99, 101, 105, 103, 107, 105,
106, 107, 112 make a suitable frequency distribution.
Solution:
Range = R = Xm – Xn
= 112 – 87 = 25
Number of Classes = c = 1 + 3.3 log n
= 1 + 3.3 log 30
= 5.87 6
𝑅 25
Class Interval =h=𝐶= 6
= 4.167 5
(106)
(Chapter-07): Data Collection and Representation RISE Quantitative Methods
Example 03:
Prepare class boundaries, class mark (mid–value), relative frequency, less than cumulative
frequency and more than cumulative frequency for the following frequency distribution.
Class Limited Frequency C.B Class Mark R.F L.C.F M.C.F
86 – 90 6 85.5 – 90.5 88 6/30 6 30
91 – 95 4 90.5 – 95.5 93 4/30 10 24
96 – 100 10 95.5 – 100.5 98 10/30 20 20
101 – 105 6 100.5 – 105.5 103 6/30 26 10
106 – 110 3 105.5 – 110.5 108 3/30 29 4
111 – 115 1 110.5 – 115.5 113 1/30 30 1
30
Class:
A class is a grouping of values by which data is placed for computation of a frequency distribution
e.g 86 – 90.
Class Limits
The largest and the smallest values of class is called class limits. It is also called inclusive classes
e.g 86 – 90, 91 – 95 etc.
Class Frequency
Number of items / observations falling in a class is called class frequency.
Class Boundaries
A term which is used to express overlapping groups or classes. It is also called exclusive classes
or true class limits e.g 85.5 – 90.5, 90.5 – 95.5 etc.
Mid–Point / Class Mark
The average of lower and upper class limits (class boundaries) is called class marks.
upper + lower class limit/boundary
=
2
Relative Frequency
It is obtained by dividing the frequency of a class by total frequency (f/f). The sum of relative
frequency is always equal to 1.
Less Than Cumulative Frequency / Cumulative Frequency
It is obtained by adding the frequency of each class with the preceding class frequencies from top
to bottom.
More Than Cumulative Frequency
It is obtained by adding the frequency of each class with the frequencies of the preceding class
from bottom to top.
Class Interval
The difference between the lower and upper class boundaries of the same class is called class
interval.
Open–End Classes
If classes have neither lower limit of the first class nor the upper limit of the last class then, these
are called open–end classes.
(107)
(Chapter-07): Data Collection and Representation RISE Quantitative Methods
e.g
Below – 5
6 – 10
11 – 15
16 – Above
Important MCQs
13) The process of arranging data into different group or categories according to some common
characteristics is known as:
A) Frequency distribution B) Tabulation
C) Classification D) None of these
14) A tabular arrangement of quantitative data into different classes prepared on the basis of
magnitude along with the corresponding class frequencies
A) Probability distribution B) Frequency distribution
C) Sampling distribution D) Classification
15) Which of the following statements is/are true about frequency distribution?
i) It is the most common way of summarizing data
ii) It represents the number of times a particular value occurs.
A) Both statements are correct B) Both statements are not correct
C) Only statement (i) is correct D) Only statement (ii) is correct
(108)
(Chapter-07): Data Collection and Representation RISE Quantitative Methods
Arrayed Form
Stem Leaf
1 2, 8
2 6, 7, 9
3 0, 1, 5, 7, 9
4 0, 2, 3, 8, 8, 9
5 1, 2, 3, 4, 7, 8
6 1, 2, 4, 5, 7, 8
7 1, 4
GRAPHICAL REPRESENTATION OF DATA
Chart / Diagram
A chart is a diagrammatic representation of statistical data in simple and effective manner.
Types of Charts
• Simple Bar Chart
• Multiple Bar Chart
• Component / Sub–divided Bar Chart
• Percentage Component Bar Chart
• Pie Chart
Key Points
• Bar charts are usually used for plotting discrete data.
• Both bar charts and pie chart are used for representing categorical data.
• The bars of bar charts can be plotted vertically or horizontally.
• The width of the bars are same and the height/length of the bars are proportional to the
magnitude of the values/items.
Simple Bar Chart
A simple bar chart consists of horizontal or vertical bars of equal width and length proportional to
the magnitude of the values.
Example 05:
The top five car dealers of Lahore marked by the number of car sold in the last month are listed
below.
Car Dealers Car Sold
Bari Motors 39
Siddiqui Motors 24
Atlantic Motors 21
Ravi Motors 18
Drive Line 15
(109)
(Chapter-07): Data Collection and Representation RISE Quantitative Methods
(110)
(Chapter-07): Data Collection and Representation RISE Quantitative Methods
(111)
(Chapter-07): Data Collection and Representation RISE Quantitative Methods
Pie Chart
A pie chart is a circular chart that displays variables in proportion (or percentage or degrees) of the
quantity with in a circle. It is constructed by dividing the total angle of a circle (360o) or total
percentage (100%) into different components.
Example 09:
Draw Pie Chart for the following data.
Item Expenditure Angle of Sectors Percentage (%)
(Rs.) (Degrees)
Food 95 95 142.5
360 = 142.5 100 = 39.58 40
240 360
Clothing 32 32 48
360 = 48 100 = 13.33 13
240 360
Rent 50 50 75
360 = 75 100 = 20.83 21
240 360
Medical Care 23 23 34.5
360 = 34.5 100 = 9.58 10
240 360
Others 40 40 60
360 = 60 100 = 16.66 16
240 360
240 360 100
(112)
(Chapter-07): Data Collection and Representation RISE Quantitative Methods
Important MCQs
Graph
A graph consists of curves or straight lines that showing fluctuations and trends in statistical data.
Types of Graphs
• Histogram
• Frequency Polygon
• Less and More than Ogive
• Historigram
Histogram
• A histogram consists of a set of adjacent rectangles having class boundaries on x–axis and
the frequency on the y–axis.
• A histogram is a set of adjacent rectangles in which the area of each rectangle is proportional
to the corresponding class frequency.
• In histogram, the frequency of the class is presented by the area of rectangle.
• A histogram is a graph of frequency distribution.
• Mode can be determined by histogram.
Example 10:
Construct histogram for the following distribution.
Number of Visits Frequency Class Boundaries
41 – 50 6 40.5 – 50.5
51 – 60 8 50.5 – 60.5
61 – 70 10 60.5 – 70.5
71 – 80 12 70.5 – 80.5
81 – 90 9 80.5 – 90.5
91 – 100 5 90.5 – 100.5
(113)
(Chapter-07): Data Collection and Representation RISE Quantitative Methods
Histogram
• For unequal class Intervals
Number of Visits No. of Invoices (f) Adjustment Factor Adjusted Frequency
50 – 100 5 50 50 = 1 51=5
100 – 200 12 100 50 = 2 12 2 = 6
200 – 300 17 100 50 = 2 17 2 = 8.5
300 – 500 16 200 50 = 4 16 4 = 4
500 – 700 8 200 50 = 4 84=2
Frequency Polygon
It can be constructed in two ways:
(i) By joining the mid–points of the upper horizontal side of each rectangle of histogram.
(ii) By joining mid–point or class mark of each class proportional to the corresponding frequency.
Example 11:
Construct the frequency polygon for the following distribution.
No. of Visits Frequency Class Boundaries X
41 – 50 6 40.5 – 50.5 45.5
51 – 60 8 50.5 – 60.5 55.5
61 – 70 10 60.5 – 70.5 65.5
71 – 80 12 70.5 – 80.5 75.5
81 – 90 9 80.5 – 90.5 85.5
91 – 100 5 90.5 – 100.5 95.5
(114)
(Chapter-07): Data Collection and Representation RISE Quantitative Methods
Ogive
• Less than Ogive
• More than Ogive
An ogive is a graph used in statistics to show the cumulative frequency distribution. OR
It is a graph of cumulative frequency distribution.
It is also known as cumulative frequency polygon.
Less than Ogive
It is obtained by plotting the less than cumulative frequencies corresponding to the upper class
boundaries.
More than Ogive
It is obtained by plotting the more than cumulative frequencies corresponding to the lower class
boundaries.
Median, quartiles, deciles and percentiles can be determined by ogive.
Example 12:
Draw less than and more than ogive for the following distribution.
Class Limits Frequency Class Boundaries L.C.f M.C.f
41 – 50 6 40.5 – 50.5 6 50
51 – 60 8 50.5 – 60.5 14 44
61 – 70 10 60.5 – 70.5 24 36
71 – 80 12 70.5 – 80.5 36 26
81 – 90 9 80.5 – 90.5 45 14
91 – 100 5 90.5 – 100.5 50 5
(115)
(Chapter-07): Data Collection and Representation RISE Quantitative Methods
Historigram
A historigram is a graphical representation of a time series data that reveals the changes occurred
at different time periods.
Example 13:
The following table shows the property damaged by road accidents in Punjab for the year 2004 to
2010.
Year 2004 2005 2006 2007 2008 2009 2010
Property Damaged 201 238 392 507 484 649 742
Trend Line
Important MCQs
21) If peak of a histogram is at center and the frequency distribution distributed evenly throughout
the data, then the distribution is:
A) Positively Skewed B) Negatively Skewed
C) Symmetrical D) None of these
Example 14:
Draw the box and whisker plot for the following data set.
86, 102, 78, 90, 88, 98, 100, 94, 82, 92, 88, 86, 96, 88, 84, 90, 88
Arrayed Data
78, 82, 84, 86, 86, 88, 88, 88, 88, 90, 90, 92, 94, 96, 98, 100, 102
Median
𝑛+1 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒
~
x=( )
2
17+1 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒
~
x=( ) 2
~
x = 9th value
Lower Quartile
𝑛+1 𝑡ℎ 𝑖𝑡𝑒𝑚
Q1 = ( 4
)
17+1 𝑡ℎ 𝑖𝑡𝑒𝑚
Q1 = ( 4
)
Q1 = 4.5th item
Q1 = 4th item + 0.5 (5th item – 4th item)
Q1 = 86
(117)
(Chapter-07): Data Collection and Representation RISE Quantitative Methods
Upper Quartile
𝑛+1 𝑡ℎ 𝑖𝑡𝑒𝑚
Q3 = [3 ( 4
)]
17+1 𝑡ℎ 𝑖𝑡𝑒𝑚
Q3 = [3 ( 4
)]
Q3 = 13.5th item
Q3 = 13th item + 0.5 (14th item – 13th item)
Q3 = 94 + 0.5 (96 – 94)
Q3 = 95
Important MCQs
26) From the above box and whisker plot, identify the value of median:
A) 16 B) 23
C) 20.5 D) 23.5
(118)