Business Statistics
Business Statistics
BILLY ABRAHAM.K
LECTURER IN MATHEMATICS DEPARTMENT
KODAIKANAL
CONTENT
1. INTRODUCTION STATISTICS
2. DIAGRAMMATIC PRESENTATION
3. MEASURE OF CENTRAL
TENDENCY
4. MEASURE OF DISPERSION
5. INDEX NUMBERS
1.INTRODUCTION STATISTICS
Introduction
Application
Collection of Data
Sampling Introduction
Types of Sampling
Types of Distribution
2.DIAGRAMMATIC PRESENTATION
Introduction
Types of Diagrams
Types of Graphs
Examples
3.MEASURE OF CENTRAL TENDENCY
Mean
Median
Mode
Geometric Mean
Harmonic Mean
Quartiles, Deciles
Types of Dispersion
Lorenz Curve
Deviation
Coefficient of Variation
Consistency of data
5. INDEX NUMBERS
Types of Methods
Simple Average of Price Relatives
Number
UNIT-I
Introduction Statistics
Application
Collection of Data
Sampling Introduction
Types of Sampling
Types of Distribution
Learning Objectives
1. Define Statistics
2. Describe the Uses of Statistics
3. Distinguish Descriptive & Inferential Statistics
4. Define Population, Sample, Parameter, and
Statistic
5. Define Quantitative and Qualitative Data
6. Define Random Sample
What Is Statistics?
1. Collecting Data
e.g., Sample, Survey, Observe,
Simulate Data Why?
2. Characterizing Data Analysis
e.g., Organize/Classify, Count,
Summarize
© 1984-1994 T/Maker Co.
3. Presenting Data
e.g., Tables, Charts, Decision-
Statements
Making
4. Interpreting Results
e.g. Infer, Conclude, Specify
Confidence
Subset
The values of the variable are the range of possible values for
a variable.
E.g. student marks (0..100)
Sports Business
Individual & Team Consumer Preferences
Performance Financial Trends
Statistical Methods
Statistical
Methods
Descriptive Inferential
Statistics Statistics
Descriptive Statistics
1. Involves
• Collecting Data $
50
• Organizing Data
• Presenting Data 25
• Characterizing Data
0
2. Purpose Q1 Q2 Q3 Q4
• Describe Data
X = 30.5 S2 = 113
Types of Statistical Applications
in Business
Descriptive Statistics - describe
collected data
“51.4% of all credit card purchases in
2003 were made with a Visa Card”
2. Purpose
• Make decisions about
population characteristics
3. Example
• Retail CEOs were overpaid
Key Terms
1. Population (Universe)
• P in Population
• All items of interest
& Parameter
2. Sample • S in Sample
• Portion of population & Statistic
3. Parameter
• Summary measure about population
4. Statistic
• Summary measure about sample
Fundamental Elements of Statistics
graduation
Value – symbol [number, letter, word(s), …] associating
one option of a variable with one item: graduating senior Anne
Baker’s age at graduation will be 22
Triplet – fundamental data unit: (Anne Baker, age, 22)
Data Organized in Tables
Graduating Age Major Home
Senior
Anne Baker 22 Accounting Santa Fe
Quantitative Qualitative
Data Data
Types of Variables
Quantitative Variables
• measured on a naturally occurring scale
• equal intervals along scale (allows for meaningful
mathematical calculations)
• Ratio scale
zero value properly describes the underlying phenomenon - e.g.,
bank balance, length of a material entity
ratios of scale values properly describe relative values – e.g., 4 feet
long is indeed twice as long as 2 feet
• Interval scale
zero value is arbitrarily assigned - e.g., zero temperature in F or C
scale is not no heat at all, zero calendar time is not the beginning of
time
Ratios of scale variables do not describe relative values correctly –
e.g., 40o F is not twice as many calories as 20o F
Types of Variables
Qualitative Variables
• measured by classification only
• Non-numerical in nature
• Meaningfully ordered categories identify ordinal
data (best to worst ranking, income categories,
price ranges)
• Categories without a meaningful order identify
nominal data (gender, political affiliation, industry
classification, ethnic/cultural groups, cause of
defectives)
Types of Data & Information…
N Interval
Data Categorical?
Data
Y
Y Ordinal
Ordered?
Data
Categorical
Data N
Nominal
Data
E.g. Representing Student Grades…
N Interval Data
Data Categorical?
e.g. {0..100}
Y
Y Ordinal Data
Ordered?
e.g. {F, D, C, B, A}
Categorical
Data N Rank order to data
Nominal Data
e.g. {Pass | Fail}
Ordinal
Values must represent the ranked order of the data.
Calculations based on an ordering process are valid.
Data may be treated as nominal but not as interval.
Nominal
Values are the arbitrary numbers that represent categories.
Only calculations based on the frequencies of occurrence are valid.
Data may not be treated as ordinal or interval.
Relationships between Variables.
(Source. Rowntree 2000: 33)
Variables
Category Quantity
Continuous
Ordinal Discrete
Nominal (counting) (measuring)
Ordered
categories Ranks.
Classroom Exercise
Enter Yes or No
2. Need Statistical
Understanding
• Assumptions
• Limitations
Total Population
particular event.
A probability sampling method is any
method of sampling that uses some form of
random selection. In order to have a
random selection method, you must set up
some process or procedure that assures
that the different units in your population
have equal probabilities of being chosen
(Clark 2002: 37).
Most Common Types of Probability
Sampling
Types of Graphs
Examples
Chapter Topics
Organizing numerical data
The ordered array and stem-leaf display
Tabulating and graphing Univariate numerical
data
Frequency distributions: tables, histograms,
polygons
Cumulative distributions: tables, the Ogive
Graphing Bivariate numerical data
Chapter Topics
Organizing numerical data
The ordered array and stem-leaf display
Tabulating and graphing Univariate numerical
data
Frequency distributions: tables, histograms,
polygons
Cumulative distributions: tables, the Ogive
Graphing Bivariate numerical data
Tabulating and graphing Univariate categorical
data
The summary table
Bar and pie charts, the Pareto diagram
Tabulating and graphing Bivariate categorical data
Contingency tables
Side by side bar charts
Graphical excellence and common errors in
presenting data
Organizing Numerical Data
Numerical Data 41, 24, 32, 26, 27, 27, 30, 24, 38, 21
Frequency Distributions
Ordered Array
Cumulative Distributions
21, 24, 24, 26, 27, 27, 30, 32, 38, 41
7 6
6 5
Frequency
5 4 No Gaps
4 3
3 2 Between
2 Bars
1 0 0
0
5 15 25 36 45 55 More
Class Boundaries
Class Midpoints
Graphing Numerical Data:
The Frequency Polygon
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Frequenc y
7
6
5
4
3
2
1
0
5 15 25 36 45 55 More
Class Midpoints
Tabulating Numerical Data:
Cumulative Frequency
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Cumulative Cumulative
Class Frequency % Frequency
10 but under 20 3 15
20 but under 30 9 45
30 but under 40 14 70
40 but under 50 18 90
50 but under 60 20 100
Ogive…
first class…
next class: .355+.185=.540
:
:
“around $35”
(Refer also to Fig. 2.13 in your textbook)
Graphing Numerical Data:
The Ogive (Cumulative % Polygon)
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Ogive
100
80
60
40
20
0
10 20 30 40 50 60
20
10
0
0 10 20 30 40
Net Asset Values
Tabulating and Graphing
Categorical Data:Univariate
Data
Categorical Data
Graphing Data
Tabulating Data
The Summary Table
Pie Charts
Graphing Data
Tabulating Data
The Summary Table
Pie Charts
CD
S a vin g s
Investor's Portfolio
Savings
CD
Bonds
Stocks
0 10 20 30 40 50
Amount in K$
Pie Chart
(for an Investor’s Portfolio)
Amount Invested in K$
Savings
15%
Stocks
CD 42%
14%
Percentages are
rounded to the
Bonds
nearest percent.
29%
Pareto Diagram
45% 100%
40% 90%
bar 70%
chart 30%
shows 60%
25%
% 50%
invested 20%
40%
in each
15%
category 30% Axis for line
10%
20%
graph
shows
5% 10%
cumulative
0% 0% % invested
Stocks Bonds Savings CD
Tabulating and Graphing
Bivariate Categorical Data
Contingency tables: investment in thousands of dollars
Investment Investor A Investor B Investor C Total
Category
S avings
CD
B onds
S toc k s
0 10 20 30 40 50 60
100 25
0 0
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
No Zero Point on Vertical Axis
Median
Mode
Geometric Mean
Harmonic Mean
Quartiles, Deciles
Measures of Variability
Range, Standard Deviation, Variance, Coefficient of Variation
Population Sample
Size N n
Mean
Arithmetic Mean…
Size N n
Mean
The Arithmetic Mean…
…is appropriate for describing measurement
data, e.g. heights of people, marks of student
papers, etc.
A set of data may have one mode (or modal class), or two, or
more modes.
Mode is a useful for all data types, though mainly used for
nominal data.
For large data sets the modal class is much more relevant than a
single-value mode.
of “central” location?
A modal class
Frequency
Variable
=MODE(range) in Excel…
Note: if you are using Excel for your data
analysis and your data is multi-modal (i.e. there
is more than one mode), Excel only calculates
the smallest one.
median
mode
mean
Mean, Median, Mode…
If a distribution is asymmetrical, say skewed to the left
or to the right, the three measures may differ. E.g.:
median
mode
mean
Mean, Median, Mode…
If data are symmetric, the mean, median, and
mode will be approximately the same.
X g n X 1 X 2 ... X n 1
This would indicate we should have $1,250 at the end of our investment, not
$1,000.
Solving for the geometric mean yields a rate of 0%, which is correct.
(n 1) 3( n 1)
4 F 4 F
Q1 L i ; Q3 L i
f f
Lorenz Curve
Deviation
Coefficient of Variation
Consistency of data
Definition
Measures of dispersion are descriptive
statistics that describe how similar a set of
scores are to each other
The more similar the scores are to each other, the
lower the measure of dispersion will be
The less similar the scores are to each other, the
higher the measure of dispersion will be
In general, the more spread out a distribution is,
the larger the measure of dispersion will be
Measures of Dispersion
Which of the
distributions of scores
has the larger 125
dispersion? 100
75
50
25
The upper distribution 0
has more dispersion 1 2 3 4 5 6 7 8 9 10
Interquartile Range = Q3 – Q1
X 2
2
N
What Does the Variance
Formula Mean?
First, it says to subtract the mean from each of
the scores
This difference is called a deviate or a deviation
score
The deviate tells us how far a given score is from
the typical, or average, score
Thus, the deviate is a measure of dispersion for a
given score
What Does the Variance
Formula Mean?
Why can’t we simply take the average of the
deviates? That is, why isn’t variance defined
as:
X
2
N
Size N n
Mean
Variance
Standard
S
Deviation
Coefficient of
Variation CV cv
Coefficient of Variation…
This coefficient provides a
proportionate measure of variation, e.g.
X
2
X
2
2
N X
2
N N
9 81 2 4
8 64 1 1
6 36 -1 1
5 25 -2 4
8 64 1 1
6 36 -1 1
= 42 = 306 =0 = 12
Computational Formula
Example
X
2
X
2
X 2
N
2
2
N N
2
12
306 42
6 6
6 2
306 294
6
12
6
2
Variance of a Sample
Because the sample mean is not a perfect estimate of
the population mean, the formula for the variance of a
sample is slightly different from the formula for the
variance of a population:
s
2
X X 2
N 1
s2 is the sample variance, X is a score, X is the
sample mean, and N is the number of scores
Presentation of data & descriptive statistics
measurement.
Thus when comparing the degree of dispersion between
CV
X
UNIT-V
Index Numbers
Index Numbers
Types of Methods
Number
Index Numbers
Index numbers are used to summarize many
variables or numbers with one number
The most common index numbers are price
indexes
Consumer Price Index (CPI – TÜFE)
Producer Price Index (PPI – ÜFE)
ISE Index
Dow Jones Industrial Average
Index Numbers
Index numbers may be computed for other
things than prices
quantity indexes
quality indexes
Price Indexes
Price indexes are used to measure the general
movement of prices (inflation)
Common types of indexes
price relatives
unweighted
Laspeyres
Paasche
Price Relatives
Price relatives are used to find the change in price of
a single item
Pt
PR 100
P0
Prices of different fruits in
different years (NTL/kg)
Price Relatives for Banana and Kiwi
Pt
PR 100
P0
Pt
Puw 100
P0
UnW Fruits Price Index
Base Year=2000
Pt
Puw 100
P0
Problems with Unweighted Price
Indexes
Unweighted Price Indexes have a couple of
problems:
they may be influenced by items with high prices
items that are relatively unimportant in the goods
bundle may have undue influence
The usual solution is to weight the prices by
some quantities
Weighted Price Indexes
If the price index is to be weighted by
quantities, which quantities?
base year quantities (Laspeyres)
current year quantities (Paasche)
Pt Q0
PL 100
P0 Q0
Quantities purchased (1000 kg)
Laspeyres fruit price index
(BY=2000)
3
Pi ,t Qi , 2000
PL i 1
100
3
i 1
Pi , 2000Qi , 2000
Paasche Price Index
The Paasche index uses current quantities as the
weighting factor
Pt Qt
PP 100
P0 Qt
Paashe fruit price index (BY=2000)
3
Pi ,t Qi ,t
PP i 1
100
3
i 1
Pi , 2000Qi ,t
Which is Best?
The advantage of the Laspeyres index is that
once the quantities are set they do not change.
This index is easy to update.
The advantage of the Paasche is that the
quantities reflect the current
production/consumption. However, it is
difficult to update and may not be as easy to
compare over time.
Weights – CPI (442 items)