0% found this document useful (0 votes)
161 views173 pages

Business Statistics

Uploaded by

lakshay187
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
161 views173 pages

Business Statistics

Uploaded by

lakshay187
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 173

BUSINESS STATISTICS

BILLY ABRAHAM.K
LECTURER IN MATHEMATICS DEPARTMENT

KODAIKANAL CHRISTIAN COLLEGE

KODAIKANAL
CONTENT
1. INTRODUCTION STATISTICS
2. DIAGRAMMATIC PRESENTATION
3. MEASURE OF CENTRAL
TENDENCY
4. MEASURE OF DISPERSION
5. INDEX NUMBERS
1.INTRODUCTION STATISTICS

 Introduction
 Application

 Collection of Data

 Sampling Introduction

 Types of Sampling

 Types of Distribution
2.DIAGRAMMATIC PRESENTATION

 Introduction
 Types of Diagrams

 Types of Graphs

 Examples
3.MEASURE OF CENTRAL TENDENCY

 Mean
 Median

 Mode

 Geometric Mean

 Harmonic Mean

 Quartiles, Deciles

 Merits & Demerits


4.MEASURE OF DISPERSION

 Types of Dispersion
 Lorenz Curve

 Combined mean & Standard

Deviation
 Coefficient of Variation

 Consistency of data
5. INDEX NUMBERS
 Types of Methods
 Simple Average of Price Relatives

 Weighted Index Numbers

 Laseyre’s , Bowley’s,Fisher’s &

Marshall-Edgeworth Index Numbers


 Test of Consistency of Index Numbers

 Fisher’s Index Number an Ideal Index

Number
UNIT-I
 Introduction Statistics
 Application

 Collection of Data

 Sampling Introduction

 Types of Sampling

 Types of Distribution
Learning Objectives
1. Define Statistics
2. Describe the Uses of Statistics
3. Distinguish Descriptive & Inferential Statistics
4. Define Population, Sample, Parameter, and
Statistic
5. Define Quantitative and Qualitative Data
6. Define Random Sample
What Is Statistics?
1. Collecting Data
e.g., Sample, Survey, Observe,
Simulate Data Why?
2. Characterizing Data Analysis
e.g., Organize/Classify, Count,
Summarize
© 1984-1994 T/Maker Co.

3. Presenting Data
e.g., Tables, Charts, Decision-
Statements
Making
4. Interpreting Results
e.g. Infer, Conclude, Specify
Confidence

© 1984-1994 T/Maker Co.


Populations & Samples
Population Sample

Subset

 The graphical & tabular methods presented here apply to both


entire populations and samples drawn from populations.
Definitions…
A variable is some characteristic of a population or sample.
 E.g. student grades.
 Typically denoted with a capital letter: X, Y, Z…

The values of the variable are the range of possible values for
a variable.
 E.g. student marks (0..100)

Data are the observed values of a variable.


 E.g. student marks: {67, 74, 71, 83, 93, 55, 48}
Application Areas

 Economics  Product Development


 Forecasting  Design
 Demographics  Quality

 Sports  Business
 Individual & Team  Consumer Preferences
Performance  Financial Trends
Statistical Methods
Statistical
Methods

Descriptive Inferential
Statistics Statistics
Descriptive Statistics
1. Involves
• Collecting Data $
50
• Organizing Data
• Presenting Data 25
• Characterizing Data
0
2. Purpose Q1 Q2 Q3 Q4
• Describe Data
X = 30.5 S2 = 113
Types of Statistical Applications
in Business
 Descriptive Statistics - describe
collected data
 “51.4% of all credit card purchases in
2003 were made with a Visa Card”

 “The average Pay-to-Return Rating of


Retailing Industry CEOs in 2005 was
126.6”
Inferential Statistics
1. Involves
• Estimation
• Hypothesis Population?
Testing

2. Purpose
• Make decisions about
population characteristics
3. Example
• Retail CEOs were overpaid
Key Terms
1. Population (Universe)
• P in Population
• All items of interest
& Parameter
2. Sample • S in Sample
• Portion of population & Statistic

3. Parameter
• Summary measure about population
4. Statistic
• Summary measure about sample
Fundamental Elements of Statistics

Item of interest - experimental Unit: graduating senior


Population – the set of items we are interested in learning

about: all 1450 graduating seniors at “State U”


Variable – characteristic of a single population unit: age at

graduation
Value – symbol [number, letter, word(s), …] associating
one option of a variable with one item: graduating senior Anne
Baker’s age at graduation will be 22
 Triplet – fundamental data unit: (Anne Baker, age, 22)
Data Organized in Tables
Graduating Age Major Home
Senior
Anne Baker 22 Accounting Santa Fe

Charles 21 Comp Lit Ruidoso


Durango
Ellen Fong 22 Ecology Taiwan

Rows for items in population, columns for variables, cells


for values – variables are the focus
Definitions
 2 types of variables:
 Independent Variable (IV): A variable that is
manipulated by the researcher (Example: I assign
you to drink either 1)coffee with caffeine or 2) decaf)
 Dependent Variable (DV): The variable that is
measured to see if the independent variable had an
effect (Example: I measure how alert you are after
you drink the coffee)
Types of Data
Types of
Data

Quantitative Qualitative
Data Data
Types of Variables
 Quantitative Variables
• measured on a naturally occurring scale
• equal intervals along scale (allows for meaningful
mathematical calculations)
• Ratio scale
 zero value properly describes the underlying phenomenon - e.g.,
bank balance, length of a material entity
 ratios of scale values properly describe relative values – e.g., 4 feet
long is indeed twice as long as 2 feet
• Interval scale
 zero value is arbitrarily assigned - e.g., zero temperature in F or C
scale is not no heat at all, zero calendar time is not the beginning of
time
 Ratios of scale variables do not describe relative values correctly –
e.g., 40o F is not twice as many calories as 20o F
Types of Variables
 Qualitative Variables
• measured by classification only
• Non-numerical in nature
• Meaningfully ordered categories identify ordinal
data (best to worst ranking, income categories,
price ranges)
• Categories without a meaningful order identify
nominal data (gender, political affiliation, industry
classification, ethnic/cultural groups, cause of
defectives)
Types of Data & Information…
N Interval
Data Categorical?
Data
Y

Y Ordinal
Ordered?
Data
Categorical
Data N
Nominal
Data
E.g. Representing Student Grades…

N Interval Data
Data Categorical?
e.g. {0..100}
Y

Y Ordinal Data
Ordered?
e.g. {F, D, C, B, A}
Categorical
Data N Rank order to data

Nominal Data
e.g. {Pass | Fail}

NO rank order to data


Hierarchy of Data…
Interval
 Values are real numbers.
 All calculations are valid.
 Data may be treated as ordinal or nominal.

Ordinal
 Values must represent the ranked order of the data.
 Calculations based on an ordering process are valid.
 Data may be treated as nominal but not as interval.

Nominal
 Values are the arbitrary numbers that represent categories.
 Only calculations based on the frequencies of occurrence are valid.
 Data may not be treated as ordinal or interval.
Relationships between Variables.
(Source. Rowntree 2000: 33)

Variables

Category Quantity

Continuous
Ordinal Discrete
Nominal (counting) (measuring)

Ordered
categories Ranks.
Classroom Exercise

 For undergraduate students, what type of


variable is the following:
 Student Status (e.g., Freshman)
Enter: A for Ratio B for Interval
C for Ordinal D for Nominal
Why bother with variable types?
 Different statistical techniques used for
quantitative and qualitative variables
 Quantitative variables can be transformed into
Qualitative data through category creation
 Qualitative variables cannot be meaningfully
transformed into Quantitative data – coding
their values with numbers does not make them
quantitative
Collecting Data
 Sampling
 When all elements of a population cannot be measured then sampling
is necessary
 inferential statistics are then used to make estimates of population
parameters (e.g., average age) from the sample values
 Samples need to be representative
 Reflect population of interest
 Random Sampling
 Most common sampling method to ensure sample is largely representative
 Ensures that each subset of fixed size is equally likely to be selected
 Stratified Sampling
 Most representative sample technique
 Requires prior knowledge of population strata (sub-population)
 Uses random sampling within strata
Question (10thPERSON)
 A local TV station conducts exit polling
during an election, selecting every 10 th person
who exits the polling station. Is this a random
sample?

 Enter Yes or No

 Why or why not?


Common Sources of Error in
Survey Data
 Selection bias – exclusion of a subset of the population of
interest prior to sampling
 Non-response bias – introduced when responses are not
received from all sample members – what can be done?
 Measurement error – inaccuracy in recorded data. Can be
due to survey design, transcription error, or surveyor sabotage
 Example – prediction of CEO performance based on their golf
handicap (Chapter 1 Statistics In Action, p22)
The Role of Statistics in Managerial
Decision Making
 Statistical literacy is useful, if not necessary, to make
informed decisions both at work (selecting a new
employee based on education) and at home (selecting
a new car based on repair data)
 Requires statistical thinking to critically assess data
and the inferences drawn from it
 Statistical thinking assists you in identifying research
resulting from unethical or uninformed statistical
practices
Statistical
Computer Packages
1. Typical Software
• Excel
• SPSS
• SAS
• MINITAB

2. Need Statistical
Understanding
• Assumptions
• Limitations
Total Population

 The total collection of units, elements or


individuals that you want to analyse.
 These can be countries, lab-rats, light bulbs,
university students, banks, residents of a
particular area, regional health authorities etc.
 The population for a study of infant health
might be all children born in the U.K. in the
1980's.
Sample
 A sample is a group of units selected from a larger
group (the population). By studying the sample it is
hoped to draw valid conclusions about the larger
group.
 Using example for study of infant health the sample
might be all babies born on 7th May in any of the
years.
 samples selected because the population is too large
to study in its entirety.
 Important that the researcher carefully and
completely defines the population, including a
description of the members to be included
Representative sample

 A sample whose characteristics correspond to, or


reflect, those of the original population or reference
population
 To ensure representativeness, the sample may be
either completely random or stratified depending
upon the conceptualized population and the sampling
objective (i.e., upon the decision to be made).
 A thorny issue in the social sciences- is it possible to
achieve?
Probability Sampling
A probability provides a quantitative description of the likely occurrence of a

particular event.
 A probability sampling method is any
method of sampling that uses some form of
random selection. In order to have a
random selection method, you must set up
some process or procedure that assures
that the different units in your population
have equal probabilities of being chosen
(Clark 2002: 37).
Most Common Types of Probability
Sampling

 Simple Random Sampling


 Stratified Random Sampling
 Systematic Random Sampling
 Cluster Or Multistage Sampling
Simple Random Sampling
 where we select a group of subjects (a sample) for study from
a larger group (a population). Each individual is chosen
randomly and each member of the population has an equal
chance of being included in the sample.
 Every possible sample of a given size has the same chance of
selection; that is, each member of the population is equally
likely to be chosen at any stage in the sampling process.
(Easton & Mc Coll 2004).
 A lottery draw is a good example of simple random sampling.
A sample of 6 numbers is randomly generated from a
population of 45, with each number having an equal chance
of being selected.
Stratified Random Sampling
 Often factors which divide up the population into sub-populations
(groups / strata)
 measurement of interest may vary among the different sub-populations.
 This has to be accounted for when we select a sample from the
population to ensure our sample is representative of the population.
 This is achieved by stratified sampling.
 A stratified sample is obtained by taking samples from each stratum or
sub-group of a population.
 Suppose a farmer wishes to work out the average milk yield of each cow
type in his herd which consists of Ayrshire, Friesian, Galloway and
Jersey cows. He could divide up his herd into the four sub-groups and
take samples from these (Easton and Mc Coll 2004).
Systematic Random Sampling
 Systematic sampling, sometimes called interval sampling, means that there is a
gap, or interval, between each selection.
 Often used in industry, where an item is selected for testing from a production
line (say, every fifteen minutes) to ensure that machines and equipment are
working to specification.
 Alternatively, the manufacturer might decide to select every 20th item on a
production line to test for defects and quality. This technique requires the first
item to be selected at random as a starting point for testing and, thereafter, every
20th item is chosen.
used when questioning people in surveys eg market researcher selecting every
10th person who enters a particular store, after selecting a person at random as a
starting point;
 interviewing occupants of every 5th house in a street, after selecting a house at
random as a starting point.
If researcher wants to select a fixed size sample. In this case, it is first necessary
to know the whole population size from which the sample is being selected. The
appropriate sampling interval, I, is then calculated by dividing population size, N,
by required sample size, n, as follows:
 If a systematic sample of 500 students were to be carried out in a university with
an enrolled population of 10,000, the sampling interval would be:
 I = N/n = 10,000/500 =20
Cluster Or Multistage Sampling

 Cluster sampling is a sampling technique where the


entire population is divided into groups, or clusters,
and a random sample of these clusters are selected.
All observations in the selected clusters are included
in the sample.
 every element should have a specified (equal) chance
of being selected into the final sample.
 typically used when the researcher cannot get a
complete list of the members of a population they
wish to study but can get a complete list of groups or
'clusters' of the population
 Cheap, easy economical method of data collection.
Non-Probability Sampling
Main Types
 Convenience/ opportunity/accidental
sampling.
 Purposive/ judgemental sampling
 Quota sampling
 Snowball sampling
Convenience/ opportunity/accidental
sampling.
 volunteer samples
 Sometimes access through contacts or
gatekeepers
 ‘easy to reach’ population.
Purposive/ judgemental sampling
 Involves selecting a group of people because
they have particular traits that the researcher
wants to study
 e.g. consumers of a particular product or
service in some types of market research
 My own questionnaire research on ‘New-
Age’ Travellers.
Quota sampling

 widely used in opinion polls and market


research.
 Interviewers given a quota of subjects of
specified type to attempt to recruit.
 eg. an interviewer might be told to go out and
select 20 male smokers and 20 female
smokers so that they could interview them
about their health and smoking behaviours .
Snowball sampling
 Involves two main steps.
1. Identify a few key individuals
2. Ask these individuals to volunteer to
distribute the questionnaire to people who
know and fit the traits of the desired sample
(e.g. my research on Travellers)
Sample Size
 In general, the larger the sample size (selected with
the use of probability techniques) the better. The
more heterogeneous a population is on a variety of
characteristics (e.g. race, age, sexual orientation,
religion) then a larger sample is needed to reflect that
diversity. (Papadopoulos 2003)
 Response rates vary on the type of surveys (e.g. mail
surveys, telephone surveys). Response rates under 60
or 70 per cent may compromise the integrity of the
random sample. (ibid)
Sample Size
 In general, the larger the sample size (selected with
the use of probability techniques) the better. The
more heterogeneous a population is on a variety of
characteristics (e.g. race, age, sexual orientation,
religion) then a larger sample is needed to reflect that
diversity. (Papadopoulos 2003)
 Response rates vary on the type of surveys (e.g. mail
surveys, telephone surveys). Response rates under 60
or 70 per cent may compromise the integrity of the
random sample. (ibid)
Frequencies and Distributions
 Frequency-A frequency is the number of
times a value is observed in a distribution or
the number of times a particular event occurs.
 Distribution-When the observed values are
arranged in order they are called a rank order
distribution or an array. Distributions
demonstrate how the frequencies of
observations are distributed across a range of
values.
Two elements to a distribution
 Scale with a number of values -(Usually
arrange the scores from the highest to lowest).
 Corresponding observations- Tally up the
scores, convert them into frequencies.
Types of Distribution
 Frequency distribution
 Class Intervals
 Relative (Proportional or percentage
distributions)
 Cumulative distributions.
Frequency Distributions
 Shows number of cases having each of the
attributes of a particular variable. Divided into two
types
1. Ungrouped distribution-scores not collapsed into
categories, each score represented as a separate
values
2. Grouped distribution. Scores collapsed into
categories so that several scores are presented
together as a group. Groups usually referred to as a
class interval.
Relative (proportional or
percentage) distributions
 The proportion of cases in the whole
distribution observed at each score or value.
Cumulative distribution.
 The number of cases up to and including the
scale value. Can appear in grouped or
ungrouped format.
 Cumulative relative distribution for any
particular value is the the total up to, and
including, that value
Example
Look at the distribution below:
This distribution shows the recorded ages of patients
receiving treatment for heart disease in the Stroud
district. There are 50 observed values. We can easily see
how often each value occurs. What is the frequency of
the following values, 79; 81; 94? What is the range of
this distribution?(r = h – l ). What is the mode? What is
the median? From this distribution we can also tell that
most of the values tend to cluster around the middle of
the range.
62 64 65 66 68 70 71 71 72 72
73 74 74 74 75 75 76 77 77 78
78 78 79 79 79 80 80 80 81 81
81 81 81 82 82 82 83 83 85 85
86 87 87 88 89 90 90 92 94 96
UNIT-II
Diagrammatic Presentation
Types of Diagrams

Types of Graphs

Examples
Chapter Topics
 Organizing numerical data
 The ordered array and stem-leaf display
 Tabulating and graphing Univariate numerical
data
 Frequency distributions: tables, histograms,
polygons
 Cumulative distributions: tables, the Ogive
 Graphing Bivariate numerical data
Chapter Topics
 Organizing numerical data
 The ordered array and stem-leaf display
 Tabulating and graphing Univariate numerical
data
 Frequency distributions: tables, histograms,
polygons
 Cumulative distributions: tables, the Ogive
 Graphing Bivariate numerical data
 Tabulating and graphing Univariate categorical
data
 The summary table
 Bar and pie charts, the Pareto diagram
 Tabulating and graphing Bivariate categorical data
 Contingency tables
 Side by side bar charts
 Graphical excellence and common errors in
presenting data
Organizing Numerical Data
Numerical Data 41, 24, 32, 26, 27, 27, 30, 24, 38, 21

Frequency Distributions
Ordered Array
Cumulative Distributions
21, 24, 24, 26, 27, 27, 30, 32, 38, 41

Stem and Leaf 2 144677


Histograms Ogive
Display 3 028
4 1
Tables Polygons
Organizing Numerical Data
(continued)
 Data in raw form (as collected):
24, 26, 24, 21, 27, 27, 30, 41, 32, 38
 Data in ordered array from smallest to largest:
21, 24, 24, 26, 27, 27, 30, 32, 38, 41
 Stem-and-leaf display:
2 144677
3 028
4 1
Graphing Numerical Data:
The Histogram
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Histogram

7 6
6 5
Frequency

5 4 No Gaps
4 3
3 2 Between
2 Bars
1 0 0
0
5 15 25 36 45 55 More

Class Boundaries
Class Midpoints
Graphing Numerical Data:
The Frequency Polygon
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Frequenc y

7
6
5
4
3
2
1
0
5 15 25 36 45 55 More

Class Midpoints
Tabulating Numerical Data:
Cumulative Frequency
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Cumulative Cumulative
Class Frequency % Frequency
10 but under 20 3 15
20 but under 30 9 45
30 but under 40 14 70

40 but under 50 18 90
50 but under 60 20 100
Ogive…

Is a graph of a cumulative frequency distribution.

We create an ogive in three steps…


1) Calculate relative frequencies. 

2) Calculate cumulative relative frequencies by

adding the current class’ relative frequency to the


previous class’ cumulative relative frequency.
 (For the first class, its cumulative relative frequency is just its
relative frequency)
Cumulative Relative
Frequencies…

first class…
next class: .355+.185=.540
:
:

last class: .930+.070=1.00


Ogive…
Is a graph of a cumulative frequency
distribution.
1) Calculate relative frequencies. 

2) Calculate cumulative relative frequencies. 

3) Graph the cumulative relative frequencies…


Ogive…

The ogive can be used to


answer questions like:

What telephone bill value


is at the 50th percentile?

“around $35”
(Refer also to Fig. 2.13 in your textbook)
Graphing Numerical Data:
The Ogive (Cumulative % Polygon)
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Ogive

100
80
60
40

20
0
10 20 30 40 50 60

Class Boundaries (Not Midpoints)


Graphing Bivariate Numerical
Data (Scatter Plot)
Mutual Funds Scatter Plot
40
30
Total Year to
Date Return
(%)

20
10
0
0 10 20 30 40
Net Asset Values
Tabulating and Graphing
Categorical Data:Univariate
Data
Categorical Data

Graphing Data
Tabulating Data
The Summary Table
Pie Charts

Bar Charts Pareto Diagram


Summary Table
(for an Investor’s Portfolio)

Investment Category Amount Percentage


(in thousands $)

Stocks 46.5 42.27


Bonds 32 29.09
CD 15.5 14.09
Savings 16 14.55
Total 110 100
Variables are Categorical
Graphing Categorical Data:
Univariate Data
Categorical Data

Graphing Data
Tabulating Data
The Summary Table
Pie Charts
CD

S a vin g s

B onds Bar Charts Pareto Diagram


S to c k s 45 120
40
100
0 10 20 30 40 50 35
30 80
25
60
20
15 40
10
20
5
0 0
S to c k s B onds S a vin g s CD
Bar Chart
(for an Investor’s Portfolio)

Investor's Portfolio

Savings
CD
Bonds
Stocks

0 10 20 30 40 50
Amount in K$
Pie Chart
(for an Investor’s Portfolio)
Amount Invested in K$

Savings
15%

Stocks
CD 42%
14%

Percentages are
rounded to the
Bonds
nearest percent.
29%
Pareto Diagram
45% 100%

40% 90%

Axis for 35%


80%

bar 70%
chart 30%

shows 60%
25%
% 50%

invested 20%
40%
in each
15%
category 30% Axis for line
10%
20%
graph
shows
5% 10%
cumulative
0% 0% % invested
Stocks Bonds Savings CD
Tabulating and Graphing
Bivariate Categorical Data
 Contingency tables: investment in thousands of dollars
Investment Investor A Investor B Investor C Total
Category

Stocks 46.5 55 27.5 129


Bonds 32 44 19 95
CD 15.5 20 13.5 49
Savings 16 28 7 51
Total 110 147 67 324
Tabulating and Graphing
Bivariate Categorical Data
 Side by side charts
Comparing Investors

S avings

CD

B onds

S toc k s

0 10 20 30 40 50 60

Inves tor A Inves tor B Inves tor C


Principles of Graphical
Excellence
 Presents data in a way that provides
substance, statistics and design
 Communicates complex ideas with clarity,
precision and efficiency
 Gives the largest number of ideas in the most
efficient manner
 Almost always involves several dimensions
 Tells the truth about the data
Errors in Presenting Data
 Using “chart junk”
 Failing to provide a relative basis
in comparing data between
groups
 Compressing the vertical axis
 Providing no zero point on the vertical axis
“Chart Junk”

Bad Presentation  Good Presentation


Minimum Wage Minimum Wage
1960: $1.00 $
4
1970: $1.60
2
1980: $3.10
0
1990: $3.80 1960 1970 1980 1990
No Relative Basis

Bad Presentation  Good Presentation


A’s received by A’s received by
Freq. students. students.
300 30 %
200 
 10
0 
FR SO JR SR FR SO JR SR

FR = Freshmen, SO = Sophomore, JR = Junior, SR = Senior


Compressing Vertical Axis

Bad Presentation Good Presentation


Quarterly Sales Quarterly Sales
$ $
200 50

100 25

0 0
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
No Zero Point on Vertical Axis

Bad Presentation  Good Presentation


Monthly Sales
Monthly Sales $
$ 45
45
42
42
39
39 36
36
J F M A M J 0
J F M A M J
Graphing the first six months of sales.
Summary II…
Interval Nominal
Data Data
Histogram, Ogive, Frequency and
or Stem-and-Leaf Relative
Single Set of
Display Frequency Tables,
Data
Bar and Pie
Charts
Relationship Scatter Diagram Contingency
Between Table, Bar Charts
Two Variables
Chapter Summary
 Organized numerical data
 The ordered array and stem-leaf display
 Tabulated and graphed univariate numerical
data
 Frequency distributions: tables, histograms, polygon
 Cumulative distributions: tables and the Ogive
 Graphed bivariate numerical data
Chapter Summary
(continued)
 Tabulated and graphed univariate categorical data
 The summary table
 Bar and pie charts, the Pareto diagram
 Tabulated and graphed bivariate categorical data
 Contingency tables
 Side by side charts
 Discussed graphical excellence and common errors in
presenting data
UNIT-III
 Measure of Central Tendency
 Mean

 Median

 Mode

 Geometric Mean

 Harmonic Mean

 Quartiles, Deciles

 Merits & Demerits


WHAT DO THEY ALL MEAN?
Numerical Descriptive
Techniques…
 Measures of Central Location
 Mean, Median, Mode

 Measures of Variability
 Range, Standard Deviation, Variance, Coefficient of Variation

 Measures of Relative Standing


 Percentiles, Quartiles

 Measures of Linear Relationship


 Covariance, Correlation, Least Squares Line
Measures of Central Location…
The arithmetic mean, a.k.a. average, shortened to
mean, is the most popular & useful measure of central
location.

It is computed by simply adding up all the observations


and dividing by the total number of observations:

Sum of the observations


Mean =
Number of observations
Notation…
When referring to the number of observations in a population,
we use uppercase letter N

When referring to the number of observations in a sample, we


use lower case letter n

The arithmetic mean for a population is denoted with Greek


letter “mu”:

 The arithmetic mean for a sample is denoted with an “x-bar”:


Statistics is a pattern language…

Population Sample

Size N n

Mean
Arithmetic Mean…

Population Mean Sample Mean


Statistics is a pattern language…
Population Sample

Size N n

Mean
The Arithmetic Mean…
…is appropriate for describing measurement
data, e.g. heights of people, marks of student
papers, etc.

…is seriously affected by extreme values called


“outliers”. E.g. as soon as a billionaire moves
into a neighborhood, the average household
income increases beyond what it was previously!
Measures of Central Location…
The median is calculated by placing all the
observations in order; the observation that falls in the
middle is the median.
Data: {0, 7, 12, 5, 14, 8, 0, 9, 22} N=9 (odd)
Sort them bottom to top, find the middle:
0 0 5 7 8 9 12 14 22

Data: {0, 7, 12, 5, 14, 8, 0, 9, 22, 33} N=10 (even)

Sort them bottom to top, the middle is the


simple average between 8 & 9:
0 0 5 7 8 9 12 14 22 33
median = (8+9)÷2 = 8.5
Sample and population medians are computed the same way.
Measures of Central Location…
The mode of a set of observations is the value that occurs most
frequently.

A set of data may have one mode (or modal class), or two, or
more modes.

Mode is a useful for all data types, though mainly used for
nominal data.

For large data sets the modal class is much more relevant than a
single-value mode.

Sample and population modes are computed the same way.


Mode…
 E.g. Data: {0, 7, 12, 5, 14, 8, 0, 9, 22, 33} N=10

Which observation appears most often?


The mode for this data set is 0. How is this a measure

of “central” location?

A modal class
Frequency

Variable
=MODE(range) in Excel…
Note: if you are using Excel for your data
analysis and your data is multi-modal (i.e. there
is more than one mode), Excel only calculates
the smallest one.

 You will have to use other techniques (i.e.


histogram) to determine if your data is bimodal,
trimodal, etc.
Mean, Median, Mode…
If a distribution is symmetrical,
the mean, median and mode may coincide…

median
mode

mean
Mean, Median, Mode…
 If a distribution is asymmetrical, say skewed to the left
or to the right, the three measures may differ. E.g.:
median
mode

mean
Mean, Median, Mode…
If data are symmetric, the mean, median, and
mode will be approximately the same.

If data are multimodal, report the mean, median


and/or mode for each subgroup.

If data are skewed, report the median.


Mean, Median, & Modes for Ordinal &
Nominal Data…
For ordinal and nominal data the calculation of
the mean is not valid.

Median is appropriate for ordinal data.

For nominal data, a mode calculation is useful


for determining highest frequency but not
“central location”.
Geometric Mean…
The geometric mean is used when the variable is a growth
rate or rate of change, such as the value of an investment
over periods of time.

If Ri denotes the rate of return in period i (i = 1, 2, …, n),


then
The geometric mean R of the returns R , R , … R is
g 1 2 n
defined such that:

Solving for Rg we produce the following formula:



Harmonic Mean

Harmoni Mean of a set of n values is defined as


the reciprocals of the mean of the reciprocals
of these values.
That is, x1,x2,x3,………xn are the n values,
Presentation of data & descriptive statistics
The geometric mean:
 An alternative measure which is particularly applicable

when the average rate of growth is to be measured is the


geometric mean

X g  n X 1 X 2 ... X n  1

 E.g.: annual growth rates: 10%, 20%, 15%, -30%, 20%


 The annual growth rate is

X g  5 1.11.2 1.15  0.7 1.2  1  1.0498  1  4.98%


Finance Example…
Suppose a 2-year investment of $1,000 grows by 100% to $2,000 in the first
year, but loses 50% from $2,000 back to the original $1,000 in the second
year. What is your average return?

 Using the arithmetic mean, we have

This would indicate we should have $1,250 at the end of our investment, not
$1,000.

 Solving for the geometric mean yields a rate of 0%, which is correct.

The upper case Greek Letter “Pi” represents a product of terms…


Harmonic Mean

Harmoni Mean of a set of n values is defined as


the reciprocals of the mean of the reciprocals
of these values.
That is, x1,x2,x3,………xn are the n values,
Measures of Central Location •
Summary…
 Compute the Mean to
 Describe the central location of a single set of interval data

 Compute the Median to


 Describe the central location of a single set of interval or ordinal data

 Compute the Mode to


 Describe a single set of nominal data

 Compute the Geometric Mean to


 Describe a single set of interval data based on growth rates
Presentation of data & descriptive statistics
 Calculation of quartiles from grouped data:

 (n  1)   3( n  1) 
 4  F   4  F 
Q1  L  i   ; Q3  L  i  
 f   f 
   

 L= lower bound of the quartile group


 i = width of quartile group
 F = cumulative frequency up to the quartile group
 f = frequency in the quartile group
Presentation of data & descriptive statistics
 Example of calculation of quartiles from grouped data. See
Table 2.9 (page 55) :
 (51  1) 
 4  12 
Q1  3  1   2.667%
 3 
 
 3(51  1) 
  37 
Q3  3  1 4
  3.666%
 3 
 

 Quartile range = Q3 – Q1 = 3.666 – (-2667) = 6.333%


 Quartile deviation = 6.333/2 = 3.1666%
Measures of Central Location •
Summary…
 Compute the Mean to
 Describe the central location of a single set of interval data

 Compute the Median to


 Describe the central location of a single set of interval or ordinal data

 Compute the Mode to


 Describe a single set of nominal data

 Compute the Geometric Mean to


 Describe a single set of interval data based on growth rates
How is the range of a set of numbers
identified?

 Arrange the numbers in the set in order from


least to greatest.

 Subtract the lowest number from the highest


number in the set of numbers.

 The difference of the two numbers is the range


of a set of numbers.
Back to your test Scores:
95, 87, 92, 100, and 94

Your highest score is a 100


Your lowest score is 87
Your range is 100 – 87 = 13
The range = 13
It’s Time To Practice!
Number of pets owned by 7 students:
2, 1, 1, 4, 3, 2, 1

What is the mean 2

What is the median 2

What is the Mode 1

What is the Range 3


4 ways to describe data
The Mean – what we usually think of as
the average…..
The Median – The middle number in a
data set…..50% of the numbers are above and
50% are below the median
The Mode – The number that occurs
most often…..
The Range – the difference between the
smallest and bigest number…..
UNIT-IV
 Measure of Dispersion
 Types of Disperson

 Lorenz Curve

 Combined mean & Standard

Deviation
 Coefficient of Variation

 Consistency of data
Definition
 Measures of dispersion are descriptive
statistics that describe how similar a set of
scores are to each other
 The more similar the scores are to each other, the
lower the measure of dispersion will be
 The less similar the scores are to each other, the
higher the measure of dispersion will be
 In general, the more spread out a distribution is,
the larger the measure of dispersion will be
Measures of Dispersion
 Which of the
distributions of scores
has the larger 125
dispersion? 100
75
50
25
The upper distribution 0
has more dispersion 1 2 3 4 5 6 7 8 9 10

because the scores are 125


100
more spread out 75
50
That is, they are less 25
similar to each other 0
1 2 3 4 5 6 7 8 9 10
Measures of Dispersion
 There are three main measures of dispersion:
 The range
 The semi-interquartile range (SIR)
 Variance / standard deviation
The Range
 The range is defined as the difference between
the largest score in the set of data and the
smallest score in the set of data, XL - XS
 What is the range of the following data:
4 8 1 6 6 2 9 3 6 9
 The largest score (XL) is 9; the smallest score
(XS) is 1; the range is XL - XS = 9 - 1 = 8
When To Use the Range
 The range is used when
 you have ordinal data or
 you are presenting your results to people with little or
no knowledge of statistics
 The range is rarely used in scientific work as it is
fairly insensitive
 It depends on only two scores in the set of data, XL
and XS
 Two very different sets of data can have the same
range:
1 1 1 1 9 vs 1 3 5 7 9
The Semi-Interquartile Range
 The semi-interquartile range (or SIR) is
defined as the difference of the first and third
quartiles divided by two
 The first quartile is the 25th percentile
 The third quartile is the 75th percentile
 SIR = (Q3 - Q1) / 2
Interquartile Range…
The quartiles can be used to create another measure of
variability, the interquartile range, which is defined as
follows:

 Interquartile Range = Q3 – Q1

The interquartile range measures the spread of the middle


50% of the observations.
Large values of this statistic mean that the 1st and 3rd

quartiles are far apart indicating a high level of variability.


Deciles
SIR Example

 What is the SIR for the 2


data to the right? 4
 5 = 25th %tile
 25 % of the scores are 6
below 5 8
 5 is the first quartile 10
 25 % of the scores are 12
above 25 14
 25 is the third quartile
20
 SIR = (Q3 - Q1) / 2 = (25  25 = 75th %tile
30
- 5) / 2 = 10
60
When To Use the SIR
 The SIR is often used with skewed data as it is
insensitive to the extreme scores
Variance
 Variance is defined as the average of the square
deviations:

  X    2

2 
N
What Does the Variance
Formula Mean?
 First, it says to subtract the mean from each of
the scores
 This difference is called a deviate or a deviation
score
 The deviate tells us how far a given score is from
the typical, or average, score
 Thus, the deviate is a measure of dispersion for a
given score
What Does the Variance
Formula Mean?
 Why can’t we simply take the average of the
deviates? That is, why isn’t variance defined
as:

  X  
 2

N

This is not the


formula for
variance!
What Does the Variance
Formula Mean?
 One of the definitions of the mean was that it
always made the sum of the scores minus the
mean equal to 0
 Thus, the average of the deviates must be 0
since the sum of the deviates must equal 0
 To avoid this problem, statisticians square the
deviate score prior to averaging them
 Squaring the deviate score makes all the squared
scores positive
What Does the Variance
Formula Mean?
 Variance is the mean of the squared deviation
scores
 The larger the variance is, the more the scores
deviate, on average, away from the mean
 The smaller the variance is, the less the scores
deviate, on average, from the mean
Coefficient of Variation…
The coefficient of variation of a set of
observations is the standard deviation of the
observations divided by their mean,
that is:

Population coefficient of variation = CV =

Sample coefficient of variation = cv =


Statistics is a pattern language…
Population Sample

Size N n
Mean

Variance

Standard
S
Deviation
Coefficient of
Variation CV cv
Coefficient of Variation…
This coefficient provides a
proportionate measure of variation, e.g.

A standard deviation of 10 may be perceived as


large when the mean value is 100, but only
moderately large when the mean value is 500.
Standard Deviation

 When the deviate scores are squared in variance,


their unit of measure is squared as well
 E.g. If people’s weights are measured in pounds,
then the variance of the weights would be expressed
in pounds2 (or squared pounds)
 Since squared units of measure are often
awkward to deal with, the square root of variance
is often used instead
 The standard deviation is the square root of variance
Standard Deviation
 Standard deviation = variance
 Variance = standard deviation2
Computational Formula
 When calculating variance, it is often easier to use a
computational formula which is algebraically
equivalent to the definitional formula:

  X
2

X  
2

  
2

N X

2
 
N N

2 is the population variance, X is a score,  is the


population mean, and N is the number of scores
Computational Formula
Example
X X2 X- (X-)2

9 81 2 4
8 64 1 1
6 36 -1 1
5 25 -2 4
8 64 1 1
6 36 -1 1
 = 42  = 306 =0  = 12
Computational Formula
Example
  X
2

 
X
2
  X    2

N

2

2
 
N N
2
12
306  42 
 6 6
6 2
306  294

6
12

6
2
Variance of a Sample
 Because the sample mean is not a perfect estimate of
the population mean, the formula for the variance of a
sample is slightly different from the formula for the
variance of a population:

s
2


 X X  2

N 1
s2 is the sample variance, X is a score, X is the
sample mean, and N is the number of scores
Presentation of data & descriptive statistics

 Which measure of dispersion?


 If the median is used -> quartile deviation
 If the arithmetic mean is used -> variance/std.
deviation/ negative semi-variance
Presentation of data & descriptive statistics
Coefficient of variation
 The SD is expressed in the underlying units of

measurement.
 Thus when comparing the degree of dispersion between

variables, we must take into account of the difference in


magnitude of variables, e.g. FTSE index x S&P 500 index
or Dow Jones index x Ibovespa
 We can use the SD for returns but not for levels!

 The CV overcomes this problem


CV 
X
UNIT-V
Index Numbers
 Index Numbers
 Types of Methods

 Simple Average of Price Relatives

 Weighted Index Numbers

 Laseyre’s , Bowley’s,Fisher’s &

Marshall-Edgeworth Index Numbers


 Test of Consistency of Index Numbers

 Fisher’s Index Number an Ideal Index

Number
Index Numbers
 Index numbers are used to summarize many
variables or numbers with one number
 The most common index numbers are price
indexes
 Consumer Price Index (CPI – TÜFE)
 Producer Price Index (PPI – ÜFE)
 ISE Index
 Dow Jones Industrial Average
Index Numbers
 Index numbers may be computed for other
things than prices
 quantity indexes
 quality indexes
Price Indexes
 Price indexes are used to measure the general
movement of prices (inflation)
 Common types of indexes
 price relatives
 unweighted
 Laspeyres
 Paasche
Price Relatives
 Price relatives are used to find the change in price of
a single item

Pt
PR   100
P0
Prices of different fruits in
different years (NTL/kg)
Price Relatives for Banana and Kiwi
Pt
PR   100
P0

 From 2001 to 2002

 Banana: PRB = (0.94/0.91)*100 = 103.3


 Kiwi: PRK = (2.10/1.90)*100 = 110.5
 OR
Banana price index
BASE YEAR=2000
Unweighted Price Indexes
 Price Relatives only represent the change in price of one
item over time
 Unweighted Price indexes are formed by adding the
prices in the year of interest and dividing by the sum of
the prices in the base year

 Pt
Puw   100
 P0
UnW Fruits Price Index
Base Year=2000
 Pt
Puw   100
 P0
Problems with Unweighted Price
Indexes
 Unweighted Price Indexes have a couple of
problems:
 they may be influenced by items with high prices
 items that are relatively unimportant in the goods
bundle may have undue influence
 The usual solution is to weight the prices by
some quantities
Weighted Price Indexes
 If the price index is to be weighted by
quantities, which quantities?
 base year quantities (Laspeyres)
 current year quantities (Paasche)

 Most of the CPI’s and PPI’s use these or some


variations (i.e. Fisher’s ideal PI)
Laspeyres Index
 The Laspeyres index uses base year quantities as
weights

 Pt Q0
PL   100
 P0 Q0
Quantities purchased (1000 kg)
Laspeyres fruit price index
(BY=2000)


3
Pi ,t Qi , 2000
PL  i 1
 100

3
i 1
Pi , 2000Qi , 2000
Paasche Price Index
 The Paasche index uses current quantities as the
weighting factor

 Pt Qt
PP   100
 P0 Qt
Paashe fruit price index (BY=2000)


3
Pi ,t Qi ,t
PP  i 1
 100

3
i 1
Pi , 2000Qi ,t
Which is Best?
 The advantage of the Laspeyres index is that
once the quantities are set they do not change.
This index is easy to update.
 The advantage of the Paasche is that the
quantities reflect the current
production/consumption. However, it is
difficult to update and may not be as easy to
compare over time.
Weights – CPI (442 items)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy