STATISTICS
STATISTICS
Inferential Statistics
1. Decide on the number of class interval to use lower limit + upper limit
CM =
between 5-15. Too many class intervals result to 2
several empty class intervals while too few creates
8. Tally the row scores and indicate the frequency for
long details. Use the Sturge’s formula whenever
each of the class intervals.
possible.
9. Get the relative frequency. This gives us the
2. Compute the Range. This is the difference between
percentage of observations in a particular class of
the highest value and the lowest value in the set of
interest. This is obtained by dividing the frequency
data.
of the class by the total number of
R=HO−LO
frequency/observations.
f Xi= observed value
rf = x 100
n
∑ = Summation notation
rf= relative frequency interval
Weighted Mean
f= frequency
There are some cases where values are given more
n= number of observations importance than others.
n
10. Add the frequencies and indicate the sum.
∑ xi
MODULE 3 & 4 x=
i=1
n
MEASURES OF CENTRAL TENDENCY
B. Median
Measures of central tendency commonly referred to us as
an average. The purpose of an average is to pinpoint the The median of ungrouped data arranged in an array
center in the set of observations. (increasing or decreasing order of magnitude) Is the middle
observation for an odd number of items or the arithmetic
3 measures of central tendency commonly used in mean of two middle values when the number of items in
business and economics – arithmetic mean, median and the distribution is even.
mode.
C. Mode
Statistics is any measure calculated from sample data,
thus, measures of location from a sample are statistic. The mode for the ungrouped data is defined as the value
that appears with the highest frequency.
Parameters – numerical values calculated from population
data. That is, the item that appears most often, usually denoted
by 𝒙̂ (read as x hat).
Measures of Statistics Parameter
Central (sample) (Population) It is generally used with nominal data. It can be easily
Tendency identified by inspection of an ungrouped set of data by
Mean 𝒙̅ 𝝁 getting the score or item which occurs most frequently.
Median mdn Mdn
When all values appear with the same frequency, the
Mode mo Mo
mode does not exist. A distribution with the only one
n = sample size, N = population size
mode is called unimodal while a distribution which has two
modes is bimodal; and for the same sets of data with the
Methods of Central Tendency of Ungrouped Data three or more modes is known as multimodal.
Ungrouped or raw data are those data which are not yet Measures of Relative Standing: Ungrouped Data
organized or arranged into frequency distribution.
Another set of measures that helps us describe data set
A. Arithmetic Mean are the measures of relative standing.
the sum of the values the variables divided by the number Quantiles are extension of the median concept; these are
of observations. The definition is the same for both sample the values which divide a set of data into equal parts. The
and the population, although we use a different symbol to measures of relative standing are the quartiles, deciles,
refer to each. and percentiles.
Population or Sample Mean: Percentile: The whole data set is equally divided into 100
N∨n parts.
∑ xi Decile: Data set is divided into 10 equally divided parts.
i=1
μ∨ x=
N∨n Quartile: Data set is divided into 4 parts. Note: At certain
𝝁 or 𝒙̅ = population or sample mean points, these three measures will have the same values
( )
Lp. ln
−cf P 85
Measures of Relative Standing: Grouped Data 100
P85=lb 85 + i
f P 85
Quartiles: Grouped Data
Steps in Identifying the Percentile 85 class.
The formula for quartiles will patterned from the median
formula. Example Q1: 1. Multiply data set by 85 and divide the product by
100.
( )
1n 2. The percentile 85 class is the class that has a sum
−cf q 1
4 of frequencies greater than the result of step 1.
Q1=lb Q1 + i
f q1
MODULE 5
lbQ1= lower boundary of quartile 1 class
Measures of Variability (Dispersion): Range
n= number of observations
A. Range
cfq1= cumulative frequency before quartile 1 class
i= class interval It is the difference between the largest and the smallest
values in a data set.
Steps in Identifying the Quartile 1 class.
Ungrouped data – finding the difference between the
1. Divide the number of observations by 4. highest and the lowest value.
2. Go over the entries in the less than cumulative
frequency column. The class that has a sum of Grouped data – the range is determined by subtracting the
frequencies greater than the n/4 is the quartile 1 lower boundary of the lowest class interval from the upper
class. boundary of the highest-class interval of a frequency
distribution. (Class boundaries are considered the true
Deciles: Grouped Data limits)
The formula for deciles will also be patterned from the Interquartile Range
median formula. Example D7:
Quartile divides the distribution of numerical into four
ln equal parts.
−cf D 1
10
D7=lb D 1 +( )i The first or lower quartile lies on the 25% of the total
f D1
number values, while the third or the upper quartile is on
lbD1= lower boundary of decile 7 class the 75%.
n= number of observations
The interquartile range (IQR) – finding the difference 2. Compute the value of the mean
between the value of the third quartile (𝑄3) or upper
3. Find the individual absolute value of each deviation from
quartile and the first quartile (𝑄1) or lower quartile.
the mean
IQR = Q3 – Q1
4. Find the sum of the absolute value in Step 3 and
Semi-Interquartile Range
5. Substitute the values in the formula and solve.
The semi-interquartile range (SIQR) or quartile deviation
For the grouped data, the mean deviation or average
(QD) indicates the variation or dispersion of the values
deviation is determined by the following
covering the middle 50% of the distribution of the data is
found by getting half of the value or distance between or procedures below:
distance between the third quartile or upper quartile and
the first quartile or the lower quartile. 1. Compute the mean 𝑥̅of the distribution
SIQR or QD = (Q3 – Q1)/2 2. Subtract the mean from each of the midpoints and write
the absolute values of the results under the column 𝑥 − 𝑥̅
ICR (Grouped Data)
3. Find the product of items under column f and items
( )
1n under column 𝑥 − 𝑥̅
−cf q 1
4
Q1=lb Q1 + i 4. Add the products in Step 3 to obtain the value of ∑ 𝑓(𝑥 −
f q1
𝑥̅)
lbQ1=lower boundary of quartile 1 class 5. Divide the sum obtained in Step 4 by n.
n= number of observations Measures of Variability (Dispersion): Variance
cfq1= cumulative frequency before quartile 1 class Variance is defined as the average of the squared
deviations from the mean.
Fq1= frequency of quartile 1 class
The square root of this variance is known as standard
i= class interval
deviation.
Measures of Variability (Dispersion): Mean Absolute
The variance for a sample data is denoted by S2 (read as S
Deviation (MAD)
squared or the square of S) while the symbol for variance
Mean Deviation or average deviation is defined as the of the population is 𝜎 2 (read as sigma squared)
average of the absolute deviations of the individual values
Ungrouped Data
of a set of numerical data from either mean, the median or
mode. Among the three, the mean is the most preferred To determine the variance of an ungrouped data, let us
and commonly used measure of central tendency for follow the steps below:
computing the deviation or average deviation
1. Arrange the values according to magnitude lowest to
Ungrouped Data highest or vice versa
n
2. Calculate the mean
∑❑
i=1
3. Obtain the individual deviations from the mean
Grouped Data
4. Square each deviation and write the results under
n column |𝒙 − 𝒙̅ |𝟐
∑❑
i=1 5. Find the sum squared deviations
For the ungrouped data, the mean deviation or average 6. Divide the sum in Step 5 by n-1 for sample data or by n
deviation is determined by the following for population data.
procedures below: Measures of Variability (Dispersion): Standard Deviation
1. Arrange the values from lowest to highest or vice-versa
Standard deviation is the most important measure of Forecasting in Different Departments
variability.
Accounting – new product/process cost estimates,
By knowing the standard deviation, we can determine the profit projections, cash management
position of the scores in a frequency distribution in relation Finance – equipment/equipment replacement
to the mean. needs, timing, and amount of funding/borrowing
needs
A standard deviation of a small value means that the
Human resources – hiring activities, including
values in a distribution are scattered or spread out near
recruitment, interviewing, and training; layoff
the mean and vice versa.
planning, including outplacement counseling.
S= √ S2 Marketing – pricing and promotion, e-business
strategies, global competition strategies
S=
2∑ f ∨x−x̅ ¿2 MIS – new/revised information systems, internet
n−1 services.
Operations – Schedules, capacity planning, work
and
assignments and workloads, inventory planning,
δ=√ δ make-or-buy decisions, outsourcing, project
2
management.
Measures of Shape: Skewness Product/service design - Revision of current
Skewness is defined as the degree of departure from features, design of new products or services
symmetry. Help managers plan the system, and the other is to help
A frequency curve that has a longer tail to the right than to them plan the use of the system
the left is said to be skewed to the right or described as Features Common to all Forecast
positively skewed distribution. Reversely, a distribution is
negatively skewed or skewed to the left if it has a tail which Forecasting techniques generally assume that the
is longer to the left than to the right. same underlying causal system that existed in the
past will continue to exist in the future.
When the value of the skewness is zero (0) then the Forecasts are not perfect; actual results usually
distribution is symmetric, indicating that the mean is equal differ from predicted values; the presence of
to the median randomness precludes a perfect forecast.
3( x−x ) Allowances should be made for forecast errors.
SK= Forecasts for groups of items tend to be more
S
accurate than forecasts for individual items
FORECASTING because forecasting errors among items in a group
usually have a canceling effect.
It is a prediction, estimate or determination of what will
Forecast accuracy decreases as the time period
occur in the future based on a certain set of factors.
covered by the forecast—the time horizon —
The value being forecast may be sales, interest rates, increases. Short-range forecasts must contend
funds, gross national product (GNP), technological status, with fewer uncertainties than longer-range
and others. forecasts, so they tend to be more accurate.
The factors on which a forecast is based may be any of the Element of a good Forecast
following: past data, opinion or judgement, company data,
1. Timely
or perceived pattern related to time.
2. Accurate
Answers the following questions: 3. Reliable
4. Expressed in Meaningful Units
1. What is the purpose of the forecast?
5. In writing
2. What are the dynamics and components of the
6. Simple to understand and use
system for which the forecast will be made?
7. Cost-effective
3. How important is the past in estimating the
future? Categories of Forecasting in Time Horizon
Short-term Forecast – It covers one day to one year and
are used mainly for short-run control such as employment,
purchasing, scheduling, sales and production rates.
Intermediate-term Forecast – a period ranging from one
season to one or two years and is used for production
schedules, revenues, cash flow and budget planning.
Forecasting Techniques
F=
∑ ( N )( W ) (S)
∑W
F= forecast of time periods
S= actual values
W=weight given
Exponential Smoothing – refers to family of The method of measuring the strength of association
forecasting models that are very similar to (correlation) among variables. It is a descriptive statistical
weighted moving average that weights the most method that measures the relationship between two
recent past data more than the distant past data. different variables.
Regression Analysis – is a simple statistical tool
The correlation coefficient is measures on a scale that
used to model the dependence of a variable on
varies from +1 to -1. When one variables measures, as the
one or more explanatory variables.
other measure, the correlation is positive. When one
Simple linear regression – is the least estimator of
decreases and the other increases, it is negative. Complete
a linear regression model with a single predictor
absence of correlation is represented by 0.
(one independent variable). The least square
model determines a regression equation by
minimizing the squares of the vertical distances
between actual and predicted values of Y.
F=
∑ ( N )( W ) (S)
∑W When to Use
Shortcut Method: Trend Formula (Excel) Parametric: (Pearson’s Coefficient) Where the data must
=Trend(known Y, known X, independent variable) be handled in relation to the parameters of populations or
probability distributions. Typically used with quantitative
Making Graphs data already set out within said parameters.
Nonparametric: (Spearman’s Rank) Where no assumptions
- Scatter Diagram
can be made about the probability distribution. Typically
- Chart Element (Trendline)
used with qualitative data but can be used with
- Display Equation on Chart
quantitative data if Spearman’s Rank proves inadequate
Interpretation
Shortcut: