Management Science L5
Management Science L5
COMMUNICATION, INC.
CSTC College Bldg. Gen. Luna St. Maharlika Hi-way, Pob. 3, Arellano
Sub. Sariaya Province of Quezon R4A
Registrar’s Office: 042 3290850 / 042 7192818
CSTC IT Center: 042 7192805
Atimonan Contact Number: 042 7171420
Preliminaries
I. Lesson Number 5
II. Lesson Title Descriptive Statistics: Probability, Distribution, Univariate Data
III. Brief Introduction Statistics has become the universal language of the sciences, and
of the Lesson data analysis can lead to powerful results. As scientists,
researchers, and managers working in the natural resources sector,
we all rely on statistical analysis to help us answer the questions that
arise in the populations we manage.
IV. Lesson Objectives a. Describe the basic features of the data in a study
b. Identify simple summaries for sample and measurements.
c. Describe the:
Univariate Analysis
Probability
Distribution
Lesson Proper
I. Getting Started
Briefly explain the chart below.
II. Discussion
Descriptive Statistics
A population is the group to be studied, and population data is a collection of all elements in the
population.
Populations are characterized by descriptive measures called parameters. Inferences about
parameters are based on sample statistics. For example, the population mean (µ) is estimated
by the sample mean (x̄). The population variance (σ2) is estimated by the sample variance (s2).
Variables are the characteristics we are interested in. For example:
The length of fish in Long Lake.
The pH of lakes in the Adirondack Park.
The weight of grizzly bears in Yellowstone National Park.
Variables are divided into two major groups: qualitative and quantitative. Qualitative variables
have values that are attributes or categories. Mathematical operations cannot be applied to
qualitative variables. Examples of qualitative variables are gender, race, and petal color.
Quantitative variables have values that are typically numeric, such as measurements.
Mathematical operations can be applied to these data. Examples of quantitative variables are
CSTC COLLEGE OF SCIENCES TECHNOLOGY AND
COMMUNICATION, INC.
CSTC College Bldg. Gen. Luna St. Maharlika Hi-way, Pob. 3, Arellano
Sub. Sariaya Province of Quezon R4A
Registrar’s Office: 042 3290850 / 042 7192818
CSTC IT Center: 042 7192805
Atimonan Contact Number: 042 7171420
Descriptive Measures
Descriptive measures of populations are called parameters and are typically written using Greek
letters. The population mean is μ (mu). The population variance is σ2 (sigma squared) and
population standard deviation is σ (sigma).
Descriptive measures of samples are called statistics and are typically written using Roman
letters. The sample mean is (x-bar). The sample variance is s2 and the sample standard
deviation is s. Sample statistics are used to estimate unknown population parameters.
In this section, we will examine descriptive statistics in terms of measures of center and
measures of dispersion. These descriptive statistics help us to identify the center and spread of
the data.
Measures of Center
Mean
The arithmetic mean of a variable, often called the average, is computed by adding up all the
values and dividing by the total number of values.
The population mean is represented by the Greek letter μ (mu). The sample mean is
represented by x̄(x-bar). The sample mean is usually the best, unbiased estimate of the
population mean. However, the mean is influenced by extreme values (outliers) and may not be
the best measure of center with strongly skewed data. The following equations compute the
population mean and sample mean.
where xi is an element in the data set, N is the number of elements in the population, and n is
the number of elements in the sample data set.
Median
The median of a variable is the middle value of the data set when the data are sorted in order
from least to greatest. It splits the data into two equal halves with 50% of the data below the
median and 50% above the median. The median is resistant to the influence of outliers, and
may be a better measure of center with strongly skewed data.
The calculation of the median depends on the number of observations in the data set.
To calculate the median with an odd number of values (n is odd), first sort the data from
smallest to largest.
Mode
CSTC COLLEGE OF SCIENCES TECHNOLOGY AND
COMMUNICATION, INC.
CSTC College Bldg. Gen. Luna St. Maharlika Hi-way, Pob. 3, Arellano
Sub. Sariaya Province of Quezon R4A
Registrar’s Office: 042 3290850 / 042 7192818
CSTC IT Center: 042 7192805
Atimonan Contact Number: 042 7171420
The mode is the most frequently occurring value and is commonly used with qualitative data as
the values are categorical. Categorical data cannot be added, subtracted, multiplied or divided,
so the mean and median cannot be computed. The mode is less commonly used with
quantitative data as a measure of center. Sometimes each value occurs only once and the
mode will not be meaningful.
Understanding the relationship between the mean and median is important. It gives us insight
into the distribution of the variable. For example, if the distribution is skewed right (positively
skewed), the mean will increase to account for the few larger observations that pull the
distribution to the right. The median will be less affected by these extreme large values, so in
this situation, the mean will be larger than the median. In a symmetric distribution, the mean,
median, and mode will all be similar in value. If the distribution is skewed left (negatively
skewed), the mean will decrease to account for the few smaller observations that pull the
distribution to the left. Again, the median will be less affected by these extreme small
observations, and in this situation, the mean will be less than the median.
Measures of Dispersion
Measures of center look at the average or middle values of a data set. Measures of dispersion
look at the spread or variation of the data. Variation refers to the amount that the values vary
among themselves. Values in a data set that are relatively close to each other have lower
measures of variation. Values that are spread farther apart have higher measures of variation.
Examine the two histograms below. Both groups have the same mean weight, but the values of
Group A are more spread out compared to the values in Group B. Both groups have an average
weight of 267 lb. but the weights of Group A are more variable.
Range
The range of a variable is the largest value minus the smallest value. It is the simplest measure
and uses only these two values in a quantitative data set.
Variance
The variance uses the difference between each value and its arithmetic mean. The differences
are squared to deal with positive and negative differences. The sample variance (s2) is an
unbiased estimator of the population variance (σ2), with n-1 degrees of freedom.
Degrees of freedom: In general, the degrees of freedom for an estimate is equal to the number
of values minus the number of parameters estimated en route to the estimate in question.
The sample variance is unbiased due to the difference in the denominator. If we used “n” in the
denominator instead of “n – 1”, we would consistently underestimate the true population
variance. To correct this bias, the denominator is modified to “n – 1”.
CSTC COLLEGE OF SCIENCES TECHNOLOGY AND
COMMUNICATION, INC.
CSTC College Bldg. Gen. Luna St. Maharlika Hi-way, Pob. 3, Arellano
Sub. Sariaya Province of Quezon R4A
Registrar’s Office: 042 3290850 / 042 7192818
CSTC IT Center: 042 7192805
Atimonan Contact Number: 042 7171420
σ2 = s2 =
Standard Deviation
The standard deviation is the square root of the variance (both population and sample). While
the sample variance is the positive, unbiased estimator for the population variance, the units for
the variance are squared. The standard deviation is a common method for numerically
describing the distribution of a variable. The population standard deviation is σ (sigma) and
sample standard deviation is s.
Population standard deviation Sample standard deviation
CV = CV =
Variability
Variability is described in many different ways. Standard deviation measures point to point
variability within a sample, i.e., variation among individual sampling units. Coefficient of variation
also measures point to point variability but on a relative basis (relative to the mean), and is not
influenced by measurement units. Standard error measures the sample to sample variability, i.e.
variation among repeated samples in the sampling process. Typically, we only have one sample
and standard error allows us to quantify the uncertainty in our sampling process.
Basic Statistics Example using Excel and Minitab Software
Consider the following tally from 11 sample plots on Heiburg Forest, where Xi is the number of
downed logs per acre. Compute basic statistics for the sample plots.
CSTC COLLEGE OF SCIENCES TECHNOLOGY AND
COMMUNICATION, INC.
CSTC College Bldg. Gen. Luna St. Maharlika Hi-way, Pob. 3, Arellano
Sub. Sariaya Province of Quezon R4A
Registrar’s Office: 042 3290850 / 042 7192818
CSTC IT Center: 042 7192805
Atimonan Contact Number: 042 7171420
Probability Distribution
Once we have organized and summarized your sample data, the next step is to identify the
underlying distribution of our random variable. Computing probabilities for continuous random
variables are complicated by the fact that there are an infinite number of possible values that
our random variable can take on, so the probability of observing a particular value for a random
variable is zero. Therefore, to find the probabilities associated with a continuous random
variable, we use a probability density function (PDF).
A PDF is an equation used to find probabilities for continuous random variables. The PDF must
satisfy the following two rules:
The area under the curve must equal one (over all possible values of the random variable).
The probabilities must be equal to or greater than zero for all possible values of the random
variable.
The area under the curve of the probability density function over some interval represents the
probability of observing those values of the random variable in that interval.
The Normal Distribution
Many continuous random variables have a bell-shaped or somewhat symmetric distribution.
This is a normal distribution. In other words, the probability distribution of its relative frequency
histogram follows a normal curve. The curve is bell-shaped, symmetric about the mean, and
defined by µ and σ (the mean and standard deviation).
CSTC COLLEGE OF SCIENCES TECHNOLOGY AND
COMMUNICATION, INC.
CSTC College Bldg. Gen. Luna St. Maharlika Hi-way, Pob. 3, Arellano
Sub. Sariaya Province of Quezon R4A
Registrar’s Office: 042 3290850 / 042 7192818
CSTC IT Center: 042 7192805
Atimonan Contact Number: 042 7171420
There are normal curves for every combination of µ and σ. The mean (µ) shifts the curve to the
left or right. The standard deviation (σ) alters the spread of the curve. The first pair of curves
have different means but the same standard deviation. The second pair of curves share the
same mean (µ) but have different standard deviations. The pink curve has a smaller standard
deviation. It is narrower and taller, and the probability is spread over a smaller range of values.
The blue curve has a larger standard deviation. The curve is flatter and the tails are thicker. The
probability is spread over a larger range of values.
CSTC COLLEGE OF SCIENCES TECHNOLOGY AND
COMMUNICATION, INC.
CSTC College Bldg. Gen. Luna St. Maharlika Hi-way, Pob. 3, Arellano
Sub. Sariaya Province of Quezon R4A
Registrar’s Office: 042 3290850 / 042 7192818
CSTC IT Center: 042 7192805
Atimonan Contact Number: 042 7171420
For example, if a normally distributed random variable has a μ = 6 and σ = 2, then a value of x =
7 corresponds to a Z-score of 0.5.
This tells you that 7 is one-half a standard deviation above its mean. We can use this
relationship to find probabilities for any normal random variable.
To find the area for values of X, a normal random variable, draw a picture of the area of interest,
convert the x-values to Z-scores using the Z-score and then use the standard normal table to
find areas to the left, to the right, or in between.
Assessing Normality
If the distribution is unknown and the sample size is not greater than 30 (Central Limit
Theorem), we have to assess the assumption of normality. Our primary method is the normal
probability plot. This plot graphs the observed data, ranked in ascending order, against the
“expected” Z-score of that rank. If the sample data were taken from a normally distributed
random variable, then the plot would be approximately linear.
Examine the following probability plot. The center line is the relationship we would expect to see
if the data were drawn from a perfectly normal distribution. Notice how the observed data (red
dots) loosely follow this linear relationship. Minitab also computes an Anderson-Darling test to
assess normality. The null hypothesis for this test is that the sample data have been drawn from
a normally distributed population. A p-value greater than 0.05 supports the assumption of
normality.
The observed data do not follow a linear pattern and the p-value for the A-D test is less than
CSTC COLLEGE OF SCIENCES TECHNOLOGY AND
COMMUNICATION, INC.
CSTC College Bldg. Gen. Luna St. Maharlika Hi-way, Pob. 3, Arellano
Sub. Sariaya Province of Quezon R4A
Registrar’s Office: 042 3290850 / 042 7192818
CSTC IT Center: 042 7192805
Atimonan Contact Number: 042 7171420
_________________________________________________________________________
_________________________________________________________________________
_________________________________________________________________________
_________________________________________________________________________
_________________________________________________________________________
_________________________________________________________________________
VI. References
Calderon, Jose and Gonzales, Expectacion (2015), Methods of Research and Thesis
Writing, National Bookstore, Mandaluyong City, Philippines
Trinidad, Jose Eos and ADMU (2028), Researching: Philippine Realities - A Guide to
Qualitative, Quantitative, and Humanities Research, Ateneo de Manila University Press,
Quezon City, Philippines
Paler-Calmorin, Laurentina and Calmorin, Mechor (2010), Research Methods and Thesis
Writing, Rex Bookstore, Inc., Manila. Philippines
Prepared by:
Professor