0% found this document useful (0 votes)
113 views11 pages

Management Science L5

This document is an instructional module on descriptive statistics from CSTC College. It begins with an introduction to descriptive statistics and objectives to describe data features, identify summaries, and describe univariate analysis, probability, and distributions. The lesson proper then discusses descriptive measures including parameters and statistics, variables, measures of center like the mean, median and mode, and measures of dispersion like range, variance and standard deviation.

Uploaded by

Santos Jewel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
113 views11 pages

Management Science L5

This document is an instructional module on descriptive statistics from CSTC College. It begins with an introduction to descriptive statistics and objectives to describe data features, identify summaries, and describe univariate analysis, probability, and distributions. The lesson proper then discusses descriptive measures including parameters and statistics, variables, measures of center like the mean, median and mode, and measures of dispersion like range, variance and standard deviation.

Uploaded by

Santos Jewel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

CSTC COLLEGE OF SCIENCES TECHNOLOGY AND

COMMUNICATION, INC.
CSTC College Bldg. Gen. Luna St. Maharlika Hi-way, Pob. 3, Arellano
Sub. Sariaya Province of Quezon R4A
Registrar’s Office: 042 3290850 / 042 7192818
CSTC IT Center: 042 7192805
Atimonan Contact Number: 042 7171420

SCHOOL OF TEACHER EDUCATION

Instructional Module in Management Science

Preliminaries
I. Lesson Number 5
II. Lesson Title Descriptive Statistics: Probability, Distribution, Univariate Data
III. Brief Introduction Statistics has become the universal language of the sciences, and
of the Lesson data analysis can lead to powerful results. As scientists,
researchers, and managers working in the natural resources sector,
we all rely on statistical analysis to help us answer the questions that
arise in the populations we manage. 
IV. Lesson Objectives a. Describe the basic features of the data in a study
b. Identify simple summaries for sample and measurements.
c. Describe the:
 Univariate Analysis
 Probability
 Distribution

Lesson Proper
I. Getting Started
Briefly explain the chart below.

II. Discussion
Descriptive Statistics
A population is the group to be studied, and population data is a collection of all elements in the
population.
Populations are characterized by descriptive measures called parameters. Inferences about
parameters are based on sample statistics. For example, the population mean (µ) is estimated
by the sample mean (x̄). The population variance (σ2) is estimated by the sample variance (s2).
Variables are the characteristics we are interested in. For example:
The length of fish in Long Lake.
The pH of lakes in the Adirondack Park.
The weight of grizzly bears in Yellowstone National Park.
Variables are divided into two major groups: qualitative and quantitative. Qualitative variables
have values that are attributes or categories. Mathematical operations cannot be applied to
qualitative variables. Examples of qualitative variables are gender, race, and petal color.
Quantitative variables have values that are typically numeric, such as measurements.
Mathematical operations can be applied to these data. Examples of quantitative variables are
CSTC COLLEGE OF SCIENCES TECHNOLOGY AND
COMMUNICATION, INC.
CSTC College Bldg. Gen. Luna St. Maharlika Hi-way, Pob. 3, Arellano
Sub. Sariaya Province of Quezon R4A
Registrar’s Office: 042 3290850 / 042 7192818
CSTC IT Center: 042 7192805
Atimonan Contact Number: 042 7171420

age, height, and length.


Quantitative variables can be broken down further into two more categories: discrete and
continuous variables. Discrete variables have a finite or countable number of possible values.
Think of discrete variables as “hens.” Hens can lay 1 egg, or 2 eggs, or 13 eggs… There are a
limited, definable number of values that the variable could take on.
Continuous variables have an infinite number of possible values. Think of continuous variables
as “cows.” Cows can give 4.6713245 gallons of milk, or 7.0918754 gallons of milk, or 13.272698
gallons of milk … There are an almost infinite number of values that a continuous variable could
take on.

Descriptive Measures
Descriptive measures of populations are called parameters and are typically written using Greek
letters. The population mean is μ (mu). The population variance is σ2 (sigma squared) and
population standard deviation is σ (sigma).
Descriptive measures of samples are called statistics and are typically written using Roman

letters. The sample mean is  (x-bar). The sample variance is s2 and the sample standard
deviation is s. Sample statistics are used to estimate unknown population parameters.
In this section, we will examine descriptive statistics in terms of measures of center and
measures of dispersion. These descriptive statistics help us to identify the center and spread of
the data.
Measures of Center
Mean
The arithmetic mean of a variable, often called the average, is computed by adding up all the
values and dividing by the total number of values.
The population mean is represented by the Greek letter μ (mu). The sample mean is
represented by x̄(x-bar). The sample mean is usually the best, unbiased estimate of the
population mean. However, the mean is influenced by extreme values (outliers) and may not be
the best measure of center with strongly skewed data. The following equations compute the
population mean and sample mean.

  
where xi is an element in the data set, N is the number of elements in the population, and n is
the number of elements in the sample data set.
Median
The median of a variable is the middle value of the data set when the data are sorted in order
from least to greatest. It splits the data into two equal halves with 50% of the data below the
median and 50% above the median. The median is resistant to the influence of outliers, and
may be a better measure of center with strongly skewed data.

The calculation of the median depends on the number of observations in the data set.
To calculate the median with an odd number of values (n is odd), first sort the data from
smallest to largest.
Mode
CSTC COLLEGE OF SCIENCES TECHNOLOGY AND
COMMUNICATION, INC.
CSTC College Bldg. Gen. Luna St. Maharlika Hi-way, Pob. 3, Arellano
Sub. Sariaya Province of Quezon R4A
Registrar’s Office: 042 3290850 / 042 7192818
CSTC IT Center: 042 7192805
Atimonan Contact Number: 042 7171420

The mode is the most frequently occurring value and is commonly used with qualitative data as
the values are categorical. Categorical data cannot be added, subtracted, multiplied or divided,
so the mean and median cannot be computed. The mode is less commonly used with
quantitative data as a measure of center. Sometimes each value occurs only once and the
mode will not be meaningful.
Understanding the relationship between the mean and median is important. It gives us insight
into the distribution of the variable. For example, if the distribution is skewed right (positively
skewed), the mean will increase to account for the few larger observations that pull the
distribution to the right. The median will be less affected by these extreme large values, so in
this situation, the mean will be larger than the median. In a symmetric distribution, the mean,
median, and mode will all be similar in value. If the distribution is skewed left (negatively
skewed), the mean will decrease to account for the few smaller observations that pull the
distribution to the left. Again, the median will be less affected by these extreme small
observations, and in this situation, the mean will be less than the median.

Measures of Dispersion
Measures of center look at the average or middle values of a data set. Measures of dispersion
look at the spread or variation of the data. Variation refers to the amount that the values vary
among themselves. Values in a data set that are relatively close to each other have lower
measures of variation. Values that are spread farther apart have higher measures of variation.
Examine the two histograms below. Both groups have the same mean weight, but the values of
Group A are more spread out compared to the values in Group B. Both groups have an average
weight of 267 lb. but the weights of Group A are more variable.

Range
The range of a variable is the largest value minus the smallest value. It is the simplest measure
and uses only these two values in a quantitative data set.
Variance
The variance uses the difference between each value and its arithmetic mean. The differences
are squared to deal with positive and negative differences. The sample variance (s2) is an
unbiased estimator of the population variance (σ2), with n-1 degrees of freedom.
Degrees of freedom: In general, the degrees of freedom for an estimate is equal to the number
of values minus the number of parameters estimated en route to the estimate in question.
The sample variance is unbiased due to the difference in the denominator. If we used “n” in the
denominator instead of “n – 1”, we would consistently underestimate the true population
variance. To correct this bias, the denominator is modified to “n – 1”.
CSTC COLLEGE OF SCIENCES TECHNOLOGY AND
COMMUNICATION, INC.
CSTC College Bldg. Gen. Luna St. Maharlika Hi-way, Pob. 3, Arellano
Sub. Sariaya Province of Quezon R4A
Registrar’s Office: 042 3290850 / 042 7192818
CSTC IT Center: 042 7192805
Atimonan Contact Number: 042 7171420

Population variance                          Sample variance

σ2 =                s2 = 
Standard Deviation
The standard deviation is the square root of the variance (both population and sample). While
the sample variance is the positive, unbiased estimator for the population variance, the units for
the variance are squared. The standard deviation is a common method for numerically
describing the distribution of a variable. The population standard deviation is σ (sigma) and
sample standard deviation is s.
Population standard deviation                 Sample standard deviation

                                     


Standard Error of the Means
Commonly, we use the sample mean x̄ to estimate the population mean μ. For example, if we
want to estimate the heights of eighty-year-old cherry trees, we can proceed as follows:
Randomly select 100 trees
Compute the sample mean of the 100 heights
Use that as our estimate
We want to use this sample mean to estimate the true but unknown population mean. But our
sample of 100 trees is just one of many possible samples (of the same size) that could have
been randomly selected. Imagine if we take a series of different random samples from the same
population and all the same size:
Sample 1—we compute sample mean x̄
Sample 2—we compute sample mean x̄
Sample 3—we compute sample mean x̄
Etc.
Each time we sample, we may get a different result as we are using a different subset of data to
compute the sample mean. This shows us that the sample mean is a random variable!
The sample mean (x̄) is a random variable with its own probability distribution called the
sampling distribution of the sample mean. The distribution of the sample mean will have a mean
equal to µ and a standard deviation equal to  .
The standard error   is the standard deviation of all possible sample means.
In reality, we would only take one sample, but we need to understand and quantify the sample
to sample variability that occurs in the sampling process.
The standard error is the standard deviation of the sample means and can be expressed in
different ways.

Note: s2 is the sample variance and s is the sample standard deviation


The Central Limit Theorem (CLT) states that the sampling distribution of the sample means will
approach a normal distribution as the sample size increases. If we do not have a normal
distribution, or know nothing about our distribution of our random variable, the CLT tells us that
the distribution of the x̄’s will become normal as n increases. How large does n have to be? A
general rule of thumb tells us that n ≥ 30.
The Central Limit Theorem tells us that regardless of the shape of our population, the sampling
distribution of the sample mean will be normal as the sample size increases.
Coefficient of Variation
To compare standard deviations between different populations or samples is difficult because
the standard deviation depends on units of measure. The coefficient of variation expresses the
standard deviation as a percentage of the sample or population mean. It is a unitless measure.
CSTC COLLEGE OF SCIENCES TECHNOLOGY AND
COMMUNICATION, INC.
CSTC College Bldg. Gen. Luna St. Maharlika Hi-way, Pob. 3, Arellano
Sub. Sariaya Province of Quezon R4A
Registrar’s Office: 042 3290850 / 042 7192818
CSTC IT Center: 042 7192805
Atimonan Contact Number: 042 7171420

Population data               Sample data

CV =          CV = 
Variability
Variability is described in many different ways. Standard deviation measures point to point
variability within a sample, i.e., variation among individual sampling units. Coefficient of variation
also measures point to point variability but on a relative basis (relative to the mean), and is not
influenced by measurement units. Standard error measures the sample to sample variability, i.e.
variation among repeated samples in the sampling process. Typically, we only have one sample
and standard error allows us to quantify the uncertainty in our sampling process.
Basic Statistics Example using Excel and Minitab Software
Consider the following tally from 11 sample plots on Heiburg Forest, where Xi is the number of
downed logs per acre. Compute basic statistics for the sample plots.
CSTC COLLEGE OF SCIENCES TECHNOLOGY AND
COMMUNICATION, INC.
CSTC College Bldg. Gen. Luna St. Maharlika Hi-way, Pob. 3, Arellano
Sub. Sariaya Province of Quezon R4A
Registrar’s Office: 042 3290850 / 042 7192818
CSTC IT Center: 042 7192805
Atimonan Contact Number: 042 7171420

(1) Sample mean: 


(2) Median = 35
(3) Variance:

(4) Standard deviation:  


(5) Range: 55 – 5 = 50
(6) Coefficient of variation:

(7) Standard error of the mean:

Probability Distribution
Once we have organized and summarized your sample data, the next step is to identify the
underlying distribution of our random variable. Computing probabilities for continuous random
variables are complicated by the fact that there are an infinite number of possible values that
our random variable can take on, so the probability of observing a particular value for a random
variable is zero. Therefore, to find the probabilities associated with a continuous random
variable, we use a probability density function (PDF).
A PDF is an equation used to find probabilities for continuous random variables. The PDF must
satisfy the following two rules:
The area under the curve must equal one (over all possible values of the random variable).
The probabilities must be equal to or greater than zero for all possible values of the random
variable.
The area under the curve of the probability density function over some interval represents the
probability of observing those values of the random variable in that interval.
The Normal Distribution
Many continuous random variables have a bell-shaped or somewhat symmetric distribution.
This is a normal distribution. In other words, the probability distribution of its relative frequency
histogram follows a normal curve. The curve is bell-shaped, symmetric about the mean, and
defined by µ and σ (the mean and standard deviation).
CSTC COLLEGE OF SCIENCES TECHNOLOGY AND
COMMUNICATION, INC.
CSTC College Bldg. Gen. Luna St. Maharlika Hi-way, Pob. 3, Arellano
Sub. Sariaya Province of Quezon R4A
Registrar’s Office: 042 3290850 / 042 7192818
CSTC IT Center: 042 7192805
Atimonan Contact Number: 042 7171420

There are normal curves for every combination of µ and σ. The mean (µ) shifts the curve to the
left or right. The standard deviation (σ) alters the spread of the curve. The first pair of curves
have different means but the same standard deviation. The second pair of curves share the
same mean (µ) but have different standard deviations. The pink curve has a smaller standard
deviation. It is narrower and taller, and the probability is spread over a smaller range of values.
The blue curve has a larger standard deviation. The curve is flatter and the tails are thicker. The
probability is spread over a larger range of values.
CSTC COLLEGE OF SCIENCES TECHNOLOGY AND
COMMUNICATION, INC.
CSTC College Bldg. Gen. Luna St. Maharlika Hi-way, Pob. 3, Arellano
Sub. Sariaya Province of Quezon R4A
Registrar’s Office: 042 3290850 / 042 7192818
CSTC IT Center: 042 7192805
Atimonan Contact Number: 042 7171420

Figure 10. A comparison of normal curves.


Properties of the normal curve:
The mean is the center of this distribution and the highest point.
The curve is symmetric about the mean. (The area to the left of the mean equals the area to the
right of the mean.)
The total area under the curve is equal to one.
As x increases and decreases, the curve goes to zero but never touches.

The PDF of a normal curve is  .


A normal curve can be used to estimate probabilities.
A normal curve can be used to estimate proportions of a population that have certain x-values.
The Standard Normal Distribution
There are millions of possible combinations of means and standard deviations for continuous
random variables. Finding probabilities associated with these variables would require us to
integrate the PDF over the range of values we are interested in. To avoid this, we can rely on
the standard normal distribution. The standard normal distribution is a special normal
distribution with a µ = 0 and σ = 1. We can use the Z-score to standardize any normal random
variable, converting the x-values to Z-scores, thus allowing us to use probabilities from the
standard normal table. So how do we find area under the curve associated with a Z-score?
Standard Normal Table
The standard normal table gives probabilities associated with specific Z-scores.
The table we use is cumulative from the left.
The negative side is for all Z-scores less than zero (all values less than the mean).
The positive side is for all Z-scores greater than zero (all values greater than the mean).
Not all standard normal tables work the same way.
Reading the Standard Normal Table
Read down the Z-column to get the first part of the Z-score (1.6).
Read across the top row to get the second decimal place in the Z-score (0.02).
The intersection of this row and column gives the area under the curve to the left of the Z-score.
Finding Z-scores for a Given Area
What if we have an area and we want to find the Z-score associated with that area?
Instead of Z-score → area, we want area → Z-score.
We can use the standard normal table to find the area in the body of values and read
CSTC COLLEGE OF SCIENCES TECHNOLOGY AND
COMMUNICATION, INC.
CSTC College Bldg. Gen. Luna St. Maharlika Hi-way, Pob. 3, Arellano
Sub. Sariaya Province of Quezon R4A
Registrar’s Office: 042 3290850 / 042 7192818
CSTC IT Center: 042 7192805
Atimonan Contact Number: 042 7171420

backwards to find the associated Z-score.


Using the table, search the probabilities to find an area that is closest to the probability you are
interested in.
Common Z-scores
There are many commonly used Z-scores:
Z.05 = 1.645 and the area between -1.645 and 1.645 is 90%
Z.025 = 1.96 and the area between -1.96 and 1.96 is 95%
Z.005 = 2.575 and the area between -2.575 and 2.575 is 99%
Applications of the Normal Distribution
Typically, our normally distributed data do not have μ = 0 and σ = 1, but we can relate any
normal distribution to the standard normal distributions using the Z-score. We can transform
values of x to values of z.

For example, if a normally distributed random variable has a μ = 6 and σ = 2, then a value of x =
7 corresponds to a Z-score of 0.5.

This tells you that 7 is one-half a standard deviation above its mean. We can use this
relationship to find probabilities for any normal random variable.

To find the area for values of X, a normal random variable, draw a picture of the area of interest,
convert the x-values to Z-scores using the Z-score and then use the standard normal table to
find areas to the left, to the right, or in between.

Assessing Normality
If the distribution is unknown and the sample size is not greater than 30 (Central Limit
Theorem), we have to assess the assumption of normality. Our primary method is the normal
probability plot. This plot graphs the observed data, ranked in ascending order, against the
“expected” Z-score of that rank. If the sample data were taken from a normally distributed
random variable, then the plot would be approximately linear.
Examine the following probability plot. The center line is the relationship we would expect to see
if the data were drawn from a perfectly normal distribution. Notice how the observed data (red
dots) loosely follow this linear relationship. Minitab also computes an Anderson-Darling test to
assess normality. The null hypothesis for this test is that the sample data have been drawn from
a normally distributed population. A p-value greater than 0.05 supports the assumption of
normality.
The observed data do not follow a linear pattern and the p-value for the A-D test is less than
CSTC COLLEGE OF SCIENCES TECHNOLOGY AND
COMMUNICATION, INC.
CSTC College Bldg. Gen. Luna St. Maharlika Hi-way, Pob. 3, Arellano
Sub. Sariaya Province of Quezon R4A
Registrar’s Office: 042 3290850 / 042 7192818
CSTC IT Center: 042 7192805
Atimonan Contact Number: 042 7171420

0.005 indicating a non-normal population distribution.


Normality cannot be assumed. You must always verify this assumption. Remember, the
probabilities we are finding come from the standard NORMAL table. If our data are NOT
normally distributed, then these probabilities DO NOT APPLY.
Do you know if the population is normally distributed?
Do you have a large enough sample size (n≥30)? Remember the Central Limit Theorem?
Did you construct a normal probability plot?
III. Application(Performance Task -40%)
ACTIVITY 5
Answer the following questions:
1. Suppose that a study on the number of hours college students spend watching Netflix per
month is normally distributed with a mean of 48 hours and standard deviation of 3.24 hours.
What does the 68-95-99.7% rule tell us about the number of hours spent on Netflix in a given
month?
2. Suppose a class of 45 students take an exam and the recorded scores are normally
distributed with a mean score of 76 points and a standard deviation of 5.6 points. How many
scores will fall between 59.2 and 92.8?
IV. Assessment(Written Works-30%)
Solve for the following:
1. Given the data set: 6, 2, 3, 5, 4, 9, 12, 27.
Find the following:
a. Mean:
b. Median:
c. Mode:
2. Given the data set: Number 1 2 3 7 8 Frequency 1 2 5 4 2
Find the following:
a. Mean:
b. Median:
c. Mode:
V. Reflection(Performance Task -40%)
You learnt about descriptive statistics. Discuss the use of statistical methods in processing
data. Write your response in paragraphs and use specific situations as examples. You will
be graded using the criteria below.
Criteria: Content (10 pts.)                                         Organization of Ideas (3 pts.)

Relevance to the topic (5 pts.)                    Brevity (2 pts.)

_________________________________________________________________________
_________________________________________________________________________
_________________________________________________________________________
_________________________________________________________________________
_________________________________________________________________________
_________________________________________________________________________
VI. References
Calderon, Jose and Gonzales, Expectacion (2015), Methods of Research and Thesis
Writing, National Bookstore, Mandaluyong City, Philippines
Trinidad, Jose Eos and ADMU (2028), Researching: Philippine Realities - A Guide to
Qualitative, Quantitative, and Humanities Research, Ateneo de Manila University Press,
Quezon City, Philippines
Paler-Calmorin, Laurentina and Calmorin, Mechor (2010), Research Methods and Thesis
Writing, Rex Bookstore, Inc., Manila. Philippines

Prepared by:

JEWELSON M. SANTOS, LPT,EdD,DHum


CSTC COLLEGE OF SCIENCES TECHNOLOGY AND
COMMUNICATION, INC.
CSTC College Bldg. Gen. Luna St. Maharlika Hi-way, Pob. 3, Arellano
Sub. Sariaya Province of Quezon R4A
Registrar’s Office: 042 3290850 / 042 7192818
CSTC IT Center: 042 7192805
Atimonan Contact Number: 042 7171420

Professor

Reviewed by: Approved by:

JOHN MARC R. MENDOZA, MAEd, MLIS JESS JAY M. SAJISE, DBA


Program Head, School of Teacher Education Vice President of Academic Affairs External

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy