100% found this document useful (2 votes)
435 views7 pages

Descriptive Statistics MBA

Descriptive statistics are used to describe basic features of data through simple summaries. There are three major characteristics examined: distribution, central tendency (mean, median, mode), and dispersion (range, variance, standard deviation). The normal distribution is a bell-shaped curve where the mean, median and mode are equal and about 68% of values fall within one standard deviation of the mean. Descriptive statistics provide simple descriptions of data, while inferential statistics are used to make generalizations beyond the sample data.

Uploaded by

Kritika Jaiswal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
435 views7 pages

Descriptive Statistics MBA

Descriptive statistics are used to describe basic features of data through simple summaries. There are three major characteristics examined: distribution, central tendency (mean, median, mode), and dispersion (range, variance, standard deviation). The normal distribution is a bell-shaped curve where the mean, median and mode are equal and about 68% of values fall within one standard deviation of the mean. Descriptive statistics provide simple descriptions of data, while inferential statistics are used to make generalizations beyond the sample data.

Uploaded by

Kritika Jaiswal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 7

Descriptive Statistics

Descriptive statistics are used to describe the basic features of the data in a study. They
provide simple summaries about the sample and the measures. Together with simple graphics
analysis, they form the basis of virtually every quantitative analysis of data. Descriptive
statistics are typically distinguished from inferential statistics. With descriptive statistics you
are simply describing what is or what the data shows. With inferential statistics, you are
trying to reach conclusions that extend beyond the immediate data alone. For instance, we use
inferential statistics to try to infer from the sample data what the population might think. Or,
we use inferential statistics to make judgments of the probability that an observed difference
between groups is a dependable one or one that might have happened by chance in this study.

There are three major characteristics of a single variable that we tend to look at:

 The distribution
 The central tendency
 The dispersion

In most situations, we would describe all three of these characteristics for each of the
variables in our study.

The Distribution: Data can be "distributed” or spread out in different ways. It can be spread
out more on the left or more on the right or it can be all jumbled up.

But there are many cases where the data tends to be around a central value with no bias left or
right, and it looks like this:

This distribution which is bell shaped is a Normal Distribution. It is often called a "Bell
Curve"
because it looks like a bell.
The Normal Distribution has some properties which are as follows:
 It works on the principle of probability. (Likelihood that even will occur)
 mean = median = mode
 Symmetry about the center.
 50% of values less than the mean and 50% greater than the mean

Normal Distribution: The Concept

A probability distribution that plots all of its values in a symmetrical fashion and most of the results
are situated around the probability's mean. Values are equally likely to plot either above or
below the mean. Grouping takes place at values that are close to the mean and then tails off
symmetrically away from the mean. Normal distribution is also known as a "Gaussian
distribution" or "bell curve".

The normal distribution is produced by the normal density function,

In this exponential function e is the constant 2.71828…, is the mean, and σ is the standard deviation.
The probability of a random variable falling within any given range of values is equal to the
proportion of the area enclosed under the function’s graph between the given values and
above the x-axis. Because the denominator (σ√2π), known as the normalizing coefficient,
causes the total area enclosed by the graph to be exactly equal to unity, probabilities can be
obtained directly from the corresponding area—i.e., an area of 0.5 corresponds to a
probability of 0.5. tables were generated in the 19th century for the special case of = 0 and σ =
1, known as the standard normal distribution, and these tables can be used for any normal
distribution after the variables are suitably rescaled by subtracting their mean and dividing by
their standard deviation, (x − μ)/σ.

Measures of Shape
As defined earlier also, normal distribution is bell shaped. The shape of distribution is
assessed by examining skewness and Kurtosis.
Skewness:-
Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution,
or data set, is symmetric if it looks the same to the left and right of the center point. Skewness
is the tendency of deviation from the mean to be larger in one direction than in another. A
positively skewed distribution has a "tail" which is pulled in the positive direction. A
negatively skewed distribution has a "tail" which is pulled in the negative direction.

It is calculated by the formula:-

Where is the mean, is the standard deviation, and N is the number of data points. A
normal distribution has a skewness of 0.
Kurtosis:
Kurtosis is a measure of whether the data are peaked or flat relative to a normal distribution.
That is, data sets with high kurtosis tend to have a distinct peak near the mean, decline rather
rapidly, and have heavy tails. Data sets with low kurtosis tend to have a flat top near the mean
rather than a sharp peak. A uniform distribution would be the extreme case.
A normal distribution is a mesokurtic distribution. A pure leptokurtic distribution has a higher
peak than the normal distribution and has heavier tails. A pure platykurtic distribution has a
lower peak than a normal distribution and lighter tails.

It is calculated by the formula:-

Where is the mean, is the standard deviation, and N is the number of data points. A
normal distribution has kurtosis equal to 0.

Need of Normal Distribution

Many things actually are normally distributed, or very close to it. For example, height and intelligence are
approximately normally distributed; measurement errors also often have a normal distribution.

The normal distribution is easy to work with mathematically. In many practical cases, the methods
developed using normal theory work quite well even when the distribution is not normal.

There is a very strong connection between the size of a sample N and the extent to which a sampling
distribution approaches the normal form. Many sampling distributions based on large N can be
approximated by the normal distribution even though the population distribution itself is definitely not
normal.
Central Tendency: The central tendency of a distribution is an estimate of the "center" of a
distribution of values. There are three major types of estimates of central tendency:

 Mean
 Median
 Mode

The Mean or average is probably the most commonly used method of describing central
tendency. This is given by the formula:-

µ = ∑×/N

µ = Mean

X= Random Variable

N= No. of Respondents

To compute the mean all you do is add up all the values and divide by the number of values.
For example, the mean or average quiz score is determined by summing all the scores and
dividing by the number of students taking the exam. For example, consider the test score
values:

15, 20, 21, 20, 36, 15, 25, 15

The sum of these 8 values is 167, so the mean is 167/8 = 20.875.

The Median is a measure of central tendency given as the value above which half of the
values fall and below which half of the values fall. Median is the 50 th percentile. Data is
arranged in ascending or descending order and middle value is the median if data number is
odd. If data is even in number, then is formulated by adding the two middle values and
dividing their sum by 2.

If we order the 8 scores shown above, we would get:

15,15,15,20,20,21,25,36

There are 8 scores and score #4 and #5 represent the halfway point. Since both of these scores
are 20, the median is 20

The mode is the most frequently occurring value in the set of scores. To determine the mode,
you might again order the scores as shown above, and then count each one. The most
frequently occurring value is the mode. In the example given above, the value 15 occurs three
times and is the mode. In some distributions there is more than one modal value. For
instance, in a bimodal distribution there are two values that occur most frequently.

Note: Notice that for the same set of 8 scores we got three different values -- 20.875, 20,
and 15 -- for the mean, median and mode respectively. If the distribution is truly normal
(i.e., bell-shaped), the mean, median and mode are all equal to each other.

Dispersion: Dispersion refers to the spread of the values around the central tendency. There
are three common measures of dispersion, the range, variance and the standard deviation.

The range is simply the highest value minus the lowest value. In our example distribution,
the high value is 36 and the low is 15, so the range is 36 - 15 = 21.

Variance is the measure of the dispersion or deviation of a set of data points around their
mean value. Variance is a mathematical expectation of the average squared deviations from
the mean. It is depicted by the symbol σ 2. In order to calculate the variance, first calculate the
mean, then subtract each value from the mean, square the result and find out the average of
the result.
It is calculated using the formula:-

Example: Five people have Rs. 600, 470, 170, 430 and 300. Find out the variance.
Answer: - Mean (µ) = 600+470+170+ 430+300 = 1970/5 = 394.
Variance = (600-394)2 + (470-394)2+ (170-394)2 + (430-394)2 + (300-394)2 /5 = 21, 704

The Standard Deviation is a more accurate and detailed estimate of dispersion because
it shows the relation that set of scores has to the mean of the sample. . It is calculated
as:-

Standard Deviation:

In the above example of variance, the standard deviation would be = √ 21, 704 = 147.33 Rs.
Note: Once you know the mean and standard deviation of the population, you can tell how
far your data points lay from the mean and in what percentage. In a normal distribution
this is:-
 68% of the distribution lies within one standard deviation of the mean.
 95% of the distribution lies within two standard deviations of the mean.
 99.7% of the distribution lies within three standard deviations of the mean.
Presenting the Univariate Data Analysis

A basic way of presenting univariate data is to create a frequency distribution of the


individual cases, which involves presenting the number of attributes of the variable studied
for each case observed in the sample. The frequency (f) of a particular observation is the
number of times the observation occurs in the data. The distribution of a variable is the
pattern of frequencies of the observation. This can be done in a table format, histograms, with
a bar chart or a similar form of graphical representation.
Frequency distributions can show either the actual number of observations falling in each
range or the percentage of observations. When frequency distribution is done with the help of
percentage; the distribution is called a relative frequency distribution. Frequency distribution
tables can be used for both categorical and numeric variables. Continuous variables should
only be used with class intervals.
A sample distribution table and a bar chart for a univariate analysis are presented below

Age range Frequency Percent


under 18 10 5
18–29 50 25
29–45 40 20
45–65 40 20
over 65 60 30
Valid cases: 200
Missing cases: 0

Apart from frequency distribution tables, descriptive statistics also includes the measures of
central tendency, dispersion and shape.
Dealing with Missing data: There are certain situations in which respondents knowingly or
unknowingly don’t answer certain questions. The responses corresponding to such
respondents are known as missing data.
The most common approach dealing with missing data is list wise deletion whereby we
simply omit those cases with missing data and to run our analyses on what remains. This
approach is usually called list wise deletion, but it is also known as complete case analysis.
This approach results in reduced sample size and sometimes biased estimate of population
parameter. Another approach is pairwise deletion in which each element of the inter-
correlation matrix is estimated using all available data. If one participant reports his income
and expenditure, but not his age, he is included in the correlation of income and expenditure,
but not in the correlations involving age. This approach also suffers from several
disadvantages like estimate of parameters will be based on different sets of data, with
different sample sizes and different standard errors. Some researcher also use mean value to
substitute for the missing data. Others also conduct regression analysis to deal with missing
data. Missing data coding should also be done with caution. The missing data should be
assigned a number that should not be equal to the value of variable obtained in the survey. All
other methods of presenting univariate data have been explained in data processing chapter.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy