100% found this document useful (1 vote)
1K views10 pages

Descriptive and Inferential Statistics

The document describes descriptive and inferential statistics. Descriptive statistics describe patterns in data through measures of central tendency and spread. Inferential statistics allow generalization from samples to populations through hypothesis testing, confidence intervals, and statistical modeling. Key differences are that descriptive statistics only describe sample data while inferential statistics enable broader conclusions.

Uploaded by

Ryan Menina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
1K views10 pages

Descriptive and Inferential Statistics

The document describes descriptive and inferential statistics. Descriptive statistics describe patterns in data through measures of central tendency and spread. Inferential statistics allow generalization from samples to populations through hypothesis testing, confidence intervals, and statistical modeling. Key differences are that descriptive statistics only describe sample data while inferential statistics enable broader conclusions.

Uploaded by

Ryan Menina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 10

DESCRIPTIVE AND INFERENTIAL STATISTICS

The field of statistics is divided into two major divisions: descriptive and inferential. Each of
these segments is important, offering different techniques that accomplish different
objectives. Descriptive statistics describe what is going on in a population or data set.
Inferential statistics, by contrast, allow scientists to take findings from a sample group and
generalize them to a larger population.

The two types of statistics have some important differences.

Descriptive Statistics

Descriptive statistics is the type of statistics that probably springs to most people’s minds
when they hear the word “statistics.” In this branch of statistics, the goal is to describe.
Numerical measures are used to tell about features of a set of data. There are a number of
items that belong in this portion of statistics, such as:

 The average, or measure of the center of a data set, consisting of the mean, median,
mode, or midrange
 The spread of a data set, which can be measured with the range or standard deviation
 Overall descriptions of data such as the five number summary
 Measurements such as skewness and kurtosis
 The exploration of relationships and correlation between paired data
 The presentation of statistical results in graphical form

These measures are important and useful because they allow scientists to see patterns
among data, and thus to make sense of that data.

Descriptive statistics can only be used to describe the population or data set under study:
The results cannot be generalized to any other group or population.

Types of Descriptive Statistics

There are two kinds of descriptive statistics that social scientists use:

Measures of central tendency capture general trends within the data and are calculated and
expressed as the mean, median, and mode.

A mean tells scientists the mathematical average of all of a data set, such as the average age
at first marriage; the median represents the middle of the data distribution, like the age that
sits in the middle of the range of ages at which people first marry; and, the mode might be
the most common age at which people first marry.

Measures of spread describe how the data are distributed and relate to each other, including:
 The range, the entire range of values present in a data set
 The frequency distribution, which defines how many times a particular value occurs
within a data set
 Quartiles, subgroups formed within a data set when all values are divided into four
equal parts across the range
 Mean absolute deviation, the average of how much each value deviates from the mean
 Variance, which illustrates how much of a spread exists in the data
 Standard deviation, which illustrates the spread of data relative to the mean

Measures of spread are often visually represented in tables, pie and bar charts, and
histograms to aid in the understanding of the trends within the data.

Inferential Statistics

Inferential statistics are produced through complex mathematical calculations that allow
scientists to infer trends about a larger population based on a study of a sample taken from
it.

Scientists use inferential statistics to examine the relationships between variables within a
sample and then make generalizations or predictions about how those variables will
relate to a larger population.

It is usually impossible to examine each member of the population individually. So scientists


choose a representative subset of the population, called a statistical sample, and from this
analysis, they are able to say something about the population from which the sample came.
There are two major divisions of inferential statistics:

 A confidence interval gives a range of values for an unknown parameter of the


population by measuring a statistical sample. This is expressed in terms of an interval
and the degree of confidence that the parameter is within the interval.
 Tests of significance or hypothesis testing where scientists make a claim about the
population by analyzing a statistical sample. By design, there is some uncertainty in
this process. This can be expressed in terms of a level of significance.

Techniques that social scientists use to examine the relationships between variables, and
thereby to create inferential statistics, include linear regression analyses, logistic regression
analyses, ANOVA, correlation analyses, structural equation modeling, and survival analysis.
When conducting research using inferential statistics, scientists conduct a test of
significance to determine whether they can generalize their results to a larger population.
Common tests of significance include the chi-square and t-test. These tell scientists the
probability that the results of their analysis of the sample are representative of the
population as a whole.

Descriptive vs. Inferential Statistics


Although descriptive statistics is helpful in learning things such as the spread and center of
the data, nothing in descriptive statistics can be used to make any generalizations. In
descriptive statistics, measurements such as the mean and standard deviation are stated as
exact numbers.

Even though inferential statistics uses some similar calculations—such the mean and
standard deviation—the focus is different for inferential statistics. Inferential statistics start
with a sample and then generalizes to a population. This information about a population is
not stated as a number. Instead, scientists express these parameters as a range of potential
numbers, along with a degree of confidence.

HOW TO CALCULATE A SAMPLE STANDARD DEVIATION

A common way to quantify the spread of a set of data is to use the sample standard
deviation. Your calculator may have a built in standard deviation button, which typically has
a sx on it. Sometimes it’s nice to know what your calculator is doing behind the scenes.

The steps below break down the formula for a standard deviation into a process. If you're
ever asked to do a problem like this on a test, know that sometimes it’s easier to remember a
step by step process rather than memorizing a formula.

After we look at the process we will see how to use it to calculate a standard deviation.

The Process

1. Calculate the mean of your data set.


2. Subtract the mean from each of the data values and list the differences.
3. Square each of the differences from the previous step and make a list of the squares.
 In other words, multiply each number by itself.
 Be careful with negatives. A negative times a negative makes a positive.
4. Add the squares from the previous step together.
5. Subtract one from the number of data values you started with.
6. Divide the sum from step four by the number from step five.
7. Take the square root of the number from the previous step. This is the standard
deviation.
 You may need to use a basic calculator to find the square root.
 Be sure to use significant figures when rounding your answer.

A Worked Example
Suppose you're given the data set 1,2,2,4,6. Work through each of the steps to find the
standard deviation.

1. Calculate the mean of your data set.

The the mean of the data is (1+2+2+4+6)/5 = 15/5 = 3.

2. Subtract the mean from each of the data values and list the differences.

Subtract 3 from each of the values 1,2,2,4,6


1-3 = -2
2-3 = -1
2-3 = -1
4-3 = 1
6-3 = 3Your list of differences is -2,-1,-1,1,3

3. Square each of the differences from the previous step and make a list of the squares.

You need to square each of the numbers -2,-1,-1,1,3


Your list of differences is -2,-1,-1,1,3
(-2)2 = 4
(-1)2=1
(-1)2=1
12=1
32=9Your list of squares is 4,1,1,1,9

1. Add the squares from the previous step together.

You need to add 4+1+1+1+9=16

2. Subtract one from the number of data values you started with.

You began this process (it may seem like awhile ago) with five data values. One less
than this is 5-1 = 4.

3. Divide the sum from step four by the number from step five.

The sum was 16, and the number from the previous step was 4. You divide these two
numbers 16/4 = 4.

4. Take the square root of the number from the previous step. This is the standard
deviation.

Your standard deviation is the square root of 4, which is 2.

Tip: It’s sometimes helpful to keep everything organized in a table, like the one shown
below.
DataData-Mean(Data-Mean)2
1 -2 4
2 -1 1
2 -1 1
4 1 1
6 3 9

We next add up all of entries in the right column. This is the sum of the squared deviations.
Next divide by one less than the number of data values. Finally, we take the square root of
this quotient and we are done.

WHAT IS SKEWNESS IN STATISTICS


Some distributions of data, such as the bell curve are symmetric. This means that the right and the
left of the distribution are perfect mirror images of one another. Not every distribution of data is
symmetric. Sets of data that are not symmetric are said to be asymmetric. The measure of how
asymmetric a distribution can be is called skewness.

The mean, median and mode are all measures of the center of a set of data.

The skewness of the data can be determined by how these quantities are related to one another.

Skewed to the Right

Data that are skewed to the right have a long tail that extends to the right. An alternate way of
talking about a data set skewed to the right is to say that it is positively skewed. In this situation, the
mean and the median are both greater than the mode. As a general rule, most of the time for data
skewed to the right, the mean will be greater than the median. In summary, for a data set skewed to
the right:

 Always: mean greater than the mode


 Always: median greater than the mode
 Most of the time: mean greater than median

Skewed to the Left

The situation reverses itself when we deal with data skewed to the left. Data that are skewed to the
left have a long tail that extends to the left. An alternate way of talking about a data set skewed to the
left is to say that it is negatively skewed.

In this situation, the mean and the median are both less than the mode. As a general rule, most of
the time for data skewed to the left, the mean will be less than the median. In summary, for a data
set skewed to the left:
 Always: mean less than the mode
 Always: median less than the mode
 Most of the time: mean less than median

Measures of Skewness

It’s one thing to look at two sets of data and determine that one is symmetric while the other is
asymmetric. It’s another to look at two sets of asymmetric data and say that one is more skewed
than the other. It can be very subjective to determine which is more skewed by simply looking at the
graph of the distribution. This is why there are ways to numerically calculate the measure of
skewness.

One measure of skewness, called Pearson’s first coefficient of skewness, is to subtract the mean from
the mode, and then divide this difference by the standard deviation of the data. The reason for
dividing the difference is so that we have a dimensionless quantity. This explains why data skewed
to the right has positive skewness. If the data set is skewed to the right, the mean is greater than the
mode, and so subtracting the mode from the mean gives a positive number. A similar argument
explains why data skewed to the left has negative skewness.

Pearson’s second coefficient of skewness is also used to measure the asymmetry of a data set. For
this quantity, we subtract the mode from the median, multiply this number by three and then divide
by the standard deviation.

Applications of Skewed Data

Skewed data arises quite naturally in various situations.

Incomes are skewed to the right because even just a few individuals who earn millions of dollars can
greatly affect the mean, and there are no negative incomes. Similarly, data involving the lifetime of a
product, such as a brand of light bulb, are skewed to the right. Here the smallest that a lifetime can
be is zero, and long lasting light bulbs will impart a positive skewness to the data.

WHAT IS KURTOSIS

Distributions of data and probability distributions are not all the same shape. Some are
asymmetric and skewed to the left or to the right. Other distributions are bimodaland have
two peaks. Another feature to consider when talking about a distribution is the shape of the
tails of the distribution on the far left and the far right. Kurtosis is the measure of the
thickness or heaviness of the tails of a distribution.

The kurtosis of a distributions is in one of three categories of classification:

 Mesokurtic
 Leptokurtic
 Platykurtic
We will consider each of these classifications in turn. Our examination of these categories
will not be as precise as we could be if we used the technical mathematical definition of
kurtosis.

Mesokurtic

Kurtosis is typically measured with respect to the normal distribution. A distribution that
has tails shaped in roughly the same way as any normal distribution, not just the standard
normal distribution, is said to be mesokurtic. The kurtosis of a mesokurtic distribution is
neither high nor low, rather it is considered to be a baseline for the two other classifications.

Besides normal distributions, binomial distributions for which p is close to 1/2 are
considered to be mesokurtic.

Leptokurtic

A leptokurtic distribution is one that has kurtosis greater than a mesokurtic distribution.

Leptokurtic distributions are sometimes identified by peaks that are thin and tall. The tails
of these distributions, to both the right and the left, are thick and heavy. Leptokurtic
distributions are named by the prefix "lepto" meaning "skinny."

There are many examples of leptokurtic distributions.

One of the most well known leptokurtic distributions is Student's t distribution.

Platykurtic

The third classification for kurtosis is platykurtic. Platykurtic distributions are those that
have slender tails. Many times they possess a peak lower than a mesokurtic distribution.
The name of these types of distributions come from the meaning of the prefix "platy"
meaning "broad."

All uniform distributions are platykurtic. In addition to this the discrete probability
distribution from a single flip of a coin is platykurtic.

Calculation of Kurtosis

These classifications of kurtosis are still somewhat subjective and qualitative. While we
might be able to see that a distribution has thicker tails than a normal distribution, what if
we don’t have the graph of a normal distribution to compare with? What if we want to say
that one distribution is more leptokurtic than another?

To answer these kinds of questions we need not just a qualitative description of kurtosis, but
a quantitative measure. The formula used is μ 4/σ4 where μ4 is Pearson’s fourth moment
about the mean and sigma is the standard deviation.
Excess Kurtosis

Now that we have a way to calculate kurtosis, we can compare the values obtained rather
than shapes.

The normal distribution is found to have a kurtosis of three. This now becomes our basis for
mesokurtic distributions. A distribution with kurtosis greater than three is leptokurtic and a
distribution with kurtosis less than three is platykurtic.

Since we treat a mesokurtic distribution as a baseline for our other distributions, we can
subtract three from our standard calculation for kurtosis. The formula μ 4/σ4 - 3 is the
formula for excess kurtosis. We could then classify a distribution from its excess kurtosis:

 Mesokurtic distributions have excess kurtosis of zero.


 Platykurtic distributions have negative excess kurtosis.
 Leptokurtic distributions have positive excess kurtosis.

A Note on the Name

The word "kurtosis" seems odd on the first or second reading. It actually makes sense, but
we need to know Greek to recognize this.

Kurtosis is derived from a transliteration of the Greek word kurtos. This Greek word has the
meaning "arched" or "bulging," making it an apt description of the concept known as
kurtosis.

WHAT IS THE 5 NUMBER SUMMARY

There are a variety of descriptive statistics. Numbers such as the mean, median,
mode, skewness, kurtosis, standard deviation, first quartile and third quartile, to name a
few, each tell us something about our data. Rather than looking at these descriptive
statistics individually, sometimes combining them helps to give us a complete picture. With
this end in mind, the five-number summary is a convenient way to combine five descriptive
statistics.

Which Five Numbers?

It is clear that there are to be five numbers in our summary, but which five? The numbers
chosen are to help us know the center of our data, as well as how spread out the data points
are. With this in mind, the five-number summary consists of the following:

 The minimum – this is the smallest value in our data set.


 The first quartile – this number is denoted Q1 and 25% of our data falls below the first
quartile.
 The median – this is the midway point of the data. 50% of all data falls below the
median.
 The third quartile – this number is denoted Q3 and 75% of our data falls below the
third quartile.
 The maximum – this is the largest value in our data set.

The mean and standard deviation can also be used together to convey the center and the
spread of a set of data. However, both of these statistics are susceptible to outliers. The
median, first quartile, and third quartile are not as heavily influenced by outliers.

An Example

Given the following set of data, we will report the five number summary:

1, 2, 2, 3, 4, 6, 6, 7, 7, 7, 8, 11, 12, 15, 15, 15, 17, 17, 18, 20

There are a total of twenty points in the dataset. The median is thus the average of the tenth
and eleventh data values or:

(7 + 8)/2 = 7.5.

The median of the bottom half of the data is the first quartile.

The bottom half is:

1, 2, 2, 3, 4, 6, 6, 7, 7, 7

Thus we calculateQ1= (4 + 6)/2 = 5.

The median of the top half of the original data set is the third quartile. We need to find the
median of:

8, 11, 12, 15, 15, 15, 17, 17, 18, 20

Thus we calculateQ3= (15 + 15)/2 = 15.

We assemble all of the above results together and report that the five number summary for
the above set of data is 1, 5, 7.5, 12, 20.

Graphical Representation

Five number summaries can be compared to one another. We will find that two sets with the
similar means and standard deviations may have very different five number summaries. To
easily compare two five number summaries at a glance, we can use a boxplot, or box and
whiskers graph.
SOURCES:

Taylor, Courtney. (2018, March 2). Descriptive vs. Inferential Statistics. Retrieved from
https://www.thoughtco.com/differences-in-descriptive-and-inferential-statistics-3126224

Taylor, Courtney. (2018, April 9). What Is Skewness in Statistics? Retrieved from https://www.thoughtco.com/what-is-
skewness-in-statistics-3126242

Taylor, Courtney. (2016, December 27). What Is Kurtosis? Retrieved from https://www.thoughtco.com/what-is-kurtosis-
3126241

Taylor, Courtney. (2017, September 18). What Is the 5 Number Summary? Retrieved from
https://www.thoughtco.com/what-is-the-five-number-summary-3126237

Taylor, Courtney. (2017, August 10). How to Calculate a Sample Standard Deviation. Retrieved from
https://www.thoughtco.com/calculate-a-sample-standard-deviation-3126345

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy