0% found this document useful (0 votes)
11 views26 pages

Fundamentals of Data Science unit 2

Unit 2 covers various aspects of describing data, focusing on frequency distributions, including types such as ungrouped, grouped, relative, and cumulative frequency distributions. It emphasizes the importance of organizing raw data to identify patterns, outliers, and variability through measures like range, variance, and standard deviation. The document also outlines the process for constructing frequency distributions, highlighting essential rules and steps for accurate representation of data.

Uploaded by

kaleeswaranmmcas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views26 pages

Fundamentals of Data Science unit 2

Unit 2 covers various aspects of describing data, focusing on frequency distributions, including types such as ungrouped, grouped, relative, and cumulative frequency distributions. It emphasizes the importance of organizing raw data to identify patterns, outliers, and variability through measures like range, variance, and standard deviation. The document also outlines the process for constructing frequency distributions, highlighting essential rules and steps for accurate representation of data.

Uploaded by

kaleeswaranmmcas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 26

UNIT-2

DESCRIBING DATA
Syllabus: UNIT II
Frequency distributions–Outliers–relative frequency distributions–
cumulative frequency distributions–frequency distributions for nominal
data–interpreting distributions–graphs–averages–mode–median–mean–
averages for qualitative and ranked data – describing variability–range–
variance–standard deviation–degrees of freedom–inter quartile range–
variability for qualitative and ranked data.

Frequency Distributions
 A frequency distribution is a collection of observations produced by
sorting observations into classes and showing their frequency (f) of
occurrence in each class.
 A frequency distribution helps us to detect any pattern in the data
(assuming a pattern exists) by superimposing some order on the
inevitable variability among observations.
 The advantage of using frequency distributions is that they present raw
data in an organized, easy-to-read format. The most frequently occurring
scores are easily identified, as are score ranges, lower and upper limits,
cases that are not common, outliers, and total number of observations
between any given scores.
 Frequency distribution shows whether the observations are high or low
and also whether they are concentrated in one area or spread out across
the entire scale.
 Different Types of Frequency distributions:
 Ungrouped frequency distribution.
 Grouped frequency distribution.
 Relative frequency distribution.
 Cumulative frequency distribution
Frequency Distribution for Ungrouped Data
 A frequency distribution produced whenever observations are sorted into
classes of single values is referred to as a frequency distribution for
ungrouped data.
 Frequency distributions for ungrouped data are much more informative
when the number of possible values is less than about 20.
 Example:
UNIT-2
DESCRIBING DATA
Syllabus: UNIT II
Frequency distributions–Outliers–relative frequency distributions–
cumulative frequency distributions–frequency distributions for nominal
data–interpreting distributions–graphs–averages–mode–median–mean–
averages for qualitative and ranked data – describing variability–range–
variance–standard deviation–degrees of freedom–inter quartile range–
variability for qualitative and ranked data.
Frequency Distributions
 A frequency distribution is a collection of observations produced by
sorting observations into classes and showing their frequency (f) of
occurrence in each class.
 A frequency distribution helps us to detect any pattern in the data
(assuming a pattern exists) by superimposing some order on the
inevitable variability among observations.
 The advantage of using frequency distributions is that they present raw
data in an organized, easy-to-read format. The most frequently occurring
scores are easily identified, as are score ranges, lower and upper limits,
cases that are not common, outliers, and total number of observations
between any given scores.
 Frequency distribution shows whether the observations are high or low
and also whether they are concentrated in one area or spread out across
the entire scale.
 Different Types of Frequency distributions:
 Ungrouped frequency distribution.
 Grouped frequency distribution.
 Relative frequency distribution.
 Cumulative frequency distribution
Frequency Distribution for Ungrouped Data
 A frequency distribution produced whenever observations are sorted into
classes of single values is referred to as a frequency distribution for
ungrouped data.
 Frequency distributions for ungrouped data are much more informative
when the number of possible values is less than about 20.
 Example:
UNIT-2
DESCRIBING DATA
Syllabus: UNIT II
Frequency distributions–Outliers–relative frequency distributions–
cumulative frequency distributions–frequency distributions for nominal
data–interpreting distributions–graphs–averages–mode–median–mean–
averages for qualitative and ranked data – describing variability–range–
variance–standard deviation–degrees of freedom–inter quartile range–
variability for qualitative and ranked data.

Frequency Distributions
 A frequency distribution is a collection of observations produced by
sorting observations into classes and showing their frequency (f) of
occurrence in each class.
 A frequency distribution helps us to detect any pattern in the data
(assuming a pattern exists) by superimposing some order on the
inevitable variability among observations.
 The advantage of using frequency distributions is that they present raw
data in an organized, easy-to-read format. The most frequently occurring
scores are easily identified, as are score ranges, lower and upper limits,
cases that are not common, outliers, and total number of observations
between any given scores.
 Frequency distribution shows whether the observations are high or low
and also whether they are concentrated in one area or spread out across
the entire scale.
 Different Types of Frequency distributions:
 Ungrouped frequency distribution.
 Grouped frequency distribution.
 Relative frequency distribution.
 Cumulative frequency distribution
Frequency Distribution for Ungrouped Data
 A frequency distribution produced whenever observations are sorted into
classes of single values is referred to as a frequency distribution for
ungrouped data.
 Frequency distributions for ungrouped data are much more informative
when the number of possible values is less than about 20.
 Example:
UNIT-2
DESCRIBING DATA
Syllabus: UNIT II
Frequency distributions–Outliers–relative frequency distributions–
cumulative frequency distributions–frequency distributions for nominal
data–interpreting distributions–graphs–averages–mode–median–mean–
averages for qualitative and ranked data – describing variability–range–
variance–standard deviation–degrees of freedom–inter quartile range–
variability for qualitative and ranked data.

Frequency Distributions
 A frequency distribution is a collection of observations produced by
sorting observations into classes and showing their frequency (f) of
occurrence in each class.
 A frequency distribution helps us to detect any pattern in the data
(assuming a pattern exists) by superimposing some order on the
inevitable variability among observations.
 The advantage of using frequency distributions is that they present raw
data in an organized, easy-to-read format. The most frequently occurring
scores are easily identified, as are score ranges, lower and upper limits,
cases that are not common, outliers, and total number of observations
between any given scores.
 Frequency distribution shows whether the observations are high or low
and also whether they are concentrated in one area or spread out across
the entire scale.
 Different Types of Frequency distributions:
 Ungrouped frequency distribution.
 Grouped frequency distribution.
 Relative frequency distribution.
 Cumulative frequency distribution
Frequency Distribution for Ungrouped Data
 A frequency distribution produced whenever observations are sorted into
classes of single values is referred to as a frequency distribution for
ungrouped data.
 Frequency distributions for ungrouped data are much more informative
when the number of possible values is less than about 20.
 Example:
UNIT-2
DESCRIBING DATA
Syllabus: UNIT II
Frequency distributions–Outliers–relative frequency distributions–
cumulative frequency distributions–frequency distributions for nominal
data–interpreting distributions–graphs–averages–mode–median–mean–
averages for qualitative and ranked data – describing variability–range–
variance–standard deviation–degrees of freedom–inter quartile range–
variability for qualitative and ranked data.

Frequency Distributions
 A frequency distribution is a collection of observations produced by
sorting observations into classes and showing their frequency (f) of
occurrence in each class.
 A frequency distribution helps us to detect any pattern in the data
(assuming a pattern exists) by superimposing some order on the
inevitable variability among observations.
 The advantage of using frequency distributions is that they present raw
data in an organized, easy-to-read format. The most frequently occurring
scores are easily identified, as are score ranges, lower and upper limits,
cases that are not common, outliers, and total number of observations
between any given scores.
 Frequency distribution shows whether the observations are high or low
and also whether they are concentrated in one area or spread out across
the entire scale.
 Different Types of Frequency distributions:
 Ungrouped frequency distribution.
 Grouped frequency distribution.
 Relative frequency distribution.
 Cumulative frequency distribution
Frequency Distribution for Ungrouped Data
 A frequency distribution produced whenever observations are sorted into
classes of single values is referred to as a frequency distribution for
ungrouped data.
 Frequency distributions for ungrouped data are much more informative
when the number of possible values is less than about 20.
 Example:
Syllabus: UNIT II
Frequency distributions–Outliers–relative frequency distributions–
cumulative frequency distributions–frequency distributions for nominal
data–interpreting distributions–graphs–averages–mode–median–mean–
averages for qualitative and ranked data
Syllabus: UNIT II
Frequency distributions–Outliers–relative frequency distributions–
cumulative frequency distributions–frequency distributions for nominal
data–interpreting distributions–graphs–averages–mode–median–mean–
averages for qualitative and ranked data
UNIT II

Syllabus:

Frequency distributions–Outliers–relative frequency distributions– cumulative frequency


distributions–frequency distributions for nominal data–interpreting distributions–graphs–
averages–mode–median–mean– averages for qualitative and ranked data

Frequency Distributions

 A frequency distribution is a collection of observations produced by sorting


observations into classes and showing their frequency (f) of occurrence in each
class.
 A frequency distribution helps us to detect any pattern in the data (assuming a
pattern exists) by superimposing some order on the inevitable variability among
observations.
 The advantage of using frequency distributions is that they present raw data in an
organized, easy-to-read format. The most frequently occurring scores are easily
identified, as are score ranges, lower and upper limits, cases that are not
common, outliers, and total number of observations between any given scores.
 Frequency distribution shows whether the observations are high or low and also
whether they are concentrated in one area or spread out across the entire scale.
 Different Types of Frequency distributions:
 Ungrouped frequency distribution.
 Grouped frequency distribution.
 Relative frequency distribution.
 Cumulative frequency distribution Frequency

Distribution for Ungrouped Data.

 A frequency distribution produced whenever observations are sorted into classes of


single values is referred to as a frequency distribution for ungrouped data.
 Frequency distributions for ungrouped data are much more informative when the
number of possible values is less than about 20.
Example

Frequency Distribution for Grouped Data


 A frequency distribution produced whenever observations are sorted into
classes of more than one value is referred to as a frequency distribution
for grouped data.
 Example:
Frequency Distribution for Grouped Data
 A frequency distribution produced whenever observations are sorted into classes
of more than one value is referred to as a frequency distribution for grouped data.
 Example:
Relative Frequency Distributions:
 Relative frequency distributions show the frequency of each class as a part or
fraction of the total frequency for the entire distribution.
 This type of distribution allows us to focus on the relative concentration
 of observations among different classes within the same distribution.
  This type of distribution is especially helpful when we must compare two
 or more distributions based on different total numbers of observations.
  The conversion to relative frequencies allows a direct comparison of the
 shapes of these two distributions without having to adjust for the radically
 different total numbers of observations.
  To convert a frequency distribution into a relative frequency distribution,
 divide the frequency for each class by the total frequency for the entire
 distribution.
  Example:
 This type of distribution allows us to focus on the relative concentration of observations
among different classes within the same distribution.
 This type of distribution is especially helpful when we must compare two or more
distributions based on different total numbers of observations.
 The conversion to relative frequencies allows a direct comparison of the shapes of these
two distributions without having to adjust for the radically different total numbers of
observations.
 To convert a frequency distribution into a relative frequency distribution, divide the
frequency for each class by the total frequency for the entire distribution.
 Example:

 Cumulative Frequency Distributions


  Cumulative frequency distributions show the total number of
 observations in each class and in all lower-ranked classes.
  This type of distribution can be used effectively with sets of scores, such
 as test scores for intellectual or academic aptitude, when relative standing
 within the distribution assumes primary importance. Under these
 circumstances, cumulative frequencies are usually converted, in turn, to
 cumulative percentages.
  Cumulative percentages are often referred to as percentile ranks.
  To convert a frequency distribution into a cumulative frequency
 distribution, add to the frequency of each class the sum of the frequencies
 of all classes ranked below it.
Cumulative Frequency Distributions:
 Cumulative frequency distributions show the total number of observations in each
class and in all lower-ranked classes.
 This type of distribution can be used effectively with sets of scores, such as test scores for
intellectual or academic aptitude, when relative standing within the distribution assumes
primary importance. Under these circumstances, cumulative frequencies are usually
converted, in turn, to cumulative percentages.
 Cumulative percentages are often referred to as percentile ranks.
 To convert a frequency distribution into a cumulative frequency distribution, add to
the frequency of each class the sum of the frequencies of all classes ranked below it.
Constructing Frequency Distributions
 For producing a well-constructed frequency distribution, three rules are
essential and should not be violated.
1. Each observation should be included in one, and only one, class.
2. List all classes, even those with zero frequencies.
3. All classes should have equal intervals.
 Step-by-step procedure for constructing Frequency Distributions:
1. Find the range, that is, the difference between the largest and smallest
observations.
2. Find the class interval required to span the range by dividing the range
by the desired number of classes (ordinarily 10).
3. Round off to the nearest convenient value.
4. Determine where the lowest class should begin. (Ordinarily, this
number should be a multiple of the class interval.)
5. Determine where the lowest class should end by adding the class
interval to the lower boundary and then subtracting one unit of
measurement.
6. Working upward, list as many equivalent classes as are required to
include the largest observation.
7. Indicate with a tally the class in which each observation falls.
8. Replace the tally count for each class with a number-the frequency (f)
-and show the total of all frequencies.
9. Supply headings for both columns and a title for the table.
Constructing Frequency Distributions
 For producing a well-constructed frequency distribution, three rules are
essential and should not be violated.
1. Each observation should be included in one, and only one, class.
2. List all classes, even those with zero frequencies.
3. All classes should have equal intervals.
 Step-by-step procedure for constructing Frequency Distributions:
1. Find the range, that is, the difference between the largest and smallest
observations.
2. Find the class interval required to span the range by dividing the range
by the desired number of classes (ordinarily 10).
3. Round off to the nearest convenient value.
4. Determine where the lowest class should begin. (Ordinarily, this
number should be a multiple of the class interval.)
5. Determine where the lowest class should end by adding the class
interval to the lower boundary and then subtracting one unit of
measurement.
6. Working upward, list as many equivalent classes as are required to
include the largest observation.
7. Indicate with a tally the class in which each observation falls.
8. Replace the tally count for each class with a number-the frequency (f)
-and show the total of all frequencies.
9. Supply headings for both columns and a title for the table.
Constructing Frequency Distributions
 For producing a well-constructed frequency distribution, three rules are essential and
should not be violated.
1. Each observation should be included in one, and only one, class.
2. List all classes, even those with zero frequencies.
3. All classes should have equal intervals.
 Step-by-step procedure for constructing Frequency Distributions:
1. Find the range, that is, the difference between the largest and smallest
observations.
2. Find the class interval required to span the range by dividing the range by the
desired number of classes (ordinarily 10).
3. Round off to the nearest convenient value.
4. Determine where the lowest class should begin. (Ordinarily, this number
should be a multiple of the class interval.)
5. Determine where the lowest class should end by adding the class
interval to the lower boundary and then subtracting one unit of
measurement.
6. Working upward, list as many equivalent classes as are required to
include the largest observation.
7. Indicate with a tally the class in which each observation falls. 8. Replace the
tally count for each class with a number-the frequency (f) -and show the total
of all frequencies.
8. Supply headings for both columns and a title for the table.
Frequency Distributions for Nominal Data
 When, among a set of observations, any single observation is a word, letter, or
numerical code, the data are nominal.
 Frequency distributions for qualitative data are easy to construct. Simply determine the
frequency with which observations occupy each class, and report these frequencies.
 Example:
Below frequency distribution reveals that Yes replies are
approximately twice as prevalent as No replies.
 They also can be converted into relative frequency distributions and, if the data
can be ordered because of ordinal measurement, into percentile ranks.

Interpreting Distributions
 When inspecting a distribution for the first time, we have train to look at the entire table,
not just the distribution. Read the title, column headings, and any footnotes.
 After these preliminaries, inspect the content of the frequency distribution.
 When interpreting distributions, including distributions constructed by someone else,
keep an open mind.
Outliers
 A very extreme score that requires special attention because of its potential impact
on a summary of the data is called outlier.
 Example: A GPA of 0.06, an IQ of 170, summer wages of $62,000
Dealing with Outliers
Check for Accuracy:
 Whenever an outlier encounter attempt to verify its accuracy.
 Example: For instance, whether GPA of 3.06 recorded erroneously as 0.06?
 If the outlier survives an accuracy check, it should be treated as a legitimate score.
Might Exclude from Summaries:
 Choose to segregate an outlier from any summary of the data.
 For example, we might relegate it to a footnote instead of using excessively wide
class intervals in order to include it in a frequency distribution. Or we might use
various numerical summaries, such as the median and inter quartile range
Might Enhance Understanding:
 A valid outlier can be viewed as the product of special circumstances; it can help to
understand the data.
 For example, we might understand better why crime rates differ among communities
by studying the special circumstances that produce a community with an extremely
low (or high) crime rate, or why learning rates differ among third graders by
studying a third grader who learns very rapidly (or very slowly).

Graphs
(Describing Data using Graphs)
 Data can be described clearly and concisely with the aid of a well constructed frequency
distribution.
 Data can often be described even more vividly, by converting frequency distributions
into graphs.
  They also can be converted into relative frequency distributions and, if
 the data can be ordered because of ordinal measurement, into percentile
 ranks.
 Interpreting Distributions
  When inspecting a distribution for the first time, we have train to look at
 the entire table, not just the distribution. Read the title, column headings,
 and any footnotes.
  After these preliminaries, inspect the content of the frequency distribution.
  When interpreting distributions, including distributions constructed by
 someone else, keep an open mind.
 Outliers
  A very extreme score that requires special attention because of its
 potential impact on a summary of the data is called outlier.
  Example: A GPA of 0.06, an IQ of 170, summer wages of $62,000
 Dealing with Outliers
 Check for Accuracy:
  Whenever an outlier encounter attempt to verify its accuracy.
  Example: For instance, whether GPA of 3.06 recorded erroneously as 0.06?
  If the outlier survives an accuracy check, it should be treated as a legitimate
 score.
 Might Exclude from Summaries:
  Choose to segregate an outlier from any summary of the data.
  For example, we might relegate it to a footnote instead of using
 excessively wide class intervals in order to include it in a frequency
 distribution. Or we might use various numerical summaries, such as the
 median and inter quartile range
 Might Enhance Understanding:
  A valid outlier can be viewed as the product of special circumstances; it
 can help to understand the data.
  For example, we might understand better why crime rates differ among
 communities by studying the special circumstances that produce a
 community with an extremely low (or high) crime rate, or why learning
 rates differ among third graders by studying a third grader who learns
 very rapidly (or very slowly).
 Graphs
 (Describing Data using Graphs)
  Data can be described clearly and concisely with the aid of a well constructed
 frequency distribution.
  Data can often be described even more vividly, by converting frequency
 distributions into graphs.
 Most common types of graphs:
 Graphs for Quantitative Data
 Histograms
 Frequency Polygon
 Stem and Leaf Displays
 Graphs for Qualitative Data
 Bar graph
Histogram
 A bar-type graph for quantitative data. The common boundaries between adjacent bars
emphasize the continuity of the data, as with continuous variables.
 Important features of histograms. 
 Equal units along the horizontal axis (the X axis, or abscissa) reflect the
various class intervals of the frequency distribution.
 Equal units along the vertical axis (the Y axis, or ordinate) reflect increases in
frequency.
 The intersection of the two axes defines the origin at which both numerical
scales equal 0.
 Numerical scales always increase from left to right along the horizontal axis
and from bottom to top along the vertical axis.
 The body of the histogram consists of a series of bars whose heights reflect
the frequencies for the various classes.
 Example:

Frequency Polygon
 An important variation on a histogram is the frequency polygon, or line
graph.
 Frequency polygons are particularly useful when two or more frequency
distributions or relative frequency distributions are to be included in the
same graph.
 Frequency polygons can be constructed directly from frequency distributions.
It can also be constructed from histogram.
 The step-by-step transformation of a histogram into a frequency polygon:
 A: This panel shows the histogram for the weight distribution.
 B: Place dots at the midpoints of each bar top or, in the absence of
bar tops, at midpoints for classes on the horizontal axis, and
connect them with straight lines.
 C: Anchor the frequency polygon to the horizontal axis. First,
extend the upper tail to the midpoint of the first unoccupied class
on the upper flank of the histogram. Then extend the lower tail to
the midpoint of the first unoccupied class on the lower flank of the
histogram. Now all of the area under the frequency polygon is
enclosed completely.
 D: Finally, erase all of the histogram bars, leaving only the
frequency polygon.
 Example:
Frequency Polygon
 An important variation on a histogram is the frequency polygon, or line graph.
 Frequency polygons are particularly useful when two or more frequency distributions
or relative frequency distributions are to be included in the same graph.
 Frequency polygons can be constructed directly from frequency distributions. It can also
be constructed from histogram.
 The step-by-step transformation of a histogram into a frequency polygon:
 A: This panel shows the histogram for the weight distribution.
 B: Place dots at the midpoints of each bar top or, in the absence of bar tops, at
midpoints for classes on the horizontal axis, and connect them with straight
lines.
 C: Anchor the frequency polygon to the horizontal axis. First, extend the upper
tail to the midpoint of the first unoccupied class on the upper flank of the
histogram. Then extend the lower tail to the midpoint of the first unoccupied class
on the lower flank of the histogram. Now all of the area under the frequency
polygon is enclosed completely.
 D: Finally, erase all of the histogram bars, leaving only the frequency
polygon. 
 Example:
Stem and Leaf Displays
 Stem and leaf displays are ideal for summarizing distributions, such as that for
weight data, without destroying the identities of individual observations.
 Stem and Leaf display is a device for sorting quantitative data on the basis of leading and
trailing digits.
 Stem and leaf displays represent statistical bargains. Just a few minutes of work produces
a description of data that is both clear and complete.
 Even though rarely appearing in published reports, stem and leaf displays often serve as
the first step toward organizing data.
 A good stem and leaf display
 shows the first digits of the number (thousands, hundreds or tens) as the stem and
shows the last digit (ones) as the leaf.
 usually uses whole numbers. Anything that has a decimal point is rounded to the
nearest whole number. For example, test results, speeds, heights, weights, etc.
 looks like a bar graph when it is turned on its side.
 shows how the data are spread—that is, highest number, lowest number,
most common number and outliers
 To construct the stem and leaf display
 On the left hand side of the page, write down the thousands, hundreds or
tens (all digits but the last one). These will be your stems.
 Draw a line to the right of these stems.
 On the other side of the line, write down the ones (the last digit of a number).
These will be your leaves.
 Example 1: A teacher asked 10 of her students how many books they had read in the last
12 months. Their answers were as follows: 12, 23, 19, 6, 10, 7, 15, 25, 21, 12. Prepare a
stem and leaf display for these data.
Bimodal
 It reflects the coexistence of two different types of observations in the same
distribution.
 For instance, the distribution of the ages of residents in a neighborhood consisting
largely of either new parents or their infants has a bimodal shape.
Positively Skewed
 A lopsided distribution caused by a few extreme observations in the positive
direction (to the right of the majority of Observations), is a positively skewed
distribution.
 The distribution of incomes among U.S. families has a pronounced positive skew,
with most family incomes under $200,000 and relatively few family incomes
spanning a wide range of values above $200,000.
Negatively Skewed
 A lopsided distribution caused by a few extreme observations in the negative
direction (to the left of the majority of observations), is a negatively skewed
distribution.
 The distribution of ages at retirement among U.S. job holders has a pronounced
negative skew, with most retirement ages at 60 years or older and relatively few
retirement ages spanning the wide range of ages younger than 60.
Bar graphs: A Graph for Qualitative (Nominal) Data
 Bar graphs are often used with qualitative data and sometimes with discrete
quantitative data.
 They resemble histograms except that gaps separate adjacent bars in bar graphs.
Example 1:
Interpreting graphs
 When interpreting graphs, beware of various unscrupulous techniques, such as
using bizarre combinations of axes to either exaggerate or suppress a particular data
pattern.
Describing Data with Averages
 Averages consist of numbers (or words) about which the data are, in some sense,
centered. They are often referred to as measures of central tendency
 A measure of center is a single number used to describe a set of numeric data. It
describes a typical value from the data set.
 Several types of average yield numbers or words that attempt to describe, most
generally, the middle or typical value for a distribution.
  Three different measures of central tendency are:
  Mode
  Median
  Mean.
  Each of these has its special uses, but the mean is the most important
 average in both descriptive and inferential statistics.
 Mode
  The mode equals the value of the most frequently occurring or typical
 score.
  It is easy to assign a value to the mode. If the data are organized.
 However, if the data are not organized, some counting may be required.
  The mode is readily understood as the most prevalent or typical value.
  Distributions can have more than one mode (or no mode at all).
  Distributions with two obvious peaks, even though they are not exactly
 the same height, are referred to as bimodal.
  Distributions with more than two peaks are referred to as multimodal.
  The presence of more than one mode might reflect important differences
 among subsets of data. For instance, the distribution of weights for both
 male and female statistics students would most likely be bimodal,
 reflecting the combination of two separate weight distributions—a
 heavier one for males and a lighter one for females.
  Example1: Determine the mode for the following retirement ages: 60, 63,
 45, 63, 65, 70, 55, 63, 60, 65, 63.
 Answer: mode = 63
  Example1: The owner of a new car conducts six gas mileage tests and
 obtains the following results, expressed in miles per gallon: 26.3, 28.7,
 27.4, 26.6, 27.4, 26.9. Find the mode for these data.
 Answer: mode = 27.4

 Median
  The median reflects the middle value when observations are ordered from
 least to most.
  The median splits a set of ordered observations into two equal parts, the
 upper and lower halves.
  In other words, the median has a percentile rank of 50, since observations
 with equal or smaller values constitute 50 percent of the entire distribution.
  To find the median, scores always must be ordered from least to most (or
 vice versa). This task is straightforward with small sets of data but becomes
 increasingly cumbersome with larger sets of data that must be ordered
 manually.
 Three different measures of central tendency are:
 Mode
 Median
 Mean.
 Each of these has its special uses, but the mean is the most important average in both
descriptive and inferential statistics.
Mode
 The mode equals the value of the most frequently occurring or typical score.
 It is easy to assign a value to the mode. If the data are organized. However, if
the data are not organized, some counting may be required.
 The mode is readily understood as the most prevalent or typical value.
 Distributions can have more than one mode (or no mode at all).
 Distributions with two obvious peaks, even though they are not exactly the same
height, are referred to as bimodal.
 Distributions with more than two peaks are referred to as multimodal.
 The presence of more than one mode might reflect important differences among subsets
of data. For instance, the distribution of weights for both male and female statistics
students would most likely be bimodal, reflecting the combination of two separate
weight distributions—a heavier one for males and a lighter one for females.
 Example1: Determine the mode for the following retirement ages: 60, 63, 45, 63, 65, 70,
55, 63, 60, 65, 63.
Answer: mode = 63
 Example2: The owner of a new car conducts six gas mileage tests and obtains
the following results, expressed in miles per gallon: 26.3, 28.7, 27.4, 26.6, 27.4,
26.9. Find the mode for these data.
Answer: mode = 27.4
Median
 The median reflects the middle value when observations are ordered from least to most.
 The median splits a set of ordered observations into two equal parts, the upper and lower
halves.
 In other words, the median has a percentile rank of 50, since observations with equal or
smaller values constitute 50 percent of the entire distribution.
 To find the median, scores always must be ordered from least to most (or vice versa).
This task is straightforward with small sets of data but becomes increasingly
cumbersome with larger sets of data that must be ordered manually.
 When the total number of scores is odd, there is a single middle-ranked
 score, and the value of the median equals the value of this score. When the
 total number of scores is even, the value of the median equals a value
 midway between the values of the two middlemost scores.
  In either case, the value of the median always reflects the value of middle-
 ranked scores, not the position of these scores among the set of ordered
 scores
  Example 1: Find the median for the following retirement ages: 60, 63, 45,
 63,65, 70, 55, 63, 60, 65, 63.
 Solution: median = 63
  Example2: Find the median for the following gas mileage tests: 26.3,
 28.7, 27.4, 26.6, 27.4, 26.9.
 Solution: median = 27.15 (halfway between 26.9 and 27.4)
 Mean
  The mean is the most common average.
  The mean is found by adding all scores and then dividing by the number
 of scores.
  That is

  There is no requirement that presidential terms be ranked before calculating
 the mean.
  Even when large sets of unorganized data are involved, the calculation of
 the mean is usually straightforward, particularly with the aid of a
 calculator or computer.
  The mean serves as the balance point for its frequency distribution.
  Mean cannot be used with qualitative data.
  Example 1: Find the mean for the following retirement ages: 60, 63, 45,
 63, 65, 70, 55, 63, 60, 65, 63.
 Solution:

  Example 2: Find the mean for the following gas mileage tests: 26.3, 28.7,
 27.4, 26.6, 27.4, 26.9.
 Solution:

Which Average?
 When a distribution of scores is not too skewed, the values of the mode,
median, and mean are similar, and any of them can be used to describe
the central tendency of the distribution.
When the total number of scores is odd, there is a single middle-ranked score,
and the value of the median equals the value of this score. When the total number of
scores is even, the value of the median equals a value midway between the values
of the two middlemost scores.
 In either case, the value of the median always reflects the value of middle-ranked
scores, not the position of these scores among the set of ordered scores
 Example 1: Find the median for the following retirement ages: 60, 63, 45, 63,65, 70, 55,
63, 60, 65, 63.
Solution: median = 63
 Example2: Find the median for the following gas mileage tests: 26.3, 28.7, 27.4,
26.6, 27.4, 26.9.
Solution: median = 27.15 (halfway between 26.9 and 27.4)
Mean
 The mean is the most common average.
 The mean is found by adding all scores and then dividing by the number of scores.

 That is

 There is no requirement that presidential terms be ranked before calculating the mean.
 Even when large sets of unorganized data are involved, the calculation of the mean is
usually straightforward, particularly with the aid of a calculator or computer.
 The mean serves as the balance point for its frequency distribution.
 Mean cannot be used with qualitative data.
 Example 1: Find the mean for the following retirement ages: 60, 63, 45, 63, 65, 70, 55,
63, 60, 65, 63.
 Solution:
 Example 2: Find the mean for the following gas mileage tests: 26.3, 28.7, 27.4, 26.6,
27.4, 26, 9.
 Solution:

Which average?

 When a distribution of scores is not too skewed, the values of the mode, median, and
mean are similar, and any of them can be used to describe the central tendency of the
distribution.

 When extreme scores cause a distribution to be skewed, the values of the three averages
can differ appreciably.

 Unlike the mode and median, the mean is very sensitive to extreme scores, or
outliers.

 Ideally, when a distribution is skewed, report both the mean and the median.
Appreciable differences between the values of the mean and median signal the
presence of a skewed distribution.

 If the mean exceeds the media, the underlying distribution is positively skewed
because of one or more scores with relatively large values.

 On the other hand, if the median exceeds the mean, the underlying distribution is
negatively skewed because of one or more scores with relatively small values.

 In the long run, however, the mean is the single most preferred average for quantitative
data.

 Following summarizes the relationship between the various averages and the two types
of skewed distributions (shown as smoothed curves).
Averages for Qualitative and Ranked Data
Mode Always Appropriate for Qualitative Data
 For quantitative data, in principle, all three averages can be used.
 The mode always can be used with qualitative data.
Median Sometimes Appropriate for Qualitative Data
 The median can be used whenever it is possible to order qualitative data from
least to most because the level of measurement is ordinal.
 It’s easiest to determine the median class for ordered qualitative data by using
relative frequencies
Mean cannot be used with qualitative data

Averages for Ranked Data.


 When the data consist of a series of ranks, with its ordinal level of
measurement, the median rank always can be obtained. It’s simply the
middlemost or average of the two middlemost ranks.
 The mean and modal ranks tend not to be very informative and will not be
discussed.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy