0% found this document useful (0 votes)

11 views26 pages

Fundamentals of Data Science unit 2

Unit 2 covers various aspects of describing data, focusing on frequency distributions, including types such as ungrouped, grouped, relative, and cumulative frequency distributions. It emphasizes the importance of organizing raw data to identify patterns, outliers, and variability through measures like range, variance, and standard deviation. The document also outlines the process for constructing frequency distributions, highlighting essential rules and steps for accurate representation of data.

Uploaded by

kaleeswaranmmcas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views26 pages

Fundamentals of Data Science unit 2

Uploaded by

kaleeswaranmmcas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 26

UNIT-2

DESCRIBING DATA
Syllabus: UNIT II
Frequency distributions–Outliers–relative frequency distributions–
cumulative frequency distributions–frequency distributions for nominal
data–interpreting distributions–graphs–averages–mode–median–mean–
averages for qualitative and ranked data – describing variability–range–
variance–standard deviation–degrees of freedom–inter quartile range–
variability for qualitative and ranked data.

Frequency Distributions
 A frequency distribution is a collection of observations produced by
sorting observations into classes and showing their frequency (f) of
occurrence in each class.
 A frequency distribution helps us to detect any pattern in the data
(assuming a pattern exists) by superimposing some order on the
inevitable variability among observations.
 The advantage of using frequency distributions is that they present raw
data in an organized, easy-to-read format. The most frequently occurring
scores are easily identified, as are score ranges, lower and upper limits,
cases that are not common, outliers, and total number of observations
between any given scores.
 Frequency distribution shows whether the observations are high or low
and also whether they are concentrated in one area or spread out across
the entire scale.
 Different Types of Frequency distributions:
 Ungrouped frequency distribution.
 Grouped frequency distribution.
 Relative frequency distribution.
 Cumulative frequency distribution
Frequency Distribution for Ungrouped Data
 A frequency distribution produced whenever observations are sorted into
classes of single values is referred to as a frequency distribution for
ungrouped data.
 Frequency distributions for ungrouped data are much more informative
when the number of possible values is less than about 20.
 Example:
UNIT-2
DESCRIBING DATA
Syllabus: UNIT II
Frequency distributions–Outliers–relative frequency distributions–
cumulative frequency distributions–frequency distributions for nominal
data–interpreting distributions–graphs–averages–mode–median–mean–
averages for qualitative and ranked data – describing variability–range–
variance–standard deviation–degrees of freedom–inter quartile range–
variability for qualitative and ranked data.
Frequency Distributions
 A frequency distribution is a collection of observations produced by
sorting observations into classes and showing their frequency (f) of
occurrence in each class.
 A frequency distribution helps us to detect any pattern in the data
(assuming a pattern exists) by superimposing some order on the
inevitable variability among observations.
 The advantage of using frequency distributions is that they present raw
data in an organized, easy-to-read format. The most frequently occurring
scores are easily identified, as are score ranges, lower and upper limits,
cases that are not common, outliers, and total number of observations
between any given scores.
 Frequency distribution shows whether the observations are high or low
and also whether they are concentrated in one area or spread out across
the entire scale.
 Different Types of Frequency distributions:
 Ungrouped frequency distribution.
 Grouped frequency distribution.
 Relative frequency distribution.
 Cumulative frequency distribution
Frequency Distribution for Ungrouped Data
 A frequency distribution produced whenever observations are sorted into
classes of single values is referred to as a frequency distribution for
ungrouped data.
 Frequency distributions for ungrouped data are much more informative
when the number of possible values is less than about 20.
 Example:
UNIT-2
DESCRIBING DATA
Syllabus: UNIT II
Frequency distributions–Outliers–relative frequency distributions–
cumulative frequency distributions–frequency distributions for nominal
data–interpreting distributions–graphs–averages–mode–median–mean–
averages for qualitative and ranked data – describing variability–range–
variance–standard deviation–degrees of freedom–inter quartile range–
variability for qualitative and ranked data.

Syllabus:

Frequency distributions–Outliers–relative frequency distributions– cumulative frequency

distributions–frequency distributions for nominal data–interpreting distributions–graphs–
averages–mode–median–mean– averages for qualitative and ranked data

Frequency Distributions

 A frequency distribution is a collection of observations produced by sorting

observations into classes and showing their frequency (f) of occurrence in each
class.
 A frequency distribution helps us to detect any pattern in the data (assuming a
pattern exists) by superimposing some order on the inevitable variability among
observations.
 The advantage of using frequency distributions is that they present raw data in an
organized, easy-to-read format. The most frequently occurring scores are easily
identified, as are score ranges, lower and upper limits, cases that are not
common, outliers, and total number of observations between any given scores.
 Frequency distribution shows whether the observations are high or low and also
whether they are concentrated in one area or spread out across the entire scale.
 Different Types of Frequency distributions:
 Ungrouped frequency distribution.
 Grouped frequency distribution.
 Relative frequency distribution.
 Cumulative frequency distribution Frequency

Distribution for Ungrouped Data.

 A frequency distribution produced whenever observations are sorted into classes of

single values is referred to as a frequency distribution for ungrouped data.
 Frequency distributions for ungrouped data are much more informative when the
number of possible values is less than about 20.
Example

Frequency Distribution for Grouped Data

 A frequency distribution produced whenever observations are sorted into
classes of more than one value is referred to as a frequency distribution
for grouped data.
 Example:
Frequency Distribution for Grouped Data
 A frequency distribution produced whenever observations are sorted into classes
of more than one value is referred to as a frequency distribution for grouped data.
 Example:
Relative Frequency Distributions:
 Relative frequency distributions show the frequency of each class as a part or
fraction of the total frequency for the entire distribution.
 This type of distribution allows us to focus on the relative concentration
 of observations among different classes within the same distribution.
  This type of distribution is especially helpful when we must compare two
 or more distributions based on different total numbers of observations.
  The conversion to relative frequencies allows a direct comparison of the
 shapes of these two distributions without having to adjust for the radically
 different total numbers of observations.
  To convert a frequency distribution into a relative frequency distribution,
 divide the frequency for each class by the total frequency for the entire
 distribution.
  Example:
 This type of distribution allows us to focus on the relative concentration of observations
among different classes within the same distribution.
 This type of distribution is especially helpful when we must compare two or more
distributions based on different total numbers of observations.
 The conversion to relative frequencies allows a direct comparison of the shapes of these
two distributions without having to adjust for the radically different total numbers of
observations.
 To convert a frequency distribution into a relative frequency distribution, divide the
frequency for each class by the total frequency for the entire distribution.
 Example:

 Cumulative Frequency Distributions

  Cumulative frequency distributions show the total number of
 observations in each class and in all lower-ranked classes.
  This type of distribution can be used effectively with sets of scores, such
 as test scores for intellectual or academic aptitude, when relative standing
 within the distribution assumes primary importance. Under these
 circumstances, cumulative frequencies are usually converted, in turn, to
 cumulative percentages.
  Cumulative percentages are often referred to as percentile ranks.
  To convert a frequency distribution into a cumulative frequency
 distribution, add to the frequency of each class the sum of the frequencies
 of all classes ranked below it.
Cumulative Frequency Distributions:
 Cumulative frequency distributions show the total number of observations in each
class and in all lower-ranked classes.
 This type of distribution can be used effectively with sets of scores, such as test scores for
intellectual or academic aptitude, when relative standing within the distribution assumes
primary importance. Under these circumstances, cumulative frequencies are usually
converted, in turn, to cumulative percentages.
 Cumulative percentages are often referred to as percentile ranks.
 To convert a frequency distribution into a cumulative frequency distribution, add to
the frequency of each class the sum of the frequencies of all classes ranked below it.
Constructing Frequency Distributions
 For producing a well-constructed frequency distribution, three rules are
essential and should not be violated.
1. Each observation should be included in one, and only one, class.
2. List all classes, even those with zero frequencies.
3. All classes should have equal intervals.
 Step-by-step procedure for constructing Frequency Distributions:
1. Find the range, that is, the difference between the largest and smallest
observations.
2. Find the class interval required to span the range by dividing the range
by the desired number of classes (ordinarily 10).
3. Round off to the nearest convenient value.
4. Determine where the lowest class should begin. (Ordinarily, this
number should be a multiple of the class interval.)
5. Determine where the lowest class should end by adding the class
interval to the lower boundary and then subtracting one unit of
measurement.
6. Working upward, list as many equivalent classes as are required to
include the largest observation.
7. Indicate with a tally the class in which each observation falls.
8. Replace the tally count for each class with a number-the frequency (f)
-and show the total of all frequencies.
9. Supply headings for both columns and a title for the table.
Constructing Frequency Distributions
 For producing a well-constructed frequency distribution, three rules are
essential and should not be violated.
1. Each observation should be included in one, and only one, class.
2. List all classes, even those with zero frequencies.
3. All classes should have equal intervals.
 Step-by-step procedure for constructing Frequency Distributions:
1. Find the range, that is, the difference between the largest and smallest
observations.
2. Find the class interval required to span the range by dividing the range
by the desired number of classes (ordinarily 10).
3. Round off to the nearest convenient value.
4. Determine where the lowest class should begin. (Ordinarily, this
number should be a multiple of the class interval.)
5. Determine where the lowest class should end by adding the class
interval to the lower boundary and then subtracting one unit of
measurement.
6. Working upward, list as many equivalent classes as are required to
include the largest observation.
7. Indicate with a tally the class in which each observation falls.
8. Replace the tally count for each class with a number-the frequency (f)
-and show the total of all frequencies.
9. Supply headings for both columns and a title for the table.
Constructing Frequency Distributions
 For producing a well-constructed frequency distribution, three rules are essential and
should not be violated.
1. Each observation should be included in one, and only one, class.
2. List all classes, even those with zero frequencies.
3. All classes should have equal intervals.
 Step-by-step procedure for constructing Frequency Distributions:
1. Find the range, that is, the difference between the largest and smallest
observations.
2. Find the class interval required to span the range by dividing the range by the
desired number of classes (ordinarily 10).
3. Round off to the nearest convenient value.
4. Determine where the lowest class should begin. (Ordinarily, this number
should be a multiple of the class interval.)
5. Determine where the lowest class should end by adding the class
interval to the lower boundary and then subtracting one unit of
measurement.
6. Working upward, list as many equivalent classes as are required to
include the largest observation.
7. Indicate with a tally the class in which each observation falls. 8. Replace the
tally count for each class with a number-the frequency (f) -and show the total
of all frequencies.
8. Supply headings for both columns and a title for the table.
Frequency Distributions for Nominal Data
 When, among a set of observations, any single observation is a word, letter, or
numerical code, the data are nominal.
 Frequency distributions for qualitative data are easy to construct. Simply determine the
frequency with which observations occupy each class, and report these frequencies.
 Example:
Below frequency distribution reveals that Yes replies are
approximately twice as prevalent as No replies.
 They also can be converted into relative frequency distributions and, if the data
can be ordered because of ordinal measurement, into percentile ranks.

Interpreting Distributions
 When inspecting a distribution for the first time, we have train to look at the entire table,
not just the distribution. Read the title, column headings, and any footnotes.
 After these preliminaries, inspect the content of the frequency distribution.
 When interpreting distributions, including distributions constructed by someone else,
keep an open mind.
Outliers
 A very extreme score that requires special attention because of its potential impact
on a summary of the data is called outlier.
 Example: A GPA of 0.06, an IQ of 170, summer wages of $62,000
Dealing with Outliers
Check for Accuracy:
 Whenever an outlier encounter attempt to verify its accuracy.
 Example: For instance, whether GPA of 3.06 recorded erroneously as 0.06?
 If the outlier survives an accuracy check, it should be treated as a legitimate score.
Might Exclude from Summaries:
 Choose to segregate an outlier from any summary of the data.
 For example, we might relegate it to a footnote instead of using excessively wide
class intervals in order to include it in a frequency distribution. Or we might use
various numerical summaries, such as the median and inter quartile range
Might Enhance Understanding:
 A valid outlier can be viewed as the product of special circumstances; it can help to
understand the data.
 For example, we might understand better why crime rates differ among communities
by studying the special circumstances that produce a community with an extremely
low (or high) crime rate, or why learning rates differ among third graders by
studying a third grader who learns very rapidly (or very slowly).

Graphs
(Describing Data using Graphs)
 Data can be described clearly and concisely with the aid of a well constructed frequency
distribution.
 Data can often be described even more vividly, by converting frequency distributions
into graphs.
  They also can be converted into relative frequency distributions and, if
 the data can be ordered because of ordinal measurement, into percentile
 ranks.
 Interpreting Distributions
  When inspecting a distribution for the first time, we have train to look at
 the entire table, not just the distribution. Read the title, column headings,
 and any footnotes.
  After these preliminaries, inspect the content of the frequency distribution.
  When interpreting distributions, including distributions constructed by
 someone else, keep an open mind.
 Outliers
  A very extreme score that requires special attention because of its
 potential impact on a summary of the data is called outlier.
  Example: A GPA of 0.06, an IQ of 170, summer wages of $62,000
 Dealing with Outliers
 Check for Accuracy:
  Whenever an outlier encounter attempt to verify its accuracy.
  Example: For instance, whether GPA of 3.06 recorded erroneously as 0.06?
  If the outlier survives an accuracy check, it should be treated as a legitimate
 score.
 Might Exclude from Summaries:
  Choose to segregate an outlier from any summary of the data.
  For example, we might relegate it to a footnote instead of using
 excessively wide class intervals in order to include it in a frequency
 distribution. Or we might use various numerical summaries, such as the
 median and inter quartile range
 Might Enhance Understanding:
  A valid outlier can be viewed as the product of special circumstances; it
 can help to understand the data.
  For example, we might understand better why crime rates differ among
 communities by studying the special circumstances that produce a
 community with an extremely low (or high) crime rate, or why learning
 rates differ among third graders by studying a third grader who learns
 very rapidly (or very slowly).
 Graphs
 (Describing Data using Graphs)
  Data can be described clearly and concisely with the aid of a well constructed
 frequency distribution.
  Data can often be described even more vividly, by converting frequency
 distributions into graphs.
 Most common types of graphs:
 Graphs for Quantitative Data
 Histograms
 Frequency Polygon
 Stem and Leaf Displays
 Graphs for Qualitative Data
 Bar graph
Histogram
 A bar-type graph for quantitative data. The common boundaries between adjacent bars
emphasize the continuity of the data, as with continuous variables.
 Important features of histograms. 
 Equal units along the horizontal axis (the X axis, or abscissa) reflect the
various class intervals of the frequency distribution.
 Equal units along the vertical axis (the Y axis, or ordinate) reflect increases in
frequency.
 The intersection of the two axes defines the origin at which both numerical
scales equal 0.
 Numerical scales always increase from left to right along the horizontal axis
and from bottom to top along the vertical axis.
 The body of the histogram consists of a series of bars whose heights reflect
the frequencies for the various classes.
 Example:

Frequency Polygon
 An important variation on a histogram is the frequency polygon, or line
graph.
 Frequency polygons are particularly useful when two or more frequency
distributions or relative frequency distributions are to be included in the
same graph.
 Frequency polygons can be constructed directly from frequency distributions.
It can also be constructed from histogram.
 The step-by-step transformation of a histogram into a frequency polygon:
 A: This panel shows the histogram for the weight distribution.
 B: Place dots at the midpoints of each bar top or, in the absence of
bar tops, at midpoints for classes on the horizontal axis, and
connect them with straight lines.
 C: Anchor the frequency polygon to the horizontal axis. First,
extend the upper tail to the midpoint of the first unoccupied class
on the upper flank of the histogram. Then extend the lower tail to
the midpoint of the first unoccupied class on the lower flank of the
histogram. Now all of the area under the frequency polygon is
enclosed completely.
 D: Finally, erase all of the histogram bars, leaving only the
frequency polygon.
 Example:
Frequency Polygon
 An important variation on a histogram is the frequency polygon, or line graph.
 Frequency polygons are particularly useful when two or more frequency distributions
or relative frequency distributions are to be included in the same graph.
 Frequency polygons can be constructed directly from frequency distributions. It can also
be constructed from histogram.
 The step-by-step transformation of a histogram into a frequency polygon:
 A: This panel shows the histogram for the weight distribution.
 B: Place dots at the midpoints of each bar top or, in the absence of bar tops, at
midpoints for classes on the horizontal axis, and connect them with straight
lines.
 C: Anchor the frequency polygon to the horizontal axis. First, extend the upper
tail to the midpoint of the first unoccupied class on the upper flank of the
histogram. Then extend the lower tail to the midpoint of the first unoccupied class
on the lower flank of the histogram. Now all of the area under the frequency
polygon is enclosed completely.
 D: Finally, erase all of the histogram bars, leaving only the frequency
polygon. 
 Example:
Stem and Leaf Displays
 Stem and leaf displays are ideal for summarizing distributions, such as that for
weight data, without destroying the identities of individual observations.
 Stem and Leaf display is a device for sorting quantitative data on the basis of leading and
trailing digits.
 Stem and leaf displays represent statistical bargains. Just a few minutes of work produces
a description of data that is both clear and complete.
 Even though rarely appearing in published reports, stem and leaf displays often serve as
the first step toward organizing data.
 A good stem and leaf display
 shows the first digits of the number (thousands, hundreds or tens) as the stem and
shows the last digit (ones) as the leaf.
 usually uses whole numbers. Anything that has a decimal point is rounded to the
nearest whole number. For example, test results, speeds, heights, weights, etc.
 looks like a bar graph when it is turned on its side.
 shows how the data are spread—that is, highest number, lowest number,
most common number and outliers
 To construct the stem and leaf display
 On the left hand side of the page, write down the thousands, hundreds or
tens (all digits but the last one). These will be your stems.
 Draw a line to the right of these stems.
 On the other side of the line, write down the ones (the last digit of a number).
These will be your leaves.
 Example 1: A teacher asked 10 of her students how many books they had read in the last
12 months. Their answers were as follows: 12, 23, 19, 6, 10, 7, 15, 25, 21, 12. Prepare a
stem and leaf display for these data.
Bimodal
 It reflects the coexistence of two different types of observations in the same
distribution.
 For instance, the distribution of the ages of residents in a neighborhood consisting
largely of either new parents or their infants has a bimodal shape.
Positively Skewed
 A lopsided distribution caused by a few extreme observations in the positive
direction (to the right of the majority of Observations), is a positively skewed
distribution.
 The distribution of incomes among U.S. families has a pronounced positive skew,
with most family incomes under $200,000 and relatively few family incomes
spanning a wide range of values above $200,000.
Negatively Skewed
 A lopsided distribution caused by a few extreme observations in the negative
direction (to the left of the majority of observations), is a negatively skewed
distribution.
 The distribution of ages at retirement among U.S. job holders has a pronounced
negative skew, with most retirement ages at 60 years or older and relatively few
retirement ages spanning the wide range of ages younger than 60.
Bar graphs: A Graph for Qualitative (Nominal) Data
 Bar graphs are often used with qualitative data and sometimes with discrete
quantitative data.
 They resemble histograms except that gaps separate adjacent bars in bar graphs.
Example 1:
Interpreting graphs
 When interpreting graphs, beware of various unscrupulous techniques, such as
using bizarre combinations of axes to either exaggerate or suppress a particular data
pattern.
Describing Data with Averages
 Averages consist of numbers (or words) about which the data are, in some sense,
centered. They are often referred to as measures of central tendency
 A measure of center is a single number used to describe a set of numeric data. It
describes a typical value from the data set.
 Several types of average yield numbers or words that attempt to describe, most
generally, the middle or typical value for a distribution.
  Three different measures of central tendency are:
  Mode
  Median
  Mean.
  Each of these has its special uses, but the mean is the most important
 average in both descriptive and inferential statistics.
 Mode
  The mode equals the value of the most frequently occurring or typical
 score.
  It is easy to assign a value to the mode. If the data are organized.
 However, if the data are not organized, some counting may be required.
  The mode is readily understood as the most prevalent or typical value.
  Distributions can have more than one mode (or no mode at all).
  Distributions with two obvious peaks, even though they are not exactly
 the same height, are referred to as bimodal.
  Distributions with more than two peaks are referred to as multimodal.
  The presence of more than one mode might reflect important differences
 among subsets of data. For instance, the distribution of weights for both
 male and female statistics students would most likely be bimodal,
 reflecting the combination of two separate weight distributions—a
 heavier one for males and a lighter one for females.
  Example1: Determine the mode for the following retirement ages: 60, 63,
 45, 63, 65, 70, 55, 63, 60, 65, 63.
 Answer: mode = 63
  Example1: The owner of a new car conducts six gas mileage tests and
 obtains the following results, expressed in miles per gallon: 26.3, 28.7,
 27.4, 26.6, 27.4, 26.9. Find the mode for these data.
 Answer: mode = 27.4

 Median
  The median reflects the middle value when observations are ordered from
 least to most.
  The median splits a set of ordered observations into two equal parts, the
 upper and lower halves.
  In other words, the median has a percentile rank of 50, since observations
 with equal or smaller values constitute 50 percent of the entire distribution.
  To find the median, scores always must be ordered from least to most (or
 vice versa). This task is straightforward with small sets of data but becomes
 increasingly cumbersome with larger sets of data that must be ordered
 manually.
 Three different measures of central tendency are:
 Mode
 Median
 Mean.
 Each of these has its special uses, but the mean is the most important average in both
descriptive and inferential statistics.
Mode
 The mode equals the value of the most frequently occurring or typical score.
 It is easy to assign a value to the mode. If the data are organized. However, if
the data are not organized, some counting may be required.
 The mode is readily understood as the most prevalent or typical value.
 Distributions can have more than one mode (or no mode at all).
 Distributions with two obvious peaks, even though they are not exactly the same
height, are referred to as bimodal.
 Distributions with more than two peaks are referred to as multimodal.
 The presence of more than one mode might reflect important differences among subsets
of data. For instance, the distribution of weights for both male and female statistics
students would most likely be bimodal, reflecting the combination of two separate
weight distributions—a heavier one for males and a lighter one for females.
 Example1: Determine the mode for the following retirement ages: 60, 63, 45, 63, 65, 70,
55, 63, 60, 65, 63.
Answer: mode = 63
 Example2: The owner of a new car conducts six gas mileage tests and obtains
the following results, expressed in miles per gallon: 26.3, 28.7, 27.4, 26.6, 27.4,
26.9. Find the mode for these data.
Answer: mode = 27.4
Median
 The median reflects the middle value when observations are ordered from least to most.
 The median splits a set of ordered observations into two equal parts, the upper and lower
halves.
 In other words, the median has a percentile rank of 50, since observations with equal or
smaller values constitute 50 percent of the entire distribution.
 To find the median, scores always must be ordered from least to most (or vice versa).
This task is straightforward with small sets of data but becomes increasingly
cumbersome with larger sets of data that must be ordered manually.
 When the total number of scores is odd, there is a single middle-ranked
 score, and the value of the median equals the value of this score. When the
 total number of scores is even, the value of the median equals a value
 midway between the values of the two middlemost scores.
  In either case, the value of the median always reflects the value of middle-
 ranked scores, not the position of these scores among the set of ordered
 scores
  Example 1: Find the median for the following retirement ages: 60, 63, 45,
 63,65, 70, 55, 63, 60, 65, 63.
 Solution: median = 63
  Example2: Find the median for the following gas mileage tests: 26.3,
 28.7, 27.4, 26.6, 27.4, 26.9.
 Solution: median = 27.15 (halfway between 26.9 and 27.4)
 Mean
  The mean is the most common average.
  The mean is found by adding all scores and then dividing by the number
 of scores.
  That is

  There is no requirement that presidential terms be ranked before calculating
 the mean.
  Even when large sets of unorganized data are involved, the calculation of
 the mean is usually straightforward, particularly with the aid of a
 calculator or computer.
  The mean serves as the balance point for its frequency distribution.
  Mean cannot be used with qualitative data.
  Example 1: Find the mean for the following retirement ages: 60, 63, 45,
 63, 65, 70, 55, 63, 60, 65, 63.
 Solution:

  Example 2: Find the mean for the following gas mileage tests: 26.3, 28.7,
 27.4, 26.6, 27.4, 26.9.
 Solution:

Which Average?
 When a distribution of scores is not too skewed, the values of the mode,
median, and mean are similar, and any of them can be used to describe
the central tendency of the distribution.
When the total number of scores is odd, there is a single middle-ranked score,
and the value of the median equals the value of this score. When the total number of
scores is even, the value of the median equals a value midway between the values
of the two middlemost scores.
 In either case, the value of the median always reflects the value of middle-ranked
scores, not the position of these scores among the set of ordered scores
 Example 1: Find the median for the following retirement ages: 60, 63, 45, 63,65, 70, 55,
63, 60, 65, 63.
Solution: median = 63
 Example2: Find the median for the following gas mileage tests: 26.3, 28.7, 27.4,
26.6, 27.4, 26.9.
Solution: median = 27.15 (halfway between 26.9 and 27.4)
Mean
 The mean is the most common average.
 The mean is found by adding all scores and then dividing by the number of scores.

 That is

 There is no requirement that presidential terms be ranked before calculating the mean.
 Even when large sets of unorganized data are involved, the calculation of the mean is
usually straightforward, particularly with the aid of a calculator or computer.
 The mean serves as the balance point for its frequency distribution.
 Mean cannot be used with qualitative data.
 Example 1: Find the mean for the following retirement ages: 60, 63, 45, 63, 65, 70, 55,
63, 60, 65, 63.
 Solution:
 Example 2: Find the mean for the following gas mileage tests: 26.3, 28.7, 27.4, 26.6,
27.4, 26, 9.
 Solution:

Which average?

 When a distribution of scores is not too skewed, the values of the mode, median, and
mean are similar, and any of them can be used to describe the central tendency of the
distribution.

 When extreme scores cause a distribution to be skewed, the values of the three averages
can differ appreciably.

 Unlike the mode and median, the mean is very sensitive to extreme scores, or
outliers.

 Ideally, when a distribution is skewed, report both the mean and the median.
Appreciable differences between the values of the mean and median signal the
presence of a skewed distribution.

 If the mean exceeds the media, the underlying distribution is positively skewed
because of one or more scores with relatively large values.

 On the other hand, if the median exceeds the mean, the underlying distribution is
negatively skewed because of one or more scores with relatively small values.

 In the long run, however, the mean is the single most preferred average for quantitative
data.

 Following summarizes the relationship between the various averages and the two types
of skewed distributions (shown as smoothed curves).
Averages for Qualitative and Ranked Data
Mode Always Appropriate for Qualitative Data
 For quantitative data, in principle, all three averages can be used.
 The mode always can be used with qualitative data.
Median Sometimes Appropriate for Qualitative Data
 The median can be used whenever it is possible to order qualitative data from
least to most because the level of measurement is ordinal.
 It’s easiest to determine the median class for ordered qualitative data by using
relative frequencies
Mean cannot be used with qualitative data

Averages for Ranked Data.

 When the data consist of a series of ranks, with its ordinal level of
measurement, the median rank always can be obtained. It’s simply the
middlemost or average of the two middlemost ranks.
 The mean and modal ranks tend not to be very informative and will not be
discussed.

Bootstrapping in Excel
No ratings yet
Bootstrapping in Excel
41 pages
FDA Unit 2 Notes
No ratings yet
FDA Unit 2 Notes
39 pages
(Ebook) Real Stats: Using Econometrics for Political Science and Public Policy by Bailey, Michael A. ISBN 9780199981946, 0199981949 pdf download
No ratings yet
(Ebook) Real Stats: Using Econometrics for Political Science and Public Policy by Bailey, Michael A. ISBN 9780199981946, 0199981949 pdf download
48 pages
Descriptive Statistics: Descriptive Statistics Are Used by Researchers To Report On Populations and Samples
100% (1)
Descriptive Statistics: Descriptive Statistics Are Used by Researchers To Report On Populations and Samples
41 pages
FDS Unit 2
No ratings yet
FDS Unit 2
27 pages
Unit 2 Fds Final
No ratings yet
Unit 2 Fds Final
92 pages
Effect of Hands-on Activities on Achievement and Retention of Senior Secondary Chemistry Students in Stoichiometry
No ratings yet
Effect of Hands-on Activities on Achievement and Retention of Senior Secondary Chemistry Students in Stoichiometry
6 pages
DOC-20250509-WA0009.
No ratings yet
DOC-20250509-WA0009.
141 pages
Worksheet 2.5 HW 11 - Descriptive Stat Practice
No ratings yet
Worksheet 2.5 HW 11 - Descriptive Stat Practice
1 page
basics of data science
No ratings yet
basics of data science
31 pages
Lind_2024_Release_Chap002_PPT_Accessible
No ratings yet
Lind_2024_Release_Chap002_PPT_Accessible
30 pages
ANOVA 2 (2) NNNKN
No ratings yet
ANOVA 2 (2) NNNKN
36 pages
Methods of Data Collection and Presentation
No ratings yet
Methods of Data Collection and Presentation
33 pages
UNIT II
No ratings yet
UNIT II
38 pages
Dat Science Unit 2
No ratings yet
Dat Science Unit 2
27 pages
Week 1 - Ch 2
No ratings yet
Week 1 - Ch 2
49 pages
The Giving Back Statistic: A Comparative Analysis of The Factors That Dictate The Chances of Alumni Donations For Their Universities
No ratings yet
The Giving Back Statistic: A Comparative Analysis of The Factors That Dictate The Chances of Alumni Donations For Their Universities
20 pages
Adverse Impact
No ratings yet
Adverse Impact
38 pages
CH-2
No ratings yet
CH-2
16 pages
Describing Data With Tables
No ratings yet
Describing Data With Tables
9 pages
Introduction To Econometrics and Operations Research
No ratings yet
Introduction To Econometrics and Operations Research
28 pages
Lekcija 3 - Frekvencije
No ratings yet
Lekcija 3 - Frekvencije
57 pages
Lecture-2 & 3
No ratings yet
Lecture-2 & 3
94 pages
Chapter 2. Presenting Data in Tables and Charts: Objectives
No ratings yet
Chapter 2. Presenting Data in Tables and Charts: Objectives
44 pages
FDS UNIT 2 NOTES
No ratings yet
FDS UNIT 2 NOTES
46 pages
Shokhrukh Usmonov Colorado Technical University Applied Managerial Decision Making (MGMT600) Unit 3 - Individual Project Non-Parametric Statistics
No ratings yet
Shokhrukh Usmonov Colorado Technical University Applied Managerial Decision Making (MGMT600) Unit 3 - Individual Project Non-Parametric Statistics
7 pages
Statistics Chap 2 Frequency Distrn
No ratings yet
Statistics Chap 2 Frequency Distrn
29 pages
Chapter 05 - Multicollinearity
100% (1)
Chapter 05 - Multicollinearity
26 pages
UCCM2233 - Chp2 Organizing Data - Wble
No ratings yet
UCCM2233 - Chp2 Organizing Data - Wble
84 pages
Unit II Data Science Notes
No ratings yet
Unit II Data Science Notes
38 pages
Stat-Module-2-PPT
No ratings yet
Stat-Module-2-PPT
57 pages
PLU Quantitative Techniques 2
No ratings yet
PLU Quantitative Techniques 2
20 pages
4. Frequency distribution
No ratings yet
4. Frequency distribution
5 pages
DS Chapter - 2
No ratings yet
DS Chapter - 2
73 pages
1. Descriptive Statistics (1)
No ratings yet
1. Descriptive Statistics (1)
65 pages
Stat 2024 Formula and Tables For Statistics v1
No ratings yet
Stat 2024 Formula and Tables For Statistics v1
28 pages
717866723 Ad3491 Fdsa Unit 2 Notes Eduengg
No ratings yet
717866723 Ad3491 Fdsa Unit 2 Notes Eduengg
85 pages
Unit 2
No ratings yet
Unit 2
18 pages
Assignment II
No ratings yet
Assignment II
3 pages
Introduction To Rlogistic
No ratings yet
Introduction To Rlogistic
135 pages
Chapter
No ratings yet
Chapter
33 pages
Creating and Using Frequency Distributions
No ratings yet
Creating and Using Frequency Distributions
23 pages
Frequency Distributions: Essentials of Statistics For The Behavioral Sciences
No ratings yet
Frequency Distributions: Essentials of Statistics For The Behavioral Sciences
45 pages
UNIT- II DESCRIBING DATA I
No ratings yet
UNIT- II DESCRIBING DATA I
21 pages
Support Vector Regression
No ratings yet
Support Vector Regression
14 pages
CEE 6505: Transportation Planning: Week 03: Trip Generation (Fundamentals)
No ratings yet
CEE 6505: Transportation Planning: Week 03: Trip Generation (Fundamentals)
66 pages
_ Unit 2 _ Descriptive Analytics
No ratings yet
_ Unit 2 _ Descriptive Analytics
85 pages
Chapter 2 (6)
No ratings yet
Chapter 2 (6)
24 pages
Basic Statistics For Data Science
100% (1)
Basic Statistics For Data Science
45 pages
002 Frequency Distribution PSY102
No ratings yet
002 Frequency Distribution PSY102
59 pages
Lecture-3 Frequency Distribution
No ratings yet
Lecture-3 Frequency Distribution
22 pages
Chapter-2-Methods of Data Presentation
No ratings yet
Chapter-2-Methods of Data Presentation
17 pages
Stat CH-2
No ratings yet
Stat CH-2
46 pages
Unit II Question Bank With Hints and Answers
No ratings yet
Unit II Question Bank With Hints and Answers
16 pages
CH 2
No ratings yet
CH 2
47 pages
chapter2
No ratings yet
chapter2
32 pages
Chapter2 091117004812 Phpapp01
100% (1)
Chapter2 091117004812 Phpapp01
55 pages
STAT - ANOVA and Control Chart
No ratings yet
STAT - ANOVA and Control Chart
54 pages
Chapter-2-Static of Data-1
No ratings yet
Chapter-2-Static of Data-1
13 pages
Data Mining Notes: 7 Semester. CS 1435: Syllabus
No ratings yet
Data Mining Notes: 7 Semester. CS 1435: Syllabus
4 pages
Chapter 3: Descriptive statistcs
No ratings yet
Chapter 3: Descriptive statistcs
24 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
54 pages
EMBA Day3
No ratings yet
EMBA Day3
29 pages
Manecon Module 3 Notes
No ratings yet
Manecon Module 3 Notes
5 pages
Chapter 2 SUMMARY Descriptive Statistics
No ratings yet
Chapter 2 SUMMARY Descriptive Statistics
32 pages
Univariate and Bivariate Data Analysis + Probability
100% (1)
Univariate and Bivariate Data Analysis + Probability
5 pages
Statistics 1232445944520487 1
No ratings yet
Statistics 1232445944520487 1
101 pages
Economics Sem 4 Notes
No ratings yet
Economics Sem 4 Notes
25 pages
Regression Equations
No ratings yet
Regression Equations
32 pages
Topic 3
No ratings yet
Topic 3
22 pages
Methods-of-Data-Presentation
No ratings yet
Methods-of-Data-Presentation
8 pages
Data Organization
No ratings yet
Data Organization
69 pages
Mock Test Maths 2021
No ratings yet
Mock Test Maths 2021
17 pages
Chapter 2
No ratings yet
Chapter 2
46 pages
M2.2 - Presentaton of Data
No ratings yet
M2.2 - Presentaton of Data
28 pages
Chapter 2-190810 074149
No ratings yet
Chapter 2-190810 074149
19 pages
Statistics Chapter-II
No ratings yet
Statistics Chapter-II
66 pages
AK - STATISTIKA - 01 - Describing Data
No ratings yet
AK - STATISTIKA - 01 - Describing Data
26 pages
Chap 012
75% (4)
Chap 012
91 pages
Methods of Data Presntation
No ratings yet
Methods of Data Presntation
53 pages
BIOSTAT Chapter2
100% (1)
BIOSTAT Chapter2
57 pages
Frequency Distribution and Data
No ratings yet
Frequency Distribution and Data
5 pages
CH02 - Data Description 2
No ratings yet
CH02 - Data Description 2
85 pages
Frequency Distributions: Describing, Exploring and Comparing Data
No ratings yet
Frequency Distributions: Describing, Exploring and Comparing Data
28 pages
Frequency Distribution & Graghs
No ratings yet
Frequency Distribution & Graghs
28 pages
Assignment 5 Lanka Jaswanth 19BIT0061
No ratings yet
Assignment 5 Lanka Jaswanth 19BIT0061
9 pages
FDSA Unit-2
No ratings yet
FDSA Unit-2
41 pages
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet
Chapter 6 Demand Forecasting
91% (11)
Chapter 6 Demand Forecasting
27 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Fundamentals of Data Science unit 2

Uploaded by

Fundamentals of Data Science unit 2

Uploaded by

UNIT-2

Frequency distributions–Outliers–relative frequency distributions– cumulative frequency

 A frequency distribution is a collection of observations produced by sorting

Distribution for Ungrouped Data.

 A frequency distribution produced whenever observations are sorted into classes of

Frequency Distribution for Grouped Data

 Cumulative Frequency Distributions

Averages for Ranked Data.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.